Search and Replace

Lincoln A Baxter lab at lincolnbaxter.com
Mon Jan 4 20:56:13 EST 2016


On Sun, 2016-01-03 at 20:16 -0500, R. Victor Klassen wrote:
> I would like to elevate the question to something more along the line
> of a feature request.
> 
> The scenario is as follows:
> 
> During the high season, we have 5-6 invoices per week, largely
> containing the same items, but different quantities, and sometimes
> different prices.  
> Occasionally auto-fill doesn’t happen - for what reason, I don’t
> know.   It is usually in such an instance that an item may have the
> wrong account associated with it.
> But this means that accounts can from time to time get wrong.
> 
> And then for the next - oh, 4-6 weeks, this is applied to every
> invoice with that item, thanks to autofill.  I notice when one of the
> income accounts is surprisingly low or high (usually low, as I’m less
> likely to be suspicious if it is high).
> 
> Here is where a search and replace would be wonderful.   Find all
> occurrences of XXX in the description field of all invoices between
> date D1M1YYY1 and date D2M2YYY2 and change the account to AAA if it
> is not already.   Possibly with a confirm requested on each
> occurrence.

OK, so this looks exactly like a problem I just solved last week! (with
a perl script).

In my case, I had used the ability to delete accounts, and move all transactions to another account (higher in the hierarchy) to remove accounts that never had more than 2 or 3 transactions per year.  But, several months later, I decided I had gone too far, and wanted to back it out at least part way... (because it turned out I had included in the consolidation payroll splits that happened twice a month and it created too much noise for how I want to account for the other transaction.

So I needed to bulk move of all of the transactions referencing an account in the transaction's split, to a different account, using a regular expression match on the transaction description.  

The script I wrote is attached along with the text documentation. While it does not look at transaction date ranges, I think that this would be useful to add. I'm planning to do some other transaction date processing (to divide up my GC file with YEARS if transactions in it, into files with older (archived) data and a current file with the most recent year at least, so adding date range searching is something I'd consider for this script, if others think they would use it, I'll see if I could do that sooner than later.  I think capability would be useful, and it would more narrowly target the moves to make. So it would add a degree of safety.

Lincoln
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc_move_splits.pl
Type: application/x-perl
Size: 15614 bytes
Desc: not available
URL: <http://lists.gnucash.org/pipermail/gnucash-user/attachments/20160104/3c207e98/attachment.pl>
-------------- next part --------------
NAME
    gc_move_splits.pl

COPYRIGHT
    Copyright (C) 2016 Lincoln A Baxter

    This program is free software: you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
    Free Software Foundation, either version 3 of the License, or (at your
    option) any later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
    Public License for more details.

    Please see the GNU General Public License at
    <http://www.gnu.org/licenses/>

ABSTRACT
    Move transaction splits from one account in the GC file to another.
    Useful if you have consolidated accounts, and then discovered you want
    to separate out some of the transactions into different accounts.

SYNOPSIS
    gc_move_splits.pl [options] CG-file.xml [modified-GC-file.xml]

    options:

     --help                         print this synopsis help text
     --man                          print the full man page (this file's POD)
     --verbose                      increased verbosity (mainly for debugging)
     --fromAcctSplit=AccountPath    path of account the split should be move FROM
     --toAccountSplit=AccountPath   path of account the split should be move TO
     --dumpAccount=file|-           create a dump of account paths and guids
     --description=regex            Only move splits in transactions matching
                                    description= my be repeated to specify more than one
     --pathSeparator=char           GnuCash account path separator (default=:)

    The second file argument is is optional. With no destination output
    file, gc_move_splits.pl runs in analytical/trial mode, and reports all
    actions taken and produces all specified analysis outputs.

    The source input file is never modified.

    If your gnucash data file is compressed you must uncompress it first (on
    unix based OSes) as follows:

      cat gnucach_data_file gunzip > uncompressed_gnucash_file.xml

    Or you can just uncheck the "compressed file" option in GnuCash and
    save.

DISCUSSION
    The script reads an uncompressed "version 2" GnuCash XML datafile and
    writes a new file with the modified splits.

    Splits to be moved are found by

      1. Getting the Guid for the account from which the split is to be removed
      2. Getting the Guid for the account into which the split should be moved

    Then traversing all transactions in the file looking for

      1. transactions with descriptions matching the input regex
      2. finding the split with the guid to be removed
      3. replacing the guid to be removed with the guid of the destination account

    The script reads the GnuCash xml file *as XML* not as text. The script
    uses XML::LibXML CPAN perl module to read, traverse modify and output a
    new XML file. It is not dependant on the formating of the gnucash XML
    unlike most perl scripts this author has seen on the GnuCash users email
    list.

    Because this script does not treat the GnuCash file as text, it is not
    subject to the breakage that would occur if the formatting of the XML
    data were to change.

    Instead, this script reads the GnuCash data a DOM structure, and then
    manipulates that structure.

    The script does not modify the input GnuCash XML datafile. The user must
    specify an outout data filename. The results of the script's operations
    are written to this file, which should then be opened and checked in
    GnuCash. Before replacing the original file.

    To print a usage synopsis: gc_move_splits.pl --help

    To print the synopsis, plus option descriptions: gc_move_splits.pl
    --help --verbose

    To print the entire man page: gc_move_splits.pl --man

ENVIRONMENT
    Because gc_move_splits.pl reads the input XML file *as XML* using the
    perl CPAN XML::LibXML module, using an XPath expression to find the
    bayes matching data slots in each account, the script requires that your
    perl environment have the CPAN XML::LibXML module installed.

    The command

      perl -c -MXML::LibXML </dev/null

    will report an error if the module is not installed in your environment.
    Of course the script will report this also, because if it is not
    present, the script will not compile.

  Unix/Linux environments
    Most Linux distributions make this available via the their standard
    package managers. On Debian based distributions this can be installed
    with the following command:

       sudo apt-get install libxml-perl

    On Unix/Linux environments this script should be made executable with
    the chmod command

       chmod +x gc_move_splits.pl

  Windows environments
    XML::LibXML is also available in Active Perl, and in the cygwin
    environments.

    This author has not tested this script on windows, but knows of no
    reason why it will not work, once the required environment is installed.

    On windows, the easiest way to run the script would be by using perl
    from a cmd prompt:

       perl gc_move_splits.pl

  Macintosh environments
    This author is not familiar with the OSX environment. But knows it is
    based on BSD Unix. The script should run from a terminal prompt, once
    the requisit, LIBXml is installed in the perl environemnt. Patches to
    these instructions are welcome.

OPTIONS
    All options may be abbreviated as long as the option is distinct from
    all other options.

  --description='regex'
    This regular expression is used for identifying transaction splits to be
    moved or retargeted to another account.

    This switch may be specified multiple times to specify multiple Regexes:

       --descri='^Check' --descr='United Way'

    The first will find "Check" at the beginning of a transaction
    description. The second will find "United Way" anywhere in the
    transaction description.

  --verbose
    Print very verbose output to STDOUT. Used for debugging. Don't bother.
    Besides, this is also not very well implemented in this script at this
    time.

  --help (or -h)
    Print help text.

  --path(Separator)=:
    Specifies the character used as the path separator in your GC datafile.
    The default value is a colon (:)

  --man
    Print the full man page documentation

  --fromAcctSplit=
    Full path of account we what to move the split FROM.

  --toAcctSplit=
    Full path of account we what to move the split TO.

  --description
    Specifies the regex that will be looked for in the transaction
    description. May be repeated.

  --dumpAcct=file|-
    Dump the full paths for all the accounts in the GC file to either the
    file specified or to STDOUT, if "-" is used.

AUTHOR
    Lincoln A. Baxter email: my intials (all three) (at) lincolnbaxter (dot)
    com



More information about the gnucash-user mailing list