Matching problems when importing QFX transactions

Lincoln A Baxter lab at lincolnbaxter.com
Thu Oct 1 00:49:06 EDT 2015


Hi Larry,

On Tue, 2015-09-29 at 11:14 -0400, Larry Bradley wrote:
> Running gnucash 2.6.3 on Ubuntu.
> I am importing transactions from my bank in QFX format - thus using 
> the
> generic importer with Baysian matching. Some of the sources that once
> imported to the proper category no longer do.
> 
> For example, we shop at several grocery stores. All of them except 
> one
> get assigned to the GROCERIES account. The 'bad' one gets assigned to
> Imbalance. It used to work fine. Various other sources also now
> misbehave.
> 
> The generic import parameters are set as follows:
> Match display threshold=1
> Auto-add threshold=3
> auto-clear threshold=6
> 
> I have the ability to change the database tables if this is what is
> necessary to fix things up. I can also write external programs to
> manipulate the QFX file before GC sees it.
> 
> I have searched the archives but not found anything useful.
> 
> Just looking thru the XML and found this entry:
> 
>        <slot>
>           <slot:key>Metro</slot:key>
>           <slot:value type="frame">
>             <slot>
>               <slot:key>Expenses:Food:Dining Out</slot:key>
>               <slot:value type="integer">5</slot:value>
>             </slot>
>             <slot>
>               <slot:key>Expenses:Food:Groceries</slot:key>
>               <slot:value type="integer">96</slot:value>
>             </slot>
>             <slot>
>               <slot:key>Imbalance-CAD</slot:key>
>               <slot:value type="integer">3</slot:value>
>             </slot>
>           </slot:value>
>         </slot>
> 
> 
> Metro is one of the failing matches. The proper match is
> "Food:Groceries". I get "Imbalance:CAD" when I import. Can I delete 
> the
> "Imbalance" entry?
> 
> Any help would be appreciated.
> 

There have been various posts to this list over the years about the
Baysian transaction matcher, and number of posts in the last year.  

The matcher will degrade over time, especially if:

1) You have renamed accounts (the keys end up referencing accounts that
no long exist, because the implementation does no reference accounts by
the account UUID. (guid)).  (seem to do this a lot)

2) If you have been inconsistent in where you send transactions when
you import them.  hen it can't make up its mind, and you have to pick
Or it will make the wrong decision for this transaction.  (every one
does this)

3) If you allow transactions to be imported without balancing accounts
because you want to get the transactions imported, but you want to fix
the imbalance transactions later: These will be balanced by default to
Imbalance account and the Bayes slot data will remember it. It could be
it should really not remember these transactions at all, but it does. 
 (I have done this intentially when I'm not sure where transactions
should go, but I want to finish the import.)

Previous users have posted perl scripts to manipulate GC XML files by
treating the files as text, and frankly I have not liked these scripts
because XML data is structured data, and these scripts assume specific
formatting, which the XML standard explicitly does not require. GC
could change the formatting of the XML it writes, and this would break
any script that does not read the data AS XML data. 

So, several months ago, the itch go so bad a spent a weekend or two
writing several scripts to process GC XML data _as_ XML, so that I
could clean up my 10+ year old GC file, and improve transaction
matching which had gotten pretty bad.  At first I wanted to just remove
all bayes data, and I implemented that.  But as I got to know the data
better I realized I could be much smarter about it, and just remove the
Bayes slots that were confusing the matching, and rename references to
orphaned accounts etc.

I have attached the script I wrote for this to this email.
(prune_bayes_data.pl)  This script has a number of options that can be
used for analyzing large GC files, and manipulating the Bayes data. The
script does NOT modify the input data, but it must be uncompressed XML.

It does require that the CPAN module XML::LibXML be installed in your
perl environment.  Minimal instructions are provided in the comments at
the head of the script. When I find time I will create a git hub
repository to make these scripts easily supportable... 

In your case (once you have a perl environment that meets the
requirements, you could run this script as in the following example and
it would remove all your Imbalance Bayes slot keys... as well as
orphaned slots referencing accounts that no longer exist:

     perl prune_bayes_data.pl --remRegex=Imbalance orig.GC.file.xml  modified.GC.file.xml
     #the "perl" part of the above is only necessary if you are on windows
     #since windows does not understand the #! line at the top of the script
     #on Unix/Linux based OSs, just make the script executable with chmod +x

Output will look like this:

    Will remove slot keys matched by regular expression: Imbalance
    There are 212 accounts in this file
    8 accounts contain import-map-bayes slot data
    Processing import-map-bayes slots in account: 0 Checking
    Processing import-map-bayes slots in account: 2 Savings
    Processing import-map-bayes slots in account: 1 Shared Checking
    Processing import-map-bayes slots in account: Dad MMS+
    Processing import-map-bayes slots in account: 3 Chase Freedom
    Processing import-map-bayes slots in account: LL Bean Card
    Processing import-map-bayes slots in account: 1 Red
    Processing import-map-bayes slots in account: 2 Black
    Removed 23 slots matching regexes: Imbalance
    Removed 323 slots which reference accounts that don't exist
    The above totals 346 slots explicitly removed
    Removed a total of 490 slots (including empty parent slots created by explicit removals)
    modified.GC.file.xml contains the rewritten GnuCash file.

HTH.

Lincoln

-------------- next part --------------
A non-text attachment was scrubbed...
Name: prune_bayes_data.pl
Type: application/x-perl
Size: 27453 bytes
Desc: not available
URL: <http://lists.gnucash.org/pipermail/gnucash-user/attachments/20151001/8e19cc00/attachment-0001.pl>


More information about the gnucash-user mailing list