Bayesian matching- Imbalance

John Ralls jralls at ceridwen.us
Wed Jul 15 23:39:27 EDT 2015


> On Jul 15, 2015, at 8:09 PM, Lincoln A Baxter <lab at lincolnbaxter.com> wrote:
> 
> On Tue, 2015-07-14 at 09:10 -0700, John Ralls wrote:
>>> On Jul 14, 2015, at 8:32 AM, C <Peace at AleksandrSolzhenitsyn.net> 
>> wrote:
>>> 
>>> I'm running- GnuCash r21973 on 2013-01-03 on Linux.
>>> 
>>> When importing a QFX file into my credit card account many of the
>>> imported transactions are "matched" incorrectly. For example; 
>> "Sunoco
>>> Car fuel" ends up in the transfer column with the name Expense: 
>> Food
>>> which is incorrect.
>>> 
>>> The Bayesian matching is set but I don't know what the proper 
>> levels are
>>> to be set to...or whether that'll solve any problems.
>>> 
>>> Many of the imported QFX transactions end up in the "Imbalanced 
>> USD"
>>> account.
>>> 
>>> Fixing this stuff takes way too much time.
>>> 
>>> Do you have any idea what should be done and how to fix matching so 
>> the
>>> purchases are correctly labeled in the "transfer" column?
>>> 
>> 
>> You have to train, or perhaps retrain, the Bayesian matcher. That 
>> means reviewing and correcting the transfer accounts every time you 
>> do an import before accepting the matches. Depending on the 
>> variability of the descriptions and how long you’ve allowed the bad 
>> matches to persist it may take many imports worth of corrections to 
>> overcome the bad scores. There’s no “clear the history” button 
>> implemented to let you start over from scratch.
> 
> There may be no "clear the history" button, but it might be very
> helpful to many (including me if we could just wipe it out and start
> over. This is by far not the first complaint I have read. 
> 
> Why could we not just remove the files or replace them with fresh
> (empty/untainted) copies?  This and the additional complications you
> name below makes the Bayesian matching way less useful than it could
> otherwise be, even if we could just wipe it out... 
> 
> This is annoying enough to me, that it might be worth my pulling down
> the sources and at least fixing the account name problem, and submitted
> a patch.
> 
> For now, Where are the file(s)?  

There are no files. The match data are in the GnuCash data file as slots in each account into which you have imported, with the keys “import-map” for the string matcher and “import-map-bayes” for the Bayesian mapper. If you’re careful you could delete those slots from your file with an editor, ideally one that knows about keeping XML tags in balance. Make a backup first!
You’ll have to re-train the matcher from scratch.

We can’t just change from names to GUIDs, that would introduce a file incompatibility. So:
* maint needs a patch to detect which is being used in the file, and to use it.
* master needs to just use GUID, but also
** a “scrub” function to convert all the existing scores to GUIDs
** a Feature setting (which maint can use to decide which to use) to prevent the file from being loaded by older versions that don’t know about the GUID switch.

I intend to do that at some point, but it’s not high enough a priority that I can promise it for 2.8. If you want to have a go at it by all means have a look at the code and then we can talk about it in gnucash-dev or irc.

Regards,
John Ralls




More information about the gnucash-user mailing list