[GNC] Bayesian Matching is not working very well

john jralls at ceridwen.us
Wed Feb 2 11:59:49 EST 2022



> On Feb 1, 2022, at 10:44 PM, David Carlson <david.carlson.417 at gmail.com> wrote:
> 
> I am currently running GnuCash 3.8 in Ubuntu 20.04.  This release has the
> 'New' generic importer with the revised Bayesian matching.  I have found
> that sometimes it is good at assigning accounts to incoming transactions
> and other times it is awful.  One example is transactions where Starbucks
> recharges my wife's online app with $25.00 two or three times a month,
> GnuCash has seen this transaction 130 times according to the Import Map
> Editor, yet it cannot assign this transaction to the correct expense
> account.  There are several other common transactions at certain businesses
> that do not get assigned even though they have appeared many times.  It
> used to work better in the old version that was in the 2.6.x releases. Is
> this a bug?

Maybe, depending on why the matcher gets it wrong. The matcher uses space as a separator to break up the description into tokens. Every time you match a transfer account to a transaction the tokens in the transaction are added or incremented on the account. The import map editor shows those pairings with the scores.

The match algorithm is a variation on a "naive Bayes filter" commonly used in spam detectors like SpamAssassin, see https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering <https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering> for an explanation. 
The modification is that a spam filter makes a single yes-no decision while the match algorithm needs to make a most likely decision among two or more possible accounts.

Now that you know what to look for you can analyze the tokens in your Starbucks recharge transactions using the counts in the import map editor to figure out why the matching algorithm might be getting it wrong.

Regards,
John Ralls



More information about the gnucash-user mailing list