Understanding the bayesian import matching algorithm

Christian Gruber christian.gruber at posteo.de
Thu Jul 2 15:10:53 EDT 2020


while further studying the bayesian import matching algorithm I'm now at 
the point, where I wanted to understand, how the bayes formula is 
applied to the problem of matching transactions to accounts using 
tokens. But I need further information, since it doesn't come clear to 
me what is really calculated there.

The implementation can be found in the following functions in Account.cpp:

  * get_first_pass_probabilities()
  * build_probabilities()
  * highest_probability()

Actually, the latter could be omitted as it only selects the account 
with the highest matching probability.

Studying the code and the rare comments on the implementation it seems 
to be a variant of the naive bayes classifier 
with the tokens used as (independent) "features" and the accounts used 
as "classes". But comparing this algorithm to the code leaves several 
questions open.

Does anybody know a more precise algorithm description, on which the 
implementation in GnuCash is based on?


