[GNC-dev] Understanding the bayesian import matching algorithm

Christian Gruber christian.gruber at posteo.de
Thu Jul 2 15:10:53 EDT 2020


Hi,

while further studying the bayesian import matching algorithm I'm now at 
the point, where I wanted to understand, how the bayes formula is 
applied to the problem of matching transactions to accounts using 
tokens. But I need further information, since it doesn't come clear to 
me what is really calculated there.

The implementation can be found in the following functions in Account.cpp:

  * get_first_pass_probabilities()
  * build_probabilities()
  * highest_probability()

Actually, the latter could be omitted as it only selects the account 
with the highest matching probability.

Studying the code and the rare comments on the implementation it seems 
to be a variant of the naive bayes classifier 
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Probabilistic_model> 
with the tokens used as (independent) "features" and the accounts used 
as "classes". But comparing this algorithm to the code leaves several 
questions open.

Does anybody know a more precise algorithm description, on which the 
implementation in GnuCash is based on?

Regards,
Christian



More information about the gnucash-devel mailing list