[GNC-dev] Understanding the bayesian import matching algorithm
Christian Gruber
christian.gruber at posteo.de
Thu Jul 2 15:10:53 EDT 2020
Hi,
while further studying the bayesian import matching algorithm I'm now at
the point, where I wanted to understand, how the bayes formula is
applied to the problem of matching transactions to accounts using
tokens. But I need further information, since it doesn't come clear to
me what is really calculated there.
The implementation can be found in the following functions in Account.cpp:
* get_first_pass_probabilities()
* build_probabilities()
* highest_probability()
Actually, the latter could be omitted as it only selects the account
with the highest matching probability.
Studying the code and the rare comments on the implementation it seems
to be a variant of the naive bayes classifier
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Probabilistic_model>
with the tokens used as (independent) "features" and the accounts used
as "classes". But comparing this algorithm to the code leaves several
questions open.
Does anybody know a more precise algorithm description, on which the
implementation in GnuCash is based on?
Regards,
Christian
More information about the gnucash-devel
mailing list