Import transactions change proposal

Greg Stark gsstark at mit.edu
Sun Feb 9 12:09:38 CST 2003


Derek Atkins <warlord at MIT.EDU> writes:

> Chris and I were talking on #gnucash about potentially expanding the
> match mapper to use some sort of Bayesian filtering to determine the
> destination account mapping.  However I'm not sure how such a system
> would work -- or where the necessary databases would get stored (or
> even what the databases would look like).

That would be what I proposed a few weeks ago. I've been thinking about it
further, I'm not sure the database would have to be stored anywhere. When the
import begins you scan the last 1,000 or so transactions on the account you're
importing to; load them into an in-memory database and use that.

> Adding in other information to the bayesian mix would certainly be
> possible, once we come up with an architecture.  But you really don't
> have a lot of information to work with when trying to choose a
> destination account.

My thinking was to use the levenshtein distance (same idea as agrep) for the
text fields, the difference between the amounts in percentage, the day of
month, day of week etc.

The algorithm would be a bit different from e-mail spam matching though.
Instead of pulling out hundreds of attributes from an e-mail message and using
an index to find the weights quickly, gnucash would have only a half dozen or
so attributes but would have to scan the database completely to find
approximate matches.

-- 
greg



More information about the gnucash-devel mailing list