Import transactions change proposal
Derek Atkins
warlord at MIT.EDU
Mon Feb 10 19:35:41 CST 2003
Christopher Browne <cbbrowne at cbbrowne.com> writes:
> > > 1. Ifilter performance gets increasingly /greatly/ ugly as the number
> > > of categories grows. Doing this totally automagically means that each
> > > transaction is a "category," and if there are thousands of transactions,
> > > that's not terribly nice.
> >
> > No, each *account* is a category. This is *NOT* duplicate matching.
>
> Ah, yes, you're right. "Thousands of categories" might be ugly, but the
> objection falls away pretty neatly when there is a natural
> already-present, trivial-to-fix-if-it-gets-it-wrong categorization.
Well, cmorgan and I had this conversation on #gnucash last night. We
basically decided that what we need is an architecture where we have
the following set of maps for each import account. I'm trying to show
this as a "tree" for simplicity, as it's a multi-layer hierarchy:
<token1>/
<acct1> == <token_count>
<acct2> == <token_count>
...
<token2>/
<acct1> == <token_count>
<acct2> == <token_count>
...
Based on this layout, the "find account" algorithm would look
something like:
1- for each token
1a- lookup the token map
1b- build the partial percentages for the potential accounts of that token
2- combine all the partial percentages for all the potential accounts
3- choose the account with the highest percentage (or none if some theshold is
not met)
The algorithm to "store account" would look like:
1- for each token
1a- lookup the token map
1b- increment the token count for the account (or add the account to the map)
If you look closely, you'll notice that this hierarchy maps quite
nicely to the KVP tree ;)
-derek
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord at MIT.EDU PGP key available
More information about the gnucash-devel
mailing list