Import transactions change proposal

Mon Feb 10 19:35:41 CST 2003

Christopher Browne <cbbrowne at cbbrowne.com> writes:

> > > 1.  Ifilter performance gets increasingly /greatly/ ugly as the number
> > > of categories grows.  Doing this totally automagically means that each
> > > transaction is a "category," and if there are thousands of transactions,
> > > that's not terribly nice.
> > 
> > No, each *account* is a category. This is *NOT* duplicate matching.
> 
> Ah, yes, you're right.  "Thousands of categories" might be ugly, but the
> objection falls away pretty neatly when there is a natural
> already-present, trivial-to-fix-if-it-gets-it-wrong categorization.

Well, cmorgan and I had this conversation on #gnucash last night.  We
basically decided that what we need is an architecture where we have
the following set of maps for each import account.  I'm trying to show
this as a "tree" for simplicity, as it's a multi-layer hierarchy:

<token1>/
  <acct1> == <token_count>
  <acct2> == <token_count>
  ...
<token2>/
  <acct1> == <token_count>
  <acct2> == <token_count>
  ...

Based on this layout, the "find account" algorithm would look
something like:

1- for each token
 1a- lookup the token map
 1b- build the partial percentages for the potential accounts of that token
2- combine all the partial percentages for all the potential accounts
3- choose the account with the highest percentage (or none if some theshold is
   not met)

The algorithm to "store account" would look like:

1- for each token
  1a- lookup the token map
  1b- increment the token count for the account (or add the account to the map)

If you look closely, you'll notice that this hierarchy maps quite
nicely to the KVP tree ;)

-derek
-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available