Import transactions change proposal

Derek Atkins warlord at MIT.EDU
Sun Feb 9 13:45:18 CST 2003


Greg Stark <gsstark at MIT.EDU> writes:

> Derek Atkins <warlord at MIT.EDU> writes:
> 
> > The problem is that data import is "lossy", you don't necessarily have
> > all the import information in the GNC Transaction.  For example, you
> > lose the QIF Category name, but you DEFINITELY want to be able to map
> > from QIF Category to GNC Account.  
> 
> Well, my first reaction is that importing shouldn't be lossy. Having lossy
> steps closes options such as being able to show the user where a transaction
> came from and what the information the bank presented was. I could see that
> being useful in a dispute with a bank or debugging problems after the fact.

Argubly it has to be, considering the GNC format is very different
from the QIF format, which is very different than OFX or HBCI...  So
by definition you're going to have SOME loss.  The question is what
info do you lose?  For example, QIF has the concept of a Payee, a
Memo, a Category, and a Class..  There is no direct mapping between
all of these...  We try to map as best we can.  The key information is
the source account, date, amount, and then payee/memo TRY go go into
the GNC Description/Memo, and Category/Class try to be combined into a
GNC Account.  But that doesn't always work.

> > In order to just load txns and build the map at runtime you'd need to be
> > able to store all this information. You'd also lose badly when you try to go
> > across Accounting Periods.
> 
> That sounds like a case of denormalizing the underlying data representation in
> order to implement a presentation level feature. I would have expected
> accounting periods to simply mark date boundaries or mark individual
> transactions as unmodifiable. I wouldn't have expected to actually move the
> transactions around and make accessing them require special actions.

Perhaps, but there is a good reason for this.  There is call for
actually using separate data stores for different periods, for example
one XML data-file per year.  The reason is that XML is LARGE, and you
necessarily need to read/parse a complete XML document.  After several
years your data file can get to be tens or maybe even hundreds of
megabytes -- why force users to load all that extraneous data?

Periods are quite more than just a presentation layer.

> > Choosing a destination account is much more tricky -- you've got
> > potentially hundreds of choices to match into.  If you have ideas for
> > a decent matching algorithm I'd love to hear it.  Code would be
> > better, but we should work on designs before coding, IMHO.
> 
> I had a plan for a matching heuristic, but I think the bayesian filter is a
> better idea. Any hard coded heuristic will work well for some people but fail
> completely for others. A bayesian filter should adapt to various systems with
> different data formats much better.

I haven't looked at ifilter, so I don't know what kind of data-store
it requires.  I still want to maintain a data store rather than trying
to build it at runtime.  I just have no clue about what would need to
be stored in such a database.

> greg

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available


More information about the gnucash-devel mailing list