[GNC-dev] avoid the brain dead import

David Cousens davidcousens at bigpond.com
Wed Aug 29 18:52:49 EDT 2018


I have experienced the importer trying to match data out of the range
of dates in the current import. It only occurred from memory when I
first changed over to version 3.0. The matcher appeared to have lost
all memory of what accounts to assign in the changeover form 2.6.
However I found after importing 1-2 months data it was functioning
normally again. I have been using the OFX importer for 3-4 years with
OFX without any significant problems.

Your point about large data files sounds valid. I havent looked at the
code for the match picker so I don't know how it works or whether it
works on the historical data to extract the information it needs to
make a choice of an accounts to assign or data to match. 

As it is a Bayesian mechanism at some point it has to examine the
existing data and construct some sort of probability table, so my guess
would be that this could be a step which is taking so long. Being able
to set a preference for a date range or period to use in constructing
the initial probability tables is probably a good idea if this is the

My experience on the changeover from 2.6 to 3.0 when it appeared to
have lost any memory of previous import assignments indicated that the
importer was constructing those tables from the data it imports and not
from the historical data, but I could be wrong.  I would expect it to
be using a Kalman filtering approach on the input data but can't be
sure until I get a good look at the code. It did attempt to match
transactions that were otherwise similar to transactions in the
previous month or two initially. I only have data going back~8 years
and have been retired for a large percentage of that so my files aren't
huge so I may not be hitting your problem if it is the case that it
does look further back.

I think the decision about whether to import a small number of
transactions by hand is really one for the user and not the importer to
make. I would import small batches, maybe 20-30  to test the importer
function and ensure it was working as expected before attempting to
import 10k.

On Wed, 2018-08-29 at 22:00 +0100, Wm via gnucash-devel wrote:
> On 25/08/2018 07:22, David Cousens wrote:
> i thank David for his posting which i have read, I don't address all
> he said
> > Keep trying. Tthe brain dead importer does get less brain dead with
> > repeated
> > use.
> i'm not sure it does get better as implemented because 2 of the bits
> of 
> brain dead-ity are
> 1. the universe against which the importer is comparing imported tx
> is 
> going to be growing so as a strategy it is doO0MED to sluggishness
> and 
> eventually not being used unless there is some limit to the universe 
> (week / month / quarter / year / decade)
> 2. unless there is something better users are going to try and use
> it 
> and become more frustrated and stop using it.
> ====
> fairly easy to think about ways of fixing 1. like "do you want the 
> importer to really, really, really compare the imported tx against
> your 
> stuff from the 1980's ?  y/N"  at the moment this is defaulting to Y 
> without asking and I don't think that makes sense.
> I mean, think of inflation?  Why would one of anything in 2018 be 
> sensibly matched against the same thing 30 years ago?
> There isn't even the opportunity to time limit the universe and some 
> folk have stuff going back much longer than me and have many more tx 
> than me.
> fixing 2. just involves some thought about the user, almost no 
> programming.  Redundant questions for the user would be, "you are 
> importing 3 tx, you have 10K tx in your file, this could take
> fucking 
> hours, do you want to continue or just type them in by hand?  if you 
> want my advice by hand is quicker"
> See?  the importer has no idea of scale, 3 tx incoming ?  I'll do it
> by 
> hand.
> _______________________________________________
> gnucash-devel mailing list
> gnucash-devel at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel

More information about the gnucash-devel mailing list