Transaction import date range (was: Re: OFX Importer and Transaction Matcher)

Christian Stimming stimming at tuhh.de
Mon Feb 14 04:40:38 EST 2005


Oops,

sorry, Benoit -- I didn't think enough about these potential 
implications of my change. Those were not intended, and we can surely 
revert that behaviour to the previous one.

Let me explain what I tried to do: I got complaints about some bad 
performance of the matcher dialog (i.e. it took very long for it to 
build up when importing more than 10 transactions), so I tried to figure 
out where this computational time is spent. However, I didn't actually 
do any real profiling of the importer dialog, so any changes in that 
context were only from rough guesses about where the cycles are wasted. 
I'm sorry that I broke the behaviour for other people -- I'm sure we 
will be able to improve performance just as well, while still retaining 
the large date range from before.

 From looking at the code, it was obvious that for each imported 
transaction, there was a foreach-loop over *all* splits of the given 
account. If the account has many transactions, e.g. it covers several 
years, then 1. this takes a long time, and 2. the vast majority of these 
splits won't even come close to being a matching candidate. So I thought 
  it would be fine to set a date hard-limit, thus reducing the cycles in 
the foreach-loop dramatically. This hard-limit is enforced by the 
xaccQueryAddDateMatchTT() in gnc_import_find_split_matches(). (The 
hard-limit in split_find_match() was a first try, but didn't give a good 
enough performance increase, thus I introduced the QueryAddDateMatch.) I 
*thought* the existing MATCH_DATE_NOT_THRESHOLD would be fine for such a 
hard-limit, but maybe we should simply add another 
MATCH_DATE_NOT_HARDLIMIT which is something like 4-6 weeks, and then use 
this for the QueryAddDateMatch. Additionally, we can surely revert the 
MATCH_DATE_NOT_THRESHOLD section in the split_find_match() to the old 
behaviour where this is not a hardlimit but gives a -5 penalty. (Right 
now, this check will never be used because the splits with this time 
difference have already been excluded by the QueryAddDateMatch.)

Would that seem ok to everyone? Anyway, my original goal of speeding up 
the importer dialog didn't work out too well in any case. Obviously 
there is a lot of time spent somewhere during this import, but I haven't 
yet figured out how to fix it. I'd need some real profiling tools (maybe 
valgrind's cachegrind?) to see where the problem is.

Benoit Grégoire schrieb:
> On 2004-10-08, I changed date MATCH_DATE_NOT_THRESHOLD from 3 weeks to two 
> weeks, and dropped the punishement from -10 to -5 to avoid your very problem.  
> Checks cashed more than two weeks later is actually a fairly common 
> occurence.  
> Unfortunately, on 2004-11-27  Christian changed this check to to a hard limit, 
> thus removing ANY transaction more than two week apart from consideration, no 
> matter how well it matches.  It's easy to understand, from what I heard, 
> checks are mostly extinct in Germany.  

Yes, you're right -- for the HBCI import, the transactions are at most 
2-4 days off and rarely 7 days, but almost never more than that. We 
don't have checks at all as well, so this use case simply doesn't show 
up here at all. Oh, maybe internally this hardlimit could be made a 
function argument: The OFX importer could use a large time interval 
whereas the HBCI importer a smaller one.

> Your only solution for now is to downgrade to GnuCash 1.8.9.

Well, if you've compiled from source code, then you can also edit the 
file src/import-export/import-backend.c and change the value for 
MATCH_DATE_NOT_THRESHOLD (line 62) from 14 to 5*7 or whatever you like. 
This should fix your problem for now, I guess.

> Actaully it's quite mature, but you've hit a recently introduced bug.

Yes, sorry for that.

Christian



More information about the gnucash-user mailing list