Transaction import date range (was: Re: OFX Importer and Transaction Matcher)

Derek Atkins warlord at MIT.EDU
Mon Feb 14 09:10:39 EST 2005


Christian Stimming <stimming at tuhh.de> writes:

> From looking at the code, it was obvious that for each imported 
> transaction, there was a foreach-loop over *all* splits of the given 
> account. If the account has many transactions, e.g. it covers several 
> years, then 1. this takes a long time, and 2. the vast majority of these 
> splits won't even come close to being a matching candidate. So I thought 
>  it would be fine to set a date hard-limit, thus reducing the cycles in 
> the foreach-loop dramatically. This hard-limit is enforced by the 
> xaccQueryAddDateMatchTT() in gnc_import_find_split_matches(). (The 
> hard-limit in split_find_match() was a first try, but didn't give a good 
> enough performance increase, thus I introduced the QueryAddDateMatch.) I 
> *thought* the existing MATCH_DATE_NOT_THRESHOLD would be fine for such a 
> hard-limit, but maybe we should simply add another 
> MATCH_DATE_NOT_HARDLIMIT which is something like 4-6 weeks, and then use 
> this for the QueryAddDateMatch. Additionally, we can surely revert the 
> MATCH_DATE_NOT_THRESHOLD section in the split_find_match() to the old 
> behaviour where this is not a hardlimit but gives a -5 penalty. (Right 
> now, this check will never be used because the splits with this time 
> difference have already been excluded by the QueryAddDateMatch.)

As soon as you're running gncQueryRun() you're already iterating
through all the transactions in the QofBook.  However if the speed is
decreasing based on the number of txns to be imported then more likely
the issue is something that is per-import-txn, not per-book-txn.

Also, the current query code is pretty darn fast, even with lots of
existing transactions.  It does a lot of internal caching to speed up
the query.

> Would that seem ok to everyone? Anyway, my original goal of speeding up 
> the importer dialog didn't work out too well in any case. Obviously 
> there is a lot of time spent somewhere during this import, but I haven't 
> yet figured out how to fix it. I'd need some real profiling tools (maybe 
> valgrind's cachegrind?) to see where the problem is.

I've found cachegrind to be a useful tool.  It's certainly helped me
in the past when I was trying to optimize the query code.

> Yes, you're right -- for the HBCI import, the transactions are at most 
> 2-4 days off and rarely 7 days, but almost never more than that. We 
> don't have checks at all as well, so this use case simply doesn't show 
> up here at all. Oh, maybe internally this hardlimit could be made a 
> function argument: The OFX importer could use a large time interval 
> whereas the HBCI importer a smaller one.

FWIW, QIF has the same issue as OFX.

>> Your only solution for now is to downgrade to GnuCash 1.8.9.
>
> Well, if you've compiled from source code, then you can also edit the 
> file src/import-export/import-backend.c and change the value for 
> MATCH_DATE_NOT_THRESHOLD (line 62) from 14 to 5*7 or whatever you like. 
> This should fix your problem for now, I guess.
>
>> Actaully it's quite mature, but you've hit a recently introduced bug.
>
> Yes, sorry for that.
>
> Christian

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available


More information about the gnucash-devel mailing list