Transaction import date range (was: Re: OFX Importer and
Transaction Matcher)
Christian Stimming
stimming at tuhh.de
Mon Feb 14 04:40:38 EST 2005
Oops,
sorry, Benoit -- I didn't think enough about these potential
implications of my change. Those were not intended, and we can surely
revert that behaviour to the previous one.
Let me explain what I tried to do: I got complaints about some bad
performance of the matcher dialog (i.e. it took very long for it to
build up when importing more than 10 transactions), so I tried to figure
out where this computational time is spent. However, I didn't actually
do any real profiling of the importer dialog, so any changes in that
context were only from rough guesses about where the cycles are wasted.
I'm sorry that I broke the behaviour for other people -- I'm sure we
will be able to improve performance just as well, while still retaining
the large date range from before.
From looking at the code, it was obvious that for each imported
transaction, there was a foreach-loop over *all* splits of the given
account. If the account has many transactions, e.g. it covers several
years, then 1. this takes a long time, and 2. the vast majority of these
splits won't even come close to being a matching candidate. So I thought
it would be fine to set a date hard-limit, thus reducing the cycles in
the foreach-loop dramatically. This hard-limit is enforced by the
xaccQueryAddDateMatchTT() in gnc_import_find_split_matches(). (The
hard-limit in split_find_match() was a first try, but didn't give a good
enough performance increase, thus I introduced the QueryAddDateMatch.) I
*thought* the existing MATCH_DATE_NOT_THRESHOLD would be fine for such a
hard-limit, but maybe we should simply add another
MATCH_DATE_NOT_HARDLIMIT which is something like 4-6 weeks, and then use
this for the QueryAddDateMatch. Additionally, we can surely revert the
MATCH_DATE_NOT_THRESHOLD section in the split_find_match() to the old
behaviour where this is not a hardlimit but gives a -5 penalty. (Right
now, this check will never be used because the splits with this time
difference have already been excluded by the QueryAddDateMatch.)
Would that seem ok to everyone? Anyway, my original goal of speeding up
the importer dialog didn't work out too well in any case. Obviously
there is a lot of time spent somewhere during this import, but I haven't
yet figured out how to fix it. I'd need some real profiling tools (maybe
valgrind's cachegrind?) to see where the problem is.
Benoit Grégoire schrieb:
> On 2004-10-08, I changed date MATCH_DATE_NOT_THRESHOLD from 3 weeks to two
> weeks, and dropped the punishement from -10 to -5 to avoid your very problem.
> Checks cashed more than two weeks later is actually a fairly common
> occurence.
> Unfortunately, on 2004-11-27 Christian changed this check to to a hard limit,
> thus removing ANY transaction more than two week apart from consideration, no
> matter how well it matches. It's easy to understand, from what I heard,
> checks are mostly extinct in Germany.
Yes, you're right -- for the HBCI import, the transactions are at most
2-4 days off and rarely 7 days, but almost never more than that. We
don't have checks at all as well, so this use case simply doesn't show
up here at all. Oh, maybe internally this hardlimit could be made a
function argument: The OFX importer could use a large time interval
whereas the HBCI importer a smaller one.
> Your only solution for now is to downgrade to GnuCash 1.8.9.
Well, if you've compiled from source code, then you can also edit the
file src/import-export/import-backend.c and change the value for
MATCH_DATE_NOT_THRESHOLD (line 62) from 14 to 5*7 or whatever you like.
This should fix your problem for now, I guess.
> Actaully it's quite mature, but you've hit a recently introduced bug.
Yes, sorry for that.
Christian
More information about the gnucash-user
mailing list