improving import transaction matching
Christian Stimming
stimming at tuhh.de
Wed Nov 28 03:50:44 EST 2007
Quoting Tom Brown <tomgcdev5 at thecap.org>:
> I manually entered receipts for a few months purchases and then
> imported an OFX file from my credit card company. The process of
> matching existing and imported splits seems like it has lots of room
> for improvement.
Yes, absolutely. Nobody is working on these parts at the moment, so
you're free to try out whatever comes to mind.
> As far as I can tell the matching mostly happens in
> src/import-export/import-backend.c
Yes. As you can see there, for some matching tasks a "bayesian
matcher" is used if the corresponding preference is set, but I just
remember this bayesian something was unexpectedly used only for some
of the matching tasks and not all. I don't recall which ones were
(optionally) bayesian and which ones were always rule-based.
> A very simple first improvement is to use case insensitive comparisons
> of the description and memo text fields. I'd also like to make these
> comparisons give some prob points when the words in these fields are
> reordered.
Obviously you refer to the rule-based matching. Sure, just go ahead
and check whatever improves performance here, and submit patches.
> I can't find any obvious tests for this code. What is the normal way
> you test this kind of code?
Unfortunately I never had any easy test case available. Even the OFX
file import isn't a good test case because the OFX files have unique
identifiers for each transaction, so that the second import will
(correctly) immediately match the transactions from the first import.
In this case I always had to close gnucash, edit the data file to
remove all "online-id" KVP values, and reload the data file into
gnucash to have the OFX import trigger the matching again.
If you've built with HBCI support you should have the MT940 import
available, which will trigger the matcher on each import; there is an
MT940 test file at doc/examples/downloaded.mt940 with one single
transaction - also not so good for testing the matching.
If you can come up with some easier test infrastructure and/or some
unittests, feel free to implement those inside the import-export/test/
directory and submit them as patches.
> I've been building and playing with the trunk.
>
> README.svn suggests I speak before diving into making changes so here
> I am.
You're very welcome. We are looking forward to patches or any pieces
of code :-)
Christian
More information about the gnucash-devel
mailing list