improving import transaction matching

Christian Stimming stimming at tuhh.de
Wed Nov 28 03:50:44 EST 2007


Quoting Tom Brown <tomgcdev5 at thecap.org>:
> I manually entered receipts for a few months purchases and then
> imported an OFX file from my credit card company. The process of
> matching existing and imported splits seems like it has lots of room
> for improvement.

Yes, absolutely. Nobody is working on these parts at the moment, so  
you're free to try out whatever comes to mind.

> As far as I can tell the matching mostly happens in
> src/import-export/import-backend.c

Yes. As you can see there, for some matching tasks a "bayesian  
matcher" is used if the corresponding preference is set, but I just  
remember this bayesian something was unexpectedly used only for some  
of the matching tasks and not all. I don't recall which ones were  
(optionally) bayesian and which ones were always rule-based.

> A very simple first improvement is to use case insensitive comparisons
> of the description and memo text fields. I'd also like to make these
> comparisons give some prob points when the words in these fields are
> reordered.

Obviously you refer to the rule-based matching. Sure, just go ahead  
and check whatever improves performance here, and submit patches.

> I can't find any obvious tests for this code. What is the normal way
> you test this kind of code?

Unfortunately I never had any easy test case available. Even the OFX  
file import isn't a good test case because the OFX files have unique  
identifiers for each transaction, so that the second import will  
(correctly) immediately match the transactions from the first import.  
In this case I always had to close gnucash, edit the data file to  
remove all "online-id" KVP values, and reload the data file into  
gnucash to have the OFX import trigger the matching again.

If you've built with HBCI support you should have the MT940 import  
available, which will trigger the matcher on each import; there is an  
MT940 test file at doc/examples/downloaded.mt940 with one single  
transaction - also not so good for testing the matching.

If you can come up with some easier test infrastructure and/or some  
unittests, feel free to implement those inside the import-export/test/  
  directory and submit them as patches.

> I've been building and playing with the trunk.
>
> README.svn suggests I speak before diving into making changes so here
> I am.

You're very welcome. We are looking forward to patches or any pieces  
of code :-)

Christian


More information about the gnucash-devel mailing list