[GNC-dev] Is the import match map still required?

Christian Gruber christian.gruber at posteo.de
Sun May 24 17:26:14 EDT 2020

Am 24.05.20 um 01:52 schrieb David Cousens:
> Christian,
> I guess it depends on whether there is a performance advantage in using the
> previously stored data for the transfer account associations over
> constructing the frequency table on the fly. The search for matching
> transactions only takes place within a narrow time window around the date of
> import, so it is unlikely to canvas enough transactions to be able to
> construct a valid frequency table from tokenized data within that window.
> The stored frequency table would generally contain data from a much wider
> range of transactions and would take much longer to construct on the fly
> each time it was needed.
I'm only thinking about account matching (bayesian matching), not 
transaction matching. For this of course it would be necessary to work 
with all historical data, not only with a few transactions within a 
narrow time window. Can you tell, if it would be a considerable 
performance load to construct the frequency table on the fly from all 
historical transactions related to a transfer account?
> I have also pondered whether it could be usefully augmented by using data
> from transactions entered manually which have not been imported for the file
> associations.  Could be of value where you have a good set of historical
> records but it would only need a one off run through the existing
> transactions to gather the data. Unless you confined it to running on a
> specific set of accounts to which you import data it might cause bloat of
> the data file with unnecessary and unused information.

A possible advantage of constructing the frequency table on the fly 
could be, that it is always up-to-date. If the user sets the "wrong" 
other account during import for instance and corrects this after the 
import, the import match map still contains the wrong matching 
information at the moment and will also not be corrected after the import.

Also manually entered transactions would be considered, right.

A one-off manual run through all transactions to update the import match 
map could be a good alternative to constructing it on the fly. Sounds good.

Why do you think, a run through all transactions "might cause a bloat of 
the data file"? The current import match map also contains all, maybe 
unused or unnecessary data from all matched accounts. I still assume in 
this case, that the import match map is related to one transfer account 
only, which already limits the set of accounts from which the import 
match map is constructed.

> I have examined the stored data in my  data file with the import map editor
> and found that there was a lot of data stored which contributes little to
> the matching for the transfer account ( dates, connectors (a, and, the
> etc.), transaction amounts ?) which often have a fairly uniform frequency
> for all accounts which were used as transfer accounts. After a bit of
> pruning of the stored data my matching reliability seemed to improve a bit.
Ok, I see. If the import match map has to be pruned to get reliable 
results from the bayesian matching algorithm, a frequency table, which 
is constructed on the fly or is rebuilt on a one-off run, is a big 
disadvantage. If it is constructed on the fly, nothing can be pruned. 
And if it is rebuilt, all pruned data will back after the run.
> I don't know at the moment if the tokens stored for transfer account
> matching are a subset of the tokens used for transaction matching (haven't
> checked) but restricting the set of tokens used may possibly improve
> performance and reduce the amount of data stored if all tokens associated
> with a transaction are currently being stored in the frequency table which
> is what I suspect from examining my import map data.
Yes, this is the current situation, every token is stored. Do you have 
suggestions, how tokens could be automatically pruned in a meaningful way?
> David Cousens
> -----
> David Cousens
> --
> Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
> _______________________________________________
> gnucash-devel mailing list
> gnucash-devel at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel

More information about the gnucash-devel mailing list