[GNC-dev] Is the import match map still required?

Christian Gruber christian.gruber at posteo.de
Wed Jun 3 16:48:12 EDT 2020


I created two enhancement issues on Bugzilla regarding this topic:

  * https://bugs.gnucash.org/show_bug.cgi?id=797778
  * https://bugs.gnucash.org/show_bug.cgi?id=797779


Am 30.05.20 um 14:37 schrieb Christian Gruber:
> David,
>
> thanks for your detailed explanations. Implementing a procedure, which 
> could be run as needed and which updates the frequency table according 
> to the current transactions for an account, seems to be a meaningful 
> first step. This could be used to measure performance next. Then it 
> could be decided, if this procedure can also run on the fly.
>
> I also thought more about the user's part of interaction with the 
> frequency table. The current situation seems to be a bit like 
> "hacking" the frequency table to achieve better matching results. You 
> can remove some entries, which seem to be wrong or seem to corrupt the 
> matching results or whatever. If the user would not change the 
> frequency table directly, but could instead set some personal 
> preferences on how the data is used, this would solve the problem, 
> that these preferences are not influenced by running the procedure 
> updating the frequency table. And by regular updates of the frequency 
> table, wrong or outdated entries are removed reliably and the data is 
> up-to-date. The user could for example exclude some tokens from the 
> bayesian algorithm, which are not relevant for him.
>
> Christian
>
>
> Am 25.05.20 um 01:13 schrieb David Cousens:
>> Christian,
>>
>> I haven't experimented to know whether constructing the frequency 
>> table on
>> the fly creates a performance bottleneck or not but am guessing the 
>> original
>> developer thought it might. It would require a detailed look at the code
>> involved but my suspicion would be that the performance penalty is 
>> likely to
>> be significant.
>>
>> My comment about bloat is that at present data is only maintained for
>> accounts you specifically import data into and if that data is 
>> stored. If it
>> isn't then bloat doesn't apply obviously. Any sort of generalized 
>> procedure
>> could allow selection of accounts for which Bayesian matching is 
>> required,
>> i.e. those for which importing is used to input data. My initial 
>> thought was
>> that you would run it for all accounts but it is really only 
>> necessary for
>> the specific subset of accounts into which you import data. It would 
>> then
>> require the ability to run the procedure on an account if it occurred in
>> import data but didn't have existing account matching data. If it is 
>> on the
>> fly then no problem it can run whenever a new account being imported 
>> into
>> appears in the imported data. The most common use case is probably 
>> importing
>> data to one specific account but GnuCash is also able to specify the 
>> account
>> being imported into in the import data itself.  I haven't looked at 
>> how the
>> frequency table is currently stored in memory but I am guessing it is
>> constructed in memory when the data file is read in.
>>
>> The up-to-date aspect is one advantage and if the current procedure  is
>> changed to improve performance then that is not hampered by the 
>> presence of
>> historical data which would be updated automatically when the 
>> procedure is
>> run. If the table is stored as it is at present and a procedure was
>> available to trawl the current transactions for an account then it 
>> can be
>> kept up to date by running that procedure periodically. the use of 
>> data from
>> manually entered transactions would then be incorporated whether on 
>> the fly
>> or just run as required.
>>
>> Having a standalone procedure to trawl an existing file to update the 
>> stored
>> data for an account  would allow exploration of whether this is 
>> likely to be
>> a significant performance hit if it was run on the fly so that could 
>> perhaps
>> be a first step.  The core part of the code to store the data has to 
>> exist
>> in the matcher code already and it will be a case of wrapping this in 
>> a loop
>> through the transactions existing in an account and setting up the gui
>> interface to select accounts to run on.
>>
>> The problem with pruning the data is that GnuCash has no way of knowing
>> apriori which tokens are most relevant. I would think that date 
>> information
>> is not really relevant and amount/value information does little in most
>> cases to identify a transfer account.
>>
>> The main difficulty I have  with transfer account assignment is that 
>> some
>> regular transactions use a unique code in the description each time they
>> occur with no separate unique identifier of the transaction source. 
>> My wife
>> and I both have separte gym membership subscriptions and the transaction
>> descriptions neither identify the gym or for which of us  the 
>> transaction
>> applies. Options are to persuade the source to include specific data 
>> or only
>> use a single account to record both but I like to track both our 
>> individual
>> and joint expenses
>>
>> Some regular transactions also get matched to previous payments in the
>> transaction matching within the date range window where the amounts and
>> descriptions are usually identical. The current 42 day window 
>> captures both
>> fortnightly and monthly regular income transactions for example.  
>> This only
>> affects a few transactions each month and I don't have huge numbers of
>> transactions to process now that I have retired but that may not be 
>> the case
>> for other users. Maybe making the date range window adjustable rather 
>> than
>> fixed might be a cure for this. Setting it at <14 days would cure the
>> problems I have for example, but that again would not work for 
>> everybody.
>>
>> I am currently committed to a bit on the documentation front so I 
>> will be
>> unlikey to consider this for the near future in other than general 
>> terms but
>> someone else may be willing to take it up.
>>
>> David
>>
>>
>>
>> -----
>> David Cousens
>> -- 
>> Sent from: 
>> http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
>> _______________________________________________
>> gnucash-devel mailing list
>> gnucash-devel at gnucash.org
>> https://lists.gnucash.org/mailman/listinfo/gnucash-devel


More information about the gnucash-devel mailing list