[GNC-dev] Is the import match map still required?
Christian Gruber
christian.gruber at posteo.de
Wed Jun 3 16:48:12 EDT 2020
I created two enhancement issues on Bugzilla regarding this topic:
* https://bugs.gnucash.org/show_bug.cgi?id=797778
* https://bugs.gnucash.org/show_bug.cgi?id=797779
Am 30.05.20 um 14:37 schrieb Christian Gruber:
> David,
>
> thanks for your detailed explanations. Implementing a procedure, which
> could be run as needed and which updates the frequency table according
> to the current transactions for an account, seems to be a meaningful
> first step. This could be used to measure performance next. Then it
> could be decided, if this procedure can also run on the fly.
>
> I also thought more about the user's part of interaction with the
> frequency table. The current situation seems to be a bit like
> "hacking" the frequency table to achieve better matching results. You
> can remove some entries, which seem to be wrong or seem to corrupt the
> matching results or whatever. If the user would not change the
> frequency table directly, but could instead set some personal
> preferences on how the data is used, this would solve the problem,
> that these preferences are not influenced by running the procedure
> updating the frequency table. And by regular updates of the frequency
> table, wrong or outdated entries are removed reliably and the data is
> up-to-date. The user could for example exclude some tokens from the
> bayesian algorithm, which are not relevant for him.
>
> Christian
>
>
> Am 25.05.20 um 01:13 schrieb David Cousens:
>> Christian,
>>
>> I haven't experimented to know whether constructing the frequency
>> table on
>> the fly creates a performance bottleneck or not but am guessing the
>> original
>> developer thought it might. It would require a detailed look at the code
>> involved but my suspicion would be that the performance penalty is
>> likely to
>> be significant.
>>
>> My comment about bloat is that at present data is only maintained for
>> accounts you specifically import data into and if that data is
>> stored. If it
>> isn't then bloat doesn't apply obviously. Any sort of generalized
>> procedure
>> could allow selection of accounts for which Bayesian matching is
>> required,
>> i.e. those for which importing is used to input data. My initial
>> thought was
>> that you would run it for all accounts but it is really only
>> necessary for
>> the specific subset of accounts into which you import data. It would
>> then
>> require the ability to run the procedure on an account if it occurred in
>> import data but didn't have existing account matching data. If it is
>> on the
>> fly then no problem it can run whenever a new account being imported
>> into
>> appears in the imported data. The most common use case is probably
>> importing
>> data to one specific account but GnuCash is also able to specify the
>> account
>> being imported into in the import data itself. I haven't looked at
>> how the
>> frequency table is currently stored in memory but I am guessing it is
>> constructed in memory when the data file is read in.
>>
>> The up-to-date aspect is one advantage and if the current procedure is
>> changed to improve performance then that is not hampered by the
>> presence of
>> historical data which would be updated automatically when the
>> procedure is
>> run. If the table is stored as it is at present and a procedure was
>> available to trawl the current transactions for an account then it
>> can be
>> kept up to date by running that procedure periodically. the use of
>> data from
>> manually entered transactions would then be incorporated whether on
>> the fly
>> or just run as required.
>>
>> Having a standalone procedure to trawl an existing file to update the
>> stored
>> data for an account would allow exploration of whether this is
>> likely to be
>> a significant performance hit if it was run on the fly so that could
>> perhaps
>> be a first step. The core part of the code to store the data has to
>> exist
>> in the matcher code already and it will be a case of wrapping this in
>> a loop
>> through the transactions existing in an account and setting up the gui
>> interface to select accounts to run on.
>>
>> The problem with pruning the data is that GnuCash has no way of knowing
>> apriori which tokens are most relevant. I would think that date
>> information
>> is not really relevant and amount/value information does little in most
>> cases to identify a transfer account.
>>
>> The main difficulty I have with transfer account assignment is that
>> some
>> regular transactions use a unique code in the description each time they
>> occur with no separate unique identifier of the transaction source.
>> My wife
>> and I both have separte gym membership subscriptions and the transaction
>> descriptions neither identify the gym or for which of us the
>> transaction
>> applies. Options are to persuade the source to include specific data
>> or only
>> use a single account to record both but I like to track both our
>> individual
>> and joint expenses
>>
>> Some regular transactions also get matched to previous payments in the
>> transaction matching within the date range window where the amounts and
>> descriptions are usually identical. The current 42 day window
>> captures both
>> fortnightly and monthly regular income transactions for example.
>> This only
>> affects a few transactions each month and I don't have huge numbers of
>> transactions to process now that I have retired but that may not be
>> the case
>> for other users. Maybe making the date range window adjustable rather
>> than
>> fixed might be a cure for this. Setting it at <14 days would cure the
>> problems I have for example, but that again would not work for
>> everybody.
>>
>> I am currently committed to a bit on the documentation front so I
>> will be
>> unlikey to consider this for the near future in other than general
>> terms but
>> someone else may be willing to take it up.
>>
>> David
>>
>>
>>
>> -----
>> David Cousens
>> --
>> Sent from:
>> http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
>> _______________________________________________
>> gnucash-devel mailing list
>> gnucash-devel at gnucash.org
>> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
More information about the gnucash-devel
mailing list