[GNC] Why do Imported Transactions NEED to be Matched?

Sat Apr 25 02:39:08 EDT 2020

I have a few thousand transactions in a bank statement csv file tagged with
a transfer account. Why can't I skip the import matching process?

Firstly, assume this test data from the Concepts Guide checking account
with the addition of a typical bank statement description and transfer
account. It is loaded into gcashdata_3emptyAccts, created in the guide's
exercises.

Date,Number,Memo,Amount,Balance,Account,Entity
05/03/06,101,GG25j1546 Groceries    wtf 15:57
061124,-45.21,413.05,Expenses:Groceries,Big Food
06/03/06,,Transfer to J&J Doe Savings Acc 5765-8397
589654259587,100,513.05,Savings,[Savings]
14/03/06,,Direct Credit Salary from Employers R
Us,670,1183.05,Income:Salary,Employers R Us
28/03/06,,Mmvoin515b  Internet Company
bg??,-20,1163.05,Expenses:Internet,FastFibre
28/03/06,102,Light Company Big City Branch
9g8k863,-78,185.05,Expenses:Electricity,Light Company
28/03/06,103,Phone Company Name    Autodebit
595642583,-45,140.05,Expenses:Phone,Phone Company Name
28/03/06,104,April Rent        5 Short
Road,-350,690.05,Expenses:Rent,HighTower

Adding the following record to the test data might help explain this issue:
28/04/06,104,May Rent        5 Short
Road,-350,340.05,Expenses:Rent,HighTower

1. So both accounts are given, the Checking Account and the Transfer
Account. It is ok to present the account matching screen but the user
should be able to just select the Next option without making any changes
(to just accept the import). If there are new accounts I'd expect them to
be created as part of the import process unless they are invalid.

2. I understand the issue with duplicate transactions that need to be
avoided. Importing the checking account with a transfer to a credit card
account being processed as a transaction in both accounts, then importing
the credit card account with a transfer from the checking account being
processed as a transaction in both accounts. The result is two transactions
in each account which actually represent a single transaction in each
account.

For a large import where there are no transactions in any other accounts
then this can't happen and the user should be able to go to the Next step.
If there were other bank accounts with data in them but it was prior to the
period being imported then the same should apply.

There are still reconciliation steps after an import. If there are existing
records in another bank account representing the same transactions being
imported the user should have the option to go to the Next step, have the
program flag the duplicates and allow the user to consolidate them into a
single record.

Let's assume this scenario. The checking account has transfers with a cash
management account and a credit card. If all the cash management transfers
were to/from the checking account you wouldn't load the cash management
statement electronically and just enter monthly interest payments manually.
If there were a few checking account transfers to/from the credit card each
month it would be preferable to just load the statement and fix the
duplicates of the same transaction at reconciliation.

This issue has been raised and responded to a couple of times already but I
don't consider the responses explain the NEED for matching to occur in THIS
scenario.

-------------------------------------------------

https://github.com/Gnucash/gnucash-docs/pull/132#issuecomment-619119386

>> If data is tagged with an account why does the account need to be
matched with identical GnuCash accounts [exported from GnuCash]?
>
> Sorry, do you mean if the CSV already has the "other" account why does it
need to run the matcher? I don't think that it does.
>
>> Why is transaction matching required?
>
> Because many, perhaps even most, imports don't have the "other" account
identified, just a description. That's the case for your bank account
example. The matcher, once trained, provides automatic assignment of the
"other" account based on the description.
>
>> Furthermore, if one transaction is matched in a list. Why isn't the
transaction list updated to match other identical transactions matched?
>
> Because matching a transaction list takes significant time and re-running
it every time a user matched a transaction would be annoyingly slow. So it
works the other way around as of IIRC 3.7 or so but that's not yet
documented: You can select several transactions at once in the matcher and
right-click for a context menu with the single entry for picking a matching
account for all of the selected transactions. Once the matcher is trained
it will match all of the transactions.

https://lists.gnucash.org/pipermail/gnucash-user/2020-April/090627.html

>> I have a bigger issue. I have a few thousand transactions in a bank
statement csv. How do I get through the transaction matching process?
>
> The transaction matching has two components. The first is the avoidance
of duplicate transactions. The second is the assignment second account for
the transaction which is not normally specified explicitly in a bank
statement record. The bayesian approach in GnuCash works well on this
second problem. However you need to understand how it works to optimize it.
One of the reasons I started working on the documentation of the importing
was that the documentation was pretty poor and I didn't understand what was
happening myself. You cannot import your file with a thousand transactions
in one hit and expect GnuCash to correctly assign accounts. The account
assignment is done by tokenizing information in the description date and
amount fields of the transaction and constructing a table of the frequency
of ocurrence of the tokens a particular account that has been assigned as
the second account. When a transaction is imported its tokenized
information is comapred with the frequenct table and a score of the matches
of the tokens with each possible account assignment is calculated and the
one with the account highest score is selected and preented as the assigned
account. You can manually override that automatic assignment in the matcher
window. When all of the transactions displayed have had the correct
accounts assigned to them and you click the OK button on the matcher
window, the token data is updated into the frequency table in the data file
at that point only. If you have never imported data, that table is totally
empty. Note: The frequency table contains no information derived from
transactions which may have already been recorded manually in GnuCash
without using the import matcher.
> The best strategy is to initially import data in small batches at first
making sure that you manually assign the correct accountsin each case
before hitting OK to actually import the data. It is only after OK is
clicked that the frequency table in the data file is updated. If you import
data with incorrect account assignments or leave transactions which are
assigned to the Imbalance accounts in the import, you are training the
system to assign the wrong accounts. You should notice that after a
successful few imports that GnuCash's guesses at the account should improve
and most accounts will generally be assigned to the accounts you want. At
this point you can start increasing the size of the batches of data you
import. Splitting a csv file up is fairly easy in a text editor. If you
have not been completing the imports a s described or have been correcting
the account assignments in GnuCash after importing your data file is going
to contain frequency table information which will misdirect the account
assignment. Tools->Import->Map Editor allows editing of the stored tokens.
Any associations with Imbalance accounts should be deleted. This is a
relatively new feature and is on my list of future documentation projects .
Use with caution. I improved the matching performance considerably by
editing out data for files which were being assigned incorrectly fairly
frequently. The matcher is never going to work perfectly unless the
imported data explicitly specifies the second account for the transaction.
In this case Gnucash also constructs a map of accounts specified in a
Transfer account field to specific accounts in the GnuCash internal account
heirarchy.