[GNC] [GNC-dev] Questions about Import Map Data

john jralls at ceridwen.us
Tue Jul 5 12:01:16 EDT 2022



> On Jul 4, 2022, at 11:07 PM, Lincoln A Baxter <lab at lincolnbaxter.com> wrote:
> 
> Hi, 
> 
> It's been a very long time since I asked a question on this list serve,
> and a very long time since I last did a "cleanup" of my GC file (with
> perl scripts back then).  
> 
> I started keeping my accounting data in GC back in March of 2005.  Now
> I'm looking to reorganize my chart of expense accounts (again) to
> simplify things... when I do that, I'll want to remove the obsolete
> Bayes data (again).
> 
> I have a general idea of how the Bayesian import mapper works... Years
> ago I wrote some perl scripts to prune it out of the uncompressed XML
> file, so I could start it over.  But, I'm now looking at the  Tools ->
> Import Map Editor which I will (belatedly say) is a huge improvement
> over mucking about with the xml slot data -- which now looks to be
> simplified), especially if all one wants to do, is zero it out.  But,
> this has lead me to some questions:
> 
> Most of the Bayes data is space tokenized transaction description data.
> I get that part.  My understanding is that it is scores in this data
> that the matcher uses to map transactions to a balancing account.   But
> I see data, that could not of come from transaction descriptions.  Why
> are days of the week (Monday, Tuesday, etc)  in the Bayes data?  Where
> does this data come from? How does this help the transaction mapper?
> 
> It looks like the Non-Bayesian data consist of a full (non-tokenized)
> transaction descriptions... at least a few of them (nowhere near all of
> them... not even close).  Given how few I've got (and how old they
> appear to be), I don't under stand why this data is here.  It is almost
> like this was an early attempt to create transaction matching data,
> that might have existed since before the bayesian matcher "matured."
> So:
> 
> Why does gnucash have these records (and how did they get created)? 
> Are these records used in transaction mapping? (anymore)... or does
> this display exist simply for the purpose of allowing one to nuke them?


The day of the week of the transaction posted date is one of the tokens, as are the "words" of the description and of each split's memo. The logic is coded in TransactionGetTokens, https://github.com/Gnucash/gnucash/blob/e9df8d41d2fc838046bc02aed0b05bde53ca9dcd/gnucash/import-export/import-backend.c#L442 and isn't significantly changed from the original contribution in GnuCash 1.9.1, see https://github.com/Gnucash/gnucash/commit/b2ccbf62cf04adddbf3464875c6521582b353964.

You're correct that the non-Bayesian matching is just the transaction description. You might have those in your book because at some point you disabled Bayes matching in Preferences, but more likely you imported a QIF file or a CSV file before Geert re-wrote the CSV importer for GnuCash 3.0. The QIF importer doesn't support Bayesian matching so if you import a QIF file those match records are what the importer will use.

Regards,
John Ralls




More information about the gnucash-user mailing list