[GNC] Fixing bad Bayesian data

Steve Cohen stevecoh2 at gmail.com
Tue Dec 18 19:34:11 EST 2018


OK, I figured out that .gnucash does not describe the file format which 
is either compressed or non-compressed XML depending on the compression 
setting you choose.

So I switched to non-compressed and look at the bayesian elements and 
it's not what I would have expected.  The expressions that are mapped 
from are not phrases but "words."  Something like

     <slot>
 
<slot:key>import-map-bayes/INDEPENDENCE/f5cb4b5b31decc01c394dd7170078254</slot:key>
       <slot:value type="integer">1</slot:value>
     </slot>
     <slot>
 
<slot:key>import-map-bayes/INDIA/af48360c2fb9b039b4707ad7d7517950</slot:key>
       <slot:value type="integer">1</slot:value>
     </slot>
     <slot>
 
<slot:key>import-map-bayes/INGTON/94ec6c9aae683c9125fb0dd2b1bb8846</slot:key>
       <slot:value type="integer">1</slot:value>
     </slot>
     <slot>
 
<slot:key>import-map-bayes/INN/c6447afebc9564fded7d1bafbe1e026e</slot:key>
       <slot:value type="integer">1</slot:value>
     </slot>
     <slot>
 
<slot:key>import-map-bayes/INTEREST/b572baae5a56a30ce384ab58ff12ed7d</slot:key>
       <slot:value type="integer">1</slot:value>
     </slot>
     <slot>
 
<slot:key>import-map-bayes/INTUIT/9c204d33baf137f4f0b078f9b61531d1</slot:key>
       <slot:value type="integer">1</slot:value>
     </slot>
     <slot>
 
<slot:key>import-map-bayes/INVESTM/5920c9dbe1d24308893a5eeb32d01e09</slot:key>
       <slot:value type="integer">3</slot:value>
     </slot>
     <slot>
 
<slot:key>import-map-bayes/IS/c6447afebc9564fded7d1bafbe1e026e</slot:key>
       <slot:value type="integer">4</slot:value>
     </slot>

So I am trying to understand how these are applied.  I get that the long 
hex numbers are GUIDs representing accounts and that the expressions 
before this are bits of the transaction description.  But what if the 
transaction description is multiple words, each mapping to a different 
account?  Obviously "INVESTM" and "IS" are going to be pulled in many 
different directions.  How does "INGTON" get in there?  Why isn't it 
"WAASHINGTON"? So I'm trying to understand how this works at all.

I know that it does, but I can't imagine how.

The long hex numbers are GUIDs corresponding to accounts.
On 12/18/18 5:59 PM, Stephen M. Butler wrote:
> On 12/18/18 3:31 PM, Steve Cohen wrote:
>> Thanks.
>>
>> Seems like none of these solutions will work if your data is stored as 
>> a .gnucash file, they only work with .xml files.
>>
>> Is there a way to convert this?
>>
>> Is the Bayesian matching applied to entries that are corrected in the 
>> account editor, or is it only applied to entries made in the importer?
>>
>> I am somewhat comfortable with the bleeding edge, but, when is the 
>> release of version 4 expected?
>>
>>
>> On 12/18/18 5:17 PM, David Cousens wrote:
>>> Steve
>>>
>>> These may help.
>>> https://wiki.gnucash.org/wiki/Bayes
>>> https://lists.gnucash.org/pipermail/gnucash-user/2016-July/066299.html
>>> http://gnucash.1415818.n4.nabble.com/Fixing-confused-bayesian-matching-data-td4685819.html 
>>>
>>> http://blog.jdlh.com/en/2016/07/29/resetting-gnucashs-import-transaction-matching/ 
>>>
>>>
>>> Make a backup of your data file and only work on a copy until you are 
>>> sure
>>> it is working after changing it if you attempt any of the solutions
>>> mentioned in the above posts.
>>>
>>> The importer stores the map data and probabilities during the final 
>>> step of
>>> the import process. If you let transactions go through to Imbalance 
>>> then it
>>> obviously gets no data to work with. If you assign all transactions to a
>>> specific transfer account before import and continue to do that, it will
>>> eventually correct itself. There are a few situations in which the 
>>> bayesian
>>> matcher does not work. I find where there is a transaction unique number
>>> which changes with each periodic transaction the matcher seems to run 
>>> into
>>> problems. An number identifying the payer/payee and not the transaction
>>> itself is OK. Some of mine have both.
>>>
>>> There will be a feature to be added in GnuCash V4 which allows multiple
>>> selection of transactions and assignment of a single transfer account 
>>> in the
>>> import matcher which speeds up the transaction matching process
>>> significantly. It can be incorporated in V3.x as a patch if you build
>>> GnuCash from source, but the risk is that future bug fixes in the 
>>> importer
>>> which change the two affected files could result in a non-working 
>>> GnuCash.
>>> It incorporated in the master barnch of the GitHub repository and can be
>>> built from that if you are comfortable working with the bleeding edge.
>>>
>>> David Cousens
>>>
>>>
>>>
>>>
>>> -----
>>> David Cousens
>>> -- 
> 
> 
> Steve,
> 
> In GnC, click on the Tools menu and then on the Import Map Editor.  Once 
> on the new screen you can see all the mappings that have been generated.
> 
> In my case, I did some restructuring of my accounts and found that the 
> existing mappings no longer worked.  I highlighted the top levels and 
> clicked on the DELETE key.  That reset everything for me and I'm in the 
> process of building the new set of mappings.
> 
> The high level is based on the imports you do.  I had three: Checking 
> account, Credit Card, and Savings account.  The last one is used so 
> little that it isn't worth the hassle of downloading the 1-2 entries 
> each month so I now enter them by hand.  That will leave me with just 
> two imports -- which I plan to do multiple times each month to keep the 
> number of transactions low.
> 
> Anyway, if you decide to clear everything out, the above is a nice and 
> easy way to do that.
> 
> --Steve
> 



More information about the gnucash-user mailing list