OFX Bayesian import not working for me

John Ralls jralls at ceridwen.us
Sat Nov 28 11:03:29 EST 2015


> On Nov 28, 2015, at 4:40 AM, Eliot Rosenbloom <eliot at ejr.me> wrote:
> 
> Hi John,
> 
> I have finally found the time and "courage" to face the backlog of accounting.
> 
> I would much prefer to use "," as my separator, rather than ";" .  (It's a lot more convenient to type a "," when entering accounts.)
> 
> I examined my Preferences:  It shows "," as my separator.
> 
> I just imported Visa transactions from 8/2014 and manually assigned the funds to accounts. I then imported Visa transactions from 9/2014.  Many of the accounts again had to be manually assigned.  This seems to be a repeat of what I had tried many times in the past.
> 
> As I mentioned, I never intentionally tried to change the separator from , to :  -- but perhaps it happened inadvertently.
> 
> If there are duplicate accounts, is there a way I can delete the ones using ":" ?  (Without losing my data!)  Will that allow Bayesian auto-assignment to work?  Otherwise, is my only option to resign myself to using ":" and keep assigning until it gets a "higher score"?
> 
> I've attached pics of the 8/2014 and 9/2014 transactions about to be imported (prior to manual assignment) in case that is helpful.
> 
> Let me know if I can provide you with more information.  I'd really hope we can get this cleared up.

Eliot,

This should go on the list, it might be useful for others. Also, Robert Fewell has a pull request in on the development branch to add a token viewer/editor and this case might be helpful to him.

You can edit the data file, though it can get complicated as I’ll explain later. I suggest making a copy first. TextEdit will do fine for editing, just be sure to save as plain text. Delete whole xml elements, so in the example I used earlier,

       <slot>
         <slot:key>MEDICARE</slot:key>
         <slot:value type="frame">
           <slot>
             <slot:key>Expenses,Medical Expenses,Health Insurance</slot:key>
             <slot:value type="integer">2</slot:value>
           </slot>
           <slot>
             <slot:key>Expenses,Medical Expenses,Medicare</slot:key>
             <slot:value type="integer">8</slot:value>
           </slot>
           <slot>
             <slot:key>Expenses:Medical Expenses:Medicare</slot:key>
             <slot:value type="integer">6</slot:value>
           </slot>
         </slot:value>
       </slot>

you want to make sure that you delete corresponding <slot>..</slot>. So if you want to make it use Expenses, Medical Expenses, Health Insurance with key MEDICARE you would delete the other two slots. I suggest that you also change the score of the remaining slot to the sum of all:

       <slot>
         <slot:key>MEDICARE</slot:key>
         <slot:value type="frame">
           <slot>
             <slot:key>Expenses,Medical Expenses,Health Insurance</slot:key>
             <slot:value type="integer”>16</slot:value>
           </slot>
         </slot:value>
       </slot>

Before you dive in it’s important to understand that the Bayesian matcher tokenizes the description string on spaces and assigns scores to each token. If the description in the import is “CMS MEDICARE” (invented for illustration, I didn’t look at the OFX file) then there will be another slot with key CMS and sub-slots with various accounts and scores. Here’s where you need to be careful: You might have other transactions with the string “CMS” in the description and when combined with some word other than “MEDICARE” will go to a different account. The sum of the scores for each token is what actually determines which account the matcher selects, which is why I suggested keeping the total MEDICARE score the same.  Note that the same applies to the MEDICARE key: If you have different transactions with MEDICARE in the description, some of which should go to Expenses,Medical Expenses,Medicare and others to Expenses, Medical Expenses,Health Insurance then you should delete only the slot whose key has ‘:’ separators and look at slots for the other tokens to see that they all make sense. Depending on how much token duplication there is and how many tokens each description has the combinations can get rather daunting, so you may elect to just delete the value tokens with ‘:’ separated keys.

Regards,
John Ralls


More information about the gnucash-user mailing list