OFX Bayesian import not working for me

Eliot Rosenbloom eliot at ejr.me
Tue Dec 1 20:35:59 EST 2015


Ok, I think I understand now:  If GC identifies a single key-word 
("token") in the transaction's description, then it assigns the Amount 
to the account with the highest "score."  If the transaction has more 
than one token, then GC totals the scores for each account (across all 
the relevant tokens) and, again, assigns the Amount to the account with 
the highest (total) score.

I did choose to delete the whole import-match-bayes slot for each 
relevant account, and it seems to be working fine!  I'm VERY appreciative!

Does anyone know if GC looks for tokens only in the Description / <NAME> 
field, or does it also examine the Memo field?  My credit union often 
fills the Description field with generic info such as "ACH Withdrawal" 
and puts the more specific, helpful info into the Memo field.  :-(

I found that doing a global change in the .ofx import file CHANGING, for 
example:
           "<NAME>ACH Withdrawal<MEMO>"
TO:    "<XXX>dummy<NAME>ACHW:  "
moved the useful information into the Description field (pre-pended with 
"ACHW:  ").

Making 2-3 similar global changes on each month's .ofx file is not 
prohibitively time consuming, but if there were an easier way, I'd be 
happy to hear it.

Again, many thanks, John!

Eliot
> John Ralls <mailto:jralls at ceridwen.us>
> November 28, 2015 at 2:38 PM
>
>
> Eliot,
>
> Please remember to copy the list on all replies. “Reply all” works well.
>
> The .gnucash file without a date. It’s compressed with gzip, and you 
> can uncompress it on the command line with gunzip or you can unselect 
> “Compress Files” in Preferences>General and the next save will be 
> uncompressed.
>
> I summed across all three in the first MEDICARE example because I 
> deleted two of them with the unstated assumption that only one was 
> correct. I explained that that was just an example and that a real 
> case would be more complicated, which I thought that I’d clarified 
> later by explaining the way Bayesian matching tokenizes descriptions, 
> scores the token - account pair, and then sums the scores across the 
> tokens to select the matching account.
>
> So to “make sense” of a set of token scores you need to run that 
> process yourself for the tokens you intend to change: From a set of 
> import files find the descriptions containing each token you’re 
> contemplating changing, find the other tokens in those descriptions, 
> look at the token-account scores for each and work out what account 
> the matcher will select in each case. If it appears that the matcher 
> will do the right thing, remove only the “:” delimited account tokens; 
> you probably don’t need to change the scores of the remaining ones, 
> because the token-account scores for “:” delimited accounts were all 
> created together and you’ve decided that the other scores provide the 
> right answer. Deleting the “:” scores is still helpful because the 
> matcher won’t have to look at those scores any more and that will 
> speed it up. If the matcher is guessing wrong then by working out the 
> match process by hand you’ll understand why and can remove or adjust 
> token-account scores as necessary. If all of that seems like too much 
> work you can just delete the whole import-match-bayes slot and start 
> over generating new matches.
>
> If I understand your question about posting, just “reply all”. The 
> list is in the CC field of this message and “reply all” will ensure 
> that it’s in your reply as well.
>
> Regards,
> John Ralls
>
>
> Eliot Rosenbloom <mailto:eliot at ejr.me>
> November 28, 2015 at 1:03 PM
> Thanks John!
>
> 1.  I hate to be dumb, but where do I find the file with <slot> tags?  
> I see only 3 types of files:  .log (with minimal info in them);  and 
> .gnucash with and without a date (both are "garbage" when opened with 
> TextEditor).
>
> 2.  "look at slots for the other tokens to see that they all make 
> sense." -- I'm a bit fuzzy on the relation (below) between the two 
> entries for "Medicare" [I would delete the one with":" , I assume]  
> AND the one for "Health Insurance," and why you summed across all 3?  
> And can you say a bit more about what "making sense" means?  What 
> should I be looking to find or avoid?
>
> 3.  To post this (obviously w/o my personal files I sent you), do I 
> remember that there is an email address I can forward it to ... 
> probably proceeded by a brief statement of the problem?
>
> Thanks!
>
> Eliot
> Mac  OS 10.11
> (And BTW:  I downloaded GC v. 2.6.9, but in Get Info and the Finder 
> regular column listing, it shows up as 2.6.7.  The .dmg was named 
> Intel 2.6.9-1)
>
> John Ralls <mailto:jralls at ceridwen.us>
> November 28, 2015 at 10:03 AM
>
>
> Eliot,
>
> This should go on the list, it might be useful for others. Also, 
> Robert Fewell has a pull request in on the development branch to add a 
> token viewer/editor and this case might be helpful to him.
>
> You can edit the data file, though it can get complicated as I’ll 
> explain later. I suggest making a copy first. TextEdit will do fine 
> for editing, just be sure to save as plain text. Delete whole xml 
> elements, so in the example I used earlier,
>
> <slot>
> <slot:key>MEDICARE</slot:key>
> <slot:value type="frame">
> <slot>
> <slot:key>Expenses,Medical Expenses,Health Insurance</slot:key>
> <slot:value type="integer">2</slot:value>
> </slot>
> <slot>
> <slot:key>Expenses,Medical Expenses,Medicare</slot:key>
> <slot:value type="integer">8</slot:value>
> </slot>
> <slot>
> <slot:key>Expenses:Medical Expenses:Medicare</slot:key>
> <slot:value type="integer">6</slot:value>
> </slot>
> </slot:value>
> </slot>
>
> you want to make sure that you delete corresponding <slot>..</slot>. 
> So if you want to make it use Expenses, Medical Expenses, Health 
> Insurance with key MEDICARE you would delete the other two slots. I 
> suggest that you also change the score of the remaining slot to the 
> sum of all:
>
> <slot>
> <slot:key>MEDICARE</slot:key>
> <slot:value type="frame">
> <slot>
> <slot:key>Expenses,Medical Expenses,Health Insurance</slot:key>
> <slot:value type="integer”>16</slot:value>
> </slot>
> </slot:value>
> </slot>
>
> Before you dive in it’s important to understand that the Bayesian 
> matcher tokenizes the description string on spaces and assigns scores 
> to each token. If the description in the import is “CMS MEDICARE” 
> (invented for illustration, I didn’t look at the OFX file) then there 
> will be another slot with key CMS and sub-slots with various accounts 
> and scores. Here’s where you need to be careful: You might have other 
> transactions with the string “CMS” in the description and when 
> combined with some word other than “MEDICARE” will go to a different 
> account. The sum of the scores for each token is what actually 
> determines which account the matcher selects, which is why I suggested 
> keeping the total MEDICARE score the same.  Note that the same applies 
> to the MEDICARE key: If you have different transactions with MEDICARE 
> in the description, some of which should go to Expenses,Medical 
> Expenses,Medicare and others to Expenses, Medical Expenses,Health 
> Insurance then you should delete only the slot whose key has ‘:’ 
> separators and look at slots for the other tokens to see that they all 
> make sense. Depending on how much token duplication there is and how 
> many tokens each description has the combinations can get rather 
> daunting, so you may elect to just delete the value tokens with ‘:’ 
> separated keys.
>
> Regards,
> John Ralls
> Eliot Rosenbloom <mailto:eliot at ejr.me>
> November 28, 2015 at 6:40 AM
> Hi John,
>
> I have finally found the time and "courage" to face the backlog of 
> accounting.
>
> I would much prefer to use "," as my separator, rather than ";" .  
> (It's a lot more convenient to type a "," when entering accounts.)
>
> I examined my Preferences:  It shows "," as my separator.
>
> I just imported Visa transactions from 8/2014 and manually assigned 
> the funds to accounts. I then imported Visa transactions from 9/2014.  
> Many of the accounts again had to be manually assigned.  This seems to 
> be a repeat of what I had tried many times in the past.
>
> As I mentioned, I never intentionally tried to change the separator 
> from , to :  -- but perhaps it happened inadvertently.
>
> If there are duplicate accounts, is there a way I can delete the ones 
> using ":" ?  (Without losing my data!)  Will that allow Bayesian 
> auto-assignment to work?  Otherwise, is my only option to resign 
> myself to using ":" and keep assigning until it gets a "higher score"?
>
> I've attached pics of the 8/2014 and 9/2014 transactions about to be 
> imported (prior to manual assignment) in case that is helpful.
>
> Let me know if I can provide you with more information.  I'd really 
> hope we can get this cleared up.
>
> Many thanks,
>
> Eliot
>
>
>
>
> Eliot Rosenbloom <mailto:eliot at ejr.me>
> July 14, 2015 at 8:19 PM
> John,  Thanks so much for responding.  I was about to write again to 
> see what had happened.
>
> I never intentionally tried to change the separater from , to :  -- 
> but perhaps it happened inadvertently.
>
>



More information about the gnucash-user mailing list