OFX Bayesian import not working for me
John Ralls
jralls at ceridwen.us
Fri May 5 12:05:06 EDT 2017
> On May 5, 2017, at 8:26 AM, Eliot Rosenbloom <eliot at ejr.me> wrote:
>
> John -
>
> I don't know if I ever thanked you: I wanted to see if the Bayes algorithms would work for several months. It's been working great!! Thanks for all your research and help. It's very much appreciated!
Eliot,
You're welcome, but please do remember to copy the list on all replies.
Regards,
John Ralls
>> Eliot Rosenbloom <mailto:eliot at ejr.me> December 1, 2015 at 11:35 PM
>> For the benefit of other readers: I am presuming my first paragraph is correct, and your "It seems not," refers to non-examination of the Memo field for matching and/or the non-existence of "an easier way" to consider the detailed info in the Memo field.
>>
>> (I'm not sure what to make of the comment you cite, but things do seem to be working.) - Eliot
>>
>>
>> John Ralls <mailto:jralls at ceridwen.us> December 1, 2015 at 10:00 PM
>>
>>
>> Eliot,
>>
>> It seems not. The following comment may or may not explain:
>> /* Disable matching by memo, until bayesian filtering is implemented.
>> * It's currently unlikely to help, and has adverse effects,
>> * causing false positives, since very often the type of the
>> * transaction is stored there.
>>
>> Unfortunately the author of that comment didn’t explain what he meant by bayesian filtering anywhere.
>>
>> Regards,
>> John Ralls
>>
>> Eliot Rosenbloom <mailto:eliot at ejr.me> December 1, 2015 at 7:35 PM
>> Ok, I think I understand now: If GC identifies a single key-word ("token") in the transaction's description, then it assigns the Amount to the account with the highest "score." If the transaction has more than one token, then GC totals the scores for each account (across all the relevant tokens) and, again, assigns the Amount to the account with the highest (total) score.
>>
>> I did choose to delete the whole import-match-bayes slot for each relevant account, and it seems to be working fine! I'm VERY appreciative!
>>
>> Does anyone know if GC looks for tokens only in the Description / <NAME> field, or does it also examine the Memo field? My credit union often fills the Description field with generic info such as "ACH Withdrawal" and puts the more specific, helpful info into the Memo field. :-(
>>
>> I found that doing a global change in the .ofx import file CHANGING, for example:
>> "<NAME>ACH Withdrawal<MEMO>"
>> TO: "<XXX>dummy<NAME>ACHW: "
>> moved the useful information into the Description field (pre-pended with "ACHW: ").
>>
>> Making 2-3 similar global changes on each month's .ofx file is not prohibitively time consuming, but if there were an easier way, I'd be happy to hear it.
>>
>> Again, many thanks, John!
>>
>> Eliot
>>
>> John Ralls <mailto:jralls at ceridwen.us> November 28, 2015 at 2:38 PM
>>
>>
>> Eliot,
>>
>> Please remember to copy the list on all replies. “Reply all” works well.
>>
>> The .gnucash file without a date. It’s compressed with gzip, and you can uncompress it on the command line with gunzip or you can unselect “Compress Files” in Preferences>General and the next save will be uncompressed.
>>
>> I summed across all three in the first MEDICARE example because I deleted two of them with the unstated assumption that only one was correct. I explained that that was just an example and that a real case would be more complicated, which I thought that I’d clarified later by explaining the way Bayesian matching tokenizes descriptions, scores the token - account pair, and then sums the scores across the tokens to select the matching account.
>>
>> So to “make sense” of a set of token scores you need to run that process yourself for the tokens you intend to change: From a set of import files find the descriptions containing each token you’re contemplating changing, find the other tokens in those descriptions, look at the token-account scores for each and work out what account the matcher will select in each case. If it appears that the matcher will do the right thing, remove only the “:” delimited account tokens; you probably don’t need to change the scores of the remaining ones, because the token-account scores for “:” delimited accounts were all created together and you’ve decided that the other scores provide the right answer. Deleting the “:” scores is still helpful because the matcher won’t have to look at those scores any more and that will speed it up. If the matcher is guessing wrong then by working out the match process by hand you’ll understand why and can remove or adjust token-account scores as necessary. If all of that seems like too much work you can just delete the whole import-match-bayes slot and start over generating new matches.
>>
>> If I understand your question about posting, just “reply all”. The list is in the CC field of this message and “reply all” will ensure that it’s in your reply as well.
>>
>> Regards,
>> John Ralls
>>
>>
>> Eliot Rosenbloom <mailto:eliot at ejr.me> November 28, 2015 at 1:03 PM
>> Thanks John!
>>
>> 1. I hate to be dumb, but where do I find the file with <slot> tags? I see only 3 types of files: .log (with minimal info in them); and .gnucash with and without a date (both are "garbage" when opened with TextEditor).
>>
>> 2. "look at slots for the other tokens to see that they all make sense." -- I'm a bit fuzzy on the relation (below) between the two entries for "Medicare" [I would delete the one with":" , I assume] AND the one for "Health Insurance," and why you summed across all 3? And can you say a bit more about what "making sense" means? What should I be looking to find or avoid?
>>
>> 3. To post this (obviously w/o my personal files I sent you), do I remember that there is an email address I can forward it to ... probably proceeded by a brief statement of the problem?
>>
>> Thanks!
>>
>> Eliot
>> Mac OS 10.11
>> (And BTW: I downloaded GC v. 2.6.9, but in Get Info and the Finder regular column listing, it shows up as 2.6.7. The .dmg was named Intel 2.6.9-1)
>>
>
>
More information about the gnucash-user
mailing list