Bayesian matching- Imbalance

David T. sunfish62 at yahoo.com
Thu Jul 16 11:11:49 EDT 2015


I will mention that some time ago, a user wrote in about pruning the Bayesian matching portion of their data file. You might search the archives and see whether the code for that was made available or not…

On Jul 16, 2015, at 10:20 AM, John Ralls <jralls at ceridwen.us> wrote:

> 
>> On Jul 16, 2015, at 5:34 AM, Lincoln A Baxter <lab at lincolnbaxter.com> wrote:
>> 
>> Off list intentionally...
>> 
>>> On Wed, 2015-07-15 at 20:39 -0700, John Ralls wrote:
>>>>> Why could we not just remove the files or replace them with fresh
>>>>> (empty/untainted) copies?  This and the additional complications 
>>>> you
>>>>> name below makes the Bayesian matching way less useful than it 
>>>> could
>>>>> otherwise be, even if we could just wipe it out... 
>>>>> 
>>>> 
>>>>> This is annoying enough to me, that it might be worth my pulling 
>>>> down
>>>>> the sources and at least fixing the account name problem, and 
>>>> submitted
>>>>> a patch.
>>>>> 
>>>>> For now, Where are the file(s)?  
>>>> 
>>>> There are no files. The match data are in the GnuCash data file as 
>>>> slots in each account into which you have imported, with the keys 
>>>> “import-map” for the string matcher and “import-map-bayes” for the 
>>>> Bayesian mapper. If you’re careful you could delete those slots 
>>>> from your file with an editor, ideally one that knows about keeping 
>>>> XML tags in balance. Make a backup first!
>>>> You’ll have to re-train the matcher from scratch.
>>>> 
>>>> We can’t just change from names to GUIDs, that would introduce a 
>>>> file incompatibility. So:
>>>> * maint needs a patch to detect which is being used in the file, 
>>>> and to use it.
>>>> * master needs to just use GUID, but also
>>>> ** a “scrub” function to convert all the existing scores to GUIDs
>>>> ** a Feature setting (which maint can use to decide which to use) 
>>>> to prevent the file from being loaded by older versions that don’t 
>>>> know about the GUID switch.
>>>> 
>>>> I intend to do that at some point, but it’s not high enough a 
>>>> priority that I can promise it for 2.8. If you want to have a go at 
>>>> it by all means have a look at the code and then we can talk about > > it in gnucash-dev or irc.
>>>> 
>>>> Regards,
>>>> John Ralls
>>>> 
>> 
>> Thank you John!  A most helpful response!
>> 
>> I has afraid the answer would be something like this.  (data imbedded
>> in the data file -- it clearly has the advantage of portability of the
>> data file -- and the disadvantage of making it harder to separately
>> manipulate).  I completely see the backwards compatibility problem... 
>> 
>> I might take a look at writing a script to remove the keys from the
>> file... at least as a separate "utility..." as a first step to
>> scratching my personal itch.  
>> 
>> I still work full time (at least for the next several years) so I'm not
>> sure when I'll find the time to do the above development, but thank you
>> for being so clear about what needs to be done, I agree completely that
>> at least the above list would have to be implemented...  
>> 
>> In addition to learning the code, I'd have the additional learning
>> curve of having to learn scheme (lisp) I suspect. I have professionally
>> used probably, at least 20 different programming languages over the
>> years, and am fully competent with OOO, lisp is not a language I have
>> worked in, It might be that a "project" like this (with a little
>> coaching would be a fun way to learn, and in the end I might be able to
>> contribute in other ways.  If not this, then perhaps some other GC
>> functionality.  I think this is a great program.
> 
> Please remember to copy the list on all replies.
> 
> If it’s a one-off it would be easier to just remove the <import-map-bayes> element and its children from the account in question in a text editor. If its something you expect to do more than once, or you want to publish it for others, then the easiest approach would be an XSLT script, which I’d think could do the job in a very few lines.
> 
> For the SQL backends, save as XML, do the edit or apply the XSLT script, and re-save the result back to SQL. The way slots are stored in SQL makes it considerably more difficult to work with.
> 
> No Scheme required for this one, which is a good thing because I’m not the one to be coaching anyone in Scheme. The matchers are in C, though new code should be in C++11. I think Geert is planning to work over the import/export engine as his exercise in refreshing and updating his C++ skills.
> 
> Regards,
> John Ralls
> 
> 
> _______________________________________________
> gnucash-user mailing list
> gnucash-user at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-user
> -----
> Please remember to CC this list on all your replies.
> You can do this by using Reply-To-List or Reply-All.




More information about the gnucash-user mailing list