Merging two GNCBook* objects (was creating invoices in gnucash)

Fri Apr 9 11:39:10 EDT 2004

hi,

This conversation should remain on gnucash-devel so other developers
have the chance to see it and comment.

Neil Williams <linux at codehelp.co.uk> writes:

> On Tuesday 06 April 2004 8:42, you wrote:
>> >> NOTE: a general API to "merge two books" would be a good potential
>> >> solution.
>> >
>> > I agree, but this task is already big enough for me. I'm OK in C but I'm
>> > no wizard!
>>
>> Well, this "merge two books" is the code you'd need to write
>> regardless of whether you decide to re-write the parsers.  If you
>> consider this too much work for you then give up now and save yourself
>> the headache.  The "merge" function is necessary AND sufficient for
>> the functionality you want.  The XML parser re-write is neither
>> necessary nor sufficient.
>
> Just to let you know, in case you think I've gone awfully quiet, I am still 
> planning this but I've had a lot of (paid) work to do in the last few days 
> (pharmacist by trade) plus I've got a few of those annoying loose ends in 
> other projects that I want to complete. 

No worries.  It happens to all of us!

> Something along the lines of:
> GNCBook* g_merge(GNCBook* main GNCBook* import) {}
>
> I anticipate creating another two books in memory:
> GNCBook* collision and GNCBook* parsed - collision would be offered to
> the user for confirmation / amendment and then the (amended) collision +
> parsed would be committed to main and returned. This way, if the user
> aborts (because of the number / type of collisions), I can just delete
> collision, import and parsed and leave main untouched. Otherwise, I'd use
> the amended collision object to add / modify records in main and add
> parsed - containing records that are simple imports with no collision
> problems, like new transactions in accounts possibly modified by the merge.

While good in theory, I don't think this is exactly the best approach.
I think you want to break the import down into pieces, and I'm not
convinced that storing them in a new GNCBook* is the right thing to do
(there is a lot of overhead in a GNCBook*).  I may be wrong here -- I'm
hoping some of the other import developers can speak up.

> dummy outline:
>
> GNCBook* g_merge(GNCBook* main, GNCBook* import) {
> // Create the rule set object
> // Use the set to make decision 1: Is this data going to conflict with main.
> // Yes -> GNCBook* collision, No -> GNCBook* parsed
> // repeat until import is exhausted
> // I'll need some kind of tally of how collision has been amended by the
> // user
> // That tally can then tell me how to resolve each collision and the two
> // books can be added to main.
>
> return main;
> }
>
> (all pretty predictable and generic so far.)

I see no reason to "return main". just modify main in place and return
an error code.

> My biggest concern is getting tied up in the detail of a GTK dialog when I 
> want to concentrate on the rule set and collision logic. Would someone else 
> in the Gnucash team be willing to create an empty dialog box for me, later? I 
> haven't done any GUI work in Gnome yet. The dialog would presumably need just 
> a list control (or maybe a large text label) with 2 radio buttons per 
> collision and a method of showing say 10 collisions at a time and rolling 
> forward like a wizard. I'd anticipate passing a copy of collision and 
> receiving an array, arrayname[collisionID] = response. 

I'm working on some "generic druid" code in the g2 branch.  You can
probably use that.  This interface closely matches the "transaction
duplicate detection" interface that already exists, except it would
need to be extended to other data objects.

> Two types of actions to resolve collisions:
> 1. Main overrides import
> 2. Import overrides main

I think the list of rules is going to depend on the object-type.  For
example, how you "merge" an accounts is going to be different than how
you merge a transaction, and how you merge an invoice is going to be
different than how you merge the invoice item-list.

> So a simple tally of user response for each collision event can be resolved to 
> parse the collision book.

Maybe.

You might find that a simple list is not sufficient.  I don't know,
but my gut feels that somewhere in the rules you need to determine
for each object in import if it's:

"the same"		(guids match)
"maybe the same"	(guids don't match but something else matches,
                         like maybe the account name or invoice owner/date)
"new"			("clearly" new)

What we're testing here if the the object refers to the same semantic
concept.  E.g. there is a semantic concept of the top-level Asset
account.  But if I'm merging your account tree into my account tree
the guids will differ, but semantically they are the same.  Hense,
these accounts are "maybe the same".  You probably need user
intervention here to properly map the "maybe the same" objects.

If objects are "the same" or "maybe the same" then you might need to
determine whether they contain different data.  For example, if I
import my OWN account tree again (say, merging a backup file), I might
have changed a description in an account, or I might have changed a
transaction.

Then you also need to keep all the references correct.

> I'm hoping to start the rule set this holiday weekend by setting out which 
> parts of GNCBook would have to be overwritten and which would have to be 
> appended after user confirmation. (Basically sorting settings from 
> transactions) and then creating a test program for development that 
> implements two basic GNCBook's and outputs the impact of rule changes on 
> each.

What do you mean "overwritten"?  Oh, I think you mean trying to
determine which objects are the "same", "maybe the same", and "new"?

Also, I _HIGHLY_ suggest you work from CVS HEAD and _NOT_ from the 1.8
tree.  Otherwise it's just adding work, and frankly this code wont be
getting into 1.8 so you might as well work from "current" code.

Also, just make sure you can plug in rules per object type.  :)
See the qofobject code in CVS HEAD to see what I mean.

Good Luck!

> I'll leave the XML parser stuff until next time.

As I've said, you can consider the parser "done" and use the existing
parsing tools.  Once you get the merge function done you can consider
going back and rewriting the parsers to be schema-based.

-derek

PS: Feel free to pop into the #gnucash channel on irc.gnome.org to
discuss it with us.  Many of the devs hang out there and answer tech
questions for each other.

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available