Merging two GNCBook* objects

Derek Atkins warlord at MIT.EDU
Fri Apr 9 14:12:54 EDT 2004


Neil Williams <linux at codehelp.co.uk> writes:

> I was beginning to work the same way. There will need to be separate handling 
> of the various component objects. I didn't put that in the summary as I 
> wasn't sure whether those functions should be part of the API or just for 
> internal scope.

At some level they ned to be part of the API, because a plug-in module
needs to register the API.  See the QofObject code for examples of API
registration.

> In the internals yes, I should have specified:
>
> Two types of *user* actions to resolve collisions as presented in the dialog:
> 1. Main overrides import
> 2. Import overrides main
>
> i.e. for each specific collision (no matter how far down the tree), the user 
> only needs to decide whether the keep the original or import the new. I'd 
> rather not add a burdensome 'edit-in-place' feature.
>
>> example, how you "merge" an accounts is going to be different than how
>> you merge a transaction, and how you merge an invoice is going to be
>> different than how you merge the invoice item-list.

This makes sense...  You might, however, also need user help to even
determine if an object is a "duplicate" or new.

>> You might find that a simple list is not sufficient.  I don't know,
>
> I feared as much. I like to start simple - that way more bits stay simple!

Keep it as simple as possible, but no simpler.  I'm just trying to make
sure you don't make it "simpler". ;)

> the same - I'll get the import engine to ignore the import data, leave main 
> untouched. I'd appreciate comments on just how strict this has to be:
> e.g. If the description field doesn't match but the date, account, amount and 
> category do match - I'd still list that as a collision but what if it's only 
> a difference in capitalisation? 'Lunch' instead of 'lunch' - it would save a 
> lot of queries to look for matches with case-insensitive patterns. There 
> would still be anomalies with abbreviations, whitespace etc.

If the GUID matches it's the always same semantic object, regardless
of whether other data fields are changed.

If the GUID doesn't match then you need some algorithm to detect
whether the objects are the same semantically.  This algorithm is
going to be object-type specific.  How you detect
semantically-equivalent transactions is different than how you detect
semantically-equivalent customers.

You _MAY_ need user input at this stage, too, to help map e.g. an
"import" customer to an "existing" customer.  Granted, you might be
able to combine the "is this a duplicate?" and "what shall we do with
it?"  in the same user query.

> maybe the same, maybe different - report to user using the collision object 
> (probably not a complete GNCBook). Merge action is dictated by the nature of 
> the import data involved in that specific collision.
>
> new - don't list in the collision dialog, just store in case the user aborts 
> and then commit when the other collisions are resolved.
>
> There's a balance to be struck here between giving the user 2,000 collisions 
> and leaving the user 100 transactions to adjust manually.

Agreed.

> Yes, from our previous discussions, I was not going to put a lot of weight on 
> matching guid's - it's only a part of the match and the match would still 
> fail if other parts of the object differed.

I would put weight into a guid match in one direction: they match, it
*IS* the same object.  If they do NOT match, then you need to do more
work.

> :-) Ho-hum. The rule set and the references are going to take the most care.

Devil in the details... ;)

> Yes, identifying which data objects within a GNCBook object in RAM cannot be 
> duplicated / repeated and which therefore, in the event of user confirmation, 
> would be need to have that data overwritten to reflect the imported data. 
> i.e. an account can only have one name but many transactions - the account 
> name would be overwritten if the user chose to allow that data to be 
> imported. Transactions may be overwritten (confirmed collisions) or appended 
> (new).

Ok.  Sounds reasonable.

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available


More information about the gnucash-devel mailing list