Welcome back

Tue Aug 17 22:10:14 EDT 2004

On Tue, Aug 17, 2004 at 09:20:37PM +0100, Neil Williams was heard to remark:
> On Tuesday 17 August 2004 2:44, Derek Atkins wrote:
> > linas at linas.org (Linas Vepstas) writes:
> > >> A book-merge: function is an excellent idea. I'll get that done.
> > >
> > > OK, let us know how this goes. Although we could add support for lists 
> > > and trees into qof (to handle the lists of splits and the account tree)
> > > I'm not sure that I want to burn the brain cells right now to figure out
> > > the 'right' way of doing thiis, not just yet.   By contrast, the
> > > book-merge-per-object call seemed like a simple catch-all.
> 
> Ah - first gotcha. I think a name change would be helpful here. I can see that 
> book-merge: would be misleading as we have already diverged. 

??  I don't understand the above sentance.  Oh, OK, maybe I do, see
below.  OK, lets pick a differnt name. lets call it "object-merge".

> Any per-object merge runs into user problems. I am all too familiar with the 
> problems of per-collision data handling, as I mentioned in one of the very 
> early posts on this code. I got stuck in a loop with a Palm sync that just 
> went on and on and on and on and ON presenting new conflicts for resolution 
> without ever giving a hint of how many were still to come. I gave up at 200. 
> Most other users would never get beyond 50. 

Can you count the number of potential conflicts by doing a 'dry run'
first?  Speaking of which, how does the gui and book-merge interact?
That is, what is the sequence of calls that must be made to merge?

> The fatal flaw in this approach 
> is then obvious - those first 200 conflicts were committed to the final 
> database as you went along because each object was dealt with in turn. If the 
> user aborts, the final book is in complete chaos. YUK!

The GUI that uses book-merge needs to make a copy of the book first.
If the merge is aborted mid-way, you still have a copy of the original. 
Yes, this sounds cpu-sucking to me.   And we don't have an
infrastructure for making copies at this time.  So we'd need to deal
with that ... 

> The merge runs on a per-book basis for two simple reasons. 
> 
> 1. When the Init() is complete, the user intervention loop has a clear and 
> accurate value for the total number of conflicts to be resolved by the user. 
> To do this, all import objects must be compared before any user intervention 
> is sought. The user then gets a simple choice based on accurate information - 
> if the number of conflicts is too high, the user can abort immediately. The 
> level that determines 'too high' cannot be determined in advance.

Good, yes.

> 2. Any abort prior to the commit stage is absolutely safe, nothing is changed 
> in the target book until every conflict is resolved. If the user aborts at 
> any point before the final commit, everything is left exactly as it was 
> found.

OK, right.  So ... explain to me what the steps are to do this.  I will
then turn around and propose the same API back to you, except that API will
now be per-object, instead of per-book.

> > > own merge routine to 'do the right thing' for that object type.  For
> > > us 'lazy' object developers, provide a generic merge routine that
> > > we can use if an object is fully qof compliant.
> 
> Fully qof-compliant objects wouldn't need an object-specific merge routine 
> defined.

No, I was proposing that it be the 'default routine' ... 

> Having a value 'book-merge: NULL,' would be misleading to other developers. It 
> could easily lead people to think that the object will NOT be merged. Yet if 
> an object is fully QOF compliant, it's all handled by the book merge routine, 
> so no special object behaviour is required.

Can't we just move the merge function out of the book-merge routine, and 
put it into the per-object routine?  

> Instead, I need the object to help me with any non-compatible parameters, 
> difficult objects, awkward values and post-commit code that simply doesn't 
> fit in to a generic book merge module.
> 
> e.g. If an object needs to recalculate balances or lists of Splits after a 
> commit but before control is returned to the main GnuCash process for 
> editing.
> 
> I was thinking of what might be better termed: merge-helper: 
> rather than the possibly misleading book-merge:

what would merge-helper do?

> > I agree.  I think you can safely ignore the lists in most objects
> > (except, perhaps, transactions) as those objects will get merged into
> > the book later.  For example, you don't need to worry about the
> > Account SplitList, because that's really "controlled" by the Splits
> > themselves.  When you merge in the splits you get their "parent"
> > account just fine.
> 
> Will this work reliably if the parent is a new account created by the merge? 
> (Possibly in a new AccountGroup too?) At present, I don't have any rules to 
> stipulate that certain objects should be compared or committed before or 
> after any others. That could be something for merge-helper: to define, 
> although I'd rather not (for speed reasons). 

I can't answer that without understanding the specific steps that you go
through during merge, together with some usage scenarios.

[... uninitiaklized memory ...]

> Provided that no editing is allowed until the commit is finished, is that a 
> problem?

I think so; most of the heavy processing happens during commit, so its
at that time that all pointer & etc. should be valid.

> > The only potential issue I can imagine is merging transactions -- you
> > somehow need to detect that what you're merging has fewer splits
> > (e.g. if you change a transaction and remove splits).  I'm not sure
> > how this can be done easily....
> 
> Isn't a merge always going to add data? When would an import result in less 
> Splits? I've got no code (so far) that removes data from the target book or 
> any objects in the book.

I'm not sure I understand how merge will be used.  I think Derek is
thinking about a conflict case, where there are two transactions that
should be the "same" transaction, and thus should be "merged" (?) 
except that one transaction has 2 splits and the other has 3.   
Clearly the "right answer" is not necessarily a transation with 3
splits.  Prusmably, the "correct answer" is to have the user pick one 
of the two transactions, and throw the other (and its splits) away.

In this case, we don't want to run "object-merge" on the two
transactions at all... 

--linas

-- 
pub  1024D/01045933 2001-02-01 Linas Vepstas (Labas!) <linas at linas.org>
PGP Key fingerprint = 8305 2521 6000 0B5E 8984  3F54 64A9 9A82 0104 5933