Welcome back

Wed Aug 18 13:40:43 EDT 2004

On Wednesday 18 August 2004 3:10, Linas Vepstas wrote:

> Can you count the number of potential conflicts by doing a 'dry run'
> first?

The number of objects does not indicate the number of collisions, nor the 
nature of those collisions. To make it a sensible output to the user, I need 
to output how many of those collisions actually require intervention, not a 
grand total. To calculate that in a dry run and then for real is pointless. I 
might as well store the dry run result and use it live. That is exactly what 
I do - so it isn't a dry run at all. This is the most intense part of the 
code, the commit is easy by comparison.

> Speaking of which, how does the gui and book-merge interact? 
> That is, what is the sequence of calls that must be made to merge?

It's all in the doxygen output:
qof_book_mergeInit (QofBook *importBook, QofBook *targetBook)

This sets up and performs all comparisons, identifies the target QofEntity and 
decides on which collisions need to be reported to the user.

The user intervention routines are open and flexible - there is a reasonable 
amount of code required to implement the routine but this allows a lot of 
freedom in exactly what data is presented and in what output format.

typedef void(* qof_book_mergeRuleForeachCB )(qof_book_mergeRule *, guint)
This defines the routines to display the information and retrieve the user 
actions. There's a working example here:
http://www.codehelp.co.uk/code/example-gncBookMerge.c
It uses console interaction with the user and displays the results of the 
merge.
The routine is called using:
void 
qof_book_mergeRuleForeach (qof_book_mergeRuleForeachCB, qof_book_mergeResult)

This allows each different result to be called in turn, complete with a count 
of how many results of that type remain. The only essential call is for 
qof_book_mergeResult MERGE_REPORT which is a list of all collisions that MUST 
be resolved by the user before a commit.

It retrieves the parameter data for the import QofEntity and the target 
QofEntity as strings using:
char * 
qof_book_merge_param_as_string (QofParam *qtparam, QofEntity *qtEnt)
(a convenience function that saves defining gnc_numeric handlers etc.)
This makes it easy to display the content to the user but should NOT be used 
to alter the data itself. You don't have to use this if you don't want to.

The decision of the user is set using:
int 
qof_book_mergeUpdateResult (qof_book_mergeRule *resolved, qof_book_mergeResult 
tag)

That completes the work of the user intervention routine - be it a console or 
GUI design. The routine is called for each collision that requires user 
intervention. At it's simplest, the routine gets the strings that represent 
the collision for the user to understand, it takes input for the user 
decision and sets a result in the rule for that collision; once for each 
collision. The user can abort at any time.

Finally, call int  qof_book_mergeCommit (void) to commit the data to the 
target book. This is the only time that the target book is modified. The 
function runs only once and it frees all merge data from memory once 
complete.

> The GUI that uses book-merge needs to make a copy of the book first.

It needs to make a QofBook, the data source itself is of no interest to 
qof_book_merge and any memory allocated to that can be freed once the QofBook 
is done. You can't merge anything without the data being in memory, but the 
merge does only require one instance of the import data.

> If the merge is aborted mid-way, you still have a copy of the original.
> Yes, this sounds cpu-sucking to me.   And we don't have an

How can I update the target if the import book is freed from memory?

To compare and commit, I need to have two QofBook structures in memory at the 
same time PLUS the merge code. It's the price of providing a book level 
merge. However, the merge code doesn't contain copies of the data itself, it 
merely compares the current parameter, stores a pointer and the result enum 
and moves on. 

> infrastructure for making copies at this time.  So we'd need to deal
> with that ...

I know, I can't scale up testing until that is done. Current testing is 
limited to how many objects I'm willing to code by hand instead of reading 
from an existing filesource.

> OK, right.  So ... explain to me what the steps are to do this. 

1. Use QOF to iterate through all registered objects.
2. Within that iteration, use QOF to iterate over any instances of the current 
object that exist in the import book.
These first two ensure that no registered objects get skipped by the merge. 
Everything registered with QofClass and QofObject at compile time will be 
available. No changes are needed in qof_book_merge.c or ..h to cope with 
completely new QofObject types, unless they aren't quite QOF compatible.

3. Retrieve the parameter list from QofObject for each registered entity in 
the import book.
4. Lookup the GUID of the import entity in the target book and store a pointer 
to any absolute match.
5. Lookup the closest match if no GUID match exists (this code needs 
improvement, it's a little inflexible and sterile at the moment).
6. If nothing at all can be found to even remotely match, automatically set it 
as new and move on.
7. Compare each parameter across the import book entity and target book 
entity. There are different paths here for those entities with and without an 
absolute GUID match but basically if some parameters match and some do not, 
both paths lead to the entity being reported to the user for resolution 
later.

Step 5 is the most intensive step of the entire merge. Wherever possible, a 
GUID match is favoured. Checking every instance of a specific object type for 
a distance calculated close-ish match in a target book that may contain years 
of data is likely to take time. The current code takes a little bit of a 
shortcut and once I can test with full sized books, the code will be 
improved.

Once out of all these loops, every entity in the import QofBook has been 
matched, compared and preliminary results allocated. Now the user gets the 
reports and must re-allocate all reported collisions. The 
gncExampleBookMerge.c code is the best example of how this works. There are 
lots of options - users can be offered only the essential resolutions or all 
resolutions or any of the tags in between. The only rule is that all 
collisions tagged as REPORT must be resolved before a Commit can succeed.

Finally, the Commit runs through each of the rules containing new or updated 
data in turn, it creates all required new objects, it updates all appropriate 
targets and ignores everything the user has asked to be ignored. The data to 
set into the target book is read directly from the import book. Only when the 
commit is complete can the import QofBook be freed from memory. (Although the 
rules are progressively freed as each is updated so there will be some 
savings here.)

> I will 
> then turn around and propose the same API back to you, except that API will
> now be per-object, instead of per-book.

Then you would need to have FOUR routines per object, not one.

qof_match:
qof_compare:
qof_user:
qof_commit:

You cannot run these together without trapping the user in an endless stream 
of reports with no end in sight.

You'd need to call qof_match and qof_compare for all objects in the import 
before getting a total. Then offer the user intervention and commit all 
objects at the same time. That's what I already do at the book level.

> > Fully qof-compliant objects wouldn't need an object-specific merge
> > routine defined.
>
> No, I was proposing that it be the 'default routine' ...

The default routine is a book level routine.

> > Having a value 'book-merge: NULL,' would be misleading to other
> > developers. It could easily lead people to think that the object will NOT
> > be merged. Yet if an object is fully QOF compliant, it's all handled by
> > the book merge routine, so no special object behaviour is required.
>
> Can't we just move the merge function out of the book-merge routine, and
> put it into the per-object routine?

Not if the user is to have any realistic control over the merge.

> > Instead, I need the object to help me with any non-compatible parameters,
> > difficult objects, awkward values and post-commit code that simply
> > doesn't fit in to a generic book merge module.
> >
> > e.g. If an object needs to recalculate balances or lists of Splits after
> > a commit but before control is returned to the main GnuCash process for
> > editing.
> >
> > I was thinking of what might be better termed: merge-helper:
> > rather than the possibly misleading book-merge:
>
> what would merge-helper do?

Perform object-specific work to retain data integrity - jobs that a generic 
routine simply cannot hope to understand or access on such a specific level. 
I only need merge-helper to help with the unresolved problems still listed in 
the table on my site. In the scheme of things, these don't add up to a whole 
lot but they are, when seen together, a significant hurdle to a competent 
merge.

> I can't answer that without understanding the specific steps that you go
> through during merge, together with some usage scenarios.

I've written copious amounts of documentation already, it should all be clear 
from the doxygen and docbook output on codehelp.co.uk/code/

> > Provided that no editing is allowed until the commit is finished, is that
> > a problem?
>
> I think so; most of the heavy processing happens during commit, so its
> at that time that all pointer & etc. should be valid.

Not so. There's more processing in the compare than the commit. After all, 
there are a substantial number of entities that will be ignored as duplicates 
in many merges. Deciding that they ARE duplicates takes more effort than 
freeing the memory afterwards!

> I'm not sure I understand how merge will be used.  I think Derek is
> thinking about a conflict case, where there are two transactions that
> should be the "same" transaction, and thus should be "merged" (?)
> except that one transaction has 2 splits and the other has 3.
> Clearly the "right answer" is not necessarily a transation with 3
> splits.  Prusmably, the "correct answer" is to have the user pick one
> of the two transactions, and throw the other (and its splits) away.

This, the code will do.

> In this case, we don't want to run "object-merge" on the two
> transactions at all...

Exactly, we need a book level merge with object level helpers for the odd bit 
of code.

-- 

Neil Williams
=============
http://www.codehelp.co.uk/
http://www.dclug.org.uk/
http://www.isbn.org.uk/
http://sourceforge.net/projects/isbnsearch/

http://www.biglumber.com/x/web?qs=0x8801094A28BCB3E3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20040818/bbc24e75/attachment.bin