merge logic cascade

Neil Williams linux at codehelp.co.uk
Sun Jul 18 11:08:33 EDT 2004


From previous discussions, every entity in the incoming book needs to 
classified as one of three options:
"the same"              (guids match)
"maybe the same"        (guids don't match but something else matches,
                         like maybe the account name or invoice owner/date)
"new"                   ("clearly" new)

I just want to check that there is actually a fourth situation, that of a 
semantic match that contains new data - i.e. an update, guids do match but 
some parameters contain modified data. These need to handled differently to 
entities where the guid AND all parameter data match exactly.

I'm working on the comparison routines now and I've designed a simple (well, 
it started off as simple . . . . ) logic cascade to cope with ALL objects. 
The cascade starts at MERGE_UNDEF - an undefined value used in internal error 
routines. Every object in the import book that is also registered with 
qofclass is checked and all registered parameters of that object are to be 
compared. Each object is then compared as an entity - i.e. two separate 
accounts are compared separately, with separate results. 

If someone corrupts the import data to create rogue objects that are not 
registered properly with qofclass, gncBookMerge silently ignores the objects. 
At present, I don't have a way of informing the user of such corruptions 
UNLESS the parameter names, types or values are invalid. However, as the 
functions only accept GNCBook (i.e. QofBook) then it's down to the import 
code to sift out such corruptions, and for my code to not crash noisily if 
some are not caught. Yes?

Each parameter in the entity from the input book is compared with the 
corresponding entity in the target book (main book) and the cascade starts 
with the first listed parameter by setting an enum value, dependent on the 
first match and the result of the GUID comparison.

New rules start at:

MERGE_ABSOLUTE (GUID's match; first parameter matches) OR
MERGE_DUPLICATE (GUID's do NOT match; first parameter DOES match) OR
MERGE_NEW (GUID's do NOT match; first parameter does NOT match).

These values remain unless and until another parameter in the SAME entity 
fails a match with the corresponding entity in the target book. This is to 
save work - absolute matches and duplicates are silently ignored. 
(Essentially, a MERGE_DUPLICATE is a MERGE_ABSOLUTE from an external, non 
GnuCash source. The guid isn't in the external data (maybe a Palm or 
spreadsheet etc.) and is created on-the-fly by being put into a GNCBook prior 
to calling gncBookMergeBuildRules(). As such, the guid is not important, the 
data matches and therefore it is ignored. It would therefore be advisable to 
always export the guid - to make a subsequent import less work.)

If any one subsequent parameter in the same entity FAILS a match:
(there is a proviso to this at the end of this message)

MERGE_DUPLICATE fallsback to MERGE_REPORT
 (GUID does not match and some parameters do NOT match, some do.) The user 
must resolve the conflict. This is the "maybe the same" category from 
earlier.

MERGE_NEW fallsback to MERGE_REPORT 
(GUID does not match and some parameters now DO match as well as some that 
don't.) Again, the user must resolve this conflict. Instead of being unique, 
this entity is now also classed as "maybe the same".

MERGE_ABSOLUTE fallsback to MERGE_INSERT or MERGE_APPEND
 (GUID matches but some parameters differ)
 (Target book will be updated with the entity from the input book, the method 
- insert or append - depends on object type.) The target book needs to be 
updated with data from the input book - I'd like to check whether APPEND is 
necessary? Are there situations where a SINGLE entity should not have certain 
parameter values overwritten but instead the new data should be appended 
somewhere? e.g. GList or GSList etc. - are those used?
e.g. An import contains a new transaction in an existing account. The account 
is unchanged (MERGE_ABSOLUTE) other than this new transaction. The 
transaction entity would be reported as MERGE_NEW. I'd expect that creating 
the new transaction using standard routines would update the account, there 
would be no need to change anything in the account  separately?

Can I therefore combine MERGE_INSERT and MERGE_APPEND into a single value, 
MERGE_OVERWRITE? (or perhaps MERGE_UPDATE if that is clearer).

Remaining MERGE_ABSOLUTE and MERGE_DUPLICATE results are ignored.

All MERGE_REPORT results will be made available for a GUI dialog control 
procedure that can offer the choices to the user and resolve each 
MERGE_REPORT into MERGE_NEW, or MERGE_INSERT/OVERWRITE. (if APPEND is 
ditched.)

At present, I've not written an "ignore" handler. Should the user be allowed 
to ignore certain entities in the import book that conflict with the target 
book or will this corrupt the final book (perhaps by omitting important 
transactions)? If the import data contains spurious data that should not be 
imported, isn't that something for the user to change externally?

The code, as is, would only allow the user to abort the import if it contains 
new/modified data that they do not want merged into the main book. Is that 
sufficient?

(As discussed previously, a user abort would leave the main book completely 
untouched.)

My docbook pages on the design:
http://www.codehelp.co.uk/code/index.html
the full doxygen output, including my addition to import-export
http://www.codehelp.co.uk/doxygen/index.html
(I thought of just putting the odd pages up but it was easier to put the whole 
lot up. It saved having broken links / writing a perl script to change the 
URL of certain links to cvs.gnucash.org.)

The proviso mentioned earlier:
There are certain parameters that will always match - if a user imports data 
using the same currency (as a gnc_commodity), there is no need to list all 
those as MERGE_REPORT if nothing else has changed - are there other 
parameters that should be deemed to be irrelevant UNLESS other values in the 
same entity have also changed? e.g. two transactions, both in GBP. If the 
currency is ignored, the two entities have no matching parameters - this 
would have to go as a NEW transaction.

There is no intrinsic need (that I can see) for gncBookMerge to be restricted 
to only merging into the currently active book - it is possible to specify 
two other books and merge those without affecting the active book. The 
purpose of such a merge is up for debate but it might be useful. 
:-)
I propose leaving the initial call to gncBookMerge as requiring that both 
books be explicitly specified, import and target. Perhaps it would be 
preferable to use a default argument?

PS: What is in KVP parameters? e.g. in Account, what would be contained in the 
QOF_TYPE_KVP parameter?

-- 

Neil Williams
=============
http://www.codehelp.co.uk/
http://www.dclug.org.uk/
http://www.isbn.org.uk/
http://sourceforge.net/projects/isbnsearch/

http://www.biglumber.com/x/web?qs=0x8801094A28BCB3E3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20040718/34a6f5a0/attachment.bin


More information about the gnucash-devel mailing list