merge logic cascade
Neil Williams
linux at codehelp.co.uk
Sun Jul 18 11:08:33 EDT 2004
From previous discussions, every entity in the incoming book needs to
classified as one of three options:
"the same" (guids match)
"maybe the same" (guids don't match but something else matches,
like maybe the account name or invoice owner/date)
"new" ("clearly" new)
I just want to check that there is actually a fourth situation, that of a
semantic match that contains new data - i.e. an update, guids do match but
some parameters contain modified data. These need to handled differently to
entities where the guid AND all parameter data match exactly.
I'm working on the comparison routines now and I've designed a simple (well,
it started off as simple . . . . ) logic cascade to cope with ALL objects.
The cascade starts at MERGE_UNDEF - an undefined value used in internal error
routines. Every object in the import book that is also registered with
qofclass is checked and all registered parameters of that object are to be
compared. Each object is then compared as an entity - i.e. two separate
accounts are compared separately, with separate results.
If someone corrupts the import data to create rogue objects that are not
registered properly with qofclass, gncBookMerge silently ignores the objects.
At present, I don't have a way of informing the user of such corruptions
UNLESS the parameter names, types or values are invalid. However, as the
functions only accept GNCBook (i.e. QofBook) then it's down to the import
code to sift out such corruptions, and for my code to not crash noisily if
some are not caught. Yes?
Each parameter in the entity from the input book is compared with the
corresponding entity in the target book (main book) and the cascade starts
with the first listed parameter by setting an enum value, dependent on the
first match and the result of the GUID comparison.
New rules start at:
MERGE_ABSOLUTE (GUID's match; first parameter matches) OR
MERGE_DUPLICATE (GUID's do NOT match; first parameter DOES match) OR
MERGE_NEW (GUID's do NOT match; first parameter does NOT match).
These values remain unless and until another parameter in the SAME entity
fails a match with the corresponding entity in the target book. This is to
save work - absolute matches and duplicates are silently ignored.
(Essentially, a MERGE_DUPLICATE is a MERGE_ABSOLUTE from an external, non
GnuCash source. The guid isn't in the external data (maybe a Palm or
spreadsheet etc.) and is created on-the-fly by being put into a GNCBook prior
to calling gncBookMergeBuildRules(). As such, the guid is not important, the
data matches and therefore it is ignored. It would therefore be advisable to
always export the guid - to make a subsequent import less work.)
If any one subsequent parameter in the same entity FAILS a match:
(there is a proviso to this at the end of this message)
MERGE_DUPLICATE fallsback to MERGE_REPORT
(GUID does not match and some parameters do NOT match, some do.) The user
must resolve the conflict. This is the "maybe the same" category from
earlier.
MERGE_NEW fallsback to MERGE_REPORT
(GUID does not match and some parameters now DO match as well as some that
don't.) Again, the user must resolve this conflict. Instead of being unique,
this entity is now also classed as "maybe the same".
MERGE_ABSOLUTE fallsback to MERGE_INSERT or MERGE_APPEND
(GUID matches but some parameters differ)
(Target book will be updated with the entity from the input book, the method
- insert or append - depends on object type.) The target book needs to be
updated with data from the input book - I'd like to check whether APPEND is
necessary? Are there situations where a SINGLE entity should not have certain
parameter values overwritten but instead the new data should be appended
somewhere? e.g. GList or GSList etc. - are those used?
e.g. An import contains a new transaction in an existing account. The account
is unchanged (MERGE_ABSOLUTE) other than this new transaction. The
transaction entity would be reported as MERGE_NEW. I'd expect that creating
the new transaction using standard routines would update the account, there
would be no need to change anything in the account separately?
Can I therefore combine MERGE_INSERT and MERGE_APPEND into a single value,
MERGE_OVERWRITE? (or perhaps MERGE_UPDATE if that is clearer).
Remaining MERGE_ABSOLUTE and MERGE_DUPLICATE results are ignored.
All MERGE_REPORT results will be made available for a GUI dialog control
procedure that can offer the choices to the user and resolve each
MERGE_REPORT into MERGE_NEW, or MERGE_INSERT/OVERWRITE. (if APPEND is
ditched.)
At present, I've not written an "ignore" handler. Should the user be allowed
to ignore certain entities in the import book that conflict with the target
book or will this corrupt the final book (perhaps by omitting important
transactions)? If the import data contains spurious data that should not be
imported, isn't that something for the user to change externally?
The code, as is, would only allow the user to abort the import if it contains
new/modified data that they do not want merged into the main book. Is that
sufficient?
(As discussed previously, a user abort would leave the main book completely
untouched.)
My docbook pages on the design:
http://www.codehelp.co.uk/code/index.html
the full doxygen output, including my addition to import-export
http://www.codehelp.co.uk/doxygen/index.html
(I thought of just putting the odd pages up but it was easier to put the whole
lot up. It saved having broken links / writing a perl script to change the
URL of certain links to cvs.gnucash.org.)
The proviso mentioned earlier:
There are certain parameters that will always match - if a user imports data
using the same currency (as a gnc_commodity), there is no need to list all
those as MERGE_REPORT if nothing else has changed - are there other
parameters that should be deemed to be irrelevant UNLESS other values in the
same entity have also changed? e.g. two transactions, both in GBP. If the
currency is ignored, the two entities have no matching parameters - this
would have to go as a NEW transaction.
There is no intrinsic need (that I can see) for gncBookMerge to be restricted
to only merging into the currently active book - it is possible to specify
two other books and merge those without affecting the active book. The
purpose of such a merge is up for debate but it might be useful.
:-)
I propose leaving the initial call to gncBookMerge as requiring that both
books be explicitly specified, import and target. Perhaps it would be
preferable to use a default argument?
PS: What is in KVP parameters? e.g. in Account, what would be contained in the
QOF_TYPE_KVP parameter?
--
Neil Williams
=============
http://www.codehelp.co.uk/
http://www.dclug.org.uk/
http://www.isbn.org.uk/
http://sourceforge.net/projects/isbnsearch/
http://www.biglumber.com/x/web?qs=0x8801094A28BCB3E3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20040718/34a6f5a0/attachment.bin
More information about the gnucash-devel
mailing list