Repeat imports with QSF XML.

Wed Mar 2 04:39:19 EST 2005

On Wednesday 02 March 2005 5:30 am, TMaynard wrote:
> I realize what you are tackling will open doors for a lot 
> of new features for Gnucash, and I  only partially understand the
> implications.  
> I  
> should add one more qualifier for the kind of maneuver that could be
> supported earlier  in the list of incoming features. I am going to add
> to the A and B restrictions already stated:
>
>     C: the export and graft is *one way* only  (big warning, never any
> re-import or return graft)

On the contrary, the code needs to handle repeat imports. 

It is the user who needs to be clear on whether that is what is necessary. 
That's why GUID's in the export data are so vital. When the data exported 
contains valid GUID's that are retained into the import, then you can import 
that data as many times as you like and you'll have no problems.

Start position:
Account A is called "Old Account". That is exported and the XML (QSF) changed 
to read "New Account".
When you import that XML, the Old Account will be renamed New Account and if 
it is already named New Account, nothing else will happen. So repeating the 
import does absolutely nothing.

If the GUID is simply changed to another valid GUID, then once an account with 
that GUID is created (if the user chooses to do so), then that account will 
generate a GUID match if the import is repeated, so again, nothing else will 
happen.

Note that if you alter the name of that account within GnuCash to "My Bank" or 
whatever, when you import the data (with GUID's) exported before that change, 
the name will be changed to "New Account". It is up to the user to ensure 
that the data being imported is recent enough for this to not be a problem.

A matching GUID is a semantic match: it IS always the same entity even if the 
data within the entity has changed. That's a golden rule of GUID's that the 
code and the user must follow.

> Thankfully the bank doesnt 
> have to be concerned with ever receiving return data or to be concerned
> with a unique identifier or reimportation problem.

GnuCash does need to handle such situations.

>    This seems important to enable  between certain "consenting" gnucash
> COA's.  I hope this type of restricted graft can be a high priority and
> maybe such a tool can be the venue for  some of the first look and feel
> additions.

No, there will be no hard-coded prevention of repeat imports. Instead, repeats 
will be handled intelligently and gracefully.

To do this, there are two fundamentally different situations:
1. Where the GUID is retained from export through import - the application 
handling both export and import understands GUID's and knows how to store 
them.
2. Where the GUID of the import data has had to be generated 'on-the-fly' by 
QOF because a third-party application doesn't store or use GUID's internally.

Situation 1 is a GnuCash export to QSF followed by a GnuCash import. If the 
same data file is loaded at export as at import, then the GUID's will match 
(unless some entities have been deleted in the meantime). If a different 
GnuCash data file is loaded at import time, there are likely to be NO GUID 
matches. Only if the same XML exported from file A is imported into the data 
file B more than once, will the GUID's match on import.

If there is no GUID match for any specific entity, the user will be required 
to resolve the collision.

Situation 2 cannot generate a GUID match. The GUID is created afresh every 
time the data is loaded from the third party application. This will usually 
result in collisions that the user will need to handle - usually by simply 
confirming that the incoming data is genuinely new.

> Neil has used the metaphor of "collisions" and I  see that 
> he really is constructing a coordinated traffic light system for the
> whole city.

Collisions only happen when the code cannot find a GUID match. 

These collisions are handled using user intervention - ignore, import or new. 
Ignore that section of the import data completely, import it by updating the 
best match GnuCash can find with the import data or make the import data a 
new entity within the GnuCash data file.

> In the mean time some of the traffic could carefully turn 
> Right on Red (with the warning that they should know where they want to
> go

Yes, but no warning will be issued. A semantic match will always be allowed to 
do whatever is described in the import - an entity cannot collide with 
itself.

The outline is:
GUID match and data identical - ignore.
GUID match and data differs - update
no GUID match but data exactly matches an existing entity - ignore.
no GUID match and data differs, even slightly, from best match - report.

It is possible to inform the user of how the GUID match will be updated before 
any commit takes place. Current merge code doesn't use that but the API does 
support it if other uses require it. Developers can also use the merge API to 
inform the user about entities that will be ignored - if that were to become 
useful.

> and they can't return someday with a different cargo.) 

Yes they can. It is up to the user to establish the accuracy of the import 
data - that's why it's all in plain text XML - but the code will always 
require user confirmation before data without a GUID match is altered.

If the user deliberately sets GnuCash to query a third-party QOF application 
(like pilot-link) with the same data more than once, it is likely that some 
entities will simply be ignored, the rest will be reported for user 
intervention - typically those, like accounts, that use the GUID of another 
entity as a parameter.

This is because whilst a GUID in a static XML file won't change, applications 
that create QOF data but which don't have a method to store or use GUID's 
themselves can only create a new one each time the data is loaded - entities 
that use GUID's as parameters will therefore have the new value from the 
referenced entity each time. 

However, these will still be unique GUID's and there will be no GUID match, so 
a collision will result and the user will be required to resolve the 
collision. If the user repeatedly imports such data and selects the "new" 
option in the user intervention dialog every time, the data can be created 
repeatedly but only because that is what the user claimed s/he wanted.

i.e. the user must first, repeat the same procedure to call pilot-link with 
exactly the same data. Then the user must confirm in a dialog that the 
imported data is to be created as a new entity, not ignored or used to update 
the existing entity.

Repeat imports will be allowed but only by requiring the user to confirm how 
the collisions should be handled. At any point prior to the final commit, the 
user can cancel the import without changing any data.

-- 

Neil Williams
=============
http://www.dcglug.org.uk/
http://www.nosoftwarepatents.com/
http://sourceforge.net/projects/isbnsearch/
http://www.neil.williamsleesmill.me.uk/
http://www.biglumber.com/x/web?qs=0x8801094A28BCB3E3

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-user/attachments/20050302/eee2dda1/attachment.bin