file versioning [WAS: Re: r15486 - gnucash/trunk/src - SX "enabled" patch from Peter McAlpine <peter at aoeu.ca>.]

Fri Feb 2 08:35:44 EST 2007

On Thu, 2007-02-01 at 11:58 -0500, Derek Atkins wrote:
> Ideally, EVERYTHING would be "like" a KVP-frame..  But using Gobject
> properties and GValue (or perhaps GNCValue, so we can define our own
> GValue types)...  Then we always load (and save) all properties, and
> hopefully don't have to know a-prori all the properties we might want
> (or need) to save.

We might be talking past each other, but I don't consider a named
property (gobject property or otherwise) any different from a named
struct/object field ... but I do consider it different from an arbitrary
dictionary of name/value pairs (e.g. a kvp-style structure).  I
certainly don't want every GNC object to be struct GncFooBar
{ g_hash_table *fields; }, but I don't think that's what you're
suggesting...

...we could consider having an "overflow" bucket on objects: all known
tags are copied into their fields, and unknown fragments are just copied
literally into this overflow area.  Then, on file-write, we could
re-emit the "unknown" sections literally in their containers.

The XML world has two terms that are useful in thinking about this:
must-understand and must-ignore.  In must-understand, processors must
refuse to load the file if there is an element they don't understand.
In must-ignore, processors must simply ignore them [*].

[*: In HTML/document oriented processing, there's two variants of
must-ignore [1]: must-ignore-all (the whole subtree is ignored) and
must-ignore-container (the intervening container is ignored, but any
recognized child elements (tags and text) are presented as if the
container wasn't there). We're talking about must-ignore-all, here.]

[1] http://dev2dev.bea.com/pub/a/2004/05/soa_orchard.html

In some technologies (e.g., SOAP), these concepts are explicitly in the
XML data, as attributes on tags; an element will indicate if it can must
be ignored or understood for the processor to process it.  I don't think
we want this level of granularity.  We have the overall file version
number as a global "must-understand" field. :)

I guess for processors that re-emit the content, there's two variants of
must-ignore processing: retain and discard.  We're talking about
"must-ignore+retain", here.

Of course, this is not without potential problems; here's an
semi-grounded example. Let's call the file-format and software versions
v1 and v2, for simplicity; these don't correspond to gnc data files v1
and v2, they're just sequential abstract versions.  In v2, the concept
of "Mumbles" are introduced.  These Mumbles hold references to Accounts.
As part of the v2 code, when an Account is removed, the user is queried
for the disposition toward the sub-Mumble which references the Account.
This is all pretty plausible; SX template transactions behave similarly.

If v1 opens the v2 data, under must-ignore+discard, the Mumbles are
discarded, and thus can never become inconsistent!  :)

If v1 opens the v2 data, under must-ignore+retain, the Mumbles are
retained, but ignored (and we informed the user as much :).  Invariably,
the user deletes an Account that a Mumble references.  :) As this is v1
code, it doesn't understand that the consistency of the Mumble must be
maintained.  v1 writes out the revised Accounts, and the unmodified
Mumbles.  When v2 opens the v1-saved datafile, it must run some sort of
consistency check to look for that potential condition, and query the
user for the repair activity.

I guess that this is really how gnc should behave anyways.  Anything
could happen to the data file, and the data file could contain anything.
But it bears noting that we don't really do much in the way of load-time
consistency checks now.

Must-ingore+discard is "pessimistic" versioning; its impossible to solve
all problems, so we refuse to allow any problems, at very high cost.
Must-ignore+retain is "optimistic" versioning; the Mumbles will
hopefully, probably not become inconsistent in most cases ... the user
can hopefully do the right thing ... the receiver (the v2 software)
makes it right, with respect to consistency of data only it can
understand.

For completeness, what are the use cases, here?

-- The "failed" upgrade: v1->v2->v1.

In this case, the user tries out the new version (2.0), tries new
features (budgets), but decides to revert to the old version (1.8).
Here, they probably do want to discard the novel data, so they have a
"pure 1.8" datafile again.

-- The "lateral", switching between processors.

In this case, the user or developer is switching between roughly the
same versions, but with different processing options enabled.  For
instance:

- gnucash-stable
- gnucash-stable, but built with OFX support
- gnucash-svn
- some "cashutil"/"gnc-cli" that only processes core/engine objects in
  a simplified manner on the command-line.

Especially in the modularized world we live in, modules (business, ofx,
even maybe user-specific reports) have module-specific content to save.
I also think we should do away with all data storage that's not the
users datafile (import account mapping data, report options, &c.) so it
needs to support them all.

In this case, the user definitely wants to retain the ignored elements.

It is then clearer what the to-do list for the "good thing to fix for
2.0.{5,6}" is... :/

- modify XML layer to retain unknown elements, and keep a tree of same.
- modify QOF objects to have catch-all structures (KVP frames(++)?)
- modify XML layer to re-emit unknown elements.
- modify UI to handle the "unknown elements have been loaded; continue
  or discard?"
  - figure out how to make this sufficiently detailed without being
    overwhelming.
- modify code to support consistency checks.

----------

As for SXes in particular, there's two ways we could emit them for 2.2.
The naïve approach is the current flat SX list with an <sx:enabled>
field.  When 2.0.6 reads this (assuming 2.0.6 contains the
aforementioned improvements), it will ignore the enabled flag, and treat
disabled SXes as enabled, and generally muck things up.

Another option for 2.2 to segregate the disabled SXes into a separate
tree in the XML file, as such:

    <gnc:schedXactions>
      <gnc:schedXaction>...<sx:enabled>true</>...</>
      <gnc:schedXaction>...<sx:enabled>true</>...</>
      ...
      <gnc:disabledSchedXactions>
        <gnc:schedXaction>...<sx:enabled>false</>...</>
        <gnc:schedXaction>...<sx:enabled>false</>...</>
        ...
      </>
    </>

This way, 2.0.6 would keep the disabled SXes out of the mix.  Some
argument says to keep the object/XML impedance low by modeling the SXes
this way (as separate "enabled" and "disabled" lists); I'm not
convinced.

-- 
...jsled
http://asynchronous.org/ - a=jsled;b=asynchronous.org;echo ${a}@${b}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20070202/eff5c5ac/attachment.bin