Dirty entity identification.

Thu Jul 21 18:22:39 EDT 2005

On Thursday 21 July 2005 10:04 pm, Chris Shoemaker wrote:
> If by "incremental storage system" you mean something that commits
> only what has changed, then we're on the same page.

Yes.

> (Incidentally, 
> even "immediate-commit" systems sometimes fallback to "delayed-commit"
> systems when they're in "offline" mode.)

Yes.

> > I think it would be too large to inflict on all users at all times for
> > the odd occasion that it might be useful.
>
> I think you may misunderstand.  Both the linear search and the tree
> search are retrospective, and the cost of the linear search for dirty
> instances of all types will *always* be equal to or greater than the
> tree search, and usually (in the cases where not everything is dirty)
> it will be MUCH greater.
>
> Proof: To find all the dirty instances of one type with a linear
> search where at least one instance is dirty in a collection by type,
> you must check every instance in the collection.  With a tree search
> you need not check any instance whose referent hasn't been marked as
> "containing something dirty".

My problem here is that the tree search is difficult to do in QOF because 
there is no tree that QOF can understand. This would be one of the logic 
functions in the intermediate library that is also being discussed - a 
function specific to GnuCash and CashUtil.

> > Currently, I can only see this as a solution in search of a problem.
>
> Maybe you're right, but let me play devil's advocate:

:-)

> I don't know the 
> current state of the backends, but imagine this scenario: Backend is
> remote server, and connection to server goes down.  What happens?

Currently? I think GnuCash should fallback to a file:// url and save the 
entire book to GnuCash XML v2. Actually, there is a note in the source about 
this:
/* If there is a backend, and the backend is reachable
 * (i.e. we can communicate with it), then synchronize with 
 * the backend.  If we cannot contact the backend (e.g.
 * because we've gone offline, the network has crashed, etc.)
 * then give the user the option to save to the local disk. 
 *
 * hack alert -- FIXME -- XXX the code below no longer
 * does what the words above say.  This needs fixing.
http://code.neil.williamsleesmill.me.uk/gnome2/qofsession_8c-source.html#l01226
(scroll down to line 1325)

I'll look at fixing that.

There is code in the backend handlers that falls back to file:// if the 
preferred access method is not usable. That could easily be extended.

> One 
> option is that GC prevents the user from continuing to edit the data
> on the screen.  Option two is that GC alerts the user that the
> connection went down and that changes will be committed to the server
> when the connection comes back, if ever.  Let's say we want option
> two.  The user adds/changes some splits and the connection comes back
> so we want to commit what has changed.  But how?

I think it's risky to offer option 2 without some kind of fallback - what if 
the server is actually local and the problem is a sign of something more 
serious - the user's system has become unstable etc.? Alternatively, the user 
might just need to do something else and cannot keep GnuCash running until 
the server comes back online.

That said, the SQL backend can use last_update to identify those instances 
that have changed, both during the outage and afterwards, once the connection 
is restored.

I'd envisage the user taking the option to save to a local file as the HIG / 
intuitive action. Then, once the problem was fixed, the file (edited or not) 
could be reloaded and use Save As... to re-establish the connection to the 
remote server. Just as in any other situation where the backend receives a 
whole new file, there will be increased network traffic until the two are 
synchronised.

Saving to a local file will automatically reset all dirty flags anyway. We 
cannot expect to preserve dirty flags if we give the user the (expected) 
intuitive option to save to a local file in the event of a remote failure.

> Several options: 
>   1) We cached the changes as they were made (as you describe in your
> "predictive" method.)  We just clear the cache.

Yuk. I only gave that example to show how it wouldn't work!
:-)

>   2) We just send the entire Split collection to the backend and let
> it figure out what changed.

SQL can cope with that. All that happens is that on resuming the connection, 
the network traffic increases until the SQL backend is back in sync.

After all, we are not the only application to use a remote connection to a SQL 
server and this problem is not uncommon. As it is the server that deals with 
the most events of this kind, I don't think it's unreasonable to expect the 
server to have efficient code to handle the results of a connection restart, 
independent of which application is using the server. In some situations, 
it's even built into the protocol.

>   3) We do a linear search through the Split collection to find the
> few changes and commit those.

QOF isn't optimised to do that, SQL probably is.

>   4) We do a tree search that finds that only one Account is marked as
> "contains dirty Splits" so our linear search through Splits is only
> through that Account's Splits instead of all Splits.  We find the
> changes and commit them.

To me, this is doing the work of the backend in the UI. Remember, the backend 
- like the book - knows nothing about the tree. The only routines that know 
anything about the conceptual hierarchy of Account over Split are the GUI 
tree model functions.

> Any of those options would work.  But if this is something that
> happens often, 2) and 3) will probably be unacceptably expensive.

I'm still not convinced that this should be done in the UI. Any backend that 
utilises a remote connection should be capable of handling outages in that 
connection. That is the responsibility of the backend and it is a job best 
left to the backend to sort out.

> Maybe GC will never have to address this issue because it will never
> support an "offline" mode with a remote backend.

It should and I'll look at making the file:// fallback work.

> If it does, 4) will 
> be easy to implement as long as instances store a reference to their
> "parent", like Split does.  The implementation is simply to do the
> same thing to the parent's "contains something dirty" flag as you
> currently want to do to the Collections "dirty" flag.

The same problem keeps getting in the way. The book, the backend, the 
collection and the entire query framework know nothing about the parental 
relationship between Account and Split other than that it is an available 
parameter of the relevant objects.

The tree is too specific - QOF is generic and does not get into the specific 
conceptual relationships.

-- 

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20050721/aea58836/attachment.bin