Dirty entity identification.

Thu Jul 21 21:33:02 EDT 2005

On Thu, Jul 21, 2005 at 11:22:39PM +0100, Neil Williams wrote:
> On Thursday 21 July 2005 10:04 pm, Chris Shoemaker wrote:
> > If by "incremental storage system" you mean something that commits
> > only what has changed, then we're on the same page.
> 
> Yes.
> 
> > (Incidentally, 
> > even "immediate-commit" systems sometimes fallback to "delayed-commit"
> > systems when they're in "offline" mode.)
> 
> Yes.
> 
> > > I think it would be too large to inflict on all users at all times for
> > > the odd occasion that it might be useful.
> >
> > I think you may misunderstand.  Both the linear search and the tree
> > search are retrospective, and the cost of the linear search for dirty
> > instances of all types will *always* be equal to or greater than the
> > tree search, and usually (in the cases where not everything is dirty)
> > it will be MUCH greater.
> >
> > Proof: To find all the dirty instances of one type with a linear
> > search where at least one instance is dirty in a collection by type,
> > you must check every instance in the collection.  With a tree search
> > you need not check any instance whose referent hasn't been marked as
> > "containing something dirty".
> 
> My problem here is that the tree search is difficult to do in QOF because 
> there is no tree that QOF can understand. This would be one of the logic 
> functions in the intermediate library that is also being discussed - a 
> function specific to GnuCash and CashUtil.

I haven't really been following that thread closely, but maybe QOF
isn't the right place for a tree search.  I don't really know enough
to say.

> 
> > > Currently, I can only see this as a solution in search of a problem.
> >
> > Maybe you're right, but let me play devil's advocate:
> 
> :-)
> 
> > I don't know the 
> > current state of the backends, but imagine this scenario: Backend is
> > remote server, and connection to server goes down.  What happens?
> 
> Currently? I think GnuCash should fallback to a file:// url and save the 
> entire book to GnuCash XML v2. Actually, there is a note in the source about 
> this:

No, not currently.  Ideally.

> /* If there is a backend, and the backend is reachable
>  * (i.e. we can communicate with it), then synchronize with 
>  * the backend.  If we cannot contact the backend (e.g.
>  * because we've gone offline, the network has crashed, etc.)
>  * then give the user the option to save to the local disk. 
>  *
>  * hack alert -- FIXME -- XXX the code below no longer
>  * does what the words above say.  This needs fixing.
> http://code.neil.williamsleesmill.me.uk/gnome2/qofsession_8c-source.html#l01226
> (scroll down to line 1325)
> 
> I'll look at fixing that.
> 
> There is code in the backend handlers that falls back to file:// if the 
> preferred access method is not usable. That could easily be extended.
> 
> > One 
> > option is that GC prevents the user from continuing to edit the data
> > on the screen.  Option two is that GC alerts the user that the
> > connection went down and that changes will be committed to the server
> > when the connection comes back, if ever.  Let's say we want option
> > two.  The user adds/changes some splits and the connection comes back
> > so we want to commit what has changed.  But how?
> 
> I think it's risky to offer option 2 without some kind of fallback - what if 
> the server is actually local and the problem is a sign of something more 
> serious - the user's system has become unstable etc.? Alternatively, the user 
> might just need to do something else and cannot keep GnuCash running until 
> the server comes back online.

That doesn't sound very good.

> 
> That said, the SQL backend can use last_update to identify those instances 
> that have changed, both during the outage and afterwards, once the connection 
> is restored.
> 
> I'd envisage the user taking the option to save to a local file as the HIG / 
> intuitive action. Then, once the problem was fixed, the file (edited or not) 
> could be reloaded and use Save As... to re-establish the connection to the 
> remote server. Just as in any other situation where the backend receives a 
> whole new file, there will be increased network traffic until the two are 
> synchronised.
> 
> Saving to a local file will automatically reset all dirty flags anyway. We 
> cannot expect to preserve dirty flags if we give the user the (expected) 
> intuitive option to save to a local file in the event of a remote failure.

You're describing the worst-case scenario.  Going offline is not
necessarily an error condition.  Many frontend/backend systems are
*designed* to do this as a regular part of operation.  For those
systems, the solutions you describe are not acceptable.  It may be
impossible to save everything to a local file, because client may have
only the portion of the data that he was editing/viewing.  And
retransmitting *everything* he does have everytime he comes back
online isn't feasible either.

> 
> > Several options: 
> >   1) We cached the changes as they were made (as you describe in your
> > "predictive" method.)  We just clear the cache.
> 
> Yuk. I only gave that example to show how it wouldn't work!
> :-)

Yeah, it's complicated, but some systems do this.

> 
> >   2) We just send the entire Split collection to the backend and let
> > it figure out what changed.
> 
> SQL can cope with that. All that happens is that on resuming the connection, 
> the network traffic increases until the SQL backend is back in sync.

And how often would we resend the entire collection of splits?  Every
time the SQL connection goes down and comes back up.  Which could be
every 30 seconds.  It's a good thing there are alternatives.

> 
> After all, we are not the only application to use a remote connection to a SQL 
> server and this problem is not uncommon. As it is the server that deals with 
> the most events of this kind, I don't think it's unreasonable to expect the 
> server to have efficient code to handle the results of a connection restart, 
> independent of which application is using the server. In some situations, 
> it's even built into the protocol.
> 
> >   3) We do a linear search through the Split collection to find the
> > few changes and commit those.
> 
> QOF isn't optimised to do that, SQL probably is.

You can't optimize a linear search.  You *have* to check every
instance.

> 
> >   4) We do a tree search that finds that only one Account is marked as
> > "contains dirty Splits" so our linear search through Splits is only
> > through that Account's Splits instead of all Splits.  We find the
> > changes and commit them.
> 
> To me, this is doing the work of the backend in the UI. Remember, the backend

You're saying the frontend doesn't need to know which instances are dirty?

> - like the book - knows nothing about the tree. The only routines that know 
> anything about the conceptual hierarchy of Account over Split are the GUI 
> tree model functions.

Well, the functions in the engine know that an Account has a list of splits.

> 
> > Any of those options would work.  But if this is something that
> > happens often, 2) and 3) will probably be unacceptably expensive.
> 
> I'm still not convinced that this should be done in the UI. Any backend that 
> utilises a remote connection should be capable of handling outages in that 
> connection. That is the responsibility of the backend and it is a job best 
> left to the backend to sort out.

I disagree.  Both ends need to be intelligent.  It is easy to put all
responsibilty on the backend, but resending all the data to the
backend just because the frontend has no concept of what's dirty is
inefficient.

> 
> > Maybe GC will never have to address this issue because it will never
> > support an "offline" mode with a remote backend.
> 
> It should and I'll look at making the file:// fallback work.

Well, bailing out to a file might be a nice way to handle severe
errors, but it doesn't make gc support "offline" mode.  Like I said,
maybe it never will.

> 
> > If it does, 4) will 
> > be easy to implement as long as instances store a reference to their
> > "parent", like Split does.  The implementation is simply to do the
> > same thing to the parent's "contains something dirty" flag as you
> > currently want to do to the Collections "dirty" flag.
> 
> The same problem keeps getting in the way. The book, the backend, the 
> collection and the entire query framework know nothing about the parental 
> relationship between Account and Split other than that it is an available 
> parameter of the relevant objects.
> 
> The tree is too specific - QOF is generic and does not get into the specific 
> conceptual relationships.

In your view, where exactly are those relationships best represented?

-chris