Dirty entity identification.

Fri Jul 22 15:23:13 EDT 2005

On Fri, Jul 22, 2005 at 10:43:39AM -0400, Derek Atkins wrote:
> >> Unfortunately not. The engine cannot know that a dirty "account" (whatever 
> >> that is) means a dirty Split (whatever that might be) exists somewhere. There 
> >> is no relationship. The engine only knows that this instance is dirty, that 
> >> collection is dirty and therefore the book is dirty. End.
> >
> > You missed the point here.  Forget about the engine.  This was a
> > *mathematical* statement about searches.  Having information about the
> > presence (or absence) of what you're searching for in a subset of all
> > the places you could look at a cost less than the cost of actually
> > looking in all those places makes your search cheaper.  This has
> > nothing to do with everything that the "engine cannot know."
> 
> True, but caching this information (which is effectively what you're
> doing) comes at a cost.  You need to store extra data somewhere
> (increase storage cost) in order to reduce (time) cost of a search.
> All well and good, provided you know a priori which searches you need
> to optimize.

Updating a boolean in the account doesn't cost any more than updating
a boolean in the collection.  (Of course, you'd probably want both, so
it'll cost twice.)

> 
> QOF is a general search engine and really does NOT understand some of
> the optimizations that can be made.  For example, we actually lost a
> particular optimization in the move from a "Search Accounts for
> Splits/Transactions" to QOF: we lost the ability to reduce the
> search-time by limiting the search to only Splits in particular
> Accounts.  This lossage happened necessarily because QOF does not
> understand that Accounts contain Splits.
> 
> _Accounts_ know that Accounts contain Splits, but QOF does not...  And
> it's QOF that performs that search.

Little red flags just popped up!  I know that QOF offered generalized
search and that's powerful, but let me just think (out loud) for a sec
about what a financial app like Gnucash actually need to do.  

What's one of the most common operations?  Maybe opening a view of all
the splits in an account, viewed as transactions.  Therefore, what's
probably the *most common* query?  I'm guessing it's probably the
query that finds all the splits in an account.  That query is probably
run 100 times more frequently than any other.

What's the most common object type?  Probably splits, there's probably
10 times more splits than the next most common object (probably
Transactions).  ( I have mostly Transactions with many splits, but
for a different user, this may only be more like 3 times.)

So, the by-far most common query has to iterate over the most common
object ever time we open or refresh a register.  And even though the
application-specific, non-generic, financial-logic-containing Account
objects have exactly the list we need already stored, we want to use
the generic, powerful, QOF-Query that's so flexible but has to iterate
over every split in my book and check its Account just to return the
list that the Account already had!  

Please, seriously, please tell me I'm making all this up.

Here's my take on this: We shouldn't be constrained from using the
relationships between financial objects just because some generic
library can't interpret them.  Use the library for what it can do well
(storage? generalized search?).  Use the application domain
relationships in the application where it makes sense.

Implications?  1) My re-written register will allow "anchored" account
cases where QofQuery is not even used, along with ones where it is.
2) I don't see any problem at all with dirtiness propagating back to a
flag in the book where ever containment relationships exist among the
financial objects.

-chris