(AUDIT?) Re: r14892 - gnucash/trunk - Add a new QOF_TYPE_NUMSTRING to add numeric sorts. (#150799).

Tue Sep 26 12:20:55 EDT 2006

On Tue, Sep 26, 2006 at 11:26:04AM -0400, Derek Atkins wrote:
> Quoting Chris Shoemaker <c.shoemaker at cox.net>:
> 
> >>That doesn't work with SQL backends when you want to return a subset of
> >>the responses.
> >
> >Playing the devil's advocate...
> >
> >A) We don't really have a working SQL backend.  B) and no one is
> >really working on one.  But ignoring that for now...
> 
> I concede A, but B is certainly in my mind..  If I can gain up the
> energy I certainly plan to work on it, but probably not in the timeframe
> for 2.2.
> 
> >>For example, if I wanted to do something like:
> >>
> >> select * from Split where <...> order by <...> limit 10;
> >>
> >>In this case, the "order by" clause is important in the underlying
> >>query.    If you don't get the 'order by' correct then you'll get
> >>the "wrong" objects returned, which probably isn't what you want.
> >
> >Well, you get the "right" objects, just in the wrong order.  If the user
> >changes the sort from ascending to descending, do you want to requery
> >the backend with different SQL?  Of course not.  You just reorder all
> >the objects you already have.  This is true for any sorting operation.
> 
> Not really.  Assume you have 100 objects in the database, but you want
> to see the most recent 10 objects.  If you only ask SQL for 10 objects,
> then the 10 objects it returns may not be the 10 objects you want to
> display unless the 'sort' matches.  For example, if the sort is backwards,
> you might want to see objects 1-10 but it gives you 91-100.  Or even
> worse, if you're sorting on the wrong thing it might give you some
> "random" set of the items between 1 and 100.
> 

Oh, I missed that "limit 10" part.  This is really conflating
filtering with sorting.  Does _GnuCash_ really have a use for "filter
N"?  _Even_ if we want to support remote datasets larger than RAM, you
already have filtering by "where".  So, you're describing a case when
you don't even want to return full query results!  I just don't see
this being even remotely possible for "personal and small-business"
accounting software.

> Now, one approach to work around this is to assume you have regular
> checkpointing in the database (e.g. when you "close the books") and
> then you always pull in all objects since the last checkpoint.  Then
> you don't have to worry about it, except in the cases where you want
> to "go back in time" and see things that happened in the closed-out
> periods..  Then you just need to pull in periods "atomically" -- i.e.
> you always grab a full period of data from the database.
> 
> >>Either that or you need to full working copy of all your data
> >>in core, which can get very expensive in large data sets.
> >
> >By "core" do you mean L1 data cache or just RAM?  Either way, I'm
> >_very_ skeptical of design decisions made with this motivation.
> >Assuming you mean RAM, I would assert that the number of users who:
> 
> I'm not thinking about it in terms of CPU cache usage.  I'm thinking
> about it in terms of what's stored in QOF, and what QOF has to do
> in order to give you results.
> 
> >a) would consider using GnuCash and
> >
> >b) have a financial dataset whose in memory (not on disk)
> >representation is even 1/10 of the amount of RAM that came in the
> >machine they want to use for GnuCash
> >
> >is actually zero.
> 
> I dunno.  Go grab Don Paolo's data set..  1000 accounts.   100,000 
> transaction.

Well, I figure the on-disk representation is probably 2-4 times larger
than the in memory size (totally a guess).  So I wouldn't worry unless
his datafiles are > .5GB.

> Then tell me that it's okay to have it all in QOF's RAM "cache"..

I would say it's okay to have it all in RAM, and I don't think it
needs any special "cache" at all.

> Now imagine going out to 20 years of data, hundreds of thousands of
> transactions...

10 years, 20 years, 100 years... Datasets grow linearly.  RAM doesn't.
To find the cross-over point when personal and small-business
accounting data approached sizes larger than average RAM, I think we'd
have to go back to the 1980s.

> Wouldn't you rather have a smaller working set?  I know I would.

>From a user's POV, smaller memory requirements traded for increased
latency isn't a clear win.  From a developer's POV, having uniform
access to the whole dataset is a clear benefit.

> >Yes, I understand that QOF was designed to handle NASA's multi
> >petabyte image databases.  I just think it's unnecessarily burdonsome
> >to perpetuate that design requirement in GnuCash's QOF when it doesn't
> >benefit any GnuCash users.
> 
> I wasn't really thinking in those terms...  But I do think that requiring
> QOF to operate on 20 years of data for every operation is sub-optimal.

I don't really think of it as "QOF" operating.  I think of it as
"GnuCash" operating.  And I think GnuCash should have immediate access
to all of the data in a "book", even if that's 20 years.  Now, book
closing is a nice feature, too....

> >I think it's _especially_ beneficial to drop the "our database might
> >be bigger than RAM" ideas as we consider options for
> >extending/rewriting QOF in the future.
> 
> I disagree...  but perhaps we can just agree to disagree..  If this is
> what you wanted then we might as well forego the SQL and just turn the
> data file into a transaction log.  Every time you "commit" and operation
> you just append to the log.  When you load it in, you just read the log
> and parse it into RAM.
> 
> So, why don't we do it this way?   

Well, this is essentially exactly the way GnuCash's only supported
backend works, except we only append in RAM and only save when asked
to.

> It would get the autosave feature
> that everyone is asking for.  It would mean that everything is in RAM
> for you.  The only thing it wouldn't solve is the multi-user problem.

Exactly true.  So what do we think about multi-user?  The thing is,
for multi-user access, partial loading is just an _optimization_.
It's not required for correctness.  The thing that's _not_ optional is
correct locking.  I don't know if GnuCash will _ever_ support
multi-user (I certainly hope so) but just allowing partial loads
doesn't solve the multi-user problem either.  I'd rather get locking
right first without worrying about partial loads, and then see if
partial loads are worth it (but I suspect not).

> 
> >BTW, I don't object to this current changeset, or even backporting it.
> >This is just the way QOF is today.  I'm only concerned that we
> >re-evaluate that design decision going forward.
> 
> I think this conversation is completely orthogonal to the changeset.  I'm
> working on approach #2 and I plan to send a patch to -devel once I
> get it working that way..  Then we can decide which patch we'd prefer.

Absolutely agreed.

-chris

> 
> >Just my $(small number)/$(xaccCommodityGetFraction(comm)) 
> >$(gnc_get_default_currency()).
> 
> Heh.
> 
> >-chris
> >
> 
> 
> 
> -- 
>       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>       Member, MIT Student Information Processing Board  (SIPB)
>       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>       warlord at MIT.EDU                        PGP key available
>