(AUDIT?) Re: r14892 - gnucash/trunk - Add a new QOF_TYPE_NUMSTRING to add numeric sorts. (#150799).

Chris Shoemaker c.shoemaker at cox.net
Tue Sep 26 12:20:55 EDT 2006

On Tue, Sep 26, 2006 at 11:26:04AM -0400, Derek Atkins wrote:
> Quoting Chris Shoemaker <c.shoemaker at cox.net>:
> >>That doesn't work with SQL backends when you want to return a subset of
> >>the responses.
> >
> >Playing the devil's advocate...
> >
> >A) We don't really have a working SQL backend.  B) and no one is
> >really working on one.  But ignoring that for now...
> I concede A, but B is certainly in my mind..  If I can gain up the
> energy I certainly plan to work on it, but probably not in the timeframe
> for 2.2.
> >>For example, if I wanted to do something like:
> >>
> >> select * from Split where <...> order by <...> limit 10;
> >>
> >>In this case, the "order by" clause is important in the underlying
> >>query.    If you don't get the 'order by' correct then you'll get
> >>the "wrong" objects returned, which probably isn't what you want.
> >
> >Well, you get the "right" objects, just in the wrong order.  If the user
> >changes the sort from ascending to descending, do you want to requery
> >the backend with different SQL?  Of course not.  You just reorder all
> >the objects you already have.  This is true for any sorting operation.
> Not really.  Assume you have 100 objects in the database, but you want
> to see the most recent 10 objects.  If you only ask SQL for 10 objects,
> then the 10 objects it returns may not be the 10 objects you want to
> display unless the 'sort' matches.  For example, if the sort is backwards,
> you might want to see objects 1-10 but it gives you 91-100.  Or even
> worse, if you're sorting on the wrong thing it might give you some
> "random" set of the items between 1 and 100.

Oh, I missed that "limit 10" part.  This is really conflating
filtering with sorting.  Does _GnuCash_ really have a use for "filter
N"?  _Even_ if we want to support remote datasets larger than RAM, you
already have filtering by "where".  So, you're describing a case when
you don't even want to return full query results!  I just don't see
this being even remotely possible for "personal and small-business"
accounting software.

> Now, one approach to work around this is to assume you have regular
> checkpointing in the database (e.g. when you "close the books") and
> then you always pull in all objects since the last checkpoint.  Then
> you don't have to worry about it, except in the cases where you want
> to "go back in time" and see things that happened in the closed-out
> periods..  Then you just need to pull in periods "atomically" -- i.e.
> you always grab a full period of data from the database.
> >>Either that or you need to full working copy of all your data
> >>in core, which can get very expensive in large data sets.
> >
> >By "core" do you mean L1 data cache or just RAM?  Either way, I'm
> >_very_ skeptical of design decisions made with this motivation.
> >Assuming you mean RAM, I would assert that the number of users who:
> I'm not thinking about it in terms of CPU cache usage.  I'm thinking
> about it in terms of what's stored in QOF, and what QOF has to do
> in order to give you results.
> >a) would consider using GnuCash and
> >
> >b) have a financial dataset whose in memory (not on disk)
> >representation is even 1/10 of the amount of RAM that came in the
> >machine they want to use for GnuCash
> >
> >is actually zero.
> I dunno.  Go grab Don Paolo's data set..  1000 accounts.   100,000 
> transaction.

Well, I figure the on-disk representation is probably 2-4 times larger
than the in memory size (totally a guess).  So I wouldn't worry unless
his datafiles are > .5GB.

> Then tell me that it's okay to have it all in QOF's RAM "cache"..

I would say it's okay to have it all in RAM, and I don't think it
needs any special "cache" at all.

> Now imagine going out to 20 years of data, hundreds of thousands of
> transactions...

10 years, 20 years, 100 years... Datasets grow linearly.  RAM doesn't.
To find the cross-over point when personal and small-business
accounting data approached sizes larger than average RAM, I think we'd
have to go back to the 1980s.

> Wouldn't you rather have a smaller working set?  I know I would.

>From a user's POV, smaller memory requirements traded for increased
latency isn't a clear win.  From a developer's POV, having uniform
access to the whole dataset is a clear benefit.

> >Yes, I understand that QOF was designed to handle NASA's multi
> >petabyte image databases.  I just think it's unnecessarily burdonsome
> >to perpetuate that design requirement in GnuCash's QOF when it doesn't
> >benefit any GnuCash users.
> I wasn't really thinking in those terms...  But I do think that requiring
> QOF to operate on 20 years of data for every operation is sub-optimal.

I don't really think of it as "QOF" operating.  I think of it as
"GnuCash" operating.  And I think GnuCash should have immediate access
to all of the data in a "book", even if that's 20 years.  Now, book
closing is a nice feature, too....

> >I think it's _especially_ beneficial to drop the "our database might
> >be bigger than RAM" ideas as we consider options for
> >extending/rewriting QOF in the future.
> I disagree...  but perhaps we can just agree to disagree..  If this is
> what you wanted then we might as well forego the SQL and just turn the
> data file into a transaction log.  Every time you "commit" and operation
> you just append to the log.  When you load it in, you just read the log
> and parse it into RAM.
> So, why don't we do it this way?   

Well, this is essentially exactly the way GnuCash's only supported
backend works, except we only append in RAM and only save when asked

> It would get the autosave feature
> that everyone is asking for.  It would mean that everything is in RAM
> for you.  The only thing it wouldn't solve is the multi-user problem.

Exactly true.  So what do we think about multi-user?  The thing is,
for multi-user access, partial loading is just an _optimization_.
It's not required for correctness.  The thing that's _not_ optional is
correct locking.  I don't know if GnuCash will _ever_ support
multi-user (I certainly hope so) but just allowing partial loads
doesn't solve the multi-user problem either.  I'd rather get locking
right first without worrying about partial loads, and then see if
partial loads are worth it (but I suspect not).

> >BTW, I don't object to this current changeset, or even backporting it.
> >This is just the way QOF is today.  I'm only concerned that we
> >re-evaluate that design decision going forward.
> I think this conversation is completely orthogonal to the changeset.  I'm
> working on approach #2 and I plan to send a patch to -devel once I
> get it working that way..  Then we can decide which patch we'd prefer.

Absolutely agreed.


> >Just my $(small number)/$(xaccCommodityGetFraction(comm)) 
> >$(gnc_get_default_currency()).
> Heh.
> >-chris
> >
> -- 
>       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>       Member, MIT Student Information Processing Board  (SIPB)
>       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>       warlord at MIT.EDU                        PGP key available

More information about the gnucash-devel mailing list