(AUDIT?) Re: r14892 - gnucash/trunk - Add a new QOF_TYPE_NUMSTRING to add numeric sorts. (#150799).

Tue Sep 26 11:26:04 EDT 2006

Quoting Chris Shoemaker <c.shoemaker at cox.net>:

>> That doesn't work with SQL backends when you want to return a subset of
>> the responses.
>
> Playing the devil's advocate...
>
> A) We don't really have a working SQL backend.  B) and no one is
> really working on one.  But ignoring that for now...

I concede A, but B is certainly in my mind..  If I can gain up the
energy I certainly plan to work on it, but probably not in the timeframe
for 2.2.

>> For example, if I wanted to do something like:
>>
>>  select * from Split where <...> order by <...> limit 10;
>>
>> In this case, the "order by" clause is important in the underlying
>> query.    If you don't get the 'order by' correct then you'll get
>> the "wrong" objects returned, which probably isn't what you want.
>
> Well, you get the "right" objects, just in the wrong order.  If the user
> changes the sort from ascending to descending, do you want to requery
> the backend with different SQL?  Of course not.  You just reorder all
> the objects you already have.  This is true for any sorting operation.

Not really.  Assume you have 100 objects in the database, but you want
to see the most recent 10 objects.  If you only ask SQL for 10 objects,
then the 10 objects it returns may not be the 10 objects you want to
display unless the 'sort' matches.  For example, if the sort is backwards,
you might want to see objects 1-10 but it gives you 91-100.  Or even
worse, if you're sorting on the wrong thing it might give you some
"random" set of the items between 1 and 100.

Now, one approach to work around this is to assume you have regular
checkpointing in the database (e.g. when you "close the books") and
then you always pull in all objects since the last checkpoint.  Then
you don't have to worry about it, except in the cases where you want
to "go back in time" and see things that happened in the closed-out
periods..  Then you just need to pull in periods "atomically" -- i.e.
you always grab a full period of data from the database.

>> Either that or you need to full working copy of all your data
>> in core, which can get very expensive in large data sets.
>
> By "core" do you mean L1 data cache or just RAM?  Either way, I'm
> _very_ skeptical of design decisions made with this motivation.
> Assuming you mean RAM, I would assert that the number of users who:

I'm not thinking about it in terms of CPU cache usage.  I'm thinking
about it in terms of what's stored in QOF, and what QOF has to do
in order to give you results.

> a) would consider using GnuCash and
>
> b) have a financial dataset whose in memory (not on disk)
> representation is even 1/10 of the amount of RAM that came in the
> machine they want to use for GnuCash
>
> is actually zero.

I dunno.  Go grab Don Paolo's data set..  1000 accounts.   100,000 
transaction.
Then tell me that it's okay to have it all in QOF's RAM "cache"..
Now imagine going out to 20 years of data, hundreds of thousands of
transactions...

Wouldn't you rather have a smaller working set?  I know I would.

> Yes, I understand that QOF was designed to handle NASA's multi
> petabyte image databases.  I just think it's unnecessarily burdonsome
> to perpetuate that design requirement in GnuCash's QOF when it doesn't
> benefit any GnuCash users.

I wasn't really thinking in those terms...  But I do think that requiring
QOF to operate on 20 years of data for every operation is sub-optimal.

> I think it's _especially_ beneficial to drop the "our database might
> be bigger than RAM" ideas as we consider options for
> extending/rewriting QOF in the future.

I disagree...  but perhaps we can just agree to disagree..  If this is
what you wanted then we might as well forego the SQL and just turn the
data file into a transaction log.  Every time you "commit" and operation
you just append to the log.  When you load it in, you just read the log
and parse it into RAM.

So, why don't we do it this way?   It would get the autosave feature
that everyone is asking for.  It would mean that everything is in RAM
for you.  The only thing it wouldn't solve is the multi-user problem.

> BTW, I don't object to this current changeset, or even backporting it.
> This is just the way QOF is today.  I'm only concerned that we
> re-evaluate that design decision going forward.

I think this conversation is completely orthogonal to the changeset.  I'm
working on approach #2 and I plan to send a patch to -devel once I
get it working that way..  Then we can decide which patch we'd prefer.

> Just my $(small number)/$(xaccCommodityGetFraction(comm)) 
> $(gnc_get_default_currency()).

Heh.

> -chris
>

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available