(AUDIT?) Re: r14892 - gnucash/trunk - Add a new QOF_TYPE_NUMSTRING to add numeric sorts. (#150799).
c.shoemaker at cox.net
Tue Sep 26 12:20:55 EDT 2006
On Tue, Sep 26, 2006 at 11:26:04AM -0400, Derek Atkins wrote:
> Quoting Chris Shoemaker <c.shoemaker at cox.net>:
> >>That doesn't work with SQL backends when you want to return a subset of
> >>the responses.
> >Playing the devil's advocate...
> >A) We don't really have a working SQL backend. B) and no one is
> >really working on one. But ignoring that for now...
> I concede A, but B is certainly in my mind.. If I can gain up the
> energy I certainly plan to work on it, but probably not in the timeframe
> for 2.2.
> >>For example, if I wanted to do something like:
> >> select * from Split where <...> order by <...> limit 10;
> >>In this case, the "order by" clause is important in the underlying
> >>query. If you don't get the 'order by' correct then you'll get
> >>the "wrong" objects returned, which probably isn't what you want.
> >Well, you get the "right" objects, just in the wrong order. If the user
> >changes the sort from ascending to descending, do you want to requery
> >the backend with different SQL? Of course not. You just reorder all
> >the objects you already have. This is true for any sorting operation.
> Not really. Assume you have 100 objects in the database, but you want
> to see the most recent 10 objects. If you only ask SQL for 10 objects,
> then the 10 objects it returns may not be the 10 objects you want to
> display unless the 'sort' matches. For example, if the sort is backwards,
> you might want to see objects 1-10 but it gives you 91-100. Or even
> worse, if you're sorting on the wrong thing it might give you some
> "random" set of the items between 1 and 100.
Oh, I missed that "limit 10" part. This is really conflating
filtering with sorting. Does _GnuCash_ really have a use for "filter
N"? _Even_ if we want to support remote datasets larger than RAM, you
already have filtering by "where". So, you're describing a case when
you don't even want to return full query results! I just don't see
this being even remotely possible for "personal and small-business"
> Now, one approach to work around this is to assume you have regular
> checkpointing in the database (e.g. when you "close the books") and
> then you always pull in all objects since the last checkpoint. Then
> you don't have to worry about it, except in the cases where you want
> to "go back in time" and see things that happened in the closed-out
> periods.. Then you just need to pull in periods "atomically" -- i.e.
> you always grab a full period of data from the database.
> >>Either that or you need to full working copy of all your data
> >>in core, which can get very expensive in large data sets.
> >By "core" do you mean L1 data cache or just RAM? Either way, I'm
> >_very_ skeptical of design decisions made with this motivation.
> >Assuming you mean RAM, I would assert that the number of users who:
> I'm not thinking about it in terms of CPU cache usage. I'm thinking
> about it in terms of what's stored in QOF, and what QOF has to do
> in order to give you results.
> >a) would consider using GnuCash and
> >b) have a financial dataset whose in memory (not on disk)
> >representation is even 1/10 of the amount of RAM that came in the
> >machine they want to use for GnuCash
> >is actually zero.
> I dunno. Go grab Don Paolo's data set.. 1000 accounts. 100,000
Well, I figure the on-disk representation is probably 2-4 times larger
than the in memory size (totally a guess). So I wouldn't worry unless
his datafiles are > .5GB.
> Then tell me that it's okay to have it all in QOF's RAM "cache"..
I would say it's okay to have it all in RAM, and I don't think it
needs any special "cache" at all.
> Now imagine going out to 20 years of data, hundreds of thousands of
10 years, 20 years, 100 years... Datasets grow linearly. RAM doesn't.
To find the cross-over point when personal and small-business
accounting data approached sizes larger than average RAM, I think we'd
have to go back to the 1980s.
> Wouldn't you rather have a smaller working set? I know I would.
>From a user's POV, smaller memory requirements traded for increased
latency isn't a clear win. From a developer's POV, having uniform
access to the whole dataset is a clear benefit.
> >Yes, I understand that QOF was designed to handle NASA's multi
> >petabyte image databases. I just think it's unnecessarily burdonsome
> >to perpetuate that design requirement in GnuCash's QOF when it doesn't
> >benefit any GnuCash users.
> I wasn't really thinking in those terms... But I do think that requiring
> QOF to operate on 20 years of data for every operation is sub-optimal.
I don't really think of it as "QOF" operating. I think of it as
"GnuCash" operating. And I think GnuCash should have immediate access
to all of the data in a "book", even if that's 20 years. Now, book
closing is a nice feature, too....
> >I think it's _especially_ beneficial to drop the "our database might
> >be bigger than RAM" ideas as we consider options for
> >extending/rewriting QOF in the future.
> I disagree... but perhaps we can just agree to disagree.. If this is
> what you wanted then we might as well forego the SQL and just turn the
> data file into a transaction log. Every time you "commit" and operation
> you just append to the log. When you load it in, you just read the log
> and parse it into RAM.
> So, why don't we do it this way?
Well, this is essentially exactly the way GnuCash's only supported
backend works, except we only append in RAM and only save when asked
> It would get the autosave feature
> that everyone is asking for. It would mean that everything is in RAM
> for you. The only thing it wouldn't solve is the multi-user problem.
Exactly true. So what do we think about multi-user? The thing is,
for multi-user access, partial loading is just an _optimization_.
It's not required for correctness. The thing that's _not_ optional is
correct locking. I don't know if GnuCash will _ever_ support
multi-user (I certainly hope so) but just allowing partial loads
doesn't solve the multi-user problem either. I'd rather get locking
right first without worrying about partial loads, and then see if
partial loads are worth it (but I suspect not).
> >BTW, I don't object to this current changeset, or even backporting it.
> >This is just the way QOF is today. I'm only concerned that we
> >re-evaluate that design decision going forward.
> I think this conversation is completely orthogonal to the changeset. I'm
> working on approach #2 and I plan to send a patch to -devel once I
> get it working that way.. Then we can decide which patch we'd prefer.
> >Just my $(small number)/$(xaccCommodityGetFraction(comm))
> Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
> Member, MIT Student Information Processing Board (SIPB)
> URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
> warlord at MIT.EDU PGP key available
More information about the gnucash-devel