SQL backend performance

Mon Feb 22 18:24:01 EST 2010

As I've mentioned in other posts, I have a pretty large gnucash
datafile -- more than 20 Mb uncompressed. I've been testing the SQL
backend and I'm concerned about the performance, particularly startup
performance.

I've been doing this testing on an inexpensive little HP desktop
machine, dual-core 2.8 Ghz AMD Athlon, 2 Gb memory, 300 Gb 7200 rpm
SATA drive, Arch Linux, 2.6.32 kernel. It may have been cheap, but
it's damned fast (I've been in the computer business since 1964 until
I retired last Oct. and the cost-performance of today's hardware just
boggles my mind, especially when I think about what we put up with 20
or 30 years ago; you haven't lived until you've tried getting your
work done *and* stayed sane while sitting in front of a VT100 talking
to an overloaded Vax 780 running 4.1BSD; motto: "It was hard to build,
it ought to be hard to use"). From my gnucash xml file, I've created
sqlite3 and postgresql databases containing the same data.

Here are the average data-load timings, in seconds, from just a couple
of tests per storage method (this is from the point gnucash says
"reading file" until it's up):

xml	9.10
sqlite3	45.41
postgresql	45.46

My mental model, which may be wrong, of what the SQL backend is doing
is that it is reading the entire database at startup into memory, as
opposed to retrieving the data as-needed (which is the way I'd guess
gnucash+database would be architected if it was being written from
scratch; I'll guess further that given the file-based history of
gnucash, it's existing architecture precluded using the incremental
approach when adding the database backend). I'm guessing this because
of the amount of time it takes to start up when pointed either at
postgresql or sqlite3 versions of the same data. I am further assuming
that, with the SQL backend, as the user adds new things (accounts,
transactions, etc.), new objects get inserted into the database and
modified objects get updated, on the spot. I'm guessing this because
as I make changes, the 'save' button remains grayed-out. So the
primary advantage of the database backend is that file saves are not
necessary. But, at least in my case, I am paying about a 36 second
price at startup to avoid the file saves. File saves on my machine
take about 2 seconds, much faster than reading the file, probably
because the data is being written to the buffer cache. So I'd need to
do 18 file saves during a gnucash session to be worse off (kept
waiting) with the xml file than with the data in either flavor of
database. And that assumes that I am always waiting for file saves to
complete, which is not always the case (I frequently do them after
completing a chunk of work, e.g., entering an investment transaction
from a statement, and the file-save happens while I turn my attention
to the paperwork to figure out what I need to do next).

While I didn't do a lot of timed tests and I've used the word "guess"
an awful lot above (corrections to what I've said above from someone
who actually knows what (s)he is talking about are perfectly welcome),
the numbers confirm what I already knew from many more untimed tests,
that at least in my case, startup takes an unacceptably long time. If
this is the way the released version performs and my
back-of-the-envelope analysis of the tradeoff is correct, I don't
think I'd use the database stuff and just continue with the xml file.
But if you think I've missed something here, please feel free.

/Don