SQL backend performance

Tue Feb 23 11:23:36 EST 2010

On Tue, Feb 23, 2010 at 9:15 AM, Geert Janssens
<janssens-geert at telenet.be> wrote:
> On Tuesday 23 February 2010, Donald Allen wrote:
>> As I've mentioned in other posts, I have a pretty large gnucash
>> datafile -- more than 20 Mb uncompressed. I've been testing the SQL
>> backend and I'm concerned about the performance, particularly startup
>> performance.
>>
>> I've been doing this testing on an inexpensive little HP desktop
>> machine, dual-core 2.8 Ghz AMD Athlon, 2 Gb memory, 300 Gb 7200 rpm
>> SATA drive, Arch Linux, 2.6.32 kernel. It may have been cheap, but
>> it's damned fast (I've been in the computer business since 1964 until
>> I retired last Oct. and the cost-performance of today's hardware just
>> boggles my mind, especially when I think about what we put up with 20
>> or 30 years ago; you haven't lived until you've tried getting your
>> work done *and* stayed sane while sitting in front of a VT100 talking
>> to an overloaded Vax 780 running 4.1BSD; motto: "It was hard to build,
>> it ought to be hard to use"). From my gnucash xml file, I've created
>> sqlite3 and postgresql databases containing the same data.
>>
>> Here are the average data-load timings, in seconds, from just a couple
>> of tests per storage method (this is from the point gnucash says
>> "reading file" until it's up):
>>
>> xml   9.10
>> sqlite3       45.41
>> postgresql    45.46
>>
>> My mental model, which may be wrong, of what the SQL backend is doing
>> is that it is reading the entire database at startup into memory, as
>> opposed to retrieving the data as-needed (which is the way I'd guess
>> gnucash+database would be architected if it was being written from
>> scratch; I'll guess further that given the file-based history of
>> gnucash, it's existing architecture precluded using the incremental
>> approach when adding the database backend). I'm guessing this because
>> of the amount of time it takes to start up when pointed either at
>> postgresql or sqlite3 versions of the same data. I am further assuming
>> that, with the SQL backend, as the user adds new things (accounts,
>> transactions, etc.), new objects get inserted into the database and
>> modified objects get updated, on the spot. I'm guessing this because
>> as I make changes, the 'save' button remains grayed-out. So the
>> primary advantage of the database backend is that file saves are not
>> necessary. But, at least in my case, I am paying about a 36 second
>> price at startup to avoid the file saves. File saves on my machine
>> take about 2 seconds, much faster than reading the file, probably
>> because the data is being written to the buffer cache. So I'd need to
>> do 18 file saves during a gnucash session to be worse off (kept
>> waiting) with the xml file than with the data in either flavor of
>> database. And that assumes that I am always waiting for file saves to
>> complete, which is not always the case (I frequently do them after
>> completing a chunk of work, e.g., entering an investment transaction
>> from a statement, and the file-save happens while I turn my attention
>> to the paperwork to figure out what I need to do next).
>>
>> While I didn't do a lot of timed tests and I've used the word "guess"
>> an awful lot above (corrections to what I've said above from someone
>> who actually knows what (s)he is talking about are perfectly welcome),
>> the numbers confirm what I already knew from many more untimed tests,
>> that at least in my case, startup takes an unacceptably long time. If
>> this is the way the released version performs and my
>> back-of-the-envelope analysis of the tradeoff is correct, I don't
>> think I'd use the database stuff and just continue with the xml file.
>> But if you think I've missed something here, please feel free.
>>
> Your assumptions on how things work are correct.
>
> And I noticed this performance decrease as well.
>
> There is one difference between the xml and the sql backends that may
> influence this (at least in part): the sql backend writes lots of debug
> information to gnucash.trace at present. I don't know how much impact this
> has, I haven't tested without debug information, but if we disable the debug
> information before the 2.4 release, it will surely narrow the gap.

I'm seeing trace files on the order of .5 Mb. As I mentioned earlier,
saving my xml file takes about 2 seconds. It's about 2.5 Mb (over 20
Mb uncompressed) and the 2 seconds includes the time to compress it.
Writing the trace file is not nearly as hard a job and the periodic
writes should be to the buffer cache on any reasonable machine. So
I'll guess (again) that the gap-narrowing won't amount to much. I hope
I'm wrong :-)

>
> In the future this may still be improved such that data is only queried for
> when needed, but this requires a lot of internal modifications. This would not
> have been possible for the next stable release.

I understand why this was done as it was; I said so in my previous
email. If I had been running this project, I'm quite sure I'd have
done the same thing, so I'm not at all critical. I also note that I'm
a bit of an extreme case. I've been tracking my finances with a
computer for a long time (I started using Quicken in 1994 when I
bought my first PC when Windows NT 3.5 became available; I was not
about to use Windows 3.1/MSDOS!), some of that ancient history is in
my gnucash file, so I'm pretty sure I'm on the right tail of the
file-size distribution. People with more reasonable-sized xml files
will not notice the performance hit as much, and the database back-end
frees them from having to worry about file-saves and replaying log
files in the event of a crash.

/Don

>
> Geert
>