SQL backend performance

Wed Feb 24 11:41:40 EST 2010

On Wed, Feb 24, 2010 at 10:32 AM, Phil Longstaff <plongstaff at rogers.com> wrote:
> On Wed, 2010-02-24 at 09:59 -0500, Derek Atkins wrote:
>> Donald Allen <donaldcallen at gmail.com> writes:
>>
>> >> I think true measurements will be the only way to find out what causes delays
>> >> where.
>> >
>> > Of course. I spent a big chunk of my career doing performance analysis
>> > on various bits of complicated software and learned very young (the
>> > hard way) that if you think you know how your software behaves and
>> > where the time is going, you are probably wrong. Measurement, done
>> > correctly, is the only way to get to the truth reliably. I sometimes
>> > had to insist on measurement by people who worked for me who were as
>> > cocky (and wrong) as I was when I was young :-)
>> >
>> > But until the measurements are done, there's no harm in doing some
>> > educated guessing, so long as the guessing doesn't replace the
>> > measuring. If you are frequently right, it can help you set your
>> > measurement priorities. If you are frequently wrong, it reminds you
>> > that you aren't too good at modeling the behavior of software in your
>> > head.
>>
>> For what it's worth, the old Postgres backend was dog slow too.
>>
>> I certainly encourage you to perform profiling to determine where our
>> bottlenecks are.
>
> Another thing that I haven't done too much of is trying to add extra
> indexes or optimize queries.  All SQL statements are logged to
> gnucash.trace.  Feel free to add indexes and/or change queries to
> improve performance.
>
> In general, one major problem is that certain areas of the code just
> assume that the data is loaded.  Until we remove those assumptions or
> provide alternatives, it seemed the safer route to just load all data at
> start time.

I have one quick data point for you: I ran 'top' while loading my data
a few times from Postgresql. 'top' is not exactly a surgical
measurement tool, but it can get you started in the right direction by
letting you know what the bottleneck resource is, e.g., I/O-limited,
cpu-limited, etc. What I'm seeing is that for the vast majority of the
time while the data is loading, gnucash-bin is using 100% of a
processor (2 core system). A postgres server process shows up a
distant second occasionally, and then there's a brief period at the
end of the loading where there's a burst of cpu activity by the
postgres server process. But most of the time is spent waiting while
the gnucash-bin process computes like crazy. This is 99% user-mode
time. Now the trick is to get more specific about where the time is
going.

I will offer one of my usual guesses: I don't *think* that missing
indices (resulting in full-table scans) would produce behavior like
this, because I believe the query processing is done on the server
side, so I'm postulating that in that situation, you would see high
cpu utilization by the server, which is not the case. If I'm right,
then this might be good news, if the bulk of the time is being spent
in actual gnucash code (which can be improved once you understand the
problem), as opposed, say, to libpq code. Anyway, as we discussed
earlier, my guessing is not a substitute for actual measurement.

/Don

>
> Phil
>
> _______________________________________________
> gnucash-devel mailing list
> gnucash-devel at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
>