GDA: PostgreSQL empty slots table
mrj001 at shaw.ca
Sun Feb 10 01:31:07 EST 2008
Phil Longstaff wrote:
> I just saw an e-mail on the libgda mailing list which said that
> GdaQuery might be deprecated in the 4.0 series.
I just saw that too. It is disappointing. You have put in a lot of
work using the GdaQuery.
I had compiled the V4 branch with the idea of testing it with
gnucash-gda. Not necessarily a good idea as it would be testing both
gda and gnucash-gda simultaneously. Thus, so far, I have not installed
it. As you will see below, I am having enough trouble with the current
(3.1.2) PostgreSQL provider.
>>> It looks as though the postgresql backend has a bug. Using the
>>> object model, I create an XML-based representation of the table. I
>>> create the 'slot_id' column with type integer and add the 'AUTOINC'
>>> flag to it. For mysql and sqlite, this is an INTEGER AUTO_INCREMENT
>>> column, while for postgresql, it should be a SERIAL column.
>> Agreed. I have tracked this far enough to file a reasonable bug
>> report against the PostgreSQL backend. I just ran out of time to do
>> it. I'll get to it in the next couple of days.
> Unfortunately, any fix for this will take time to propagate to the
> distributions and make it out to people who want to use the GDA
> backend with postgresql. We might need a workaround.
Yes, I think an "ALTER TABLE....." might work as a workaround. It would
not be a good thing to require the user to do, however. Then one gets
into code like "for this version of this
Another possibility would be to do a "Select MAX(slot_id) FROM slots;"
followed by an INSERT with a specified slot_id. Two problems. I don't
like imposing this overhead on non-PostgreSQL users. And the PostgreSQL
provider has especially poor performance with its GdaQuery, so that
imposing an extra round trip on it has a higher penalty than it should.
That's why they're called "workarounds" - they aren't good enough to be
On the other hand, there is time required to finish this gnucash
backend. Though, I doubt if that would be enough time.
I have now filed a bug report about this SERIAL problem:
>>> I don't really want to special-case the code for various backends
>>> since libgda is supposed to remove that need. There are other cases
>>> where it might be required.
>> Agreed. Not having special cases for different backends is a major
>> reason to use libgda. And it has enough backends that supporting
>> special cases would be an undesirable amount of work.
> Well, we could limit support to sqlite, mysql and postgresql. They
> are the 3 backends for which the provider status is "fully functional".
I hadn't noticed that. I had planned to limit my testing to those three
backends though. It would simply be too much work for me to test a
dozen libgda backends.
I wonder what not "fully functional" implies. For example, when you
execute a GdaQuery, it checks to see if the provider object implements a
function for this purpose. If so, it calls the function on the
provider. If not libgda uses fallback code to implement a GdaCommand
object and calls that. PostgreSQL's provider implements the function
(poorly from a performance point of view), and MySql's does not. Both work.
The reason I say PostgreSQL's provider does this poorly from a
performance point of view is that it converts all such queries into
prepared statements. This entails three round-trips to the db - one to
prepare the statement, one to execute it, and another to deallocate it!
This is done for every query. There is a "FIXME" in the code here, but
it looks like it will only be useful when the client uses GdaQuery with
parameters, which gnucash-gda does not. I foresee a risk that the fix
to this FIXME may cause an extremely large buildup of cached prepared
statements, which, in turn, could cause its own performance degradation.....
Additionally, for every query, some other part of the PostgreSQL
provider executes two queries per table column - checking for unique and
primary keys. I am told this comes from
gda_data_model_describe_column(), but I haven't studied the code to see
why that is called for each gnucash-gda query. (Eg. would this happen
for a SELECT MAX(slot_id).. scalar query?) I have filed a performance
bug on this one:
Investigating why none of these queries ever returned a single row led
Added to the SERIAL problem above, I seem to have found just as many
problems with the PostgreSQL provider as with the software I am
nominally testing - namely gnucash-gda.
>>> SQLite, for example, does not allow more than one row to be added in
>>> an INSERT statement, whereas mysql and postgresql do. I could speed
>>> up initial save to a blank db by writing all accounts at once, for
>>> example, but sqlite couldn't handle it and would need them to be
>>> written one at a time. Note that the libgda object model does not
>>> allow more than 1 row to be written either, so that would need to be
>>> done with SQL directly.
>> I wasn't aware of this limitation of SQLite. I have not had occasion
>> to use it. I've recently looked a little into the manual. Enough to
>> see that it supports sub-queries and only does joins through nested
>> loops. That may cause performance issues with my suggested queries,
>> especially the last one.
> Would the performance be any worse than "SELECT DISTINCT tx_guid FROM
> splits..." followed by "SELECT * from SPLIT where tx_guid IN (...)"
> which is a nested loop broken up into 2 pieces?
These are queries 1 and 2 from the register code. In my other posting
("GDA register open queries"), I propose merging them into one. I do
believe that would improve performance. (Testing of said belief is
I can't do any real benchmarking right now - my test system is building
the latest gnucash-gda. However, roughly, Query A appears to be faster
than query 1 followed by query 2 . I believe that Query A will be
easier for you to maintain - eg. no code to split up long query
strings. Later, I'll do this more rigorously without the unpredictable
load of building gnucash-gda concurrently, and not run the .db file on
an NFS mount. And also for MySql, and PostgreSQL.
Here is how I've benchmarked it:
$ time sqlite3 mydata.db <query1.sql >/dev/null
The first time I ran the above, I sent its output to a file, and edited
the file to produce query2.sql.
Additionally, combining the two queries is one less IPC for MySQL (three
less (plus two for each column in each of the two tables) for PostgreSQL
More information about the gnucash-devel