GDA referential integrity
Mark Johnson
mrj001 at shaw.ca
Fri Jan 18 00:38:55 EST 2008
Phil Longstaff wrote:
> Mark Johnson wrote:
>
>> Having successfully loaded my xml file into gnucash-gda, I saved it to a
>> new postgresql database. As Phil mentioned, the performance was poor.
>> I have several years of data, and it took hours to save. The save
>> completed sometime during the night, so I don't know exactly how long it
>> took.
>>
>>
>
> I think the basic performance problem is that on the initial save, I am
> saving each object individually rather than batched. During regular
> gnucash use, if you create a new account and then commit it, the gda
> backend needs to commit the commodity, because it might not have been
> committed. Then, a SELECT is done to see if the account already exists
> and is followed by an INSERT if it does not, or an UPDATE if it does. I
> know some databases allow INSERT OR UPDATE i.e. INSERT if it doesn't
> exist and UPDATE if it does, but GDA does not support that, and not all
> GDA backends do, so I can't count on it. In addition, when saving
> (creating or modifying) an account, the old set of saved slots needs to
> be deleted and then the new slots need to be saved. However, on an
> initial save from XML into a GDA backend, 1) those checks and deletes
> are unnecessary because the tables are empty and 2) a batched mode
> INSERT INTO accounts VALUES
> (...account1...),(...account2...),(...account3...),... could be used. I
> think GDA supports this, and if not I can fall back to SQL, but I
> haven't implemented it yet.
>
This would certainly cause very slow performance. Here are some sample
table sizes to give you an idea of how much data I have in gnucash. I
don't think it is especially large. I have seen some users post who
have quite a few more years of data in gnucash than I.
gnucash_db=# select count(*) from accounts;
count
-------
491
(1 row)
gnucash_db=# select count(*) from splits;
count
-------
11224
(1 row)
gnucash_db=# select count(*) from transactions;
count
-------
3587
(1 row)
Regarding performance, my Postgresql is running locally on the test
system and its data file is stored on the local hard drive (not the NFS
share :-). I haven't tried connecting it to a remote Postgresql, but
before gnucash-gda is ready for release, I do want to test this.
Poor performance continues beyond the initial save. Opening a register
takes an inordinate amount of time. I did not time it exactly, but it
is at least tens of seconds, possibly a couple of minutes. (On a later
try, I timed it to approximately 1 minute 35 seconds.)
When entering a transaction, each time I tab to a new split, another
long wait occurs (again tens of seconds). I would guess this is building
the list of accounts for the split to choose from. (I tried again for
timing, and it was about 47 seconds.) This happened for every split in
the transaction.
Once I had entered a transaction, going to a new blank transaction was
also very slow (tens of seconds).
In all cases, "top" shows that gnucash and postgres are both consuming
significant CPU resources.
Every transaction in the register shows the date 1901-12-13. I have
selected the ISO date format in my preferences. Naturally, the
transactions are not ordered correctly.
I tried to change the register sort order. Opening this dialog was very
slow as well (tens of seconds). Not sure why, but there was a lot of db
activity to do it.
Here's something nice about this backend: it is very easy to look at the
data with sql select statements. Much easier than with a text editor
(at least for those of us who know SQL).
I have looked at the splits for the account in question. Some of the
reconcile_dates don't make a great deal of sense to me. All of the
non-reconciled splits have a reconcile date of 1969-12-31. Can I assume
this is an artifact of the date format?
The post dates and enter dates for transactions with splits for this
account look correct. It is simply the display that is incorrect.
I am seeing many copies of the following showing on the console where I
started gnucash:
Error: xaccOpenLog(): cannot open journal
2 No such file or directory
I assume this relates to the log file gnucash has always kept. I would
not have thought this logging necessary with a db backend where,
presumably, the db is transactional and does its own logging. I don't
know for sure whether that would be true of sqlite, though.
Discarding a transaction on a register close is also surprisingly slow,
with much CPU activity for both the db and gnucash.
For the first time since importing the data to postgresql, I closed and
re-opened gnucash. It re-opened the db connection, just as expected. I
did not time opening the db, but it was felt comparable to opening the
xml file. Now, balances in the accounts tab are all zero (I believe
Phil said to expect this). Also, balances are all zero in the account
register I opened (I'm not sure Phil said to expect this.).
I'm not sure if this is a gnucash-gda issue or libgda. I note that
pgadmin3 indicates that the connection user is postgres. However, my
connection config has a different user. Is this something that gnucash
might override?
Mark
More information about the gnucash-devel
mailing list