GDA referential integrity

Mark Johnson mrj001 at shaw.ca
Fri Jan 18 00:38:55 EST 2008


Phil Longstaff wrote:
> Mark Johnson wrote:
>   
>> Having successfully loaded my xml file into gnucash-gda, I saved it to a 
>> new postgresql database.  As Phil mentioned, the performance was poor.  
>> I have several years of data, and it took hours to save.  The save 
>> completed sometime during the night, so I don't know exactly how long it 
>> took.
>>   
>>     
>
> I think the basic performance problem is that on the initial save, I am 
> saving each object individually rather than batched.  During regular 
> gnucash use, if you create a new account and then commit it, the gda 
> backend needs to commit the commodity, because it might not have been 
> committed.  Then, a SELECT is done to see if the account already exists 
> and is followed by an INSERT if it does not, or an UPDATE if it does.  I 
> know some databases allow INSERT OR UPDATE i.e. INSERT if it doesn't 
> exist and UPDATE if it does, but GDA does not support that, and not all 
> GDA backends do, so I can't count on it.  In addition, when saving 
> (creating or modifying) an account, the old set of saved slots needs to 
> be deleted and then the new slots need to be saved.  However, on an 
> initial save from XML into a GDA backend, 1) those checks and deletes 
> are unnecessary because the tables are empty and 2) a batched mode 
> INSERT INTO accounts VALUES 
> (...account1...),(...account2...),(...account3...),... could be used.  I 
> think GDA supports this, and if not I can fall back to SQL, but I 
> haven't implemented it yet.
>   
This would certainly cause very slow performance.  Here are some sample 
table sizes to give you an idea of how much data I have in gnucash.  I 
don't think it is especially large.  I have seen some users post who 
have quite a few more years of data in gnucash than I.

gnucash_db=# select count(*) from accounts;
 count
-------
   491
(1 row)

gnucash_db=# select count(*) from splits;
 count
-------
 11224
(1 row)

gnucash_db=# select count(*) from transactions;
 count
-------
  3587
(1 row)

Regarding performance, my Postgresql is running locally on the test 
system and its data file is stored on the local hard drive (not the NFS 
share :-).  I haven't tried connecting it to a remote Postgresql, but 
before gnucash-gda is ready for release, I do want to test this.

Poor performance continues beyond the initial save.  Opening a register 
takes an inordinate amount of time.  I did not time it exactly, but it 
is at least tens of seconds, possibly a couple of minutes.  (On a later 
try, I timed it to approximately 1 minute 35 seconds.)

When entering a transaction, each time I tab to a new split, another 
long wait occurs (again tens of seconds). I would guess this is building 
the list of accounts for the split to choose from.  (I tried again for 
timing, and it was about 47 seconds.)  This happened for every split in 
the transaction.

Once I had entered a transaction, going to a new blank transaction was 
also very slow (tens of seconds).

In all cases, "top" shows that gnucash and postgres are both consuming 
significant CPU resources.

Every transaction in the register shows the date 1901-12-13.  I have 
selected the ISO date format in my preferences.  Naturally, the 
transactions are not ordered correctly.

I tried to change the register sort order.  Opening this dialog was very 
slow as well (tens of seconds).  Not sure why, but there was a lot of db 
activity to do it.

Here's something nice about this backend: it is very easy to look at the 
data with sql select statements.  Much easier than with a text editor 
(at least for those of us who know SQL). 

I have looked at the splits for the account in question.  Some of the 
reconcile_dates don't make a great deal of sense to me. All of the 
non-reconciled splits have a reconcile date of 1969-12-31.  Can I assume 
this is an artifact of the date format?

The post dates and enter dates for transactions with splits for this 
account look correct.  It is simply the display that is incorrect.

I am seeing many copies of the following showing on the console where I 
started gnucash:
Error: xaccOpenLog(): cannot open journal
         2 No such file or directory
I assume this relates to the log file gnucash has always kept.  I would 
not have thought this logging necessary with a db backend where, 
presumably, the db is transactional and does its own logging.  I don't 
know for sure whether that would be true of sqlite, though.

Discarding a transaction on a register close is also surprisingly slow, 
with much CPU activity for both the db and gnucash.

For the first time since importing the data to postgresql, I closed and 
re-opened gnucash.  It re-opened the db connection, just as expected.  I 
did not time opening the db, but it was felt comparable to opening the 
xml file.  Now, balances in the accounts tab are all zero (I believe 
Phil said to expect this).  Also, balances are all zero in the account 
register I opened (I'm not sure Phil said to expect this.).

I'm not sure if this is a gnucash-gda issue or libgda.  I note that 
pgadmin3 indicates that the connection user is postgres.  However, my 
connection config has a different user.  Is this something that gnucash 
might override?

Mark




More information about the gnucash-devel mailing list