GDA missing records retest (long)

Mark Johnson mrj001 at shaw.ca
Fri Feb 29 16:28:36 EST 2008


Now that I've managed to get the data saved to the DBs and the gnucash's 
exitted, it is time to check the data in the DBs.

As a check on the DB data, I created an awk script to count the numbers 
of accounts, transactions, and splits in the xml data file.  Here is the 
awk script:

BEGIN {
     nAccounts = 0;
     nTransactions = 0;
     nSplits = 0;
     }
/^<gnc:account/   { nAccounts ++ }
/^<gnc:transaction/   { nTransactions ++ }
/<trn:split>/         { nSplits ++ }
END {
   printf("Number of Accounts %d\n", nAccounts);
   printf("Number of Transactions %d\n", nTransactions);
   printf("Number of Splits %d\n", nSplits);
   }

Here is the output from the script:
Number of Accounts 523
Number of Transactions 3663
Number of Splits 11487

The following are the results for each DB:
(editted for clarity)

SQLITE:
sqlite> select count(*) from accounts;
522
sqlite> select count(*) from transactions;
3665
sqlite> select count(*) from splits;
11494
sqlite> select count(*) from slots;
1637

MySQL:
mysql> select count(*) from accounts;
|      522 |
mysql> select count(*) from transactions;
|     3665 |
mysql> select count(*) from splits;
|    11494 |
mysql> select count(*) from slots;
|     1637 |

PostgreSQL:
gnucash_db=# select count(*) from accounts;
   522
gnucash_db=# select count(*) from transactions;
  3665
gnucash_db=# select count(*) from splits;
 11494
gnucash_db=# select count(*) from slots;
  1637

The good news is that now all three DBs have the same number of records.

However, it does not agree with the output of my awk script.

Since I have one extra account showing in the XML file, I checked to see 
if the DBs have the ROOT account in them.  They do not.  
QUESTION:  Is this a problem?

The next important question I had to look at is:  why do I have 2 extra 
transactions and 7 extra splits in the DBs? (Most likely those scheduled 
transactions from when I opened the XML files.)

Comparison of the guids for transactions and splits in each DB revealed 
3 transactions guids were different in each DB.  The ones from MySql 
were not present in the XML.  Similarly there were 7 split guids that 
did not match in the different DBs.  I did not check for them in the 
XML.  This is consistent with 3 scheduled transactions being created 
with 7 splits.  Now, I wish I'd looked closer at that scheduled 
transaction dialog when I first opened the XML yesterday....  Despite 
this, I believe that these three are scheduled transactions.  
QUESTION:  Why do two of them have a entered_date in 2005?  Checking one 
against the XML shows that the last couple of entered_dates were in 
2005, but before that they looked normal.

However, there was an excess of TWO transactions in the DBs, but THREE 
differing guids.  I was able to track this to a missing template 
transaction for a scheduled transaction.  Here is the corresponding 
entry in the PostgreSQL error log:
ERROR:  null value in column "currency_guid" violates not-null constraint
STATEMENT:  INSERT INTO transactions (guid, currency_guid, num, 
post_date, enter_date, description) VALUES 
('16097c3be93a538e66e1de61ad743b0a', NULL, '', '2005-04-27', 
'2005-04-27', 'Telus')

Here is the first portion of the XML:
<gnc:transaction version="2.0.0">
  <trn:id type="guid">16097c3be93a538e66e1de61ad743b0a</trn:id>
  <trn:date-posted>
    <ts:date>2005-04-27 00:00:00 -0600</ts:date>
  </trn:date-posted>
  <trn:date-entered>
    <ts:date>2005-04-27 12:26:51 -0600</ts:date>
  </trn:date-entered>
  <trn:description>Telus</trn:description>
  <trn:splits>

Here is the first portion of another template transaction:
<gnc:transaction version="2.0.0">
  <trn:id type="guid">df0c8dbca68109b354d2b5cb035bf623</trn:id>
  <trn:currency>
    <cmdty:space>ISO4217</cmdty:space>
    <cmdty:id>CAD</cmdty:id>
  </trn:currency>
  <trn:date-posted>
    <ts:date>2005-09-08 00:00:00 -0600</ts:date>
  </trn:date-posted>
  <trn:date-entered>
    <ts:date>2005-09-08 12:10:41 -0600</ts:date>
  </trn:date-entered>
  <trn:description>Primerica</trn:description>
  <trn:splits>

NOTE the missing <trn:currency> tag in the former.
PROBLEM: a missing XML tag has led to a failed data insert.

While looking at the PostgreSQL error log, I also noted the following error:
ERROR:  relation "books" does not exist
STATEMENT:  INSERT INTO books (guid, root_account_guid, 
root_template_guid) VALUES ('be5d5b80c1a8f36ec33139cb27440ca7', 
'8f436420bffa2ba45515192c895b083b', '2c54584a7995ce449f2fd371771ef562')
PROBLEM: the books table is not created.  I checked all 3 DBs.

Also, from the PostgreSQL error log:
ERROR:  null value in column "fullname" violates not-null constraint
STATEMENT:  INSERT INTO commodities (guid, namespace, mnemonic, 
fullname, cusip, fraction, quote_flag, quote_source, quote_tz) VALUES 
('d6d52116cd6e092ea4f525ede20f38d0', 'template', 'template', NULL, NULL, 
0, 0, NULL, '')
There were many similar entries.  I'm not sure what this means.  The XML 
data does not match this guid.  All three DBs have 30 commodities.  The 
XML file has the following near the beginning:
<gnc:count-data cd:type="commodity">29</gnc:count-data>
PROBLEM: spurious commodities inserts.

I found the following in the XML file:
<gnc:account version="2.0.0">
  <act:name/>
  <act:id type="guid">6b516dc592018e5b3968bcbd5625c7cf</act:id>
  <act:type>ROOT</act:type>
  <act:commodity>
    <cmdty:space>template</cmdty:space>
    <cmdty:id>template</cmdty:id>
  </act:commodity>
  <act:commodity-scu>1</act:commodity-scu>
  <act:parent type="guid">dd5fa0a84ddbdfc6fde7b6bcb810a5b7</act:parent>
</gnc:account>

This is inside the <gnc:template-transactions> tag, so I suppose it is 
related to scheduled transactions.  I guess it is the source of the 
spurious commodities inserts.

I found a commodity in the DBs with namespace, mnemonic and fullname all 
set to "template".  This appears to the the reason for the differing 
counts between the XML and the DBs.
QUESTION: should this be a commodity?  If so, why does gnucash-gda 
attempt to insert a bunch more copies of it?

The above is quite long.  I've highlight "PROBLEM"s and "QUESTION"s with 
those strings respectively.  This is now a tremendous improvement over 
the previous situation.

Mark



More information about the gnucash-devel mailing list