GDA missing records retest (long)
Mark Johnson
mrj001 at shaw.ca
Fri Feb 29 16:28:36 EST 2008
Now that I've managed to get the data saved to the DBs and the gnucash's
exitted, it is time to check the data in the DBs.
As a check on the DB data, I created an awk script to count the numbers
of accounts, transactions, and splits in the xml data file. Here is the
awk script:
BEGIN {
nAccounts = 0;
nTransactions = 0;
nSplits = 0;
}
/^<gnc:account/ { nAccounts ++ }
/^<gnc:transaction/ { nTransactions ++ }
/<trn:split>/ { nSplits ++ }
END {
printf("Number of Accounts %d\n", nAccounts);
printf("Number of Transactions %d\n", nTransactions);
printf("Number of Splits %d\n", nSplits);
}
Here is the output from the script:
Number of Accounts 523
Number of Transactions 3663
Number of Splits 11487
The following are the results for each DB:
(editted for clarity)
SQLITE:
sqlite> select count(*) from accounts;
522
sqlite> select count(*) from transactions;
3665
sqlite> select count(*) from splits;
11494
sqlite> select count(*) from slots;
1637
MySQL:
mysql> select count(*) from accounts;
| 522 |
mysql> select count(*) from transactions;
| 3665 |
mysql> select count(*) from splits;
| 11494 |
mysql> select count(*) from slots;
| 1637 |
PostgreSQL:
gnucash_db=# select count(*) from accounts;
522
gnucash_db=# select count(*) from transactions;
3665
gnucash_db=# select count(*) from splits;
11494
gnucash_db=# select count(*) from slots;
1637
The good news is that now all three DBs have the same number of records.
However, it does not agree with the output of my awk script.
Since I have one extra account showing in the XML file, I checked to see
if the DBs have the ROOT account in them. They do not.
QUESTION: Is this a problem?
The next important question I had to look at is: why do I have 2 extra
transactions and 7 extra splits in the DBs? (Most likely those scheduled
transactions from when I opened the XML files.)
Comparison of the guids for transactions and splits in each DB revealed
3 transactions guids were different in each DB. The ones from MySql
were not present in the XML. Similarly there were 7 split guids that
did not match in the different DBs. I did not check for them in the
XML. This is consistent with 3 scheduled transactions being created
with 7 splits. Now, I wish I'd looked closer at that scheduled
transaction dialog when I first opened the XML yesterday.... Despite
this, I believe that these three are scheduled transactions.
QUESTION: Why do two of them have a entered_date in 2005? Checking one
against the XML shows that the last couple of entered_dates were in
2005, but before that they looked normal.
However, there was an excess of TWO transactions in the DBs, but THREE
differing guids. I was able to track this to a missing template
transaction for a scheduled transaction. Here is the corresponding
entry in the PostgreSQL error log:
ERROR: null value in column "currency_guid" violates not-null constraint
STATEMENT: INSERT INTO transactions (guid, currency_guid, num,
post_date, enter_date, description) VALUES
('16097c3be93a538e66e1de61ad743b0a', NULL, '', '2005-04-27',
'2005-04-27', 'Telus')
Here is the first portion of the XML:
<gnc:transaction version="2.0.0">
<trn:id type="guid">16097c3be93a538e66e1de61ad743b0a</trn:id>
<trn:date-posted>
<ts:date>2005-04-27 00:00:00 -0600</ts:date>
</trn:date-posted>
<trn:date-entered>
<ts:date>2005-04-27 12:26:51 -0600</ts:date>
</trn:date-entered>
<trn:description>Telus</trn:description>
<trn:splits>
Here is the first portion of another template transaction:
<gnc:transaction version="2.0.0">
<trn:id type="guid">df0c8dbca68109b354d2b5cb035bf623</trn:id>
<trn:currency>
<cmdty:space>ISO4217</cmdty:space>
<cmdty:id>CAD</cmdty:id>
</trn:currency>
<trn:date-posted>
<ts:date>2005-09-08 00:00:00 -0600</ts:date>
</trn:date-posted>
<trn:date-entered>
<ts:date>2005-09-08 12:10:41 -0600</ts:date>
</trn:date-entered>
<trn:description>Primerica</trn:description>
<trn:splits>
NOTE the missing <trn:currency> tag in the former.
PROBLEM: a missing XML tag has led to a failed data insert.
While looking at the PostgreSQL error log, I also noted the following error:
ERROR: relation "books" does not exist
STATEMENT: INSERT INTO books (guid, root_account_guid,
root_template_guid) VALUES ('be5d5b80c1a8f36ec33139cb27440ca7',
'8f436420bffa2ba45515192c895b083b', '2c54584a7995ce449f2fd371771ef562')
PROBLEM: the books table is not created. I checked all 3 DBs.
Also, from the PostgreSQL error log:
ERROR: null value in column "fullname" violates not-null constraint
STATEMENT: INSERT INTO commodities (guid, namespace, mnemonic,
fullname, cusip, fraction, quote_flag, quote_source, quote_tz) VALUES
('d6d52116cd6e092ea4f525ede20f38d0', 'template', 'template', NULL, NULL,
0, 0, NULL, '')
There were many similar entries. I'm not sure what this means. The XML
data does not match this guid. All three DBs have 30 commodities. The
XML file has the following near the beginning:
<gnc:count-data cd:type="commodity">29</gnc:count-data>
PROBLEM: spurious commodities inserts.
I found the following in the XML file:
<gnc:account version="2.0.0">
<act:name/>
<act:id type="guid">6b516dc592018e5b3968bcbd5625c7cf</act:id>
<act:type>ROOT</act:type>
<act:commodity>
<cmdty:space>template</cmdty:space>
<cmdty:id>template</cmdty:id>
</act:commodity>
<act:commodity-scu>1</act:commodity-scu>
<act:parent type="guid">dd5fa0a84ddbdfc6fde7b6bcb810a5b7</act:parent>
</gnc:account>
This is inside the <gnc:template-transactions> tag, so I suppose it is
related to scheduled transactions. I guess it is the source of the
spurious commodities inserts.
I found a commodity in the DBs with namespace, mnemonic and fullname all
set to "template". This appears to the the reason for the differing
counts between the XML and the DBs.
QUESTION: should this be a commodity? If so, why does gnucash-gda
attempt to insert a bunch more copies of it?
The above is quite long. I've highlight "PROBLEM"s and "QUESTION"s with
those strings respectively. This is now a tremendous improvement over
the previous situation.
Mark
More information about the gnucash-devel
mailing list