save corrupted gnucash file

Zhang Weiwu zhangweiwu at realss.com
Thu Sep 23 22:00:25 EDT 2010


Hello. the story had to be told in timeline. Skip the indented content
for fast reading, but if you decide to reply, please do read the
indented details. Thanks in advance!

2010-08-18: I Found Invoices With Duplicated IDs, And "Fixed" It By
Manually Edit GnuCash XML File.

    Detail:

    After using gnucash for 2 years, in gnucash all invoice number from
    000000 to 000044 are taken; and I found two problems:

       1. Invoices with duplicated ID exists. There are two invoice
          000012s, two 000020s and two 000028s. In all cases, one is
          posted and paid, the other not.
       2. Customers with duplicated ID has been found. One customer have
          two records for it, both 000011. One has many invoice
          associated with it, the other not.

    It's not clear how automatically generated IDs can have duplicate
    case, but they must be manually created instead of automatically
    duplicated by GnuCash, because the duplicated ones always have
    content that is different than the other with same ID, and the
    difference are the types that can only be created by human (e.g.
    "Service" and "Services").

    I fixed it in this way:

       1. Empty all entries on the duplicated not-posted invoices.
       2. Back up current gnucash file.
       3. With XML document processing experience in the past years, I
          directly worked on the gnucash file, removing duplicated
          not-posted invoices and duplicated not-having-any-invoices
          customers from the XML file by observing XML structure and
          removing the entries. I might also removed a few references of
          the entries.
       4. Start gnucash, checked these duplicated invoices and customers
          are really gone. But I did not check if I lost other invoices too.

2010-09-21: More Than A Month Later I Found The Previous XML Fix Might
Be Wrong, But Is A Month Late Too Late To Repair?

    I checked the backup I made on 2010-08-18, found invoice 000039 to
    000044 in that backup is gone in my current working file. It is hard
    to explain /why/ they are gone, but I might have an answer for
    /when/. A check of .xac files shows they had been gone as early as
    the earliest sax file on 2010-18-23, but the period between
    2010-08-18 to 2010-18-23 is unclear as they are removed by gnucash
    (gnucash wipe historical records more than a month old), thus it is
    not clear if they are accidentally deleted by user or is a result of
    my modification to XML source. One thing is clear, if some user
    operation has been done to the gnucash file between 2010-08-18 to
    2010-18-23, it must be so in-significant that we can afford to lose,
    because during that period all our customers are in holiday.

    During the last month, one Bill was created and paid; one Invoice
    was created, posted not paid. Their IDs are 000039 and 000040,
    overlapping the invoices lost.


Story over. Now following are trials to solve this problem.

I think of

   1. Recover lost invoices, a.k.a. moving the lost invoices from the
      backup of 2010-08-18 to the current working file.
   2. Replay changes, a.k.a. take the backup of 2010-08-18, replay all
      changes of the log file to it.


I believe recovering invoices is a very difficult task, because of
multiple complicated XML internal references to these paid invoices.
Replay the changes sounds much easier, even counting in the effort  I
had to solve the duplicated invoices/customers problem again.

I tried to replay all changes and failed.

    Detail:

    To replay all changes, first I need to collect them. I did with this:

    $ head -n 2 rss_gnucash_20100923210659.log > /tmp/merged.log; for i in rss_gnucash_2010082714*.log rss_gnucash_201009[01]*.log ; do tail -n +2 $i | sed -e 's/000039/000045/' -e 's/000040/000046/'>> /tmp/merged.log ; done;

      

    Note the 'sed' statement is to shift the invoice/bill numbers
    created in last month. I have double checked by diff that 'sed' did
    not do any stupid replacement in this case.

    And I load GnuCash with the backup of 2010-08-18, ask it to import
    the merged.log, save. Now:

       1. In the saved gnucash XML file, the lost invoices are still
          there, the two bill/invoices newly created in the last month
          is also in the XML file.
       2. if I load GnuCash with this file, search for all invoices, the
          two newly created bill/invoices do not appear in the list. So
          they are somehow hidden. I checked the transaction of the bill
          and found 3 transactions associated with the bill.

    So, the replay is not successful.


What do you suggest me to do from here? Try different way to replay
logs? Manually re-enter everything in the last month by reading the log
files? Or moving lost invoices from the backup?

I will go for manual re-entry if no good way out exist. It might take 2
working days to re-enter everything last month.

Thanks in advance!


More information about the gnucash-user mailing list