save corrupted gnucash file
Derek Atkins
warlord at MIT.EDU
Fri Sep 24 09:51:31 EDT 2010
Hi,
Zhang Weiwu <zhangweiwu at realss.com> writes:
> Hello. the story had to be told in timeline. Skip the indented content
> for fast reading, but if you decide to reply, please do read the
> indented details. Thanks in advance!
>
> 2010-08-18: I Found Invoices With Duplicated IDs, And "Fixed" It By
> Manually Edit GnuCash XML File.
First, you should NEVER manually edit the GnUCash XML file. It's an
unsupported operation and can completely destroy your data if done
incorrectly.
> Detail:
>
> After using gnucash for 2 years, in gnucash all invoice number from
> 000000 to 000044 are taken; and I found two problems:
>
> 1. Invoices with duplicated ID exists. There are two invoice
> 000012s, two 000020s and two 000028s. In all cases, one is
> posted and paid, the other not.
There is no requirement that InvoiceIDs be unique. While GnuCash does
keep an internal counter, there is no requirement that you use it.
Moreover, manual entry of invoice IDs will NOT affect the counter, so if
you enter an ID of X the internal counter can still reproduce 'X' when
it gets to it in the count.
However, THIS IS NOT A BUG. There is nothing wrong with having multiple
Invoices with the same ID (as far as GnuCash is concerned). The
InvoiceID is *NOT* the key into the database, so yes, you can have two
Invoices with the same ID and different content. The key is that each
invoice has a unique GUID which is completely internal.
The invoice ID is 100% for human consumption; there is no meaning to
GnuCash. It's just a string.
> 2. Customers with duplicated ID has been found. One customer have
> two records for it, both 000011. One has many invoice
> associated with it, the other not.
See above. There is no requirement that Customer IDs be unique, either,
for the same reason as invoices. And again, the customer ID is purely
for human consumption.
> It's not clear how automatically generated IDs can have duplicate
> case, but they must be manually created instead of automatically
> duplicated by GnuCash, because the duplicated ones always have
> content that is different than the other with same ID, and the
> difference are the types that can only be created by human (e.g.
> "Service" and "Services").
Yes, most likely you created entries by hand, or you're doing something
funky with some merges?
> I fixed it in this way:
>
> 1. Empty all entries on the duplicated not-posted invoices.
> 2. Back up current gnucash file.
> 3. With XML document processing experience in the past years, I
> directly worked on the gnucash file, removing duplicated
> not-posted invoices and duplicated not-having-any-invoices
> customers from the XML file by observing XML structure and
> removing the entries. I might also removed a few references of
> the entries.
> 4. Start gnucash, checked these duplicated invoices and customers
> are really gone. But I did not check if I lost other invoices too.
>
> 2010-09-21: More Than A Month Later I Found The Previous XML Fix Might
> Be Wrong, But Is A Month Late Too Late To Repair?
>
> I checked the backup I made on 2010-08-18, found invoice 000039 to
> 000044 in that backup is gone in my current working file. It is hard
> to explain /why/ they are gone, but I might have an answer for
> /when/. A check of .xac files shows they had been gone as early as
> the earliest sax file on 2010-18-23, but the period between
> 2010-08-18 to 2010-18-23 is unclear as they are removed by gnucash
> (gnucash wipe historical records more than a month old), thus it is
> not clear if they are accidentally deleted by user or is a result of
> my modification to XML source. One thing is clear, if some user
> operation has been done to the gnucash file between 2010-08-18 to
> 2010-18-23, it must be so in-significant that we can afford to lose,
> because during that period all our customers are in holiday.
>
> During the last month, one Bill was created and paid; one Invoice
> was created, posted not paid. Their IDs are 000039 and 000040,
> overlapping the invoices lost.
Another thing to keep in mind is that in modern GnuCash versions there
are multiple counters for Customer Invoices and Vendor Bills. So these
numbers can (and WILL) overlap. However in the XML data they are all
GncInvoice objects; the only way to differentiate between an Invoice and
a Bill is by dereferencing the Owner.
> Story over. Now following are trials to solve this problem.
>
> I think of
>
> 1. Recover lost invoices, a.k.a. moving the lost invoices from the
> backup of 2010-08-18 to the current working file.
> 2. Replay changes, a.k.a. take the backup of 2010-08-18, replay all
> changes of the log file to it.
#2 cannot work -- there is no log of Business entry, and replaying a log
that has business entries *WILL* destroy your database.
> I believe recovering invoices is a very difficult task, because of
> multiple complicated XML internal references to these paid invoices.
> Replay the changes sounds much easier, even counting in the effort I
> had to solve the duplicated invoices/customers problem again.
Well, you screwed yourself by manually editing your database in the
first place.
> I tried to replay all changes and failed.
Yep.
> Detail:
>
> To replay all changes, first I need to collect them. I did with this:
>
> $ head -n 2 rss_gnucash_20100923210659.log > /tmp/merged.log; for i in rss_gnucash_2010082714*.log rss_gnucash_201009[01]*.log ; do tail -n +2 $i | sed -e 's/000039/000045/' -e 's/000040/000046/'>> /tmp/merged.log ; done;
>
>
>
> Note the 'sed' statement is to shift the invoice/bill numbers
> created in last month. I have double checked by diff that 'sed' did
> not do any stupid replacement in this case.
>
> And I load GnuCash with the backup of 2010-08-18, ask it to import
> the merged.log, save. Now:
>
> 1. In the saved gnucash XML file, the lost invoices are still
> there, the two bill/invoices newly created in the last month
> is also in the XML file.
> 2. if I load GnuCash with this file, search for all invoices, the
> two newly created bill/invoices do not appear in the list. So
> they are somehow hidden. I checked the transaction of the bill
> and found 3 transactions associated with the bill.
>
> So, the replay is not successful.
Correct.
> What do you suggest me to do from here? Try different way to replay
> logs? Manually re-enter everything in the last month by reading the log
> files? Or moving lost invoices from the backup?
I suggest you revert to your backup file and re-enter all the data that
happened since then.
> I will go for manual re-entry if no good way out exist. It might take 2
> working days to re-enter everything last month.
Well, this is what you get for manually editing your datafile!
> Thanks in advance!
> Please remember to CC this list on all your replies.
> You can do this by using Reply-To-List or Reply-All.
-derek
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord at MIT.EDU PGP key available
More information about the gnucash-user
mailing list