Fwd: Error parsing compressed file but reads uncompressed ok
mta at umich.edu
Sat Nov 16 23:38:42 EST 2013
--On 5 November 2013 14:57, John Ralls <jralls at ceridwen.us> wrote:
>> On Nov 5, 2013, at 6:46 AM, Colin Law <clanlaw at googlemail.com> wrote:
>>> On 5 November 2013 12:39, Colin Law <clanlaw at googlemail.com> wrote:
>>>> I find today that I cannot open my compressed xml accounts file, I
>>>> get a parse error opening the file. I am using version 2.4.13 (on
>>>> Ubuntu 13.10). If I uncompress the file with gunzip it will open
>>>> ok, but if I I save it compressed again then again it will not
>>>> reopen. I can read the same compressed file ok with version 2.4.10
>>>> (on Ubuntu 12.04), and if I save it with version 2.4.10 then I can
>>>> re-open it with version 2.4.13.
>>> In fact I see that even if I make a new trivial file and save it
>>> compressed then I cannot re-open it. Either something has got
>>> messed up in my system or an update has introduced a bug.
>> There haven't been any updates to Gnucash-2.4.13. The fact that you
> fix the problem by unzipping by hand suggests that there's a problem
> either with libz on Ubuntu 13-10 or with GC's linkage to it.
> In fact I see I was wrong about it failing with a trivial file. I
> have gone back through the backups and find that I can still open the
> file from earlier today but cannot open ones after a certain point, so
> it does seem to be related to this particular file. I have tried on
> another Ubuntu system running ubuntu Trusty daily build and see the
> same problem, so it is not related to user settings.
> I had a vague recollection of a similar problem reported on the forum
> a few weeks ago, but can't find it now.
> I guess you must be right, for some reason gnucash is unable to
> correctly unzip this particular file. Any suggestions on how to
> progress this further? I could send the file privately to someone if
> they wanted to have a look at it.
This happened to me a few days ago so I was able to look into the
cause. As you suspected, it's not a bug in GnuCash. Instead it's a
bug in libxml2. It is able to decompress zipped XML files itself so
GnuCash just gives it the compressed file and let's it do it's thing.
Rather than just calling gzread to do this, libxml2 contains slightly
modified code copied from libz and it's this code that contains the bug.
After the compressed data in a zipped file there is a trailer which
contains in the first two words a CRC checksum and byte length for the
uncompressed data. The code in libxml2 reads the compressed file in
1024 byte chunks and if that trailer overlaps two chunks then the part
in the second chunk isn't read. Instead it reads the first part and
then declares a premature EOF. I'll be that if you look at the file
that failed it will be slightly longer than a multiple of 1024.
The patch to libxml2 to fix this is easy. I'll submit it to the
MacPorts maintainer for libxml2 who will presumably submit it upstream.
However, it will take a while for this to propagate to all the
platforms that matter for GnuCash. GnuCash is also quite capable of
decompressing the file itself. I suggest that we change it so that it
never passes a compressed file to libxml2 and instead decompresses it
itself. This isn't a difficult change and unless you think it is a bad
idea, I'll do it.
More information about the gnucash-devel