Error parsing compressed file but reads uncompressed ok
jralls at ceridwen.us
Sun Nov 17 11:09:46 EST 2013
On Nov 16, 2013, at 8:38 PM, Mike Alexander <mta at umich.edu> wrote:
> --On 5 November 2013 14:57, John Ralls <jralls at ceridwen.us> wrote:
>>> On Nov 5, 2013, at 6:46 AM, Colin Law <clanlaw at googlemail.com> wrote:
>>>> On 5 November 2013 12:39, Colin Law <clanlaw at googlemail.com> wrote:
>>>>> I find today that I cannot open my compressed xml accounts file, I
>>>>> get a parse error opening the file. I am using version 2.4.13 (on
>>>>> Ubuntu 13.10). If I uncompress the file with gunzip it will open
>>>>> ok, but if I I save it compressed again then again it will not
>>>>> reopen. I can read the same compressed file ok with version 2.4.10
>>>>> (on Ubuntu 12.04), and if I save it with version 2.4.10 then I can
>>>>> re-open it with version 2.4.13.
>>>> In fact I see that even if I make a new trivial file and save it
>>>> compressed then I cannot re-open it. Either something has got
>>>> messed up in my system or an update has introduced a bug.
>>> There haven't been any updates to Gnucash-2.4.13. The fact that you
>> fix the problem by unzipping by hand suggests that there's a problem
>> either with libz on Ubuntu 13-10 or with GC's linkage to it.
>> In fact I see I was wrong about it failing with a trivial file. I
>> have gone back through the backups and find that I can still open the
>> file from earlier today but cannot open ones after a certain point, so
>> it does seem to be related to this particular file. I have tried on
>> another Ubuntu system running ubuntu Trusty daily build and see the
>> same problem, so it is not related to user settings.
>> I had a vague recollection of a similar problem reported on the forum
>> a few weeks ago, but can't find it now.
>> I guess you must be right, for some reason gnucash is unable to
>> correctly unzip this particular file. Any suggestions on how to
>> progress this further? I could send the file privately to someone if
>> they wanted to have a look at it.
> This happened to me a few days ago so I was able to look into the cause. As you suspected, it's not a bug in GnuCash. Instead it's a bug in libxml2. It is able to decompress zipped XML files itself so GnuCash just gives it the compressed file and let's it do it's thing. Rather than just calling gzread to do this, libxml2 contains slightly modified code copied from libz and it's this code that contains the bug.
> After the compressed data in a zipped file there is a trailer which contains in the first two words a CRC checksum and byte length for the uncompressed data. The code in libxml2 reads the compressed file in 1024 byte chunks and if that trailer overlaps two chunks then the part in the second chunk isn't read. Instead it reads the first part and then declares a premature EOF. I'll be that if you look at the file that failed it will be slightly longer than a multiple of 1024.
> The patch to libxml2 to fix this is easy. I'll submit it to the MacPorts maintainer for libxml2 who will presumably submit it upstream. However, it will take a while for this to propagate to all the platforms that matter for GnuCash. GnuCash is also quite capable of decompressing the file itself. I suggest that we change it so that it never passes a compressed file to libxml2 and instead decompresses it itself. This isn't a difficult change and unless you think it is a bad idea, I'll do it.
Good job, but file the bug directly to libxml2:
Why wait for a distro maintainer to push it upstream, especially if you have a patch?
That aside, I agree that it would be better for us to do the unzipping, so by all means make that change and backport it to 2.4.
More information about the gnucash-devel