XML size

Glen Ditchfield gjditchfield@acm.org
Wed, 03 Apr 2002 11:08:15 -0600


On April 2, 2002 10:32 am, Jesse Becker wrote:
> ...  If you are worried about space,
> compress the files (like XML lets you do); gzip averages something like
> 50-60% compression on text.

It does rather better on GnuCash files.  I tried compressing my file, 
"Accounts", with gzip and bzip2:

-rw-rw-r--    1 gjditchf gjditchf    91604 Mar 31 08:13 Accounts.bz2
-rw-rw-r--    1 gjditchf gjditchf   121615 Mar 31 08:13 Accounts.gz
-rw-rw-r--    1 gjditchf gjditchf  1350319 Mar 31 08:13 Accounts

gzip took a snappy 0.765 seconds (300MHz K6-2); bzip2 took 9.5 seconds.

Every now and then someone on gnucash-devel suggests modifying GnuCash to let 
it read and write gzipped files.  Sounds good to me ...

I went looking for XML-aware compression schemes and found surprisingly 
little.  There's some alpha-level code at
    http://www.cs.cornell.edu/People/jcheney/xmlppm/xmlppm.html
and a paper there suggests that it wouldn't be hard to generate compressed 
files that are half the size of the gzipped file.