Performance improvement for xml loads (+comments)

Tyson Dowd trd@cs.mu.OZ.AU
Thu, 7 Dec 2000 16:13:06 +1100


On 06-Dec-2000, Derek Atkins <warlord@MIT.EDU> wrote:
> Nobody is suggesting going back to the old binary format.  I'm
> certainly not.  I *AM*, however, suggesting a NEW binary format.

Any new binary format will have to be at least as extensible as XML.
After all, there's no point writing a nice tight binary format today,
when tomorrow there will be another field that needs to be added.

If you are worried about load times and memory usage, we should consider
using a SAX interface to read in the XML.  See this link for tradeoffs:
http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html

The other thing to consider is that I've heard you can generate a
near-optimal binary representation automatically from a DTD.  If you are
suggesting an approach like this for generating a binary format, then
that would be just fine, because it can be extended and maintained
semi-automatically.

libxml seems to handle compression transparently if you use 
xmlSetDocCompressMode (xmlDocPtr doc, int mode);
and 
xmlSetCompressMode(int mode);

The compression will probably handle the disk usage wastage problem
pretty well.

Personally, I'm not convinced that performance of the XML routines is
going to be a long term problem.  Besides, a lot of people feel more
comfortable with XML (or compressed XML) than being "locked in" to a
binary format (even if the source is available).  I'd much rather see
improvements to the XML based system than a completely different system,
because there's a lot of synergy to be gained by going with XML.

-- 
       Tyson Dowd           # 
                            #  Surreal humour isn't everyone's cup of fur.
     trd@cs.mu.oz.au        # 
http://www.cs.mu.oz.au/~trd #