Performance improvement for xml loads (+comments)

Tyson Dowd trd@cs.mu.OZ.AU
Thu, 7 Dec 2000 18:20:21 +1100


On 06-Dec-2000, Christopher Browne <cbbrowne@hex.net> wrote:
> > The other thing to consider is that I've heard you can generate a
> > near-optimal binary representation automatically from a DTD.  If you are
> > suggesting an approach like this for generating a binary format, then
> > that would be just fine, because it can be extended and maintained
> > semi-automatically.
> 
> I don't think this is the right answer; the "semi-automatic" part causes
> me concern, in that it doesn't guarantee extensibility.

[semi-digressing here, but still on-topic]

Actually I think you can make it automatic, provided you are willing to
pay a price of using and saving a DTD.

The issue is that you have to have the exact version of the DTD
available to be sure you can read an individual file.  One approach
is to embed the DTD in the file (at the start) along with a frequency
table for the elements (if you used a frequency table, which you
probably should).  So in effect you have all the makings of a huffman table.
Now old versions can always be read in, since they are self describing 
(provided you always use the same algorithm to encode). 

Anyway, I believe technology is out there to do this stuff, but I'm not
sure what is out there in the free software world.  And I suspect zlib
would get almost the same efficiency in all but extreme cases, and
doesn't require a DTD.  But I'm not a compression expert, I just talk
with them at lunchtime...

-- 
       Tyson Dowd           # 
                            #  Surreal humour isn't everyone's cup of fur.
     trd@cs.mu.oz.au        # 
http://www.cs.mu.oz.au/~trd #