Performance improvement for xml loads (+comments)

Derek Atkins warlord@MIT.EDU
07 Dec 2000 14:28:19 -0500


Rob Browning <rlb@cs.utexas.edu> writes:

> I think the synergy here is that people think that if you use XML,
> it's more likely that there will be tools that will be availble to
> allow you to manipulate your data outside the app.  This is in fact
> true.  Writing a parser/transformer to do some arbitrary thing to an
> XML file (massage it, extract things, etc.), or even to any text file,
> is far easier than it would be for some home-brewed format.  Heck you
> can use emacs/perl/whatever...and I have.

Honestly, I think this is a red herring.  I'm not at all convinced
that if said tools did exist they would at all be useful.  Sure, you
have a tagged data tree, but you have to know what the tags mean in
order to do anything with them.  And besides, what would said tool do
with the data anyways?  Anything that you program to be able to
understand the XML could easily be programmed to read a binary format
as well.

I suppose it's useful to use XML for data interchange.. If I wanted to
email you some of my transactions, exporting them in XML and emailing
them would probably be a good thing to support.  OTOH, I don't believe
that a binary format is any more home-brewed than XML.  For example,
using ASN.1 or XDR would require a data format description which is
comparable to a DTD.  The difference is data storage size (before
compression).

> Also, if you do decide to try and whip something up, make sure you're
> aware that we use kvp_frames now, in various places, so you will have
> to be able to accomodate items with arbitrarily deep, recursive
> key/value trees.

I think this can be coped with, but thanks for the warning.

> > I think I'll actually try to write an XDR-based data storage system
> > and we'll see.  I just don't believe anymore that XML is a reasonable
> > way to store large data sets.  XML is a cool technology, but just
> > because a technology is cool doesn't mean that it's the right tool for
> > the job.
> 
> Of course you're welcome to, but why would you waste time on this
> rather than trying to go forward with trying to integrate an embedded
> MySQL or PostgreSQL?

Well, using MySQL or PostgreSQL is just one part of it.  It's a
storage mechanism, but you still need to create the data formats that
are stored.  You still need to define the transaction objects or split
objects or whatever that get stored in the database.  So, defining a
binary data format now would certainly be useful, IMHO, down the road
when we move to a DBMS.

> Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/      PP-ASEL      N1NWH
       warlord@MIT.EDU                        PGP key available