Performance improvement for xml loads (+comments)

07 Dec 2000 14:24:48 -0600

Derek Atkins <warlord@MIT.EDU> writes:

> Honestly, I think this is a red herring.  I'm not at all convinced
> that if said tools did exist they would at all be useful.  Sure, you
> have a tagged data tree, but you have to know what the tags mean in
> order to do anything with them.

Well, for me it hasn't been a red herring.  I've already used
perl/sgrep several times to check various things about my data file
(number of accounts, number of transactions, count transactions
containing foo and write total to a file, etc.).  Now granted, most
people won't want/need to do this, but as a developer (and as a
curious higher-level user), I've already found this quite valuable.

Further, the binary format was completely opaque, and very hard to
debug.  The XML one has been quite easy.  I was able to do a number of
validity checks, and spot errors with obvious fixes just using diff.
You can't say that of non-text formats.

Further, say the file gets minor corruption for some reason.  With the
text file, you can just open it up and fix it with emacs/vi/whatever.
With a binary format, you're probably screwed unless you're *really*
an expert, and have a lot more time.

As I said, you and I may just have different perspectives here.  I've
*already* found the text format useful.

> Well, using MySQL or PostgreSQL is just one part of it.  It's a
> storage mechanism, but you still need to create the data formats
> that are stored.  You still need to define the transaction objects
> or split objects or whatever that get stored in the database.  So,
> defining a binary data format now would certainly be useful, IMHO,
> down the road when we move to a DBMS.

But for the most part, this would just involve defining the SQL tables
we need.  I don't see how that involves a "binary data format".  I
must not understand what you mean.

-- 
Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930