Performance improvement for xml loads (+comments)
Rob Browning
rlb@cs.utexas.edu
07 Dec 2000 14:24:48 -0600
Derek Atkins <warlord@MIT.EDU> writes:
> Honestly, I think this is a red herring. I'm not at all convinced
> that if said tools did exist they would at all be useful. Sure, you
> have a tagged data tree, but you have to know what the tags mean in
> order to do anything with them.
Well, for me it hasn't been a red herring. I've already used
perl/sgrep several times to check various things about my data file
(number of accounts, number of transactions, count transactions
containing foo and write total to a file, etc.). Now granted, most
people won't want/need to do this, but as a developer (and as a
curious higher-level user), I've already found this quite valuable.
Further, the binary format was completely opaque, and very hard to
debug. The XML one has been quite easy. I was able to do a number of
validity checks, and spot errors with obvious fixes just using diff.
You can't say that of non-text formats.
Further, say the file gets minor corruption for some reason. With the
text file, you can just open it up and fix it with emacs/vi/whatever.
With a binary format, you're probably screwed unless you're *really*
an expert, and have a lot more time.
As I said, you and I may just have different perspectives here. I've
*already* found the text format useful.
> Well, using MySQL or PostgreSQL is just one part of it. It's a
> storage mechanism, but you still need to create the data formats
> that are stored. You still need to define the transaction objects
> or split objects or whatever that get stored in the database. So,
> defining a binary data format now would certainly be useful, IMHO,
> down the road when we move to a DBMS.
But for the most part, this would just involve defining the SQL tables
we need. I don't see how that involves a "binary data format". I
must not understand what you mean.
--
Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930