Performance improvement for xml loads (+comments)

Derek Atkins warlord@MIT.EDU
07 Dec 2000 17:44:15 -0500


Patrick Spinler <spinler.patrick@mayo.edu> writes:

> I disagree.  
> 
> Now, I may very well be talking out my butt here, since I've never
> looked at XML closely, but my understanding is that one of the key
> aspects of XML is that there's a tagged description of the data format
> in the data itself, yes ?  That is, XML always includes the meta data
> along with the data, so you can figure out what the data is.

Nope.  I believe the DTD (e.g. format metadata) COULD be included,
but usually it is not.  Indeed, looking at my GnuCash output, it starts
with:
<?xml version="1.0"?>
<gnc>
  <version>1</version>
  <ledger-data>
    <commodity>
      <restore>
        <space>NASDAQ</space>
        <id>RHAT</id>
        <name>RHAT</name>
...

So, as you can see, there is no data format description.  There is
nothing that describes what "gnc" means, nor what "ledger-data" means,
nor "commodity", etc.  The application that reads the tree must know
how to handle that information.

> Okay, this characteristic (including the meta-data) is also part of the
> definition of an RDBMS system.  That is, a complete data format and type
> description of all the data in the database is included in that
> database.  You can't call yourself an RDBMS unless you store your
> meta-data inside your database.  Every database I've ever worked with
> (PostgreSQL, Sybase, Ingres, Oracle, MS-SQL Server, Interbase, yadda
> yadda) does this.

I admit that I don't know very much about DBMS systems.  Are columns
in a table labeled?  And can you arbitrarily add a new column to an
existing table (I suppose you could create a new table with the
existing column information and add the new column, then destroy the
old table and rename the new table back to the old one).  I just want
to make sure that you're not dependent upon a particular column
position for data.  (Again, I know next to nothing about DBMS or SQL).

But if this _IS_ how you do it, I still claim that it is equivalent to
reading in all your data and then writing it out in the "new" format
(or, in this case, a new table with an additional column).  I suppose
the only benefit here is that if you run an older application that
doesn't understand the new data against the newer database table with
the newer data, then it could still theoretically understand the rest
of the data.  However, I wouldn't trust it, because that extra column
may have information that the older application NEEDS to know in order
to properly understand the data.

> So, once you have the data in the database, it's relatively as easy to
> extend as any other meta-data containing system.

If tables are named and columns are labeled (as opposed to indexed)
then yes, you are correct.

> The other componant of XML as I understand it (or of any other meta-data
> containing system), is to have predefined DDL's and componants to work
> on those DDL's, so that you don't just know what the meta-data is, but
> what it _means_.  This problem is similar _however_ you store your
> data.  No advantage to any format here, you still have to have code to
> handle it.

Right, so XML doesn't help you here (compared to other data format
systems). XDR provides a DDL as well, as does ASN.1.  So this is a
wash.

> -- Pat

-derek
-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/      PP-ASEL      N1NWH
       warlord@MIT.EDU                        PGP key available