Performance improvement for xml loads (+comments)

Rob Browning rlb@cs.utexas.edu
09 Dec 2000 16:54:18 -0600


Before I continue, let me state for the record, that dealing
gracefully with arbitrary hardware/os-failures was never a stated goal
of the file format, so I'm going to ignore the points relating to
that.

I also maintain that all of this talk is essentially wasted, so I'm
probably going to quit responding to "kill the XML" arguments soon.
We have a format, it's almost certainly a very temporary format -
presuming that the SQL stuff goes as we expect, so it's not worth the
time it takes to type to hassle about it now.  And the XML code (or
some other similar thing) *will* still be used, most likely, as a text
import/export format.

Al Snell <alaric@alaric-snell.com> writes:

> Gzipping and ungzipping are actually quite memory and CPU intensive
> operations.

You assert this as if it's fact, yet I've seen actual tests of our
code showing that gzipping it has a negligable (less than 5 percent as
I recall) effect on write speed.

I've also seen work here done at UT that shows that for real-cases,
sometimes compression can actually be a performance win, depending on
the algorithm and circumstances -- hence the research into compressed
page VM systems.  Disks are still far slower than CPU's and RAM, this
means that compression can increase your effective bandwidth to disk
in some circumstances, so I won't take it as a given that compression
is *always* slower, though it may well be in this case.

> An XDR version would take no time at all; I've got much of it already
> written in a CVS repository. From the .x files I've written, rpcgen
> will create the C type defintiions for GnuCash data structures, and
> C code to load and save them. Easy peasy!

If I had known anything about XDR (what is it?) when we were trying to
figure out what do to with the dead binary format, I would certainly
have considered it.  I didn't, and I don't recall anyone else bringing
it up as a trivial solution then either.

But again, if we're going to SQL, and that's the current plan, at
least, then this point is moot.  (Though I am still interested in
knowing more about XDR.)

> There's a basic set you can depend on, but it gets hairy above that. I
> design RDBMS schemas for a living... in cross-DBMS
> ("heterogenous") environments (MySQL and PostgreSQL a speciality -
> www.upmystreet.com runs on my schemas, and since the introduction of the
> classified ads systems, it's taking more load than the GnuCash databases
> of a pretty large organisation will ever take :-)
> 
> Using an embedded SQL "server" within GnuCash will be a Good Thing. From
> an outside perspective, it will just mean that GnuCash uses a file of some
> wierd binary format (that nobody has a hope in hell of hand-tweaking); but
> it means that with little more than the flick of acompile-time switch, it
> could also use a "live" RDBMS server, sharing access with other users and
> all that.

Well it looks like David Merrill may be interested in starting work on
the SQL stuff.  We'd love to have your help/input if you're
interested.

Thanks

-- 
Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930