XML size

Robert A. Uhl ruhl@4dv.net
Thu, 4 Apr 2002 08:01:32 -0700


On Wed, Apr 03, 2002 at 11:03:36PM -0800, Cornel DIACONU wrote:
>
> I'm sorry, but I totally disagree with this.  Memory is not at all
> cheap.  Maybe you forgot some months ago when a simple earthquake in
> South-East Asia raised memory costs at about 2-3 times in just a few
> days.  I wil never buy memory just to be able to load a flat ASCII
> file of my data to than make a report on it

That's purely an in-application data representation issue.  The file
itself is or should be loaded into memory essentially a line at a
time--the only stuff that should stay in memory is the actual data
representation.

> You say that you'd prefer to use regexp on the database file to find
> some kind of info you need.  You really want to say that learning
> regexp way of searching is EASIER than SELECT ??????

I'd agree:

grep [Dd]eprec accounts.gnucash

select * from accounts where comment is Deprec or comment is deprec

And what if [Dd]eprec is found in a memo field?  The first example
finds it none-the-less, while the second never will.

For more complicated text searching in a tabular format, awk works
wonders.  And only works a line-at-a-time, thus using very little
memory.

> You're VERY wrong here.  A simple UPDATE xxx SET this=that wil take
> a MUCH MUCH MUCH LESS time to accomplish than ANY search/replace
> that you've suggested here.  Imagine yourself if your flat ASCII
> database file would come to an 50M size on disk.  And you said it
> yourself, it may even come to 2-3 or even more sepparate files
> either.

I don't see how.  A simple sed command will do that sort of thing very
quickly.  A db may or may not do so.  SQL does _not_ guarantee speed,
but rather data integrity &c.  If anything, SQL is typically another
inefficient layer.  I'll grant that it has its place in the world.

> And then, why do I have to wait longer for my GnuCash app to even
> launch for the first time, and then to open various accounts,
> compared to the previous binary format of the database?

There're a lot more cruft and features in this version of GnuCash than
previously.

Note that I too am against XML.  It's inherently slow (and _is_ to
blame for some of the slow load of current versions) and overly
verbose.  It takes up too much space.  Far, far better IMHO would be
Scheme forms.  This would also allow one to do some interesting
guile-fu and do things like:

gnucash accounts-model.scm -o accounts-results

And get a useful datum.  But then, I'd like to be able to write normal
guile code which includes GnuCash's transaction model.  Maybe I can
anyway, and just don't know it.

> > Try putting an SQL database under RCS.
>
> That's only you.  Myself don't care to have diffs between accounts.

Well, this isn't just about what's useful to you.  Many of us, I
think, find the ability to check in versions each evening to be rather
invaluable.  Many tools will typically operate much better on ASCII
data than on binary files.

> That's all about it: YOU and probably ONLY YOU have this habbit to
> save you data to RCS database either.  This is not an argument for
> keaping the format of the database ASCII ;-)

Nor are your issues an argument for switching back to binary.

> Try very hard to figure this: for every account record you insert
> into this XML database you have around 10 other lines of text
> inserted around it in the file.  There WILL be some point in time
> when this will knock you down, when even opening the flat ASCII file
> in ANY editor in this world (be it on Linux, on Gates's Windows, on
> any Unices) will become a hell on earth...

That's what's nice about sed and ed.  They--while painfully
primitive--are very wel-suited to this sort of thing.  And,
incidentally, why Lussier has repeatedly referred to use of sed.

> Maybe you really should read some more about this (sometimes
> wonderful) thing called SQL.

It's certainly useful within its domain.  OTOH, I tend to think that
undergrads and PHBs tend to think it is more applicable than it really
is.

I really do think that Scheme special forms/macros are the way to go.

-- 
Robert Uhl <ruhl@4dv.net>
When you disarm your subjects you offend them by showing that either
from cowardliness or lack of faith, you distrust them; and either
conclusion will induce them to hate you.
      --Niccolo Machiavelli, The Prince