XML size (was: no subject)

Paul Lussier plussier@mindspring.com
Wed, 03 Apr 2002 12:05:23 -0500


In a message dated: 03 Apr 2002 09:54:16 EST
Derek Atkins said:

>Paul Lussier <plussier@mindspring.com> writes:
>
>> I don't know how else to say this, but I LOVE THE ASCII TEXT FILE!
>> Please don't change it, at least not for the average home user.
>
>Why does it matter to you what format it's in?  Are you actually
>LOOKING at the data file?  Using what tool?  And what do you do with
>it?

Yes, I do look at it from time to time and I use Emacs.  I also make 
changes to the file from time to time using Emacs.

>Wouldn't an "ASCII Export" be more appropriate for you?  That way
>you can easily get at the data you want.

Are you planning on writing an ASCII Import feature as well?

>> Performance is no everything, and once it's loaded, it's loaded and 
>> you're done waiting.
>
>True, performance is NOT everything, but it is still important.  Sure,
>once you wait for it to load it's there, but why wait when you don't
>have to?  You do realize that even with a relatively small dataset
>your Gnucash application can rize up to the tens of megabytes of core?
>That's larger than your X server.  That kind of precludes using a
>low-end machine, doesn't it?

I doubt for the home user this is going be a problem, expecially if 
the 'transaction period' code allows you to save off said periods to 
separate files.  Tens of megabytes of core?  Memory is cheap, and so 
is disk space.  I've got over a gig of swap and lots of memory.  Tens 
of megs of core doesn't bother me that much, especially when my 
combined dataset is only 3.6M, and that's spread out over 2 separate 
files.

Again, I doubt this will be a problem for the home user.  For the 
small business user, maybe, and in that case, maybe an SQL backend is 
justified.  But I don't believe it is for the home user.

>> It's very difficult to run sed 's/Salary/Salary:Taxable/g' on a 
>> database.  I don't *want* to have to know SQL to do the same thing, 
>> nor do I want to run an SQL database on my system or have to wait for 
>> *that* to start up.
>
>Note that this will not only fail to do what you want, but could leave
>your data file unreadable and unusable.  This is _EXACTLY_ the kind of
>thing that we DON'T want people to be doing!  If you want to change
>your data you should use the application to do it.  If you don't then
>you could destroy some of the invariants of the data (for instance,
>only one account may exist with any particular name).

Okay, so I used a bad example.  But I have used this exact technique 
(global search/replace) to change things like payee fields and memo 
fields extensively with no damage or harm whatsoever.  As for using 
the application, not when I have to change 35 occurences of a 
misspelling which got propogated over time.  That simply takes way 
too long.  That's *exactly* why I like the ascii text file.

>Also, with an _embedded_ SQL server (which is what I'm talking about)
>there is no startup time.  There is no separate SQL database process.
>Gnucash would start and you're up and running.  That's the whole
>point.  You don't need to load the data file; you access it in
>real-time when you need it.

But how, with this model, do I access the raw data?  I can't.
>From the sounds of it, this will even preclude me from running a 
command line SQL query on the data.  And what about if I want to 
quickly check something without firing up gnucash?  I now can't do 
that either.  I often log into my system at home from over the 
internet.  I can't run gnucash, since redisplaying X is too painful 
over such a slow link.  I can often find exactly what I want using 
standard unix tools like grep.

Additionally, if I need to move the file somewhere, a text file 
affords much better compression than a binary file.  My current file 
is 2.1M and compresses to 196k.  That easily fits on a floppy, or can 
even be e-mailed to someone.

ASCII is simply the most flexible and most portable format there is.
XML may suck, but at least it's ascii.  An embedded SQL engine with a 
binary data format will essentially take away all the reasons I like 
gnucash.  The reason I hated Quicken was exactly because I couldn't 
access the data.

Additionally, I currently have my gnucash file under RCS.  Try 
putting an SQL database under RCS.

I'm sorry.  But you will never convince me that a binary format, for 
whatever small performance advantage it gains you now, will be a 
better choice than the total flexibility afforded by an ASCII text 
file.  If it's slow now, in less than 18 months processor speed will 
have doubled and your load time will now be 50% of what is.  Any load 
time speed gain realized today will be completely lost on those with 
the next generation processors.
-- 

Seeya,
Paul