XML size

Cornel DIACONU corneld66@yahoo.com
Wed, 3 Apr 2002 23:03:36 -0800 (PST)


Since I've started this thread a lot of thoughts have
been exchanged...
I have myself something to say, especially to Paul's
comments:

>I doubt for the home user this is going be a problem,
expecially if 
>the 'transaction period' code allows you to save off
said periods to 
>separate files.  Tens of megabytes of core?  Memory
is cheap, and so 
>is disk space.  I've got over a gig of swap and lots
of memory.  Tens 
>of megs of core doesn't bother me that much,
especially when my 
>combined dataset is only 3.6M, and that's spread out
over 2 separate 
>files.
I'm sorry, but I totally disagree with this. Memory is
not at all cheap. Maybe you forgot some months ago
when a simple earthquake in South-East Asia raised
memory costs at about 2-3 times in just a few days. 
I wil never buy memory just to be able to load a flat
ASCII file of my data to than make a report on it
!!!!!!!!
I'd be much happyer if any program will run in just
64K of RAM ! (I know either that we've come to a
situation that it's not feasable anymore, but maybe
we'll through away these PCs and come to use the new
PDAs someday).
Buying 256M of RAM just to be able to load into it my
very large XML database of GnuCash is not my intention
for spending my money...


You say that you'd prefer to use regexp on the
database file to find some kind of info you need. You
really want to say that learning regexp way of
searching is EASIER than SELECT ??????
You may need to think it again. I don't know how much
SQL did you know, but trust me, it's MUCH EASIER to
group some kind of values in your database through
SELECTs than to extract them with any regexp you may
think of.

And frankly, I even hate this kind of vicious circle
that Wintel cartel (Windows + Intel, remember) wants
to drag us into : never mind you can't solve this
problem for now, spend more money on ours newly
updated
stuff and (maybe) it will solve it (through brute
force, since they are faster...). I'd prefer
inteligent solving not brute-force...
Memory and disk space will never be cheap (at least in
my country, Romania).


>Okay, so I used a bad example.  But I have used this
exact technique 
>(global search/replace) to change things like payee
fields and memo 
>fields extensively with no damage or harm whatsoever.
 As for using 
>the application, not when I have to change 35
occurences of a 
>misspelling which got propogated over time.  That
simply takes way 
>too long.  That's *exactly* why I like the ascii text
file.

You're VERY wrong here. A simple UPDATE xxx SET
this=that wil take a MUCH MUCH MUCH LESS time to
accomplish than ANY search/replace that you've
suggested here. Imagine yourself if your flat ASCII
database file would come to an 50M size on disk. And
you said it yourself, it may even come to 2-3 or even
more sepparate files either.
And by-the-way, you've had 35 of misspelling in your
records and never got them detected ?!?!? Kind of
sloppy, isn't it ? ;))

> [...]
>As I said previously, the average home user isn't
going to have so 
>much data in their file that the size is going grow
to such a state 
>as to impact them or the performance of their system.

And then, why do I have to wait longer for my GnuCash
app to even launch for the first time, and then to
open various accounts, compared to the previous binary
format of the database ? I must point out here that my
personal DB is somehow the same size than yours 
(it's about 3.5M large).


> Additionally, if I need to move the file somewhere,
a text file 
> affords much better compression than a binary file. 
My current file 
> is 2.1M and compresses to 196k.  That easily fits on
a floppy, or can 
> even be e-mailed to someone.
[...]
> Additionally, I currently have my gnucash file under
RCS.  Try 
> putting an SQL database under RCS.
That's only you. Myself don't care to have diffs
between accounts. I'm just uploading the accounts with
their respective amounts and make reports sometimes...
RCS for my DB file is just a useless headache for me
;-)
Anyway, I challenge you to put that file into RCS when
it will grow at 50M large !!!! Try it at 200M
afterwards...
You really think it will make you any sense then ? I
sincerelly doubt that.


> Don't databases keep transaction logs and backup
files?  What happens 
> when the system crashes because of a power outage in
the middle of 
> of a transaction?  Now you've got a corrupt binary
file that contains 
> all your data and it's essentially useless.
You obviously don't (want to) know nothing about DBMSs
...
I've NEVER (repet: NEVER) came to such a case of
corruption of my DB file with either of the DBMSs I've
worked so far (Oracle, Informix, DB2, PostgreSQL, 
MySQL, and you name anything in this world further).
This is just the point with you, and your point of
view: you HAVE to trust the DBMS with something (alas,
you have to trust somebody from time to time); you
have to trust it wil have you data file trustworthy,
that it will maintain the integrity of your data (in
his OWN WAY, different from yours ;)
I actually trust Oracle's (PostgreSQL's) way of
keaping my data safe, even if I can't watch inside the
database file and see that data.
Suppose the XML database file will grow up to 200M. I
can bet you on anything that you will than curse it
yourself, when you'll come to find some peticular
value of something which lies down somewhere at the
end of the file (trust me, I have to do such a thing
often enough with some files I have to deal in my
daily job where I work; it's not just ugly, it's
horrible...

You have to have in your mind this: in a RDBMS system,
you wil NEVER come to corruption of your database file
(I give you a 99.9% of that !!!), even it will offer
you the tools and knowledge to deal whith this
problem...


>To me, it buys me a HUGE amount.  But that's me.  I
don't care it 
>there's not another person in the world who works the
way I do, this 
>works for me, and GnuCash, in it's current state,
allows me to do 
>this.  To change to SQL back end would not allow me
to do what I'm 
>currently doing, and would, really, not be any better
for me than 
>Quicken, other than it runs on Linux.
>But, that's just me.  I really don't expect the
architecture designs 
>for GnuCash to be made only according to what I want,
I'm just 
>voicing my opinions in hopes that I'll be heard.  If,
ultimately,
>things go against my wishes, well, c'est la vie.  But
I'd be derelict 
>for not at least voicing how I feel :)

That's all about it: YOU and probably ONLY YOU have
this habbit to save you data to RCS database either.
This is not an argument for keaping the format of the
database ASCII ;-)

>I still maintain that for the average home user, the
flat ascii text 
>file is the best bet.
It may be for now, but do you imagine yourself after a
5 years of using this way of doing it ?
You really think your database file will remain at
this kind of size ?
Try very hard to figure this: for every account record
you insert into this XML database you have around 10
other lines of text inserted around it in the file.
There WILL be some point in time when this will knock
you down, when even opening the flat ASCII file in ANY
editor in this world (be it on Linux, on Gates's
Windows, on any Unices) will become a hell on earth...

[...]
You said:
>> I still maintain that for the average home user,
the flat ascii text 
>> file is the best bet.
In response, Derek said:
>Again, I disagree with you.  My parents couldn't care
less what file
>format their data is.  They care about usability of
the application,
>and part of that usability is speed of startup, speed
of shutdown, and
>speed of data access/modification.

I totally agree with this.
I am myself a maniac of speedy launch of any app, of
lesser memory consumption and speed of acces to the
data I need.
Maybe you only care on looking for yourself at that
data (I understood somehow that you don't use any of
GnuCash's report features, because you make your own
reports by just searching through the flat XML file ?!
;-)
Maybe you really should read some more about this
(sometimes wonderful) thing called SQL.
It certainlly is not the brightest tool invented, but
I bet you will make much easier any kind of report you
want using SQL's SELECT than any search you may invent
on that flat ASCII file (being it regexp or not)

====
May the Force be with you ...

Cornel.



=====
<<<<Linux user # 179833>>>>
Programming in BASIC causes brain damage. (Edsger Wybe Dijkstra)

__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/