XML size
Paul Lussier
plussier@mindspring.com
Thu, 04 Apr 2002 10:12:44 -0500
I'm not going to quote everything in your reply. I just want to make
some points:
- File size
My file size, on a per year basis is about 2.0M. Currently
I split each accounting/calendar year into a separate file.
This keeps them small, and affords a speedy launch time.
Doing this, my files will likely never grow to 50M.
There has been talk for a while that Gnucash would
implement accounting periods, my assumption, though it may
be incorrect, was that they would automate the separation
of files.
- SQL vs. regexp
I never said other people should learn regexps. I simply
stated that I can do this now and I don't want to lose this
capability. As for learning SQL, I know enough to be
dangerous, but how exactly would knowing it even help me
unless what the designers were talking about was
implementing an entire database system as a backend
requiring a server? If they're using an embedded database
as Derek was mentioning, how do I access the data without
loading Gnucash up. Remember, I want/need access to my
data under circumstances which preclude the use of X and
even the Gnucash GUI. If it's an embedded data base,
I don't understand how this will be possible, UNLESS
the ascii import/export feature is written also as
a command line option which does not launch the GUI.
In which case, the SQL vs. regexp argument is moot, since
that gives us both what we want.
- Cheap memory
I guess cheap is relative. Memory is not $50USD per
megabyte. I remember quite well when it was. Less than
$20USD per megabyte to me is cheap. I guess it's all
relative.
- Wintel circle
I never said that newer, better, faster hardware
should be a requirement for anything. I simply
pointed out that if the performance enhancements
are not overly significant for the current
generation of hardware, that they will be completely
imperceptible on the next generation of hardware.
That's a simple fact. Processor speed speed doubles
about every 18 months and the prices fall.
That doesn't mean that everyone will run out and buy the
next generation immediately, but over time, the
lowest common denominator does rise. They're are
still people running 386 and 486 class machines out there.
Not as many as there were 1, 2, or 3 years ago, but there
are still some. Over time, these people will likely
jump up to a PIII or PIV class machine. Any negligable
performance improvement made by developers today for the
current class of machine they are using will be completely
lost on those with better machines and in the future.
There's a difference between that and writing programs
to specifically take advantage of next generation
technology, and then making it a requirement to run the program.
- Undetected/propagated misspellings
I made the mistake once, GnuCash's auto-completion helped me
in propagating the mistake. I didn't notice it until it
was already propagated throughout the file.
Using SELECT may or may not be faster than a regexp, I
don't know. I do know that right now it's a moot point,
since right now I *can* use a regexp to fix it using tools
I already know and can't use SELECT, since it's not
implemented. However, were SELECT implemented, and I had
the choice to use either, I'd opt for using regexp because,
as I said, that's what I know, therefore it's faster for me.
- Using RCS/regexps on large files
I have 10 years of back e-mail totalling several gigabytes.
I use regexps on my e-mail all the time with out any
significant/unreasonable delay. "Unreasonable delay" being measured
according to what I think it should take given the dataset
size. That said, I don't think I would run into a problem,
given that I tend to keep my data sets small, <5M.
- Tust/knowledge of DMBSes.
Your are right on two accounts. I do not know a tremendous
amount about databases. I understand their uses and their
benefits to a certain degree. However, I do not implicitly
trust them. I have seen them corrupt their datasets, I have
had to restore the data files from backupu tapes for the
DBA, and no, I don't trust anything on a computer system
when there's a human behind how it works. Humans are
fallible, and the programs they write have bugs. It's a
fact of life. I'm a sysadmin, I've seen far too many
people screw things up that have wasted my time and theirs
to ever trust things implicitly. That's why I want a backup
of things. That's why I want to be able to access my data
when the application *can't* be used. I'm not saying it's
something that will be terribly useful all the time, just
that the ability should not be removed.
You are wrong one account. I am very interested in how
databases work. Not enough to become a DBA, but enough
to want to understand how they are more efficient and how
they can enhance the users experience, especially from the
perspective of and application like GnuCash.
>>>>> On Wed, 3 Apr 2002, "Cornel" == Cornel DIACONU wrote:
Cornel> Maybe you really should read some more about this (sometimes
Cornel> wonderful) thing called SQL. It certainlly is not the
Cornel> brightest tool invented, but I bet you will make much easier
Cornel> any kind of report you want using SQL's SELECT than any
Cornel> search you may invent on that flat ASCII file (being it
Cornel> regexp or not)
Keep in mind, I'm not writing reports, and I don't deny the
usefulness of SQL. All I want to be able to do (most of the time)
is simply 'grep' my file. For example, my wife asked me the other day
roughly when we bought something a particular store. I didn't have
Gnucash up and running, and didn't really want to load it up since I
was in the middle of other things. All I needed to do was type
less gnucash.xac
and search for the store name. That gave me, very quickly the dates
we went to that store. From that, it was trivial for me to isolate
which transaction was the relevant one. The whole process took me
less than 3 seconds. The same process using GnuCash would have taken
significantly longer since there's now a database involved and the
search, rather letting my type 5 characters to search would have
required an entire sentence in SQL.
Granted, if the database backend and the GnuCash frontend were separate,
and the database were currently running, but GnuCash were not, the
search process would probably have been about the same amount of time
from beginning to end (i.e. the amount of time to construct the SQL
statement and get results would be roughly the same as manually
searching the file and mentally figuring out which of the transactions
I really needed. The SQL search itself would have been faster, but the
statement construction would have been slower.)
If it were an embedded database only accessible through the GnuCash
frontend, then the entire process would have been significantly
longer, since I would then need to wait for the application to start,
then I would need to point and click through a bunch of menu items
making selections and switching between keyboard and mouse. The
entire process would probably be on order of at least 1 minute if not
2. To me, that's not progress or enhancement, it impedance.
One last thing, I never said I wanted, needed, or liked XML. I only
stated that I wanted, needed, and like ascii text. Those are 2 very
different statements. All XML is ASCII (AFAIK :) but not all ascii
is XML.
However, as I said earlier, since the conversation has shifted to
that of using an embedded database and providing a text import/export
feature, we all get what we feel we want or need. So none of this
discussion really matters much anymore :)
--
Seeya,
Paul