XML size

Paul Lussier plussier@mindspring.com
Thu, 04 Apr 2002 10:12:44 -0500


I'm not going to quote everything in your reply.  I just want to make 
some points:

	- File size

	  My file size, on a per year basis is about 2.0M.  Currently
	  I split each accounting/calendar year into a separate file.
	  This keeps them small, and affords a speedy launch time.

	  Doing this, my files will likely never grow to 50M.
	  There has been talk for a while that Gnucash would
	  implement accounting periods, my assumption, though it may 
	  be incorrect, was that they would automate the separation
	  of files.

	- SQL vs. regexp

	  I never said other people should learn regexps.  I simply
	  stated that I can do this now and I don't want to lose this
	  capability.  As for learning SQL, I know enough to be 
	  dangerous, but how exactly would knowing it even help me
	  unless what the designers were talking about was 
	  implementing an entire database system as a backend 
	  requiring a server?  If they're using an embedded database
	  as Derek was mentioning, how do I access the data without
	  loading Gnucash up.  Remember, I want/need access to my 
	  data under circumstances which preclude the use of X and 
	  even the Gnucash GUI.  If it's an embedded data base,
	  I don't understand how this will be possible, UNLESS
	  the ascii import/export feature is written also as
	  a command line option which does not launch the GUI.
	  In which case, the SQL vs. regexp argument is moot, since
	  that gives us both what we want.
	
	- Cheap memory

	  I guess cheap is relative.  Memory is not $50USD per 
	  megabyte.  I remember quite well when it was.  Less than
	  $20USD per megabyte to me is cheap.  I guess it's all 
	  relative.

	- Wintel circle

	  I never said that newer, better, faster hardware
	  should be a requirement for anything.  I simply
	  pointed out that if the performance enhancements
	  are not overly significant for the current
	  generation of hardware, that they will be completely
	  imperceptible on the next generation of hardware.
	  That's a simple fact.  Processor speed speed doubles
	  about every 18 months and the prices fall.
	  That doesn't mean that everyone will run out and buy the
	  next generation immediately, but over time, the
	  lowest common denominator does rise.  They're are
	  still people running 386 and 486 class machines out there.
	  Not as many as there were 1, 2, or 3 years ago, but there 
	  are still some.  Over time, these people will likely
	  jump up to a PIII or PIV class machine.  Any negligable
	  performance improvement made by developers today for the
	  current class of machine they are using will be completely
	  lost on those with better machines and in the future.

	  There's a difference between that and writing programs
	  to specifically take advantage of next generation 
	  technology, and then making it a requirement to run the program.	

	- Undetected/propagated misspellings

	  I made the mistake once, GnuCash's auto-completion helped me
	  in propagating the mistake.  I didn't notice it until it
	  was already propagated throughout the file.

	  Using SELECT may or may not be faster than a regexp, I 
	  don't know.  I do know that right now it's a moot point, 
	  since right now I *can* use a regexp to fix it using tools 
	  I already know and can't use SELECT, since it's not 
	  implemented.  However, were SELECT implemented, and I had 
	  the choice to use either, I'd opt for using regexp because,
	  as I said, that's what I know, therefore it's faster for me.

	- Using RCS/regexps on large files

	  I have 10 years of back e-mail totalling several gigabytes.
	  I use regexps on my e-mail all the time with out any 
	  significant/unreasonable delay.  "Unreasonable delay" being measured
	  according to what I think it should take given the dataset 
	  size.  That said, I don't think I would run into a problem,
	  given that I tend to keep my data sets small, <5M.

	- Tust/knowledge of DMBSes.

	  Your are right on two accounts.  I do not know a tremendous
	  amount about databases.  I understand their uses and their 
	  benefits to a certain degree.  However, I do not implicitly
	  trust them.  I have seen them corrupt their datasets, I have
	  had to restore the data files from backupu tapes for the 
	  DBA, and no, I don't trust anything on a computer system 
	  when there's a human behind how it works.  Humans are 
	  fallible, and the programs they write have bugs.  It's a 
	  fact of life.  I'm a sysadmin, I've seen far too many
	  people screw things up that have wasted my time and theirs
	  to ever trust things implicitly.  That's why I want a backup
	  of things.  That's why I want to be able to access my data
	  when the application *can't* be used.  I'm not saying it's 
	  something that will be terribly useful all the time, just 
	  that the ability should not be removed.

	  You are wrong one account.  I am very interested in how 
	  databases work.  Not enough to become a DBA, but enough 
	  to want to understand how they are more efficient and how
	  they can enhance the users experience, especially from the 
	  perspective of and application like GnuCash.


>>>>> On Wed, 3 Apr 2002, "Cornel" == Cornel DIACONU wrote:

  Cornel> Maybe you really should read some more about this (sometimes
  Cornel> wonderful) thing called SQL.  It certainlly is not the
  Cornel> brightest tool invented, but I bet you will make much easier
  Cornel> any kind of report you want using SQL's SELECT than any
  Cornel> search you may invent on that flat ASCII file (being it
  Cornel> regexp or not)

Keep in mind, I'm not writing reports, and I don't deny the 
usefulness of SQL.  All I want to be able to do (most of the time)
is simply 'grep' my file.  For example, my wife asked me the other day
roughly when we bought something a particular store.  I didn't have 
Gnucash up and running, and didn't really want to load it up since I 
was in the middle of other things.  All I needed to do was type

	less gnucash.xac

and search for the store name.  That gave me, very quickly the dates 
we went to that store.  From that, it was trivial for me to isolate
which transaction was the relevant one.  The whole process took me
less than 3 seconds.  The same process using GnuCash would have taken 
significantly longer since there's now a database involved and the 
search, rather letting my type 5 characters to search would have 
required an entire sentence in SQL.

Granted, if the database backend and the GnuCash frontend were separate,
and the database were currently running, but GnuCash were not, the
search process would probably have been about the same amount of time
from beginning to end (i.e. the amount of time to construct the SQL
statement and get results would be roughly the same as manually
searching the file and mentally figuring out which of the transactions
I really needed.  The SQL search itself would have been faster, but the
statement construction would have been slower.)

If it were an embedded database only accessible through the GnuCash 
frontend, then the entire process would have been significantly 
longer, since I would then need to wait for the application to start, 
then I would need to point and click through a bunch of menu items 
making selections and switching between keyboard and mouse.  The 
entire process would probably be on order of at least 1 minute if not 
2.  To me, that's not progress or enhancement, it impedance.

One last thing, I never said I wanted, needed, or liked XML.  I only 
stated that I wanted, needed, and like ascii text.  Those are 2 very 
different statements.  All XML is ASCII (AFAIK :)  but not all ascii 
is XML.

However, as I said earlier, since the conversation has shifted to 
that of using an embedded database and providing a text import/export 
feature, we all get what we feel we want or need.  So none of this 
discussion really matters much anymore :)
-- 

Seeya,
Paul