XML size (was: no subject)

Christopher Browne gnucash@cbbrowne.com
Thu, 04 Apr 2002 00:39:17 -0500

> In a message dated: 03 Apr 2002 20:52:15 EST
> Derek Atkins said:
> >SQL is far from inflexible....
> Maybe I'm confusing some terms here.  To me, SQL is the language used 
> for querying the database, not the format the data is actually stored 
> in.  Am I wrong?  I don't consider SQL, the language inflexible, I 
> consider locking it a binary format inflexible, actully, more like
> inaccessible and less portable.

<sarcasm on>
I suppose you find it offensive when:
 a) People generate tar files, which are a binary format for storing

 b) People distribute binary programs, which use a binary format
    that is only readable using programs like ldd, ar, and linkers.

People should _never_ transfer around programs in binary form; when you
take something written in C, and turn it into a .o file, it becomes
_vastly_ less portable, inflexible, and inaccessible.  

Presumably they should stop doing that.

Beyond that, ext2 is quite offensive; it is a system for storing data in
an inflexible, inaccessible and non-portable binary format.  ReiserFS
is, if anything, worse, since it is only usable on Linux.
<sarcasm off>

Yes, SQL databases store data in binary formats; the wide availability
of tools to provide flexible ways of accessing that data makes the
complaint seem fairly preposterous.

TAR uses a binary format, and I see few reasons to complain about that.
You are _not_ locked into captive user interfaces with PostgreSQL, so
the notion that it's an "inflexible, inaccessible, nonportable" binary
format seems a rather stunning leap.

> >> So you're reducing the size of the application by adding code?
> >
> >Yes, because you don't need all the XML parsing and unparsing,
> >so you can "remove" that, and SQL is extremly small and concice.
> >
> >So, yes, you are reducing size AND complexity by moving to SQL.

> Okay, that makes sense, but, aren't you then increasing complexity of
> the GnuCash system as a whole by now requiring an SQL database server?


It does not substantially "increase the complexity" to require that the
user prefix installation of GnuCash by commands like:

# apt-get install postgresql


# cd /mnt/cdrom/RPMS; rpm -i postgresql*.rpm

Supposing GnuCash is being installed by some dependancy-aware system
like Debian APT or BSD Ports, all that is necessary from the user's
perspective is for the proper dependancies to sit in the proper
dependancy files.

If the addition of an SQL data store diminishes the size of GnuCash,
then that's a win.  If it eliminates the need for GnuCash to use an XML
parser to manage the basic data, that's a win.

If other applications out there might make use of the same SQL data
store, then the PostgreSQL dependancy fairly quickly starts to pay off
even further.

> >> As I said previously, the average home user isn't going to have so 
> >> much data in their file that the size is going grow to such a state 
> >> as to impact them or the performance of their system.
> >
> >That's not what I've been hearing from other users that I've been
> >talking to.
> That may be.  I'm going based on what I've been seeing on the GnuCash 
> mail lists. Maybe I've missed some posts.  It would be interesting to 
> see what kind of system specs and file sizes we're talking about 
> though.  Are people realizing 20 or 30 megabyte files?  Are these 
> multi-year files?  Would this be solved by the introduction of 
> accounting periods so that for instance, each year was in a separate 
> file?  Just curious.

Why force people to do that when it's not necessary?

> >> I'll buy that argument for a business environment, but not for the 
> >> home user.
> >
> >Where is this line drawn?  Besides, if we're going to do it for the
> >home-business environment, you'd just done all the work.

> I would make it modular and optional.  There is already the option to
> build GnuCash with SQL/PostgreSQL support, but it's not required.
> Businesses, or those individuals who wish to use this support may.
> Those who don't have that option as well.  Why does this need to
> change?

If we know that it is straightforward to get PostgreSQL installed on all
the systems that are of interest, which is likely true now, and if we
know that performance can be improved _massively_ for large "installs,"
and we know it improves reliability for all installs, and we know that
it has favorable effects on use of storage, for all installs, then that
seems a "win" all around.

The _problem_ with setting things up to support a whole lot of different
data stores is that those data stores work differently, so that
debugging and performance tuning become more challenging.  

And having multiple code bases for different data stores means that
every API change has to be coded for and tested on each of those data

It is well and neat to say that there can be a generic interface to a
bunch of kinds of data stores; that introduces a bunch of coding effort,
and in a year that has only 365 days, there may be better things to
spend effort on than synchronizing the behaviour of four different data
(reverse (concatenate 'string "ac.notelrac.teneerf@" "454aa"))
"The social dynamics  of the net are a direct  consequence of the fact
that nobody  has yet developed  a Remote Strangulation  Protocol."  
-- Larry Wall