XML size

Derek Atkins warlord@MIT.EDU
03 Apr 2002 09:45:50 -0500

Dale Alspach <alspach@math.okstate.edu> writes:

> the user set the time between autosaves. Clean up autosaved files
> automatically rather than forcing the user to acquire/write a script to
> delete old ones. If it was thought neccessary these could be handled like
> /var/log files keeping one per session/day/week (user choice?) up to five.

I agree that the log files should go away, or at least be
automatically cleaned up when you have a successful "save".  OTOH,
with a database, you don't need log files because your changes are
stored on-disk immediately.

In my mind, the log file should only be necessary for backends where
the file is re-written at "close", and the log-file should be removed
after the data was successfully saved.  That still leaves the question
of backup files.  I'm not sure what to do about that, honestly.

However, with a database (even as a file), all this goes away!

> On the XML vs binary vs database file. I don't like the binary only option.
> I do like to be able to get at the data myself using other software
> when necessary. Thus if a binary format were adopted, there still needs to

I can see your argument, but there are a lot of problems with allowing
other applications to see your data.  Worse, other applications could
make changes to your data and leave it in an inconsistent state.
Granted, with a database file they could do the same thing, but at
least it's more difficult then ;)

> be a way to export the data to tables (ascii) or an  XML file. My
> preference would be tables (CSV?) and this would seem to be the most natural
> format since a ledger is a table. There are design issues because of the
> relational structure and some of the attributes that go with the data as to
> how this export should be done.

I agree, and honestly this should happen REGARDLESS of the data file
format.  However, keep in mind that the data file stores more than
just a list of transactions.  There are:

        Chart of Accounts
        Transaction/Split List
        Share-Price Database
        Scheduled Transaction List
        Customer List
        Vendor List
        Invoice List
        Invoice-Entry List

As new data objects get added, external applications have more and
more of a chance to screw something up.

> In principle having an actual database as the storage gives one the
> ability to extract the data in a variety forms by using the facilities
> provided with the database. I am not familiar enough with postgres to
> comment on it.

This is one reason.  A major reason, but only one.  Another reason is
that with a database you do not need to cache the whole dataset in
RAM.  Yet another reason is you get immediate storage of your changes
so you don't need to "save" the data explicitly.

> Dale Alspach


       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@MIT.EDU                        PGP key available