File store (was Re: Salutations)

Christopher Browne cbbrowne@hex.net
Sun, 10 Dec 2000 23:55:06 -0600


On Mon, 11 Dec 2000 15:03:55 +1000, the world broke into rejoicing as
"Phillip Shelton" <shelton@usq.edu.au>  said:
> > -----Original Message-----
> > On Mon, 11 Dec 2000 11:32:49 +1000, the world broke into rejoicing as
> > "Phillip Shelton" <shelton@usq.edu.au>  said:
> > > How does all of this affect the `closing the books'.  If
> > the books are
> > > `close-able' then maybe we do not have to read the last 10
> > years worth of
> > > data in at once?
> > >
> > > Will the closing of the books be easier or harder with a DB?
> >
> > It becomes a decreasingly relevant issue with a "proper database."
> >
> > A prime reason why you _want_ to "close the books" is the fact that as
> > the amount of data grows, the "document" that is "the books"
> > gets large
> > and unmanageable.
> 
> Ok (I am not an accountant). I just thought that there might be some
> legal stuff that has to be done at a year end. But I suppose that is
> just a matter of a well written report.

Different industries may have some particular legal requirements; that
will generally surround there being a need to specifically _preserve_
information for some number of years, as opposed to needing to _get rid_
of data at the end of the year.

"Getting rid of data" has tended to be driven by the pragmatic issues of:
a) Only needing to report on the current calendar year, and having paper
   sets of books where recalculating things by hand is daunting;
b) Only having 5MB of disk space on early Winchester Hard Drives so that
   you only have space to store very limited amounts of data;
c) Having so Vastly Many Transactions per year that you need to purge
   data out regularly lest you need to dig an extra basement to hold
   the ever-increasing racks of disk drives.  [My employer has a bunker
   of much this nature to hold the 17 mainframes, some of which were
   likely involved the last time you booked an airline ticket; they
   have to purge data out pretty regularly as there's just too much data
   to comfortably cope with...]

> > With data being stored in a more sophisticated DB, you don't forcibly
> > _need_ to close the books; queries hit the portions that are relevant,
> > so that having 10 years worth in the DB doesn't make it desperately
> > slow.
> 
> It still might be nice to be able to archive, but I suppose that we could
> just use the archiving stuff in the DB for that?

Not really; DB "archive logs" tend to be used to ensure that data
doesn't get lost; the archiving that "takes stuff out of the DB" is at
the other end of the process.

It would make sense to create SQL queries to carefully "throw data away"
when appropriate.

> >  After all, if:
> > a) Account balances keep having to be repetitively calculated from the
> >    beginning of time until now, That'll Be Slow.
> 
> Lets hope we can tell the DB to start calculating from the changed point
> only.

Yes, that's doubtless one of the design criteria.  One, by the way, that
effectively rules out MySQL, as it doesn't do triggers...

> > b) The code pulls individual records from the DB to satisfy
> > calculations,
> >    so that GnuCash has to do a lot of iterating where the
> > loops contain
> >    DB queries, That'll Be Slow.
> 
> Ugh.

Indeed.  Which is why there needs to be both design and prototyping
efforts...

> > The current set of data structures essentially present the database
> > as a "network" or "hierarchical" database which you walk through in
> > order to calculate/display stuff.
> >
> > An SQL system does _not_ work efficiently for that approach; it
> > expects a somewhat different abstraction where you _describe_ the
> > data that you want, as with:
> >    select date, amt, descr from txns where
> >      date between "20000101" and "20000909" and
> >      acct = "Checking";
> > which returns a set.
> >
> > Submitting one query that returns 500 records is _vastly_
> > more efficient
> > than submitting 500 queries that each return 1 record, so that quite a
> > lot of things need to change to reflect the new sort of "data paths."
> > --
> 
> Here's hoping that it does not prove imposible.

I think it's more than an overnight process; it would be wishful to
expect that the first iteration of the DB schema would be perfectly
suitable.  
--
(concatenate 'string "cbbrowne" "@ntlug.org") <http://www.hex.net/~cbbrowne/>
"Look, would it save you all this bother if I just gave up and went
mad now?"  -- Arthur Dent