File store (was Re: Salutations)

David Merrill dmerrill@lupercalia.net
Mon, 11 Dec 2000 08:04:10 -0500


On Sun, Dec 10, 2000 at 10:40:03PM -0600, Christopher Browne wrote:
> On Mon, 11 Dec 2000 11:32:49 +1000, the world broke into rejoicing as
> "Phillip Shelton" <shelton@usq.edu.au>  said:
> > How does all of this affect the `closing the books'.  If the books are
> > `close-able' then maybe we do not have to read the last 10 years worth of
> > data in at once?
> > 
> > Will the closing of the books be easier or harder with a DB?
> 
> It becomes a decreasingly relevant issue with a "proper database."
> 
> A prime reason why you _want_ to "close the books" is the fact that as
> the amount of data grows, the "document" that is "the books" gets large
> and unmanageable.

I'm assuming there are some people here who have actual accounting
experience, so let's find out what there is that needs to be done,
*besides* archiving the data, at the end of the period. There
certainly isn't any need to move data from table to table just so it
doesn't show up in queries. That can be far more efficiently done by
using a field to indicate which period the entry belongs in.
 
> With data being stored in a more sophisticated DB, you don't forcibly
> _need_ to close the books; queries hit the portions that are relevant,
> so that having 10 years worth in the DB doesn't make it desperately
> slow.
> 
> That being said, having a DBMS around will require a fair bit of
> planning to ensure that all the things that need to be done to the
> data remain reasonably efficient.
> 
> After all, if:
> a) Account balances keep having to be repetitively calculated from the
>    beginning of time until now, That'll Be Slow.
> 
> b) The code pulls individual records from the DB to satisfy calculations,
>    so that GnuCash has to do a lot of iterating where the loops contain
>    DB queries, That'll Be Slow.
> 
> The current set of data structures essentially present the database
> as a "network" or "hierarchical" database which you walk through in
> order to calculate/display stuff.
> 
> An SQL system does _not_ work efficiently for that approach; it
> expects a somewhat different abstraction where you _describe_ the
> data that you want, as with:
>    select date, amt, descr from txns where
>      date between "20000101" and "20000909" and
>      acct = "Checking";
> which returns a set.

So we should maintain a table with totals (and probably some other
statistics yet to be identified) for each period. These data could be
calculated at the end of the period, or perhaps it would be better to
maintain them as we go. But there will definitely need to be such a
table. I'll put one in the schema I'm working on.
 
> Submitting one query that returns 500 records is _vastly_ more efficient
> than submitting 500 queries that each return 1 record, so that quite a
> lot of things need to change to reflect the new sort of "data paths."

Yes, there is quite a bit of overhead, both within the db and over the
wire, to setting up, executing, and returning the results of a query.


-- 
Dr. David C. Merrill                     http://www.lupercalia.net
Linux Documentation Project                dmerrill@lupercalia.net
Collection Editor & Coordinator            http://www.linuxdoc.org
                                       Finger me for my public key

Q:	How many WASPs does it take to change a lightbulb?
A:	One.