DB design document

David Merrill dmerrill@lupercalia.net
Sun, 17 Dec 2000 14:36:48 -0500


On Sun, Dec 17, 2000 at 10:18:02AM -0600, Bill Gribble wrote:
> On Sat, Dec 16, 2000 at 12:29:50PM -0500, David Merrill wrote:
> > Unfortunately, however, allowing *very* large text fields has serious
> > drawbacks in databases that it does not have in XML. You pay a price
> > in terms of storage space and indexing, which translates to query
> > execution speed. Not just on the record with large entries, but on all
> > of them.
> 
> We need to be more specific about what we need in terms of query
> efficiency.  It may make sense to use one storage/indexing strategy
> for non-text fields and a completely different one for text fields.
> It's my opinion that having arbitrarily long text fields is a
> prerequisite that other factors need to work around.  

I have gone through the db and changed almost all of the character
fields text, except for UUIDs which really should remain 32 characters
fixed. There are only a few fields where I kept a fixed or restricted
length character format, and for those I have good reasons. For
example, fields that are intended to be mnemonics, e.g., ticker
symbols. See if your needs are now met.

> What I haven't seen here is any questions about how gnucash uses its
> data.  I may be mistaken, but I would think that how the data is
> searched and processed would be at least as important for the database
> design as the types of its fields.
 
It is at least as important, yes. I just started with table schemas
because they help me visualize the entities. I am beginning to give
more thought now to the API and how it should be exposed to the world.
I have already done some work on algorithms, including maintenance of
running balances, checkpointing, and an audit trail.

It's a cart-and-horse thing. Starting with table design usually
simplifies and gives context to other decisions for me and many other
database designers.

What thoughts do you have on how the data should be "searched and
processed"?

> Also, I'm concerned that there hasn't been any discussion about how
> the database backend interacts with the engine Query API.  I think
> there's been consensus among the gnucash developers for a while that
> when we go to a real database backend, we will need to implement some
> kind of query caching system to allow for a single gnucash 'Query' to
> be processed in multiple parts by the SQL backend (this is to prevent
> the GUI from sucking huge amounts of data if it can only fit 20-50
> transactions in a single window).  Presumably this will have some side
> effects on the design of the database schema.

No effect on the schema, but definitely an effect on the API. Data
requests will have to specify something like "get me 100 transactions
against the 'Checking' account starting at 1/1/2000" and get back just
those 100 transactions.

The entine should decide how many records to retrieve, and feel free
to cache them.

This is much simpler to implement, and therefore much more robust,
than getting into server-side cursors and such.

-- 
Dr. David C. Merrill                     http://www.lupercalia.net
Linux Documentation Project                dmerrill@lupercalia.net
Collection Editor & Coordinator            http://www.linuxdoc.org
                                       Finger me for my public key

Under the full moon light we dance
Spirits dance, we dance
Joining hands, we dance
Joining souls rejoice!
		-- Karen Beth