DB design document

David Merrill dmerrill@lupercalia.net
Sat, 16 Dec 2000 13:26:55 -0500


On Sat, Dec 16, 2000 at 09:52:14AM -0600, Rob Browning wrote:
> David Merrill <dmerrill@lupercalia.net> writes:
> 
> > Please take a look at the latest revision of my design document and
> > give me some feedback.

Cool.

> Finally got a chance to glance at it and to catch up on the current
> discussion.  Most of the points I was going to mention "first-pass"
> other people already brought up, but I had a few other bits:
> 
> * Though we've discussed authorization/encryption, I suspect that it
>   might end up being a very installation-dependent thing, and
>   something that we might want to consider abstracting/modularizing so
>   that gnucash can be customized to accomodate local policy.

Good idea.

> * You've made some comments that make it sound like you're focusing on
>   just requiring PostgreSQL, and relying on some of it's particular
>   features -- though I may have just misunderstood.  In any case, the
>   issue is still relevant.  What do we think about that?
> 
>   We have a few options I can see
> 
>     * try to be DB agnostic and only use very standard SQLisms.  This
>       means that there will be limitations in how much work we can
>       have the DB handle.  More of the computation will *have* to be
>       done in the client.
> 
>     * choose one DB and go whole hog - use all it's clever tricks and
>       features to tailor the DB very closely to what we need.
> 
>     * design an API that's exactly what we need, regardless of the DB
>       involved, and plan to just have a server-side gnucash daemon
>       that manages the DB interface, dealing with any limitations of
>       the DB on the server-side before forwarding data back and forth
>       -- for example, this means that you can pretend like you have a
>       "rational" implementation.  It also means that you can have more
>       flexibility in other areas because you can perform actions on
>       the server side.  This approach, does, however, have plenty of
>       drawbacks...

My thoughts, how they ramble...

I'm leaning toward using the approach of #3, to implement an API that
could have any db engine behind it. Initially the db would be Postgres
to avoid the difficulty of a db abstraction layer. A year from now I
hope we will have more powerful db-agnostic layers available. We can
revisit the topic then.

This allows me to focus on the api, and what's behind that api can be
improved and expanded later for multiple databases.

I'm not going to go "whole hog" for any database, but I think I might
do database-specific implementation of the rational number storage, to
take advantage of the use-defined datatype feature of Postgres.

I will keep my SQL as standard as possible.

> * For now, we'd be fine with just a really good simple data store.
>   Even if we just had an embedded PostgreSQL that could just
>   read/write accounts/splits/transactions to a local sandbox, that
>   would be a big win over XML as the primiary storage medium for
>   single users.  It looks like you're thinking much longer-term, and I
>   think that's actually preferable, but I just wanted to remind people
>   that we could also benefit from a shorter-term, intermediate
>   solution.

That will be the nature of the first implementation. I always design
way farther than I implement, so that when the later stuff gets
implemented it has already been planned for. At some point probably
weeks from now, I will put much more serious thought into drawing the
line and deciding how much of the pie-in-the-sky will actually be in
the 1.0 version.

> * We haven't talked a lot about the overal communication setup, and
>   that's probably because no one really knows what we want/need yet,
>   but my mention of the proxy above is a part of that discussion.  I
>   think Linas had always expected that the local engine would
>   eventually become something more like an active cache of the remote
>   data so that local operations aren't too slow.  Of course that
>   raises all kinds of synchronization issues, but always fetching
>   everything remotely doesnt' make those completely go away, just
>   narrows the point of failure.  Anyway, overall, we need to be
>   thinking about which of these setups should be our goal (very rough
>   sketch):

I am too ignorant to partake in that discussion meaningfully. I'm not
uninterested in it. That's communications setup, not synchronization.
I already have some ideas on that front and they're in my dd. They are
very simplistic right now, but the topic is definitely on the table.

>   [ parens group things in a single process, double bars indicate
>     physical machine boundaries, and dotted lines indicate remote
>     conections. ]
> 
>     (server) <-> (gnc_proxy) || <- - - - - -> || (gnc_engine <-> ui)
> 
>     (server) || <- - - - - -> || (gnc_engine <-> ui)
> 
>   and we need to be thinking about what we *require* be done where.
>   I.e. does the DB just serve up raw records or do we require it to be
>   able to "do the math" in some cases?

The first one, I think, although gnc_proxy needs much definition.
Presumably the functionality of the current gnc_engine will be split
along some logical lines, so that a transaction save call into the
engine then calls transaction save in the proxy, which calls a stored
procedure in the db. Then we can do whatever we need to wrt securing
the data over the wire, too.

-- 
Dr. David C. Merrill                     http://www.lupercalia.net
Linux Documentation Project                dmerrill@lupercalia.net
Collection Editor & Coordinator            http://www.linuxdoc.org
                                       Finger me for my public key

For Mine is the ecstasy of the spirit
And Mine also is joy on earth.
For My law is love unto all beings.
		-- from The Charge of the Goddess, Doreen Valiente