DB design document

David Merrill dmerrill@lupercalia.net
Sat, 16 Dec 2000 15:40:10 -0500


On Sat, Dec 16, 2000 at 03:19:48PM -0500, Jean-David Beyer wrote:
> David Merrill wrote (in part):
> > 
> > On Sat, Dec 16, 2000 at 09:52:14AM -0600, Rob Browning wrote:
> > > David Merrill <dmerrill@lupercalia.net> writes:
> > >
> > > > Please take a look at the latest revision of my design document and
> > > > give me some feedback.
> 
> I just looked at it and it just showed me how much I do not know about
> double-entry bookkeeping. I am familiar with the Quicken model (if I may
> call it that), and have no predisposition to it. But the GnuCash model
> requires a different mind set. Is there any way to get up to speed on
> this? Books, on-line documentation (other than examining source code),
> etc.? I assume so, but pointers would be appreciated.

Any basic accounting text will give you the concepts you need.

> I suppose one way would be to play around with an existing distribution
> of GnuCash. I tried that some months ago, but could not successfully
> import my Quicken stuff. Probably lack of understanding, but possibly I
> picked up a development version instead of a "stable" one. Any
> recommendations on a version to try? I run Red Hat Linux 6.0 with a lot
> of RPMs applied. E.g., I have glibc-2.1.3-21.

I successfully import several years' worth of data from Quicken. Which
version you try depends on your willingness to debug. Don't put your
only copy of live data into the development versions, of course.

> > > Finally got a chance to glance at it and to catch up on the current
> > > discussion.  Most of the points I was going to mention "first-pass"
> > > other people already brought up, but I had a few other bits:
> > >
> > > * Though we've discussed authorization/encryption, I suspect that it
> > >   might end up being a very installation-dependent thing, and
> > >   something that we might want to consider abstracting/modularizing so
> > >   that gnucash can be customized to accomodate local policy.
> > 
> > Good idea.
> 
> It looks so to me. I know that the DBMSs with which I am familiar each
> try to do their own, and in their own ways. Almost certainly not
> compatible. So if GnuCash wants to enforce yet another policy, it will
> need a way to either use the underlying tools of the dbms, or find a way
> to get around them.

We're going to need a library on the db side, so it will be
architected to provide a standard api regardless of the actual db it
is running against.

Initially I'm going to code against Postgres, but eventually will
support other dbs. This will not affect the api which the engine talks
to.

> As far as encryption is concerned, what encryption are you talking
> about? The encryption that might well be desired in the communication
> between the clients and the servers (however defined, but I am thinking
> of the dbms being on a server and the clients being, perhaps elsewhere)
> just to authenticate users and determine their permissions? Or all
> communications between client and server? Or are you considering
> encrypting the stuff actually stored in the server's database (that
> could be very costly if everytime you want to access a relation or an
> index, you have to decrypt it first, and if you are entering, updating,
> or deleting, you then have to encrypt them all again)? In that case, the
> server may already have an encryption mechanism, surely (by Murphy's
> Law) different for each DBMS. Furthermore, it would not be needed if you
> could trust the server's access policies. You would need a way to
> configure those policies, of course.

I'm not considering encrypting the data within the db. Initially I
will be using a local db, so encryption is something we have time to
design.

> While not pushing IBM's DB2 UDB (though that is the one I am using on my
> machine), I know that the IPC between clients and servers is done with
> TCP/IP. There is a mechanism for having the server authenticate the
> user, and this authentication dialog can be configured to be encrypted,
> though that is not the default. This is kind-of funny since I run the
> clients and server on the same machine, but I am pretty sure I am
> atypical in this respect. I do not know how to hijack their software so
> I could stick ssh or even just ssl in series between DB2's run-time
> libraries at the client and server end, but perhaps someone else does.
> Or perhaps they already encrypt all this, though I doubt it. Do most
> other client-server DBMSs use TCP/IP to communicate between their
> clients and servers? If they do, perhaps the solution to this is already
> known. I know that DB2 is so fussy about who can use a database, that
> the super-user is precluded from doing so because of the risk of
> violating the Unix and Linux security model. The servers run as ordinary
> users, and the data are stored either in regular Unix or Linux files, or
> on raw partitions managed by the dbms itself and ignoring the file
> system. Of course, a determined superuser can do anything.

With all of the source under our control, we can do things you can't
do with DB2.

> > > * You've made some comments that make it sound like you're focusing on
> > >   just requiring PostgreSQL, and relying on some of it's particular
> > >   features -- though I may have just misunderstood.  In any case, the
> > >   issue is still relevant.  What do we think about that?
> > >
> > >   We have a few options I can see
> > >
> > >     * try to be DB agnostic and only use very standard SQLisms.  This
> > >       means that there will be limitations in how much work we can
> > >       have the DB handle.  More of the computation will *have* to be
> > >       done in the client.
> 
> I have moved my main application form postgreSQL to Informix-SE to IBM
> DB2 UDB. I do not believe there is much hope for success in this case.
> While the SQL might be standard, some APIs are still very different from
> others. For example, a trivial piece of SQL in one of my libraries goes
> like this (I happen to write in C++, but that should not make much
> difference here):
> 
> string dbBase::
> ticker_from_company_id(const long int company_id, const db_dates& date)
> {
>     EXEC SQL BEGIN DECLARE SECTION;
>     long int     tfci_co_id = company_id;
>     char         tfci_date[DB_DATE_SIZE + 1];
>     char         tfci_ticker[DB_TICK_SIZE + 1];
>     EXEC SQL END DECLARE SECTION;
> 
>     const string s_date = date.c_string();
> 
>     (void) strncpy(tfci_date, s_date.c_str(), DB_DATE_SIZE);
>     tfci_date[DB_DATE_SIZE] = EOS;
> 
>     EXEC SQL
> 	SELECT tick_symbol INTO :tfci_ticker
> 	  FROM tick
> 	 WHERE tick_company_id = :tfci_co_id
> 	   AND tick_f_date <= :tfci_date
> 	   AND :tfci_date <= tick_l_date;
>     if(CHECKERR("SELECT tick_symbol")) {
> 	cerr << "ticker_from_company_id(" << company_id << ", "
> 	     << date << "): could not find ticker." << endl;
> 	return "[NONE]";
>     }
>     if(db_check(NoData)) {
> 	cout << "ticker_from_company_id(" << company_id << ", "
> 	     << date << "): could not find ticker." << endl;
> 	cout << "\ttfci_date `" << tfci_date << "'." << endl;
> 	return "[NONE]";
>     }
> 
>     return tfci_ticker;
> } // END OF string dbBase::ticker_from_company_id(const string&
> company_id, ...
> 
> The EXEC SQL BEGIN DECLARE SECTION; ... EXEC SQL END DECLARE SECTION;
> kind of stuff varies a lot from one dbms to another.
> 
> While most dbms systems require the (standard) introduction EXEC SQL in
> the host language to alert the compiler (or preprocessor) that something
> outside the host language is going on, they do not all do it that way,
> unfortunately.
> 
> Furthermore, the way to reference host variables, in this case with a
> leading colon, as in :tfci_co_id, varies from one dbms to another.
> 
> Finally, the error handling, such as the CHECKERR and db_check functions
> varies from one dbms to another. So this kind of detail must be hidden
> behind an API or else it will be necessary to choose one dbms and reject
> the others.

I'm not going to worry about this initially. All your points are
valid, but too much to deal with right now.

> > >     * choose one DB and go whole hog - use all it's clever tricks and
> > >       features to tailor the DB very closely to what we need.
> 
> I have done that in the past. Never again! When support for a dbms is
> withdrawn, or when the decision to use the original dbms must be changed
> for other reasons, it is just too difficult to make a switch. It is
> difficult even when the interface to the dbms is hidden behind an API.
> Rewriting the API is not that much trouble, usually. The problems arise
> with all the utility programs that come along with the dbms. I tend to
> drive them with shell scripts, some quite lengthy (I have over a
> megabyte of shell scripts to help populate one database with which I am
> working, and they would all need to be changed in intellectually boring,
> but tedious, ways to switch from one dbms to another. I have done it
> twice.).
> 
> But even if one could, in confidence, pick one dbms and adopt it without
> reservation, the discipline of building an API to separate it from the
> rest of the application generally results in a cleaner, more
> maintainable system.

Yep.

> > >
> > >     * design an API that's exactly what we need, regardless of the DB
> > >       involved, and plan to just have a server-side gnucash daemon
> > >       that manages the DB interface, dealing with any limitations of
> > >       the DB on the server-side before forwarding data back and forth
> > >       -- for example, this means that you can pretend like you have a
> > >       "rational" implementation.  It also means that you can have more
> > >       flexibility in other areas because you can perform actions on
> > >       the server side.  This approach, does, however, have plenty of
> > >       drawbacks...
> > 
> FWIW, I see no good alternative to this third option.

Nor do I.

-- 
Dr. David C. Merrill                     http://www.lupercalia.net
Linux Documentation Project                dmerrill@lupercalia.net
Collection Editor & Coordinator            http://www.linuxdoc.org
                                       Finger me for my public key

I stopped believing in Santa Claus when I was six. Mother took me to see him in a department store and he asked for my autograph.
		-- Shirley Temple