DB design document

Sat, 16 Dec 2000 15:19:48 -0500

David Merrill wrote (in part):
> 
> On Sat, Dec 16, 2000 at 09:52:14AM -0600, Rob Browning wrote:
> > David Merrill <dmerrill@lupercalia.net> writes:
> >
> > > Please take a look at the latest revision of my design document and
> > > give me some feedback.

I just looked at it and it just showed me how much I do not know about
double-entry bookkeeping. I am familiar with the Quicken model (if I may
call it that), and have no predisposition to it. But the GnuCash model
requires a different mind set. Is there any way to get up to speed on
this? Books, on-line documentation (other than examining source code),
etc.? I assume so, but pointers would be appreciated.

I suppose one way would be to play around with an existing distribution
of GnuCash. I tried that some months ago, but could not successfully
import my Quicken stuff. Probably lack of understanding, but possibly I
picked up a development version instead of a "stable" one. Any
recommendations on a version to try? I run Red Hat Linux 6.0 with a lot
of RPMs applied. E.g., I have glibc-2.1.3-21.

> > Finally got a chance to glance at it and to catch up on the current
> > discussion.  Most of the points I was going to mention "first-pass"
> > other people already brought up, but I had a few other bits:
> >
> > * Though we've discussed authorization/encryption, I suspect that it
> >   might end up being a very installation-dependent thing, and
> >   something that we might want to consider abstracting/modularizing so
> >   that gnucash can be customized to accomodate local policy.
> 
> Good idea.

It looks so to me. I know that the DBMSs with which I am familiar each
try to do their own, and in their own ways. Almost certainly not
compatible. So if GnuCash wants to enforce yet another policy, it will
need a way to either use the underlying tools of the dbms, or find a way
to get around them.

As far as encryption is concerned, what encryption are you talking
about? The encryption that might well be desired in the communication
between the clients and the servers (however defined, but I am thinking
of the dbms being on a server and the clients being, perhaps elsewhere)
just to authenticate users and determine their permissions? Or all
communications between client and server? Or are you considering
encrypting the stuff actually stored in the server's database (that
could be very costly if everytime you want to access a relation or an
index, you have to decrypt it first, and if you are entering, updating,
or deleting, you then have to encrypt them all again)? In that case, the
server may already have an encryption mechanism, surely (by Murphy's
Law) different for each DBMS. Furthermore, it would not be needed if you
could trust the server's access policies. You would need a way to
configure those policies, of course.

While not pushing IBM's DB2 UDB (though that is the one I am using on my
machine), I know that the IPC between clients and servers is done with
TCP/IP. There is a mechanism for having the server authenticate the
user, and this authentication dialog can be configured to be encrypted,
though that is not the default. This is kind-of funny since I run the
clients and server on the same machine, but I am pretty sure I am
atypical in this respect. I do not know how to hijack their software so
I could stick ssh or even just ssl in series between DB2's run-time
libraries at the client and server end, but perhaps someone else does.
Or perhaps they already encrypt all this, though I doubt it. Do most
other client-server DBMSs use TCP/IP to communicate between their
clients and servers? If they do, perhaps the solution to this is already
known. I know that DB2 is so fussy about who can use a database, that
the super-user is precluded from doing so because of the risk of
violating the Unix and Linux security model. The servers run as ordinary
users, and the data are stored either in regular Unix or Linux files, or
on raw partitions managed by the dbms itself and ignoring the file
system. Of course, a determined superuser can do anything.
> 
> > * You've made some comments that make it sound like you're focusing on
> >   just requiring PostgreSQL, and relying on some of it's particular
> >   features -- though I may have just misunderstood.  In any case, the
> >   issue is still relevant.  What do we think about that?
> >
> >   We have a few options I can see
> >
> >     * try to be DB agnostic and only use very standard SQLisms.  This
> >       means that there will be limitations in how much work we can
> >       have the DB handle.  More of the computation will *have* to be
> >       done in the client.

I have moved my main application form postgreSQL to Informix-SE to IBM
DB2 UDB. I do not believe there is much hope for success in this case.
While the SQL might be standard, some APIs are still very different from
others. For example, a trivial piece of SQL in one of my libraries goes
like this (I happen to write in C++, but that should not make much
difference here):

string dbBase::
ticker_from_company_id(const long int company_id, const db_dates& date)
{
    EXEC SQL BEGIN DECLARE SECTION;
    long int     tfci_co_id = company_id;
    char         tfci_date[DB_DATE_SIZE + 1];
    char         tfci_ticker[DB_TICK_SIZE + 1];
    EXEC SQL END DECLARE SECTION;

    const string s_date = date.c_string();

    (void) strncpy(tfci_date, s_date.c_str(), DB_DATE_SIZE);
    tfci_date[DB_DATE_SIZE] = EOS;

    EXEC SQL
	SELECT tick_symbol INTO :tfci_ticker
	  FROM tick
	 WHERE tick_company_id = :tfci_co_id
	   AND tick_f_date <= :tfci_date
	   AND :tfci_date <= tick_l_date;
    if(CHECKERR("SELECT tick_symbol")) {
	cerr << "ticker_from_company_id(" << company_id << ", "
	     << date << "): could not find ticker." << endl;
	return "[NONE]";
    }
    if(db_check(NoData)) {
	cout << "ticker_from_company_id(" << company_id << ", "
	     << date << "): could not find ticker." << endl;
	cout << "\ttfci_date `" << tfci_date << "'." << endl;
	return "[NONE]";
    }

    return tfci_ticker;
} // END OF string dbBase::ticker_from_company_id(const string&
company_id, ...

The EXEC SQL BEGIN DECLARE SECTION; ... EXEC SQL END DECLARE SECTION;
kind of stuff varies a lot from one dbms to another.

While most dbms systems require the (standard) introduction EXEC SQL in
the host language to alert the compiler (or preprocessor) that something
outside the host language is going on, they do not all do it that way,
unfortunately.

Furthermore, the way to reference host variables, in this case with a
leading colon, as in :tfci_co_id, varies from one dbms to another.

Finally, the error handling, such as the CHECKERR and db_check functions
varies from one dbms to another. So this kind of detail must be hidden
behind an API or else it will be necessary to choose one dbms and reject
the others.

> >     * choose one DB and go whole hog - use all it's clever tricks and
> >       features to tailor the DB very closely to what we need.

I have done that in the past. Never again! When support for a dbms is
withdrawn, or when the decision to use the original dbms must be changed
for other reasons, it is just too difficult to make a switch. It is
difficult even when the interface to the dbms is hidden behind an API.
Rewriting the API is not that much trouble, usually. The problems arise
with all the utility programs that come along with the dbms. I tend to
drive them with shell scripts, some quite lengthy (I have over a
megabyte of shell scripts to help populate one database with which I am
working, and they would all need to be changed in intellectually boring,
but tedious, ways to switch from one dbms to another. I have done it
twice.).

But even if one could, in confidence, pick one dbms and adopt it without
reservation, the discipline of building an API to separate it from the
rest of the application generally results in a cleaner, more
maintainable system.
> >
> >     * design an API that's exactly what we need, regardless of the DB
> >       involved, and plan to just have a server-side gnucash daemon
> >       that manages the DB interface, dealing with any limitations of
> >       the DB on the server-side before forwarding data back and forth
> >       -- for example, this means that you can pretend like you have a
> >       "rational" implementation.  It also means that you can have more
> >       flexibility in other areas because you can perform actions on
> >       the server side.  This approach, does, however, have plenty of
> >       drawbacks...
> 
FWIW, I see no good alternative to this third option.

-- 
 .~.  Jean-David Beyer           Registered Linux User 85642.
 /V\                             Registered Machine    73926.
/( )\ Shrewsbury, New Jersey
^^-^^ 1:50pm up 11 days, 22:38, 2 users, load average: 2.15, 2.14, 2.10