Trial Balloon: A new DataStore Architecture?

31 Oct 2000 09:22:45 -0600

Derek Atkins <warlord@MIT.EDU> writes:

> So, any comments?

Yep.  How about "You're more or less exactly right." :>

Pieces of this discussion have popped up on the list in various
incarnations over the years, and we all know this kind of thing is
needed, but it's a big job, and no one's exactly sure how it should be
implemented.

A lot of the job is related to the "when and how should we start
supporting an SQL backend" cyclical discussion, and before doing
anything, we really need to ask and answer the question

  Do we need a really fancy, general purpose solution, or can we get
  by with just switching to an SQL-ish backend across the board (using
  an embedded MySQL or PostgreSQL or Gnome-DBA) for the single-user,
  personal implementation?

Even if the answer to that was "yes", there would still be a lot of
surrounding infrastructure that we'd need (that we already need) that
falls outside the scope of SQL.  Some of this you've mentioned, things
like ways for clients to be notified when important things change,
although even this can be worked around (perhaps store "dirty-bits" in
the database and require clients to "poll them" every so often, and
before critical operations).  No matter what solution you choose wrt
to notification, there are going to be serious design trade-offs
(complexity vs performance vs semantics vs...).

Further, real-world issues are *going* to be a big issue.  For large
datasets, and for small-business clients (and up), we *will* have to
support SQL, and SQL has particular performance qualities (that even
vary from implementation to implementation) that we'll have to
consider.

In the end, I think Linas (and the project in general) have always
felt that the engine interface should eventually be the "wrapper" that
provides the abstraction or API on top of the various backends, but to
actually be that, it's going to need some more modifications.  For
example, it's convenient to get a whole "split list" as the result of
a query, but if you really want to handle fine-grain locking and
multi-user modification issues, you probably need to switch to a more
opaque "iterator" interface or similar.

Finally, did you see my recent message about adding a "top-level" data
structure?  I think this is one of the changes we'd need to make to
start moving in the direction you've suggested, and I think that
perhaps the Session* semantics need to become a richer abstraction of
the things you can do with your dataset, and may need to be folded
into the "top-level" datastructure, or perhaps Sessions remain the
top-level datastructure, and we just enhance them (and maybe change
their name to somthing more obviously "top-level").

> PS: What I'm really talking about here is building our own user-space
> network file system with client-side cache.  Perhaps we could use
> ideas from existing FS designs, such as AFS or Coda?  Thinking about
> the problem in this way allows us to re-use existing solutions, or at
> least base our solutions on their solutions to the same problems.

I tend to think you're talking more about distributed multi-user
databases than network file systems, but the line's getting fuzzier
all the time these days.  We *do* have a relational dataset, though,
so a perhaps a straightforward filesystem isn't really a perfect
match.

On the data communication side, there's also CORBA to consider.

(Fingers tired... stopping typing now :>)
-- 
Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930