DB design document

17 Dec 2000 17:21:27 -0600

David Merrill <dmerrill@lupercalia.net> writes:

> > We need to be more specific about what we need in terms of query
> > efficiency.  It may make sense to use one storage/indexing strategy
> > for non-text fields and a completely different one for text fields.
> > It's my opinion that having arbitrarily long text fields is a
> > prerequisite that other factors need to work around.  

[...]

> What thoughts do you have on how the data should be "searched and
> processed"?

I think what Bill's getting at is that these text fields, lets take
the "memo" field for example, may very rarely be the subject of
search, or any other operation except setting/getting, and all of us
feel pretty strongly that for a field like this, arbitrary length may
very well be more important than efficiency.

For example, if we did decide that setting/getting for the memo
fields, was the only thing that mattered 99.9% of the time, it might
make sense to store memos in some alternate way in the DB (other than
just in a giant fixed-length column which would be mostly empty) so
that their highly variable lengths don't screw up the efficiency of
operations on the more heavily indexable/computable numeric fields,
and don't waste so much space.

Now I'm not proposing any solution here, and even my example above
might be bogus, but I can imagine a DB that allows an arbitrary length
field (say a blob) that wouldn't be very efficient if stored in a
table with a bunch of other fixed length fields.  In that situation,
perhaps we make a table of all the fixed length (the more
computational/indexable fields), and another table of blobs that are
indexed to the first table by GUID.

I think this is the kind of question Bill's getting at, and regardless
it's certainly one I'd like to see discussed.

-- 
Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930