Offtopic, database design; Was: Re: Get PostgreSQL installed as part of the distros

Al Snell alaric@alaric-snell.com
Mon, 30 Oct 2000 18:57:23 +0000 (GMT)


On Mon, 30 Oct 2000, Patrick Spinler wrote:

> A hint as to why: you can't balance an on-disk tree the same as an in
> memory tree.  Ergo, they don't get balanced often (typically it's a
> manual operation a DBA has to do).  Ergo, with a sequentially increasing
> key you quickly have a badly balanced index, shit poor performance, and
> the need for frequent database service interruptions to rebalance that
> table.  It's similar for hash structured indexes (think hash bucket
> overflows and overflow chains, here) and worst of all for ISAM's.

I beg to disagree; on-disk index trees should be designed to autobalance,
as in they do not need balancing since they use things like B+ trees that
avoid unbalancing in the first place! And hash tables, by their very
nature, will not care about the sequentiality of the key data if you think
about it. The hash of a sequential key won't be anywhere near sequential -
or exhibit any real patterns - unless the hash function sucks.

And ISAMs aren't an index format, they're a data format. Most databases
use a variation on ISAM. The idea is that you just dump records onto the
end of the data file - the Sequential Access - then add index entries -
the I stands for Indexed - to enable rapid finding of the correct file
offset. More advanced versions intelligently reuse the gaps left by
deleted records rather than always sequentially appending; more primitive
versions (eg, PostgreSQL) require a "vacuum" operation to compact the file
to reuse space.

> My advice: if you think you need a sequence in a table, examine your
> assumptions about the data carefully.  Why do you need the data
> retrieved in sequence ?  The only case I have ever encountered have been
> transaction timestamps.

The use of a sequence is just to ensure uniqueness. Reading data in a
sequence is purely a matter of ORDER BY and cursors, not index structures!
Having the ORDER BY an indexed field means the database server can do a
limited tree traverse to get the records in order, and thus save having to
perform a sorting algorithm!

> If you still really need a sequence in a table, I suggest taking a
> sequence and deriving a psuedo-randomized integer from it as a bitswap
> calculation, and using that psuedo-randomized number as the primary key.

That's what a hashing function (for a hashed index on a sequential key)
should do automatically, anyway!

What evil RDBMS have you been using?!?!?

ABS

-- 

 http://www.alaric-snell.com/  http://RF.Cx/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software