GnuCash Daily Source Diff

Fri, 26 Oct 2001 10:38:30 -0500

On Thu, Oct 25, 2001 at 12:22:16AM -0700, Dave Peticolas was heard to remark:
> On Fri, 2001-10-19 at 12:00, Linas Vepstas wrote:
> > I suspect, am not sure, that my biggest design mistake was making the
> > guid the 'primary key'.  I beleive that there might be a big performance
> > boost are (or at least, a more consistent behaviour) if the iguid was
> > made primary key.
> > 
> > sql db's use the 'primary key' is the primary index into whatever
> > hash-table/b-tree they use to index the data. Clearly, an 'int' compare
> > would be much faster than a string compare.  I assumed, falsely, I
> > surmise, that postgres would be smart enough to replace my string
> > primary key with an int, but that doesn't exactly seem to be the case.
> > So I assume that using iguid as the primary key should improve
> > performance.
> 
> Ok, so say I wanted to change the postgres backend to use the iguid
> values as primary keys. I would need to:
> 
> 1. Change the table creation sql commands to create the new tables
>    correctly.
> 2. Add update code to modify existing tables to change the primary
>    keys.
> 3. Modify sql commands to use iguids instead of guids where appropriate.

Yes.

> Anything else?

Not that I can think of ... 

but please do one thing first: design some simple test case, or hack up 
a partial implementation, and make sure that this really will improve
performance.   I've been fooled in the past, thinking that something
will make a difference, and it doesn't ... 

There are some perl scripts in the src/optional/swig directory that
measure performance. 

> > The other thing I noticed was that seemingly inconsequential changes
> > made big differences in performance, sometimes by 20% or so, sometimes
> > by factors of 3 or 5.  This seemed a rather unstable situation, and 
> > ... well, I'm not sure what to recommend.  Just be careful; seemingly
> > idempotent changes to the sql queries can make a big impact on
> > performance.
> 
> Yes, I've noticed that, too. I also noticed the documentation for the
> balance checkpoints and the note to the effect that most queries will
> end up pulling in everything due to the way that checkpointing works.
> Is that still true? 

I beleive so; I would have fixed the docs if I fixed the code ...

> How would you suggest fixing that? Using fixed-time
> intervals for checkpoints?

Well, yes, that would work!  It doesn't have to be fixed-time; just 
anything that stopped balance checkpoint intervals from overlapping 
each other.  The overlaps cause the system to ratchet its way through
all transactions as splits beloning to different intervals are pulled
in. 

But fixed time intervals, once a day or once a month, would be easiest.
I'm not sure what the optimal interval size is; I'm guessing about 100
transactions between checkpoints.

--linas

-- 
pub  1024D/01045933 2001-02-01 Linas Vepstas (Labas!) <linas@linas.org>
PGP Key fingerprint = 8305 2521 6000 0B5E 8984  3F54 64A9 9A82 0104 5933