GnuCash Daily Source Diff

Linas Vepstas linas@linas.org
Fri, 9 Nov 2001 09:20:07 -0600


Hi Dave,

May I hassle you some more?


On Fri, Nov 09, 2001 at 08:04:24AM -0600, Dave Peticolas was heard to remark:
> Index: src/backend/postgres/gncquery.c
> ===================================================================
> RCS file: /home/cvs/cvsroot/gnucash/src/backend/postgres/gncquery.c,v
> retrieving revision 1.6
> retrieving revision 1.7
> diff -u -r1.6 -r1.7
> --- src/backend/postgres/gncquery.c	2001/11/06 23:02:26	1.6
> +++ src/backend/postgres/gncquery.c	2001/11/09 11:50:23	1.7

> -   /* determine whther the query will need to reference the account
> -    * or commodity tables.  If it doesn't need them, then we can gain
> -    * a significant performance improvement by not specifying them.
> -    * The exact reason why this affects performance is a bit of a 
> -    * mystery to me ... */
> +   /* determine whether the query will need to reference certain
> +    * tables. */

I was somewhat disheartened to see this change; and would really like
to get you to put it back.  This routine would not exist at all, if I
hadn't spent a week performance tuning the sql backend.   Originally,
I specified *all* of the tables that might be needed for a query; the
code for this was very simple.   But then on a hunch, I noticed that 
including a table that wasn't actually referenced could slow performance
by a factor of 10 (!!!!).  I cursed, and then added considerable extra
complexity to te mechanism for the single reason of making sure that
only the tables that were really needed were referenced.   The above
few sentances were the only documentation that explained this
complexity.  It was hard won.   I am concerned that some future
programmer will look at this, and concule that they can safely chop
out hundreds of lines of seemingly idempotent code. 

Why its not idempotent in postgres, I don't know.  Maybe its a postgres
bug. Or feature.  A good SQL optimizer wouldn't have this problem.
But, as of right now, postgres does. 

Specifically, I refer to the 'need_account' and the 'need_entry'
booleans.  These serve no functional purpose:  You'd get exactly
the same results if you just set these to TRUE.  But there are
hundreds of lines of extra logic to compute these for the sole
reason of optimizing performance. 


>              case PR_GUID:
> -               switch (xaccGUIDType (&pd->guid.guid, session))
> +               if (!guid_equal (&pd->guid.guid, xaccGUIDNULL ()))
>                 {
> -                  case GNC_ID_ACCOUNT:
> -                     need_account = TRUE;
> -                     break;
> -                  case GNC_ID_SPLIT:
> -                     need_entry = TRUE;
> -                     break;
> -                  case GNC_ID_NONE:
> -                  case GNC_ID_NULL:
> -                  case GNC_ID_TRANS:
> -                  default:
> -                     break;
> +                 need_account = TRUE;
> +                 need_entry = TRUE;
>                 }
>                 break;

As per note above ....

> +   /* add needed explicit tables. postgres can figure out the rest. */

Bzzzt. I think this note says it all.  Postgres can figure out the 
rest, but it will crater performance as it does so.  You really, really
have got to restore the missing tables.



> -               switch (xaccGUIDType (&pd->guid.guid, session))
> +
> +               sq->pq = stpcpy (sq->pq, " (");
> +
> +               if (guid_equal (&pd->guid.guid, xaccGUIDNULL ()))
>                 {
> -                  case GNC_ID_NONE:
> -                  case GNC_ID_NULL:
> -                  default:
> -                     sq->pq = stpcpy(sq->pq, "FALSE ");
> -                     break;
> -              
> -                  case GNC_ID_ACCOUNT:
> -                     sq->pq = stpcpy(sq->pq, "gncAccount.accountGuid = '");
> -                     sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> -                     sq->pq = stpcpy(sq->pq, "' ");
> -                     break;
> -              
> -                  case GNC_ID_TRANS:
> -                     sq->pq = stpcpy(sq->pq, "gncTransaction.transGuid = '");
> -                     sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> -                     sq->pq = stpcpy(sq->pq, "' ");
> -                     break;
> -              
> -                  case GNC_ID_SPLIT:
> -                     sq->pq = stpcpy(sq->pq, "gncEntry.entryGuid = '");
> -                     sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> -                     sq->pq = stpcpy(sq->pq, "' ");
> -                     break;
> +                 sq->pq = stpcpy(sq->pq, "FALSE ");
>                 }
> -
> -               if (0 == pd->guid.sense)
> +               else
>                 {
> -                  sq->pq = stpcpy (sq->pq, ") ");
> +                 sq->pq = stpcpy(sq->pq, "gncAccount.accountGuid = '");
> +                 sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> +                 sq->pq = stpcpy(sq->pq, "' ");
> +                 sq->pq = stpcpy(sq->pq, " OR ");
> +
> +                 sq->pq = stpcpy(sq->pq, "gncTransaction.transGuid = '");
> +                 sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> +                 sq->pq = stpcpy(sq->pq, "' ");
> +                 sq->pq = stpcpy(sq->pq, " OR ");
> +
> +                 sq->pq = stpcpy(sq->pq, "gncEntry.entryGuid = '");
> +                 sq->pq = guid_to_string_buff (&pd->guid.guid, sq->pq);
> +                 sq->pq = stpcpy(sq->pq, "' ");

The above looks like another performance-cratering change.  Why search
all tables, when you can search the one that you need?  I presume you
made this change because you don't know which table to search ... but
how can you possibly not know which table to search?  Who would be
stupid enough to ask 'get me this GUId, but I don't know if its a
transaction or an account guid' ????

>              case PR_SHRS: {
>                 PINFO("term is PR_SHRS");
>                 sq->pq = stpcpy(sq->pq, 
> -                     "gncEntry.accountGuid = gncAccount.accountGuid AND "
>                       "gncAccount.commodity = account_com.commodity AND ");
>                 AMOUNT_TERM ("gncEntry.amount","account_com");

???? Don't you need to know which account is to be matched ??

--linas

-- 
pub  1024D/01045933 2001-02-01 Linas Vepstas (Labas!) <linas@linas.org>
PGP Key fingerprint = 8305 2521 6000 0B5E 8984  3F54 64A9 9A82 0104 5933