GnuCash Daily Source Diff

09 Nov 2001 13:26:06 -0800

--=-PDhsUfNXlDSIjJnGAh0E
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Fri, 2001-11-09 at 07:20, Linas Vepstas wrote:
> Hi Dave,
>=20
> May I hassle you some more?

Sure :)

> On Fri, Nov 09, 2001 at 08:04:24AM -0600, Dave Peticolas was heard to rem=
ark:
> > Index: src/backend/postgres/gncquery.c
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > RCS file: /home/cvs/cvsroot/gnucash/src/backend/postgres/gncquery.c,v
> > retrieving revision 1.6
> > retrieving revision 1.7
> > diff -u -r1.6 -r1.7
> > --- src/backend/postgres/gncquery.c	2001/11/06 23:02:26	1.6
> > +++ src/backend/postgres/gncquery.c	2001/11/09 11:50:23	1.7
>=20
> > -   /* determine whther the query will need to reference the account
> > -    * or commodity tables.  If it doesn't need them, then we can gain
> > -    * a significant performance improvement by not specifying them.
> > -    * The exact reason why this affects performance is a bit of a=20
> > -    * mystery to me ... */
> > +   /* determine whether the query will need to reference certain
> > +    * tables. */
>=20
> I was somewhat disheartened to see this change; and would really like
> to get you to put it back.  This routine would not exist at all, if I
> hadn't spent a week performance tuning the sql backend.   Originally,
> I specified *all* of the tables that might be needed for a query; the
> code for this was very simple.   But then on a hunch, I noticed that=20
> including a table that wasn't actually referenced could slow performance
> by a factor of 10 (!!!!).

Yes, I know, I was the one who originally pointed out to you
that removing unneeded tables would improve performance (you remember
the 10-minute startup times I saw?). However, upon reading a Postgres
book I found that in Postgres you don't actually need to use the FROM
clause at all, i.e., Postgres can figure out the tables it needs from=20
the rest of the query. So you only need to use FROM if you want to do
a self-join and need to specify a new name for a table. So the new code
only puts in FROM clauses for the self-joined commodity tables if
needed, the others are left out.

> > +   /* add needed explicit tables. postgres can figure out the rest. */
>=20
> Bzzzt. I think this note says it all.  Postgres can figure out the=20
> rest, but it will crater performance as it does so.  You really, really
> have got to restore the missing tables.

Have you actually tested it to see if it craters performance? The=20
problem with the original queries was putting *unreferenced* tables
in the FROM clause, i.e., explicity telling Postgres to search a
table that didn't need to be. Postgres isn't going to search tables
that aren't referenced in the query at all.

> > -               switch (xaccGUIDType (&pd->guid.guid, session))
> > +
> > +               sq->pq =3D stpcpy (sq->pq, " (");
> > +
> > +               if (guid_equal (&pd->guid.guid, xaccGUIDNULL ()))
> >                 {
> > -                  case GNC_ID_NONE:
> > -                  case GNC_ID_NULL:
> > -                  default:
> > -                     sq->pq =3D stpcpy(sq->pq, "FALSE ");
> > -                     break;
> > -             =20
> > -                  case GNC_ID_ACCOUNT:
> > -                     sq->pq =3D stpcpy(sq->pq, "gncAccount.accountGuid=
 =3D '");
> > -                     sq->pq =3D guid_to_string_buff (&pd->guid.guid, s=
q->pq);
> > -                     sq->pq =3D stpcpy(sq->pq, "' ");
> > -                     break;
> > -             =20
> > -                  case GNC_ID_TRANS:
> > -                     sq->pq =3D stpcpy(sq->pq, "gncTransaction.transGu=
id =3D '");
> > -                     sq->pq =3D guid_to_string_buff (&pd->guid.guid, s=
q->pq);
> > -                     sq->pq =3D stpcpy(sq->pq, "' ");
> > -                     break;
> > -             =20
> > -                  case GNC_ID_SPLIT:
> > -                     sq->pq =3D stpcpy(sq->pq, "gncEntry.entryGuid =3D=
 '");
> > -                     sq->pq =3D guid_to_string_buff (&pd->guid.guid, s=
q->pq);
> > -                     sq->pq =3D stpcpy(sq->pq, "' ");
> > -                     break;
> > +                 sq->pq =3D stpcpy(sq->pq, "FALSE ");
> >                 }
> > -
> > -               if (0 =3D=3D pd->guid.sense)
> > +               else
> >                 {
> > -                  sq->pq =3D stpcpy (sq->pq, ") ");
> > +                 sq->pq =3D stpcpy(sq->pq, "gncAccount.accountGuid =3D=
 '");
> > +                 sq->pq =3D guid_to_string_buff (&pd->guid.guid, sq->p=
q);
> > +                 sq->pq =3D stpcpy(sq->pq, "' ");
> > +                 sq->pq =3D stpcpy(sq->pq, " OR ");
> > +
> > +                 sq->pq =3D stpcpy(sq->pq, "gncTransaction.transGuid =
=3D '");
> > +                 sq->pq =3D guid_to_string_buff (&pd->guid.guid, sq->p=
q);
> > +                 sq->pq =3D stpcpy(sq->pq, "' ");
> > +                 sq->pq =3D stpcpy(sq->pq, " OR ");
> > +
> > +                 sq->pq =3D stpcpy(sq->pq, "gncEntry.entryGuid =3D '")=
;
> > +                 sq->pq =3D guid_to_string_buff (&pd->guid.guid, sq->p=
q);
> > +                 sq->pq =3D stpcpy(sq->pq, "' ");
>=20
> The above looks like another performance-cratering change.  Why search
> all tables, when you can search the one that you need?  I presume you
> made this change because you don't know which table to search ... but
> how can you possibly not know which table to search?  Who would be
> stupid enough to ask 'get me this GUId, but I don't know if its a
> transaction or an account guid' ????

I am simply implementing the semantics of the Query predicate, which is
to match a split whose guid, transaction guid, or account guid matches
the given guid. That is what the C code implements. Don't be fooled by
the fact that the engine Query code uses xaccGUIDType -- that function=20
only works if the entity has already been loaded into the engine. The=20
sql backend can't assume that.

> >              case PR_SHRS: {
> >                 PINFO("term is PR_SHRS");
> >                 sq->pq =3D stpcpy(sq->pq,=20
> > -                     "gncEntry.accountGuid =3D gncAccount.accountGuid =
AND "
> >                       "gncAccount.commodity =3D account_com.commodity A=
ND ");
> >                 AMOUNT_TERM ("gncEntry.amount","account_com");
>=20
> ???? Don't you need to know which account is to be matched ??

Yes, but that is now added further up if need_account is true.
You also need that line if you want to sort on account fields,
so I figured to do it the same as need_entry.

dave

--=-PDhsUfNXlDSIjJnGAh0E
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQA77Enu5effKKCmfpIRAkuLAKCge7fXhY2bDHW/m/v9kbPIEfBm+wCff6iE
D460Jd7pQhW6pTo0WPdCYFM=
=JwG0
-----END PGP SIGNATURE-----

--=-PDhsUfNXlDSIjJnGAh0E--