GNCQuery (was Re: GncBusiness v. GNCSession)
Derek Atkins
warlord@MIT.EDU
26 Nov 2001 10:13:01 -0500
And suddenly I find and replace that burned-out bulb in my brain...
"Duh! Of course you are right..."
Sigh. I was
a) forgetting about how Backends work (e.g. stuffing data into
the engine) , and
b) forgetting about how Queries actually work (namely that there
is the engine processing after the Backend processing).
So the real question is: If we change QueryTerms to use strings for
type/member names, can we guarantee that these strings will match the
table names and column names in e.g. a SQL Backend or e.g. a
named-object lookup in the engine? If not, then we need some
convenient way to convert them, so that the Backend still doesn't need
to know, a priori, about all extant data types.
In other words, I'd like the Backend to be data-driven too, if
possible. This may not be possible, but it would still be nice to do.
So in some sense I am beginning to agree with you that dynamic data
typing might be the long-term way-to-go (with some caveats).
Also, as we want Query execution to be _fast_, are we sure that we
want to use strings in the first place?
-derek
linas@linas.org (Linas Vepstas) writes:
> On Sun, Nov 25, 2001 at 07:12:26PM -0500, Derek Atkins was heard to remark:
> > linas@linas.org (Linas Vepstas) writes:
> >
> > > > If we could always assume a SQL backend life would be much easier :)
> > >
> > > No it wouldn't, for the six reasons I described in the previous note.
> > >
> > I was just commenting that if I could assume that all
> > backends can execute a SQL query then I can, as I've shown you how in
> > previous mail, build up the SQL Query String piecemeal in an extremely
> > extensible way.
>
> I think we are talking past each other again, since I thought we'd
> already gone over this territory. Lets assume we are talking about
> a hypothetical GNCNewQuery, which may or may not be built on or dervied
> from the current Query.
>
> 1) Having an "SQL-like" interface to GNCNewQuery would be an interesting
> idea, not without utility. Emphasis on "LIKE" in SQL-like. Similar,
> not the same.
>
> 2) The "SQL-like" interface would be a string interface, possibly
> with queries resembling the following:
>
> "SELECT Transaction WHERE (Transaction.memo='asdf' AND
> Transaction.date<"2001-07-04 12:00:00") OR
> Tranasaction.cleared='N';"
>
> I am open to scheme-form-style queries too:
>
> (select transaction (or (and (= Transaction.memo 'asdf')
> (< Transaction.date "2001-07-04 12:00:00"))
> (= Tranasaction.cleared 'N')))
>
> Either style works for me.
>
> (Back in the gnucash-1.5.x days, I had an XML interface to query
> but that was drowned in the churn, much to my dismay.)
>
>
> 3) It can't be exactly SQL, for reasons listed in the earlier email.
> These reasons include:
> 3a) Some things that can be queried in the engine, such as the
> bank balance, are difficult or imposible to formulate as
> SQL queries. Therefore, one must either limit the application
> and not allow balance queries (that's bad), or one must be
> able to intercept the query and reshape it or worse, instead
> of passing it straight through to the database.
>
> 3b) SQL has not been standardized. There are many dialects.
> Its easy to create non-portable SQL.
>
> 3c) Straight-through SQL prevents the development of interesting
> future backends, such as OO db's, or frame-slot db's or other
> interesting techologies.
>
> 4) GNCNewQuery must provide a C-language 'parser' or other utility
> that allows the query to be converted to some C representation.
> This is so that I could use C code to take apart queries that
> involve bank balances and reformulate them as needed.
>
> 4a) I think this means that an interface somewhat like the current
> Query is unavoidable.
>
>
> > My frustration is that how a query is executed for the XML/File
> > backend is completely different than how it is executed for the
> > postgres backend.
>
> This statement is erroneous. Both backends use *exactly* the
> same execution process. There is absolutely no difference.
>
> > In the Postgres backend you convert the GncQuery
> > into a char* and then pass the char* into the Postgres database.
> >
> > In other words, part 'b', for the Postgres backend, can be broken
> > up into two parts:
> >
> > b1) parse the GNCQuery into a char*
> > b2) pass the char* into the Postgres engine
>
> But that is not all.
>
> b3) push returned data into engine.
> b4) look for splits that have dangling balance info.
> b5) create a new char * that fills out the missing info.
> b6) go to b2), repeat until the all checkpoints are filled out.
> b7) Let the engine query code perform the actual scan over
> the engine data.
>
> This last part (b7) is *identical* to how it would work with
> an XML file, and the returned results are *identical* to what
> the XML backend would have yeilded.
>
> Steps b1)through b6) happen in the postgres backend. They are
> particular to the design of this backend. They would be different
> if we were using MySQL, because MySQL does not support some of the
> constructs we use in step b5). Alternately, someone might invent a
> better way of handling account balances.
>
> Only b7) is done in the engine proper.
>
> It may seem 'wasteful' to run the query twice: once in the database,
> and a second time locally in the engine. But its unavoidable for
> a variety of reasons, including the need to keep transactions
> balanced (so that the ledger works right), and the need to have
> enough data present to correctly compute account balances.
>
> Note also: due to the caching structure, there are situations
> where *no* queries are sent to postgres.
>
> > Part b1 can be done in a distributed manner. You don't necessarily
> > need to the GNCQuery parsing routine to exist in one body of code; you
> > can register new GNCQuery parsing routines at run-time.
>
> I don't understand what you are trying to say. Technically,
> step b1) is not a parse, its a traverse. One parses strings.
> One traverses trees. The current GNCQuery is a tree, not a string.
>
> Note also that b1) is not the hard part. Steps b3) through b6)
> are, because trnasactions have to be complete balanced, and
> be 'up to date' (in case remote users have edited them). b5)
> is tricky because it traverses a graph with lots of loops in it.
> b6) is tricky because its a kind of 'closure': one must keep
> repeating until no more data is returned from postgres.
>
>
> --linas
>
> --
> pub 1024D/01045933 2001-02-01 Linas Vepstas (Labas!) <linas@linas.org>
> PGP Key fingerprint = 8305 2521 6000 0B5E 8984 3F54 64A9 9A82 0104 5933
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord@MIT.EDU PGP key available