GNCQuery (was Re: GncBusiness v. GNCSession)

Derek Atkins warlord@MIT.EDU
26 Nov 2001 10:13:01 -0500


And suddenly I find and replace that burned-out bulb in my brain...

"Duh!  Of course you are right..."

Sigh.  I was
	a) forgetting about how Backends work (e.g. stuffing data into
	   the engine) , and
	b) forgetting about how Queries actually work (namely that there
	   is the engine processing after the Backend processing).

So the real question is: If we change QueryTerms to use strings for
type/member names, can we guarantee that these strings will match the
table names and column names in e.g. a SQL Backend or e.g. a
named-object lookup in the engine?  If not, then we need some
convenient way to convert them, so that the Backend still doesn't need
to know, a priori, about all extant data types.

In other words, I'd like the Backend to be data-driven too, if
possible.  This may not be possible, but it would still be nice to do.
So in some sense I am beginning to agree with you that dynamic data
typing might be the long-term way-to-go (with some caveats).

Also, as we want Query execution to be _fast_, are we sure that we
want to use strings in the first place?

-derek

linas@linas.org (Linas Vepstas) writes:

> On Sun, Nov 25, 2001 at 07:12:26PM -0500, Derek Atkins was heard to remark:
> > linas@linas.org (Linas Vepstas) writes:
> > 
> > > > If we could always assume a SQL backend life would be much easier :)
> > > 
> > > No it wouldn't, for the six reasons I described in the previous note.
> > >
> > I was just commenting that if I could assume that all
> > backends can execute a SQL query then I can, as I've shown you how in
> > previous mail, build up the SQL Query String piecemeal in an extremely
> > extensible way.
> 
> I think we are talking past each other again, since I thought we'd
> already gone over this territory.   Lets assume we are talking about
> a hypothetical GNCNewQuery, which may or may not be built on or dervied
> from the current Query.
> 
> 1) Having an "SQL-like" interface to GNCNewQuery would be an interesting
>    idea, not without utility.  Emphasis on "LIKE" in SQL-like.  Similar, 
>    not the same.
> 
> 2) The "SQL-like" interface would be a string interface, possibly
>    with queries resembling the following:
> 
>   "SELECT Transaction WHERE (Transaction.memo='asdf' AND
>                     Transaction.date<"2001-07-04 12:00:00") OR 
>                     Tranasaction.cleared='N';"
> 
>    I am open to scheme-form-style queries too:
> 
>    (select transaction (or (and (= Transaction.memo 'asdf') 
>                                 (< Transaction.date "2001-07-04 12:00:00"))
>                            (= Tranasaction.cleared 'N')))
>                             
>    Either style works for me.
>    
>    (Back in the gnucash-1.5.x days, I had an XML interface to query
>    but that was drowned in the churn, much to my dismay.)
> 
> 
> 3) It can't be exactly SQL, for reasons listed in the earlier email.
>    These reasons include:
>    3a) Some things that can be queried in the engine, such as the 
>        bank balance, are difficult or imposible to formulate as
>        SQL queries.  Therefore, one must either limit the application
>        and not allow balance queries (that's bad), or one must be
>        able to intercept the query and reshape it or worse, instead
>        of passing it straight through to the database.
>        
>    3b) SQL has not been standardized.  There are many dialects.
>        Its easy to create non-portable SQL.
> 
>    3c) Straight-through SQL prevents the development of interesting
>        future backends, such as OO db's, or frame-slot db's or other
>        interesting techologies.
> 
> 4) GNCNewQuery must provide a C-language 'parser' or other utility
>    that allows the query to be converted to some C representation.
>    This is so that I could use C code to take apart queries that 
>    involve bank balances and reformulate them as needed. 
> 
>    4a) I think this means that an interface somewhat like the current
>        Query is unavoidable.
> 
> 
> > My frustration is that how a query is executed for the XML/File
> > backend is completely different than how it is executed for the
> > postgres backend.  
> 
> This statement is erroneous.  Both backends use *exactly* the
> same execution process.  There is absolutely no difference.
> 
> > In the Postgres backend you convert the GncQuery
> > into a char* and then pass the char* into the Postgres database.
> >
> > In other words, part 'b', for the Postgres backend, can be broken
> > up into two parts:
> > 
> > 	b1) parse the GNCQuery into a char*
> > 	b2) pass the char* into the Postgres engine
> 
> But that is not all.  
> 
> b3) push returned data into engine.
> b4) look for splits that have dangling balance info.
> b5) create a new char * that fills out the missing info.
> b6) go to b2), repeat until the all checkpoints are filled out.
> b7) Let the engine query code perform the actual scan over 
>     the engine data.  
> 
> This last part (b7) is *identical* to how it would work with 
> an XML file, and the returned results are *identical* to what 
> the XML backend would have yeilded.
> 
> Steps b1)through b6) happen in the postgres backend.  They are
> particular to the design of this backend. They would be different
> if we were using MySQL, because MySQL does not support some of the
> constructs we use in step b5).  Alternately, someone might invent a
> better way of handling account balances.
> 
> Only b7) is done in the engine proper.
> 
> It may seem 'wasteful' to run the query twice: once in the database,
> and a second time locally in the engine.  But its unavoidable for 
> a variety of reasons, including the need to keep transactions 
> balanced (so that the ledger works right), and the need to have
> enough data present to correctly compute account balances.
> 
> Note also: due to the caching structure, there are situations
> where *no* queries are sent to postgres.
> 
> > Part b1 can be done in a distributed manner.  You don't necessarily
> > need to the GNCQuery parsing routine to exist in one body of code; you
> > can register new GNCQuery parsing routines at run-time.  
> 
> I don't understand what you are trying to say.  Technically, 
> step b1) is not a parse, its a traverse.  One parses strings. 
> One traverses trees.  The current GNCQuery is a tree, not a string.
> 
> Note also that b1) is not the hard part.  Steps b3) through b6) 
> are, because trnasactions have to be complete balanced, and 
> be 'up to date' (in case remote users have edited them).  b5)
> is tricky because it traverses a graph with lots of loops in it.
> b6) is tricky because its a kind of 'closure': one must keep 
> repeating until no more data is returned from postgres.
> 
> 
> --linas
> 
> -- 
> pub  1024D/01045933 2001-02-01 Linas Vepstas (Labas!) <linas@linas.org>
> PGP Key fingerprint = 8305 2521 6000 0B5E 8984  3F54 64A9 9A82 0104 5933

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@MIT.EDU                        PGP key available