Logic ideas - 5 levels.

Sun Sep 18 08:55:53 EDT 2005

On Saturday 17 September 2005 11:47 pm, Josh Sled wrote:

(oops, this is a really long one!)
:-)

> Data-type constraints are 
> generally above-and-beyond those provided by programming languages, such
> as "positive integer", "bounded integer", "length-limited string",
> "patterned string", &c.

(This will be easier once the gnc-backend-file is replaced - some of these 
bounds can be set in the XML if schemas are used properly. QSF does this to 
ensure that all GUID strings are true hexadecimals and integer fields do not 
contain string characters. That provides an entry-point validation at the XML 
level.)

The data-type constraints of individual parameters distinct from any other 
parameter should always be handled solely within the object - the (static) 
param_getfcn and param_setfcn handlers must enforce their own logic to form 
the lowest level of validation. If a parameter accepts a string but cannot 
deal with a string that contains certain characters, the specific object 
handler code for that parameter needs to do that check.

e.g. If Account->Name - for whatever reason - was not able to cope with 
&%£$*() characters, the xaccAccountSetName code should refuse to set such 
characters.

That's the lowest level and it's the one we have already - albeit it isn't 
fully utilised always.

These two levels are the simplest to implement in all UI's because one can be 
set by the backend (using run-time schema validation) and the other by the 
object code itself - which has to be shared between all UI's anyway.

So these are the levels that I see:

1. Data Source constraints: Includes XML schema / DTD validation of incoming 
data e.g. no GUID read from the XML should fail to verify as hexadecimal - it 
may fail as a GUID (out of range etc.), but not as hexadecimal. Other 
backends can provide different constraints according to their strengths - 
maybe we should seek that each backend implements a minimum standard of data 
constraint.

2. Discrete parameter logic: Each object parameter must enforce those rules 
that are necessary for itself. This includes only those rules that can be 
enforced with reference ONLY to the current parameter in the current entity. 
This is present but needs to be encouraged, widened and supported more 
cleanly.

3. Entity logic: Each entity needs a new mechanism to ensure it's own validity 
by validating it's parameters *with reference to* it's other parameters. 
These rules are the lowest level of what I am proposing as the 'new' logic. 
This is where an entity can verify that it has X parameter OR Y parameter and 
fail if the two are incompatible. This is present in some entities (typically 
those that support clones) but not in a form that can be accessed in a 
standard manner. i.e. there's no typedef or designated callback / foreach 
support.

4. Collection logic: Some objects may need to verify their own place within 
the QofCollection - typically these would be hierarchical objects like 
Account. This would implement "all accounts must have a unique name".

5. Book logic: Rules that determine how objects of different types validate 
their own data and references to other objects. This is where "all splits 
must reference a known account" would be implemented.

Clearly, the result of the unique account name rule and the splits-reference 
rule need to handled differently - one is a syntax rule, one is an assertion. 
However, the collection and book logic can support both kinds of rules. If 
there is an assertion that fits into the collection logic, that's fine. If 
there's a syntax rule for the book, that's good too.

I'm thinking of a cascade implementation - rules in the higher levels are only 
executed if lower level rules report success.

Also, each user operation decides which levels of logic need to be checked. In 
a dialog, losing the focus on one input box can utilise different logic to a 
clicking a radio button in the same dialog. Enabling "OK" would normally 
require most if not all levels to be checked, but results should be cached so 
that each rule is only executed once for each relevant data change.

e.g. the handler for the radio button would query the logic library for the 
parameter relating to the button, passing the value for low-level checks. 
Optionally, it could also request a check on another parameter or another 
rule for the current object.

Then a tag in the instance would indicate rules remaining to be checked and 
the dialog control function would enable "OK" if this returned zero.

The object declares which rules are essential and which are optional 
(assertions and syntax respectively). The UI determines how and when those 
rules are checked and in which sequence. The UI can also upgrade rules - 
handle any syntax/optional rule as if it was essential if appropriate - it 
would not be able to downgrade a rule deemed essential by the object.

> (There is also is more than one level of integrity, here.  For instance,
> gnucash would function perfectly well if no Account had a name or
> description...

That, in my plan, would be 3: Entity logic. Dictated by the QofObject 
definition, it consists of parameter handlers and a new function that 
inspects the entity as a unit, relating parameters to each other, within the 
limits of a single instance.

> the types, guids and parent-guids are all that are 
> strictly required.  But in any practical user interface, every account
> should have a name that is non-null and unique.  I guess this primarily
> extends to user-facing identifiers like names, but I think it's worth
> distinguishing assertion-level constraints like "all splits must
> reference an account" and practical constraints like "all accounts much
> have a unique name".)

That can be done according to *how* the entity complains about an invalid 
value. In the entity logic, an assertion failure could choose to free the 
entire entity (i.e. refuse to set). An "interface" failure (such as the 
account name uniqueness) would be a simple complaint along the lines of "try 
again" but leaving other parameters unchanged.

> The "high" logic, I believe, has two parts: the data-types and functions
> which define the semantics of the application, and the user interface
> which defines the "syntax" of an application, if you will.  These things
> are often very closely related, which is harder to generically abstract.

Can we implement those parts as different *methods* rather than different 
models? i.e. The same logic check can validate the semantics and the syntax 
if that is appropriate for that particular check. The handler reports back a 
more severe failure (including freeing the entity) for the semantics and a 
moderate failure (i.e. try again) on a syntax failure.

> The best that people have seem to come up with so far is the
> Model-View-Controller architecture.
>
> The mortgage loan druid is a decent example: I took care to seperate the
> GUI/druid controller from the loan-parameters/options and processing
> model... and even still the model-only code tacitly assumes a druid-like
> interface.  Some other GUI could -- if the appropriate piece was to move
> from the druid into the engine -- use that same model to re-present
> similar functionality.

I'll have to look at that.

> This has pretty low priority on my todo list right now; I'm more than
> happy to review proposals, designs and code, though.

Same here. Besides, this kind of thing suits a slower, more considered, 
development with lots of proposals and designs tossed around.

> I think the validation is hard enough and somewhat valuable, but the
> *real* value is saving any work that would be re-implementation of the
> application logic.

Agreed. The lower levels (1 + 2 above), are small increments from where we are 
now. Level 3 is implemented in a patchy way for certain elements of certain 
objects. Levels 4 and 5 are implemented in two ways: the gnc-backend-file 
refuses to save/load data that doesn't fit the implicit assumptions of that 
backend and the UI refuses to display data that doesn't fit it's own versions 
of those assumptions.

I believe we can save a lot of work by putting those assumptions in one place 
so that each backend, each UI and all other components can use the same 
rules.

Partial books would implement partial rules - there's no point in rejecting a 
QSF book that fails collection or book logic (4+5) because that is expressly 
the point of a partial book. However, QSF should never fail data constraint, 
parameter or entity logic (1,2+3). Higher level logic would be implemented 
during or immediately after the book merge when the QSF is loaded back into 
the main data set - just as I currently implement some Account hierarchy 
handling code in the merge druid that only exists in the GnuCash codebase.

In CashUtil, this could be implemented by changing to the QSF backend if the 
higher-level rules fail their checks. This would automatically require those 
checks to succeed when the data is later merged back into the main data set.

> For example, I can easily see a set of rules declared that enable
> runtime input validation for SXes, thus refactoring some code out of the
> current SX editor and as such re-usable by a CLI, but I think there's a
> fair amount of code that is not readily factored out, and I'm not sure
> how to deal with.  I can imagine that most of the non-UI logic can get
> factored out into a better Model, and the CLI View/Controller could call
> it as well as the GUI View/Controller, but I think there's a large
> amount of reimplementation just in the view/controler side, too. :(

I'd hope that the model and the view / controller would all be single units 
that take their parameters from the QofObject - I'd rather not have a 
specific model for SX and a specific model for Account. Instead, a single 
model that can load the rules for SX or Account as required.

I'd like as many of the rules as possible to be declared by the object so that 
new objects are easy to plugin. So the sched-xaction object would define a 
set of rules that express what is currently implicit in the SX editor.

> (I use SXes as my examples here due to familiarity, but I don't think
> there's anything in particular that biases them as an example.)

:-)

> Of course, some of the above costs assume feature parity between the GUI
> and CLI versions of gnucash...  what's your goal for CashUtil?  Is it a
> CLI interface to all of gnucash, or a basic access to a subset of
> gnucash?

I do want CashUtil to be a full CLI interface for all GnuCash data that can be 
represented in a CLI.

The elements that I feel are outside CashUtil are:
1. Reports.
2. Budgets.

1. Reports cannot be handled within the CLI (it's a waste of code IMHO), they 
are far better handled by using the CLI to parse SQL to select the data 
required for the report and some other (scripting) tool to format the QSF 
output into a highly customisable report. e.g. By generating the data 
*behind* the existing reports as XML from external SQL statements (in a .sql 
file), CashUtil can provide every user with every report they could ever want 
in any format they want and printable in whatever format they can prepare. No 
more concerns about why X report won't print Y data or on Z media. True, 
these reports would not then be within the scope of the GnuCash GUI but it's 
a small price to pay for the freedom to create truly customised reports - and 
if changes are made to the QSF as a result of the report, the data can always 
be merged back in.

The difficulties with doing this with the current gnc-backend-file are that 
the XML is too specific to gnucash, it can be impossible to isolate certain 
instances (due to implicit AccountGroup logic etc.) and it's difficult to 
parse. By having a QSF file that only contains the data specific to the 
intended report, users can use PHP, Perl, Python, whatever, to format that 
data in whatever way they can because the XML structure is always the same.

It also mitigates the need to set particular financial year end dates - users 
can prepare their own QSF from and to any particular date, for any selection 
of data. 

e.g. I've just done my tax return and the current GnuCash reports are simply 
not adequate. Using the QSF method, I would export the data between 6th April 
2004 and 5th April 2005, use SQL to include details for some transfers that 
are currently missed, summarise certain details that are too verbose and 
expand others that are too opaque. Then use perl (in my case) to parse the 
QSF XML and produce a full calculation of my tax return figures. Perl could 
even calculate things like my capital allowance and business:private usage 
percentages as well as estimate my payments on account. The final step is to 
wrap the commands in a bash file and the entire process is automated! (Plus I 
don't have to print out any GnuCash reports or load up OOoCalc!)

Additionally, QSF will remain available for data import/export/mining no 
matter which backend is in use for the main data.

2. Budgets: I'm not sure how to proceed with these, the budgets in G2 are not 
currently available to QOF and until that changes, it's hard to see how the 
budgets can be queried in CashUtil to produce the kind of data that is 
available for external reports. Equally, I haven't yet had time to look at 
how budgets could be represented to QOF.

Elements that could be supported but are not yet:

1. Finance::Quote could be implemented by CashUtil but the code to handle this 
has not even been considered yet.

2. Probably a few others that have slipped off my radar.

If we want cashutil around the time of G2, some features will have to be 
tagged as TODO and implemented later.

-- 

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20050918/4dbda7fb/attachment.bin