QSF XML file backend for gnucash-gnome2-dev branch

Wed Jan 26 11:52:42 EST 2005

On Wed, 2005-01-26 at 03:58, Neil Williams wrote:

> All applications using QSF would have their own user editable maps to convert 
> data to other applications. The maps are the real inter-operability stuff. 
> Application maps come with the installation, user edited maps can go with the 
> user edited objects - it's just a case of coding how the library expects the 
> application to 'go fetch'.

I don't quite understand the "user edited maps".  I can see user-edited
data values, but the applicationA<->applicationB maps don't seem like
"user-edited" content per-se...?

But, yeah, you're basically trying to do a [meta-]application task as a
library.  I'd be very thoughtful about the interface that the library
both presents-to and expects-of the application for the purpose of user-
and map-file management ... or maybe better: break that piece of the
puzzle off into a stand-alone "qsf-map-control-console" app.

> QOF will do that - by putting the data in XML as a single lump of objects, QOF 
> can query the QofBook read from that XML and do all kinds of SQL-like things. 

Sure.  And you can do that with XSLT/XQuery if the XML data is in the
application-domain, too ... and you can do RDQL/Sparql if the data is in
RDF ... and straight SQL if it's in a relational DB.

I really do understand what you're trying to do, and understand its
value.  I also think you're re-inventing a /very/ large wheel,
_especially_ with respect to the mapping stuff.  I mean, you've already
had to create a new mini-programming langauge in the map definition
format ... why not just use an existing high-level language?

> Yes, the code checks the schema when determining the file type, when preparing 
> to load the file and then leaves it until a file is ready to write out - at 
> which point it checks the outgoing file against the schema too. That just 
> catches bad use of the API where the object parameter types don't really 

Well, I feel about validation like I do about assertions: great during
development and debugging, but bad during runtime.   Perhaps we can
arrange things so that if the compilation system has libxml2 >=2.6.0 and
--enable-debug, then the validation is done.  Otherwise not.

> > > It's determining the filetype and validating the content that would
> > > require duplication of the schema *code* so that I know which tags will
> > > occur where.
> >
> > The parser should impliclity have that knowledge.
> 
> Once the schema has been used to identify objects vs maps, yes.

That identification can very easily happen without the schema; it's just
a string-comparison against the fully-qualified name of the root element
of the document, as I said before.

> The schema itself, the xsd, doesn't. The code that handles the schema would - 
> if we went below libxml2 >= 2.5.2

*shudder*  There's no need to re-write a validator.

> Without runtime schematic validation, I would have to implement a method of 
> reliably distinguishing between qsf objects and qsf maps, checking the 
> content of every parameter tag matches the expected definitions, check that 
> each object and each parameter tag has the required attributes . . . . e.g. 

You only need to look at the fully-qualified name of the root element;
you can safely assume the rest...

> I'd have to individually validate every incoming date string to check the xsd 
> format. Every boolean would have to be checked and converted to TRUE instead 
> of true or 1 or T.

Overkill.  If the document claims via the name of it's root element that
it conforms to a schema and actually does not, then an error should be
raised during parsing.  Anyways, just because it's validated, you still
don't get to ignore parsing errors...  if you go to grab an expected
value  and it's not there, then you raise an exception.  If it's
optional, then you still have to check for that in the code.  If it's in
the wrong lexical format, then parsing errors should raise exceptions. 
If the lexical form of the data is valid but garbage, then the validator
won't determine that anyways.

> code the majority of the checks that the schema would have done. These QSF 
> objects are user-edited - all manner of garbage could be in them.

You expect these objects to be _user_ edited?  I thought the whole point
was machine-to-machine integration?  Import/Export stuff...

In any case, I would argue that it is the responsibility of the exporter
to generate correct XML.  If the exporter is a human-being, then an
editor which supports schema is useful, and publishing the schema in a
form that's readable for editors to understand and use is a good thing.

> I really can't do any of this without runtime schema validation. This is quite 
> enough work as it is, re-inventing something that has ALREADY been released 
> is more than a little pointless.

It's not "use existing" vs. "write your own", it's "do it" vs. "don't do
it".  Very very few XML-parsing applications use runtime schematic
validation; I'm pretty sure that this doesn't need it either.

> > By hand?  FTR, I was talking about using RelaxNG [compact] instead of
> > XML Schema.  I've found the latter to be painful, and the former to be
> > pain-less, simple, wonderful, &c.
> 
> Personally, I found the opposite. I'm used to the schema, it naturally fits 
> the way the data needs to be handled and verified.

Have you used the Compact syntax?  The RelaxNG XML syntax is still
better than XML Schema IMHO, but it's still XML, which is teh suck to
write.  The compact syntax is like butter.

...jsled

-- 
http://asynchronous.org/ - `a=jsled; b=asynchronous.org; echo ${a}@${b}`