Creating Gnucash invoices with XML

Mon Apr 5 19:52:40 EDT 2004

Neil Williams <linux at codehelp.co.uk> writes:

>> Yes, this is an issue..  How does IIF deal with it?
>
> IIF? Intuit Interchange Format? Not used it. Sorry.

Yep, that's what I meant.  Supposedly it's a CSV format.  Hense the question.

>> If you wanted to write an importer that took a snippet of a GnuCash
>> XML file and imported it, that would work too.  Indeed, importing an
>
> Kind of the reverse - take an XML file from other applications and import it. 
> Pre-defined XML formats that are easy to create and convert. XML easily 
> accommodates mixing contents too, so that one XML import can include invoices 
> and payments, accounts and style-sheet settings. Everything that was stored 
> by Gnucash in XML as a data storage format is already available for data 
> export, import, merge and exchange. That makes importing data from a PDA etc. 
> much easier.

Hmm, not sure how you import an XML file from another application
unless it's writing out gnucash-xml objects.  Gnucash does NOT use
schema or dtd to describe the data; the XML parsers and generators are
all hand-built.  Annoying, but the state-of-the-world.

>> XML file would help in multiple ways.  For example, imagine being able
>> to re-run the Hierarchy Druid in order to add new sets-of-accounts to
>> your data file!  An XML importer would make this much easier.
>
> If I'm correct in how Gnucash used XML for data storage, then you've already 
> done all that work. In order for Gnucash to save and reopen XML storage 
> files, XML definitions for every saved component must exist, even if not 
> explicitly. Not only that, but mechanisms already exist to convert the 
> incoming XML data (from the data file) into live Gnucash data (for display 
> and manipulation in the GUI).

No, loading a data file is different than merging into an existing
data file.  More on this in a bit.

> XML can sort this out itself and, TBH, it is a job best left to XML to 
> accomplish. I've tried others and failed. CSV is probably the most awkward 
> for repeated use. The work with XML is front-loaded - design enough data 
> formats in the beginning and enforce these as the required formats for later. 
> Most of this work is already done because of the previous XML methods. The 
> previous (understandably deprecated) storage XML definitions can simply be 
> recast into XML export/import definitions. That leaves the data storage for 
> any mechanism you'd like.

TBH?

Just designing XML formats doesn't solve the merge problem.

> Have I got this right?
> Currently you have Gnucash this way:
> Start app -> open previous data file -> read XML -> populate data structures 
> in RAM -> display GUI -> manipulate data in GList etc -> write to XML on save 
> -> exit.

Close.  You've broken down into multiple steps what is in actuality
one big thing.  Right now "open previous data file," "read XML,"
"populate data structures in RAM" are ALL part of the same subsystem.
They are NOT distinct sets.

Perhaps you mean:

Start App -> Determine previous data file -> load Datafile -> display GUI ->
manipulate data in RAM -> write to data file -> exit

This is much closer to what happens, if you assume the existing XML
File backend.  If, however, you use Postgres, the data is saved at
every "commit" operation (which if effectively every transaction change,
and every time you hit "ok" on a dialog).

> What I propose would be:
> Start app -> open previous data file/source -> read a format to be decided, 
> perhaps SQL -> populate data structures using new mapping -> display GUI -> 
> manipulate data in GList as before -> write/flush to new format on save or 
> just on each operation -> exit.

Uh, modulo what I said above, this is already what happens.  The
problem is that the XML subsystem does not have a "merge".  There is
no intermediate step of "load Datafile" that will merge into an
existing open Datafile.  That's the import step that needs to happen.

Yes, we have the code that will read the data and load it into a bunch
of objects in RAM.  What we do NOT have is the GUI and logic to merge
those object into an existing datafile-in-RAM.

> Now, when I want to export data, I re-use the old function calls to save a 
> portion of the data file into the old XML format. When I import an XML file, 
> I re-use the old file-open calls to import the data. All that's needed is a 
> wrapper to cope with partial data and crashes with existing data.

Export from Gnucash is easy.  That's a no-brainer, and not what I'm
worried about.  It's loading into gnucash of partial data that's the
problem.

Gnucash's XML binding assumes you're loading a full data file.  That's
not what you're asking for.

> Crucially, we'd need to retain all the existing XML <-> GList mappings but 
> instead of loading them every time, they would be pressed into service upon 
> an export or import only. This provides the importer with a ready conduit to 
> all existing Gnucash data structures - meaning that absolutely anything 
> already in Gnucash can be imported and exported.

Well DUH!  That's not the problem.

> There must be some level of XML parsing already being performed within Gnucash 
> file operations. File->Open and File->Save etc.
>
> This would simply be downgraded to import-export. 

Yes, but you're missing the necessary "merge" logic which currently
does not exist.  Yes, the actual I/O functions exist, that's not the
hard part.

>> The downside is the challenge in mapping the GUIDs of an imported data
>> to an existing data.  How do you know if an account is the same?  Or
>> an invoice?  or a customer?  It's a huge can of worms to build an XML
>> importer (which is why it hasn't been done, yet ;)
>
> Not necessarily. In the help file that talked about XSLT, there were a whole 
> list of XSLT definitions for components. XML has the advantage over CSV that 
> these formats can be validated and are reliable. Therefore, an XML file that 
> claims to represent an invoice (from the choice of DTD) but actually contains 
> payment data can be rejected in a nice, informative, operation.

Uh, you don't understand.  I'm not at all worried about formats here.
I'm talking about data contents and merging.

There is more data than what's visible to the user, and you have to
pay attention to that.  Also, just because you merge does not mean you
want to keep the GUID.  Indeed, I would argue that you DON'T.  The
issues are murky.  It's not as clear cut as you're making it out to
be.  Beware the devil in the details!  It'll bite your ass if you're
not careful.

> Duplications would be handled in exactly the same way as now - by having the 
> unique ID stored / retrieved from the XML, missing ID -> new record.

That's not sufficient.  If I create account "foo" and you create
account "foo", are those accounts the same or not?  What if you
transpose into a non-gnucash dataform and then back into gnucash's
data form?

> already using XML for data storage, so (unless I'm in for a shock), the data 
> typing and conversion must already be in code?

As I keep repeating, XML generation and parsing is NOT the problem.
Yes, gnucash already has that code.  Yes, that code can be reused.
But that code alone does not solve the merging problem.

> If you agree (and if my assumptions about Gnucash file operations above are 
> correct) I'd recommend dumping CSV as a data import mechanism and using XML 
> instead. No need for XSLT, by defining the formats, third-party applications 
> can write native Gnucash XML documents ready for import and expect valid XML 
> export documents in the same format. (native as in 'old version native'.)

Unfortunately there are places that still export data in CSV format --
in particular transactional information, or even IIF!  So we still
need a CSV importer..  Even if that "importer" is a program that
converts CSV -> XML and uses an XML importer.

[ note: the following list assumed we were still talking about a CSV
  importer.  Changing the context to an XML importer and keeping the
  list is both unfair and incorrect.  While many of the issues involved
  are the same, the list for an XML importer _is_ different.  -derek ]

>> What needs to be done:
>> * column mapping
>
> done in XML

That doesn't help real CSV data (see above).  We'd still need a
converter to convert from CSV -> XML for certain web downloads or
other data sources that don't provide XML.  This conversion still
requires column mapping and field parsing to create conformant
XML.  Think "date string" or "monetary string".  Take a look at
the QIF importer if you think this is easy.  It's not!

>> * field parsing (we already have a bunch of generic parsers)
>
> already implemented in Open/Save and in need of customising to accept only 
> partial input.

Yes, the xml parsers need to be modified to not require a full book.
Not TOO difficult, I don't think.

>> * user verification
>
> OK, maybe once the XML is parsed, a dialog box showing (some of ) the content? 
> Changing the column mapping ala CSV isn't possible with XML, that would 
> indicate a corrupted import file and requires separate corrective measures.

At this point you don't need to map columns, but you may need to
verify other data, map accounts, etc.

>> * transaction matching
>
> I'll need help with that. The existing procedures are presumably not 
> anticipating a merge with existing data but are set to be read into an 
> otherwise empty memory allocation.

EXACTLY!

> Is it acceptable to have a very simple rule?

Depends on the rule.  Regardless, it requires user input.

> Is there a unique ID specified? 

Where?  In the XML?  Yes.  But is there any guarantee that a
non-gnucash data source will provide the object GUID?  I find that
unlikely.  That means you need to half-ignore the GUID and map based
on other input.

> If yes, update the data behind that UID.
> If no, insert as new data.
>
> Too simple?

Yes.  Too simple.  You cannot guarantee that the GUID will always
be provided in an import mechanism, nor can you guarantee that the
imported GUID matches the data GUID.

>> * data-checking
>
> To a large extent, covered within XML in terms of the wrong data in the wrong 
> field. Still some work to be done to check data types though - XML parsed 
> character data covers at least 4 different C data types! How does Gnucash 
> currently deal with a corrupt XML data file?

Not what I meant.  You may need to perform a transaction match or
duplicate check.  This has nothing to do with XML input and everything
to do with data coherency.

>> * data-insertion
>
> Hoping to re-use existing conduits.

There are no existing conduits for this.  Actually, that's not true.
The OFX and HBCI importers have a method, and the QIF importer has
another method.  I'm hoping to continue towards a generic system to
merge new data objects into the existing data hierarchy, but right now
all the work has been done to merge accounts and transactions.
Nothing has been done to merge anything "business".

So, no, nothing to reuse here.

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available