Creating Gnucash invoices with XML

Neil Williams linux at
Mon Apr 5 18:05:46 EDT 2004

Hash: SHA1

On Monday 05 April 2004 8:45, Derek Atkins wrote:
> Neil Williams <linux at> writes:
> > My first problem is the repeat within an invoice:
> > The first section needs to deal with the common stuff, customer, start
> > date, job ID. That's then fixed for the rest of that invoice. However,
> > the subsequent data needs to repeat - more than one item per invoice.
> > This would be easier under XML but in CSV, it breaks the standard unless
> > duplicated.
> Yes, this is an issue..  How does IIF deal with it?

IIF? Intuit Interchange Format? Not used it. Sorry.

> > Why was XML deprecated? It would solve these problems.
> Because XML is a HORRIBLE data format.  It's a great _INTERCHANGE_
> format.  So we want to fix that and relegate XML to what it's good at,
> exchanging data.  For storing data we should use a real database.

Totally agree on not using XML as a database. It was never suitable. I use 
MySQL for most projects.

I think XML is a superb format for data exchange. I use XML as a data exchange 
format in other projects, although not yet in C. I'm also more used to 
strictly pre-defined formats for exchange, rather than writing an entire 
parser. This will work to our advantage, see later.

> If you wanted to write an importer that took a snippet of a GnuCash
> XML file and imported it, that would work too.  Indeed, importing an

Kind of the reverse - take an XML file from other applications and import it. 
Pre-defined XML formats that are easy to create and convert. XML easily 
accommodates mixing contents too, so that one XML import can include invoices 
and payments, accounts and style-sheet settings. Everything that was stored 
by Gnucash in XML as a data storage format is already available for data 
export, import, merge and exchange. That makes importing data from a PDA etc. 
much easier.

> XML file would help in multiple ways.  For example, imagine being able
> to re-run the Hierarchy Druid in order to add new sets-of-accounts to
> your data file!  An XML importer would make this much easier.

If I'm correct in how Gnucash used XML for data storage, then you've already 
done all that work. In order for Gnucash to save and reopen XML storage 
files, XML definitions for every saved component must exist, even if not 
explicitly. Not only that, but mechanisms already exist to convert the 
incoming XML data (from the data file) into live Gnucash data (for display 
and manipulation in the GUI).

XML can sort this out itself and, TBH, it is a job best left to XML to 
accomplish. I've tried others and failed. CSV is probably the most awkward 
for repeated use. The work with XML is front-loaded - design enough data 
formats in the beginning and enforce these as the required formats for later. 
Most of this work is already done because of the previous XML methods. The 
previous (understandably deprecated) storage XML definitions can simply be 
recast into XML export/import definitions. That leaves the data storage for 
any mechanism you'd like.

Have I got this right?
Currently you have Gnucash this way:
Start app -> open previous data file -> read XML -> populate data structures 
in RAM -> display GUI -> manipulate data in GList etc -> write to XML on save 
- -> exit.

What I propose would be:
Start app -> open previous data file/source -> read a format to be decided, 
perhaps SQL -> populate data structures using new mapping -> display GUI -> 
manipulate data in GList as before -> write/flush to new format on save or 
just on each operation -> exit.

Now, when I want to export data, I re-use the old function calls to save a 
portion of the data file into the old XML format. When I import an XML file, 
I re-use the old file-open calls to import the data. All that's needed is a 
wrapper to cope with partial data and crashes with existing data.

Crucially, we'd need to retain all the existing XML <-> GList mappings but 
instead of loading them every time, they would be pressed into service upon 
an export or import only. This provides the importer with a ready conduit to 
all existing Gnucash data structures - meaning that absolutely anything 
already in Gnucash can be imported and exported.

There must be some level of XML parsing already being performed within Gnucash 
file operations. File->Open and File->Save etc.

This would simply be downgraded to import-export. 

> Similarly, it would be useful for combining multiple data files.

Yes, precisely because the definitions for every component of every data file 
must already have been defined for the save and open routines to work. Even 
if the definitions are not explicit, it's not hard to create a DTD from 
existing data.

> The downside is the challenge in mapping the GUIDs of an imported data
> to an existing data.  How do you know if an account is the same?  Or
> an invoice?  or a customer?  It's a huge can of worms to build an XML
> importer (which is why it hasn't been done, yet ;)

Not necessarily. In the help file that talked about XSLT, there were a whole 
list of XSLT definitions for components. XML has the advantage over CSV that 
these formats can be validated and are reliable. Therefore, an XML file that 
claims to represent an invoice (from the choice of DTD) but actually contains 
payment data can be rejected in a nice, informative, operation.

Duplications would be handled in exactly the same way as now - by having the 
unique ID stored / retrieved from the XML, missing ID -> new record.

For each category of import, an XML structure needs to be defined and a DTD 
created, hosted on Every XML file that refers to that DTD can be 
verified against it and data only accepted if the validation passes. That 
should reduce the amount of data typing work required. Actually, to prevent 
problems with not having access to, the DTD's would have to be in 
the package. (Thinking on the fly here, you know!)

In a real sense, most of the definition work has already been done. That's why 
I was keen on XSLT/XML in my initial query. Converting a binary storage 
program into an XML export program is a major undertaking, but gnucash was 
already using XML for data storage, so (unless I'm in for a shock), the data 
typing and conversion must already be in code?

> The choice is yours, tho.

If you agree (and if my assumptions about Gnucash file operations above are 
correct) I'd recommend dumping CSV as a data import mechanism and using XML 
instead. No need for XSLT, by defining the formats, third-party applications 
can write native Gnucash XML documents ready for import and expect valid XML 
export documents in the same format. (native as in 'old version native'.)

> What needs to be done:
> * column mapping

done in XML

> * field parsing (we already have a bunch of generic parsers)

already implemented in Open/Save and in need of customising to accept only 
partial input.

> * user verification

OK, maybe once the XML is parsed, a dialog box showing (some of ) the content? 
Changing the column mapping ala CSV isn't possible with XML, that would 
indicate a corrupted import file and requires separate corrective measures.

> * transaction matching

I'll need help with that. The existing procedures are presumably not 
anticipating a merge with existing data but are set to be read into an 
otherwise empty memory allocation.

Is it acceptable to have a very simple rule?
Is there a unique ID specified? 
If yes, update the data behind that UID.
If no, insert as new data.

Too simple?

> * data-checking

To a large extent, covered within XML in terms of the wrong data in the wrong 
field. Still some work to be done to check data types though - XML parsed 
character data covers at least 4 different C data types! How does Gnucash 
currently deal with a corrupt XML data file?

> * data-insertion

Hoping to re-use existing conduits.

- -- 

Neil Williams
Version: GnuPG v1.2.4 (GNU/Linux)


More information about the gnucash-devel mailing list