locale issues with data format when upgrading 1.8 -> 2.0

Josh Sled jsled at asynchronous.org
Fri Feb 3 17:34:35 EST 2006


On Fri, 2006-02-03 at 16:24 -0500, Derek Atkins wrote:
> I think it's a major issue that someone in an ascii-like but
> non-latin1 locale will get garbage during the default upgrade path.
> libxml doesn't really provide a way to do proper detection, and 1.8
> doesn't include an encoding in the data file..  Unfortunately the XML
> spec says that the lack of an encoding parameter means the data is in
> utf-8, but that's not the case in 1.8 -- the data is in whatever
> locale the user was using.
> 
> So, how do we solve this?

We can look for the presence of the "encoding" attribute on the
<?xml ...?> header.

If present, then libxml will do the appropriate encoding conversion.

If not, then we believe the file was written by 1.8.   As such, we
should set libxml to believe that the encoding is the system-default as
determined from
http://gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html#g-get-charset .
It may require a re-parse of the file to get encoding-conversion done;
I'm not sure when it's performed by libxml.

This file [[[

#include <libxml/parser.h>
#include <stdio.h>

int
main(int argc, char **argv)
{
  xmlDocPtr xml = xmlReadFile(argv[1], NULL, 0);
  printf("encoding: [%s]\n", xml->encoding);
}

]]] compiled with [[[
gcc `xml2-config --cflags --libs` -o xml-test xml-test.c
]]] shows that (xmlDocPtr)->encoding contains what we want to know: it's
set when <?xml [...] encoding="whatever"?> is set and NULL otherwise.

-- 
...jsled
http://asynchronous.org/ - `a=jsled; b=asynchronous.org; echo ${a}@${b}`


More information about the gnucash-devel mailing list