Testing locale change from gnucash 1.8 to gnucash-g2

Neil Williams linux at codehelp.co.uk
Sat Sep 24 16:43:31 EDT 2005


On Saturday 24 September 2005 5:21 pm, Didier Vidal wrote:
> GnuCash has a problem with encoding:

GnuCash ignores encoding. (Not quite the same issue!)

> it doesn't write the encoding 
> system in the XML files it saves.
> (if encoding is not utf-8 or utf-16, it must be specified:
> http://www.w3.org/TR/REC-xml/#charencoding)

True. I'll fix that. However, AFAICT, all GnuCash data files have actually 
been UTF-8 - certainly in the G2 and 1.8 trees.

> The potential problems are:
>     - If someone switches to a new linux distrib (that uses an other
> locale) and wants to use files created on the old distrib

libxml2 does this job on our behalf. All the encoding recognition / conversion 
and other heuristics is in libxml2. 

>     - If someone switches to gnucash-g2 and wants to use it in utf8

Can't see a real problem there, the library converts everything to UTF-8 for 
internal use so gnucash has always received UTF-8, no matter what the 
original encoding.

>     - If someone sends a gnucash file by email to a friend that runs an
> other locale on the machine

gnucash won't write out a file in an encoding other than UTF-8 - there is no 
code to do the conversion and the original encoding is not retained.

> for users that will migrate to the gnome 2 version of gnucash. In case
> of problems, the workaround would be simple anyway: users should edit
> the xml file and replace
> <?xml version="1.0"?>
> by<?xml version="1.0" encoding="(result of 'locale charmap')"?>

That, unfortunately, is not the solution. GnuCash will pass the file to 
libxml2 which will parse it and convert to UTF-8. GnuCash will then write out 
UTF-8 on the next save.

> However, from my tests, gnucash still doesn't follow the XML standard:
>    - It will save your files in the locale's charset without writing the
> encoding in the header.

Are you sure? I can't see how that would happen - isn't it actually UTF-8 with 
no encoding set?

With UTF-8 encoding declared in the <?xml ... ?> tag, I believe GnuCash would 
follow the XML standard - at least in G2.

>    - The non-ascii chars seem to be written as entities (eg: &#xE9;).
> They might be read without problem if you are in a wrong locale, but
> will not be converted to the right character. Because libxml2 is smart,
> and can guess encoding, I haven't seen any actual problem if you are
> using your files only with gnucash.

Probably because everything is actually UTF-8.

> It would be better to write the encoding in the XML file.

Definitely.

-- 

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20050924/ddc7c2ff/attachment.bin


More information about the gnucash-devel mailing list