Testing locale change from gnucash 1.8 to gnucash-g2
Neil Williams
linux at codehelp.co.uk
Sat Sep 24 16:43:31 EDT 2005
On Saturday 24 September 2005 5:21 pm, Didier Vidal wrote:
> GnuCash has a problem with encoding:
GnuCash ignores encoding. (Not quite the same issue!)
> it doesn't write the encoding
> system in the XML files it saves.
> (if encoding is not utf-8 or utf-16, it must be specified:
> http://www.w3.org/TR/REC-xml/#charencoding)
True. I'll fix that. However, AFAICT, all GnuCash data files have actually
been UTF-8 - certainly in the G2 and 1.8 trees.
> The potential problems are:
> - If someone switches to a new linux distrib (that uses an other
> locale) and wants to use files created on the old distrib
libxml2 does this job on our behalf. All the encoding recognition / conversion
and other heuristics is in libxml2.
> - If someone switches to gnucash-g2 and wants to use it in utf8
Can't see a real problem there, the library converts everything to UTF-8 for
internal use so gnucash has always received UTF-8, no matter what the
original encoding.
> - If someone sends a gnucash file by email to a friend that runs an
> other locale on the machine
gnucash won't write out a file in an encoding other than UTF-8 - there is no
code to do the conversion and the original encoding is not retained.
> for users that will migrate to the gnome 2 version of gnucash. In case
> of problems, the workaround would be simple anyway: users should edit
> the xml file and replace
> <?xml version="1.0"?>
> by<?xml version="1.0" encoding="(result of 'locale charmap')"?>
That, unfortunately, is not the solution. GnuCash will pass the file to
libxml2 which will parse it and convert to UTF-8. GnuCash will then write out
UTF-8 on the next save.
> However, from my tests, gnucash still doesn't follow the XML standard:
> - It will save your files in the locale's charset without writing the
> encoding in the header.
Are you sure? I can't see how that would happen - isn't it actually UTF-8 with
no encoding set?
With UTF-8 encoding declared in the <?xml ... ?> tag, I believe GnuCash would
follow the XML standard - at least in G2.
> - The non-ascii chars seem to be written as entities (eg: é).
> They might be read without problem if you are in a wrong locale, but
> will not be converted to the right character. Because libxml2 is smart,
> and can guess encoding, I haven't seen any actual problem if you are
> using your files only with gnucash.
Probably because everything is actually UTF-8.
> It would be better to write the encoding in the XML file.
Definitely.
--
Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20050924/ddc7c2ff/attachment.bin
More information about the gnucash-devel
mailing list