Gnome2 UTF-8 handling

Reinke Bonte reinke.bonte at web.de
Mon Dec 22 02:53:07 CST 2003


On Sun, 21 Dec 2003 23:55:22 -0800
David Hampton <hampton at employees.org> wrote:

> > Of course it is possible, but it is not trivial. The aim is to
> > compile gnucash with libxml2, but the existing parser relies on a
> > bug of libxml1. I don't think you can use the parser the same way as
> > it works now with libxml2.
>  
> Please explain further.  What bug does gnucash rely on?  I've been
> using libxml2 in the gnome2 branch without any noticeable problem.
>  

I am glad to hear that. I thought that libxml2 guesses differently from
libxml1. Because gnucash passes strings in the locale encoding to
libxml, libxml1 assumes a latin1 string while libxml2 assumes the string
to be utf-8. 

If in the gnome2-branch all strings are handled in UTF-8 internally,
libxml2 will guess correctly. But I wonder how it will handle old files
that were created under an EUC-JP encoding. Libxml1 had encoded the
EUC-JP multi-bytes in single latin1 pieces. Now when I open the same
file with libxml2 how will it handle putting together the decoded
latin1 pieces to clean multi-byte EUC-JP?

I will try to find some time to compile the gnome2 branch and have a
look how it works with multi-byte encodings.


Reinke



More information about the gnucash-devel mailing list