Gnome2 UTF-8 handling

Reinke Bonte reinke.bonte at web.de
Mon Dec 22 00:04:41 CST 2003


On Tue, 16 Dec 2003 22:57:31 +0100
Eneko Lacunza <listas at enlar.net> wrote:
> 	It is perfectly possible to handle utf-8 encoded strings without
> 	using
> C wide characters. It's just that you can't use standard str*
> functions for some tasks, that's all. Glib/Gdk libraries have the
> replacement functions for utf-8 strings, if I don't get it wrong.
> 

Of course you can, but the disadvantage is that you can't use
intelligent string functions and possibly have a UTF-8 character split
up in the middle. And what is the advantage of converting wchar_t into
char? The original poster wanted to convert functions that already use
wchar_t to char, if I remember correctly.


> 	I don't see why this means we need to use GdkWChar (note: it is
> 	not
> wchat_t).

I didn't know that "wide characters" referred to GdkWChar. 

> 
> > > > The hard part is going to be converting the existing XML and
> > > > database data from whatever it's currently using to UTF8.
> > > We don't currently include an "encoding" in the XML data file. 
> > > That could be used as a trigger to ask the user for the old
> > > encoding and then convert the data to UTF-8.  A nice touch would
> > > be to scan the file first looking for any characters with the high
> > > order bit set to see if conversion is needed in the first place.
> > I don't know about database data, but the XML file is a complete
> > mess. You will not find any high order bit set in the XML file,
> > because libxml has converted everything into HTML-entities. But
> > unfortunately the wrong entities for every encoding != Latin1. Here
> > a manual recoding of the XML-File is necessary, as I described twice
> > here on this mailing list.
> 
> 	I think it it perfectly possible to parse the XML file with it's
> parser, then check all strings (unencoded from HTML-entities by the
> parser).

Of course it is possible, but it is not trivial. The aim is to compile
gnucash with libxml2, but the existing parser relies on a bug of
libxml1. I don't think you can use the parser the same way as it works
now with libxml2.


Sorry, that I post with very little background knowledge, but proper
UTF-8 support is very important to me, and I really want it work. 


Reinke




More information about the gnucash-devel mailing list