importing Bills - character encoding

John Ralls jralls at ceridwen.us
Mon Jan 4 15:19:10 EST 2016


> On Jan 4, 2016, at 11:35 AM, Fred Bone <Fred.Bone at dial.pipex.com> wrote:
> 
> On 4 January 2016 at 23:21, tereque said:
> 
>> hi john,
>> 
>> sorry, not 100% sure how to give you the hex.
>> attached is the csv I used for testing. The string that is causing the
>> encoding trouble can be found in 'Invoice Notes' and in 'Description' here
>> it is:   面料 (not sure whether that encodes correctly inline on your
>> endthough)
> 
> The CSV contains two correctly-encoded strings each representing (the 
> same) two Chinese characters, U+9762 and U+6599; the UTF8 encoding is 
> E99DA2 E69699.
> 
> The characters embedded in the OP's original message are
> U+00E9 U+009D U+00A2 U+00E6 U+2013 U+2122
> Note that 0x96 in Windows charsets is EN DASH which is U+2013 and 0x99 is 
> TM which is U+2122.
> 
> Apparently the CSV was read as being in Windows encoding and not as being 
> in UTF8. The six bytes were read as individual codepoints.
> 
> No, I do not know how to force a UTF8 interpretation, unless perhaps by 
> prefixing a BOM.

Fred,

Beat me to it. Thanks.

Gunnar,

I should have looked a bit deeper at the glib code. Glib uses $CHARSET to set the input charset for g_locale_get_utf8(), not the locale's codeset. So add
  CHARSET=UTF-8
to environment and try again.

Regards,
John Ralls








More information about the gnucash-user mailing list