importing Bills - character encoding
John Ralls
jralls at ceridwen.us
Mon Jan 4 15:19:10 EST 2016
> On Jan 4, 2016, at 11:35 AM, Fred Bone <Fred.Bone at dial.pipex.com> wrote:
>
> On 4 January 2016 at 23:21, tereque said:
>
>> hi john,
>>
>> sorry, not 100% sure how to give you the hex.
>> attached is the csv I used for testing. The string that is causing the
>> encoding trouble can be found in 'Invoice Notes' and in 'Description' here
>> it is: 面料 (not sure whether that encodes correctly inline on your
>> endthough)
>
> The CSV contains two correctly-encoded strings each representing (the
> same) two Chinese characters, U+9762 and U+6599; the UTF8 encoding is
> E99DA2 E69699.
>
> The characters embedded in the OP's original message are
> U+00E9 U+009D U+00A2 U+00E6 U+2013 U+2122
> Note that 0x96 in Windows charsets is EN DASH which is U+2013 and 0x99 is
> TM which is U+2122.
>
> Apparently the CSV was read as being in Windows encoding and not as being
> in UTF8. The six bytes were read as individual codepoints.
>
> No, I do not know how to force a UTF8 interpretation, unless perhaps by
> prefixing a BOM.
Fred,
Beat me to it. Thanks.
Gunnar,
I should have looked a bit deeper at the glib code. Glib uses $CHARSET to set the input charset for g_locale_get_utf8(), not the locale's codeset. So add
CHARSET=UTF-8
to environment and try again.
Regards,
John Ralls
More information about the gnucash-user
mailing list