importing Bills - character encoding
Fred Bone
Fred.Bone at dial.pipex.com
Mon Jan 4 14:35:09 EST 2016
On 4 January 2016 at 23:21, tereque said:
> hi john,
>
> sorry, not 100% sure how to give you the hex.
> attached is the csv I used for testing. The string that is causing the
> encoding trouble can be found in 'Invoice Notes' and in 'Description' here
> it is: 面料 (not sure whether that encodes correctly inline on your
> endthough)
The CSV contains two correctly-encoded strings each representing (the
same) two Chinese characters, U+9762 and U+6599; the UTF8 encoding is
E99DA2 E69699.
The characters embedded in the OP's original message are
U+00E9 U+009D U+00A2 U+00E6 U+2013 U+2122
Note that 0x96 in Windows charsets is EN DASH which is U+2013 and 0x99 is
TM which is U+2122.
Apparently the CSV was read as being in Windows encoding and not as being
in UTF8. The six bytes were read as individual codepoints.
No, I do not know how to force a UTF8 interpretation, unless perhaps by
prefixing a BOM.
More information about the gnucash-user
mailing list