importing Bills - character encoding

Fred Bone Fred.Bone at dial.pipex.com
Mon Jan 4 14:35:09 EST 2016


On 4 January 2016 at 23:21, tereque said:

> hi john,
> 
> sorry, not 100% sure how to give you the hex.
> attached is the csv I used for testing. The string that is causing the
> encoding trouble can be found in 'Invoice Notes' and in 'Description' here
> it is:   面料 (not sure whether that encodes correctly inline on your
> endthough)

The CSV contains two correctly-encoded strings each representing (the 
same) two Chinese characters, U+9762 and U+6599; the UTF8 encoding is 
E99DA2 E69699.

The characters embedded in the OP's original message are
U+00E9 U+009D U+00A2 U+00E6 U+2013 U+2122
Note that 0x96 in Windows charsets is EN DASH which is U+2013 and 0x99 is 
TM which is U+2122.

Apparently the CSV was read as being in Windows encoding and not as being 
in UTF8. The six bytes were read as individual codepoints.

No, I do not know how to force a UTF8 interpretation, unless perhaps by 
prefixing a BOM.




More information about the gnucash-user mailing list