importing Bills - character encoding
John Ralls
jralls at ceridwen.us
Tue Dec 29 11:03:16 EST 2015
> On Dec 29, 2015, at 7:35 AM, Mike Evans <mikee at saxicola.co.uk> wrote:
>
> On Mon, 28 Dec 2015 20:59:06 +0800
> __ <tereque at gmail.com> wrote:
>
>> the feature of importing Bills and invoices from csv files is a great
>> improvement for my little world as it enables other people then just me to
>> prepare such documents (which is a huge improvement in the workflow if you
>> are not only working just by yourself).
>> However being located in China (and therefore dealing with Chinese
>> characters at times) I face a big obstacle with this unfortunately. Chines
>> characters just won't encode right and only display as hieroglyphs after
>> importing.
>>
>>
> I have a fix for this that "works for me" The chars get mangled with a call to:
> line_utf8 = g_locale_to_utf8 (line, -1, NULL, NULL, NULL));
> in dialog-bi-import.c
>
> Removing this (plus some other edits) solves the issue "for me" because my input file is already in UTF-8 Unicode but I have no idea how this would affect Windows or Mac machines or other file encodings.
>
> I'll commit it to git maint for testing.
>
> I guess I should move this to devel list?
>
Before you commit, can you test with the line still in place but ensuring that you're using a UTF-8 locale, i.e. en_GB.UTF8 rather than en_GB.ISO8859-1? g_locale_to_utf8() is effectively g_strdup in that case. IIRC most linuxes default to the ISO encoding if it's not specified. You can check your machine by examining the links in /usr/share/locale.
Tereque, you should do the same test, with a twist: Ensure that the file you're importing is encoded in the character set specified by your locale.
For a comprehensive fix I suppose we shouldn't assume that a file encoding matches the locale; we should ask the user for the encoding and conditionally use g_convert() rather than g_locale_to_utf8().
Regards,
John Ralls
More information about the gnucash-user
mailing list