importing Bills - character encoding

John Ralls jralls at ceridwen.us
Tue Dec 29 11:03:16 EST 2015


> On Dec 29, 2015, at 7:35 AM, Mike Evans <mikee at saxicola.co.uk> wrote:
> 
> On Mon, 28 Dec 2015 20:59:06 +0800
> __ <tereque at gmail.com> wrote:
> 
>> the feature of importing Bills and invoices from csv files is a great
>> improvement for my little world as it enables other people then just me to
>> prepare such documents (which is a huge improvement in the workflow if you
>> are not only working just by yourself).
>> However being located in China (and therefore dealing with Chinese
>> characters at times) I face a big obstacle with this unfortunately. Chines
>> characters just won't encode right and only display as hieroglyphs after
>> importing.
>> 
>> 
> I have a fix for this that "works for me" The chars get mangled with a call to:
> line_utf8 = g_locale_to_utf8 (line, -1, NULL, NULL, NULL));
> in dialog-bi-import.c
> 
> Removing this (plus some other edits) solves the issue "for me" because my input file is already in UTF-8 Unicode  but I have no idea how this would affect Windows or Mac machines or other file encodings.
> 
> I'll commit it to git maint for testing.
> 
> I guess I should move this to devel list?
> 

Before you commit, can you test with the line still in place but ensuring that you're using a UTF-8 locale, i.e. en_GB.UTF8 rather than en_GB.ISO8859-1? g_locale_to_utf8() is effectively g_strdup in that case. IIRC most linuxes default to the ISO encoding if it's not specified. You can check your machine by examining the links in /usr/share/locale.

Tereque, you should do the same test, with a twist: Ensure that the file you're importing is encoded in the character set specified by your locale.

For a comprehensive fix I suppose we shouldn't assume that a file encoding matches the locale; we should ask the user for the encoding and conditionally use g_convert() rather than g_locale_to_utf8().

Regards,
John Ralls





More information about the gnucash-user mailing list