importing Bills - character encoding

Mike Evans mikee at saxicola.co.uk
Tue Dec 29 11:34:19 EST 2015


On Tue, 29 Dec 2015 08:03:16 -0800
John Ralls <jralls at ceridwen.us> wrote:

> 
> > On Dec 29, 2015, at 7:35 AM, Mike Evans <mikee at saxicola.co.uk> wrote:
> > 
> > On Mon, 28 Dec 2015 20:59:06 +0800
> > __ <tereque at gmail.com> wrote:
> > 
> >> the feature of importing Bills and invoices from csv files is a great
> >> improvement for my little world as it enables other people then just me to
> >> prepare such documents (which is a huge improvement in the workflow if you
> >> are not only working just by yourself).
> >> However being located in China (and therefore dealing with Chinese
> >> characters at times) I face a big obstacle with this unfortunately. Chines
> >> characters just won't encode right and only display as hieroglyphs after
> >> importing.
> >> 
> >> 
> > I have a fix for this that "works for me" The chars get mangled with a call to:
> > line_utf8 = g_locale_to_utf8 (line, -1, NULL, NULL, NULL));
> > in dialog-bi-import.c
> > 
> > Removing this (plus some other edits) solves the issue "for me" because my input file is already in UTF-8 Unicode  but I have no idea how this would affect Windows or Mac machines or other file encodings.
> > 
> > I'll commit it to git maint for testing.
> > 
> > I guess I should move this to devel list?
> > 
> 
> Before you commit, can you test with the line still in place but ensuring that you're using a UTF-8 locale, i.e. en_GB.UTF8 rather than en_GB.ISO8859-1? g_locale_to_utf8() is effectively g_strdup in that case. IIRC most linuxes default to the ISO encoding if it's not specified. You can check your machine by examining the links in /usr/share/locale.
> 
> Tereque, you should do the same test, with a twist: Ensure that the file you're importing is encoded in the character set specified by your locale.
> 
> For a comprehensive fix I suppose we shouldn't assume that a file encoding matches the locale; we should ask the user for the encoding and conditionally use g_convert() rather than g_locale_to_utf8().
> 
> Regards,
> John Ralls
> 
> 

Thanks John, too late though I just committed. I also just found some reading about a similar issue here:
https://mail.gnome.org/archives/gtk-list/2003-August/msg00253.html .

But, yes we should be testing the string encoding, or locale first.

I'm going to revert and think some more.

Mike E

-- 
PGP key:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x00CDB13500D7AB53  


More information about the gnucash-user mailing list