umlauts garbled up

Andreas Köhler andi5.py at gmx.net
Fri Mar 10 21:23:55 EST 2006


Hi,

On Friday, 10 Mar 2006, 22:37 CET, Derek Atkins wrote:
> What we probably need to do is write a program that goes through the
> file and finds every non-ascii "character" and asks the user what
> charset the character is from, perhaps giving them a choice of
> different charsets and what the character would be..  (assuming of
> course we can figure that out).  Then of course we can attempt to
> remember those choices..  And then perform the various necessary
> character conversions to produce a utf8 file.
> 
> Someone want to work on a small program to do that?

I have attached a python script that tries to do a bit of that. It
is a quick hack, so do not expect it to solve all your problems and
make backups, please.

Written in python, it reads a file and writes to another. Meanwhile
it tries to decode every line in a given charset and utf-8. If only
one of the two fail, it will choose the other one. If both fail it
just writes the input string (it should rather tell the user about
that, but I did not want to change the script without testing, so
that it at least _works_).

If the input line decodes successfully in both charsets, the script
will compare character by character and print you the two
possibilities (encoded in a given terminal encoding) on difference.
One should look good, the other ugly... Choose the nice one and it
will remember that :)

How to run:

* check http://docs.python.org/lib/standard-encodings.html for the
  codec needed to decode the gnucash1.8 bytes (probably 8bit
  encoding, like iso8859-1 (pre-selected), koi8_r or such)

* adjust oldenc (see above) and (if necessary) termenc (python
  should discover that just fine)

* run as ./convert2utf8 ${inputfile} ${output file}

I would really like to hear your opinions about that and whether it
is worth improving it a bit.

-- andi5
-------------- next part --------------
A non-text attachment was scrubbed...
Name: convert2utf8.py
Type: text/x-python
Size: 3902 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20060310/86bf0d19/convert2utf8.py


More information about the gnucash-devel mailing list