umlauts garbled up
Andreas Köhler
andi5.py at gmx.net
Fri Mar 10 21:23:55 EST 2006
Hi,
On Friday, 10 Mar 2006, 22:37 CET, Derek Atkins wrote:
> What we probably need to do is write a program that goes through the
> file and finds every non-ascii "character" and asks the user what
> charset the character is from, perhaps giving them a choice of
> different charsets and what the character would be.. (assuming of
> course we can figure that out). Then of course we can attempt to
> remember those choices.. And then perform the various necessary
> character conversions to produce a utf8 file.
>
> Someone want to work on a small program to do that?
I have attached a python script that tries to do a bit of that. It
is a quick hack, so do not expect it to solve all your problems and
make backups, please.
Written in python, it reads a file and writes to another. Meanwhile
it tries to decode every line in a given charset and utf-8. If only
one of the two fail, it will choose the other one. If both fail it
just writes the input string (it should rather tell the user about
that, but I did not want to change the script without testing, so
that it at least _works_).
If the input line decodes successfully in both charsets, the script
will compare character by character and print you the two
possibilities (encoded in a given terminal encoding) on difference.
One should look good, the other ugly... Choose the nice one and it
will remember that :)
How to run:
* check http://docs.python.org/lib/standard-encodings.html for the
codec needed to decode the gnucash1.8 bytes (probably 8bit
encoding, like iso8859-1 (pre-selected), koi8_r or such)
* adjust oldenc (see above) and (if necessary) termenc (python
should discover that just fine)
* run as ./convert2utf8 ${inputfile} ${output file}
I would really like to hear your opinions about that and whether it
is worth improving it a bit.
-- andi5
-------------- next part --------------
A non-text attachment was scrubbed...
Name: convert2utf8.py
Type: text/x-python
Size: 3902 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20060310/86bf0d19/convert2utf8.py
More information about the gnucash-devel
mailing list