gnome2 utf-8 patch
hampton at employees.org
Tue Mar 2 22:45:28 CST 2004
On Sat, 2004-01-17 at 13:31, Scott Oonk wrote:
Sorry for taking forever to get back around to this. I haven't been
spending any time on gnucash recently. I've checked in your changes
with a couple of modifications as mentioned below.
> > 1) The code fragment:
> > key = g_unichar_islower (key_char_uc)
> > ? g_unichar_toupper (key_char_uc)
> > : key_char_uc;
> > doesn't seem necessary given the description of g_unichar_toupper.
> > g_unichar_toupper returns the result of converting c to uppercase. If c
> > is not an lowercase or titlecase character, or has no upper case
> > equivalent c is returned unchanged.
> > This code fragment could just be:
> > key = g_unichar_toupper (key_char_uc);
> I agree.
I've changed this in the code.
> > 2) In quickfill_insert_recursive you've left a strncmp() of two utf-8
> > strings. This should probably be a comparison of the two strings after
> > they are passed to g_utf8_normalize. Actually, quickfill should
> > probably store only normalized strings.
> I'll update gnc_quickfill_insert to normalize before storing the string.
I added a call to create a normalized string, and some cleanup at the
end of the function.
> > There's also a string comparison here that
> > should probably compare normalized strings.
> I'll go back through the code and try to find the places where we
> receive the strings from the gui components and normalize there. Do you
> think we need to call g_utf8_validate as well, or can we trust Gnome to
> pass us valid utf-8 strings?
Better safe than sorry. I think we can probably trust gnome, but then
again we're only talking about a couple of strings at a time, and the
validation overhead should be small compared.
> > 4) In gnc_quickfill_cell_modify_verify() where you have a comment about
> > something being non-safe, I think there's a bug. You're passing the
> > number of characters to g_strndup() instead of the number of bytes.
> Yes, that should be newval_len not newval_chars
> > If
> > you know that the byte count includes only whole utf characters, you
> > could just use it here. Its also utf-8 safe I think. I think utf-8
> > guarantees that the null character isn't used anywhere in the string
> > except as a terminator.
> The problem I was worried about here was languages where a single
> lowercase character can map to multiple uppercase characters. In German
> the lowercase letter ß (Eszett), maps to uppercase 'SS'. The byte
> length could easily be different - I'm not even sure the character
> length would be the same (I don't know if the 'SS' is two characters or
Huh. I knew those two were equivalent, but never knew one was upper case
and one was lower case.
I've seen several sources state that a null character is never used as
part of a utf-8 string, so it should be safe to replace this g_strndup
with g_strdup. I didn't change it though.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.gnucash.org/pipermail/gnucash-patches/attachments/20040302/05d871d2/attachment.pgp
More information about the gnucash-patches