gnome2 utf-8 patch

David Hampton hampton at employees.org
Tue Mar 2 22:45:28 CST 2004


On Sat, 2004-01-17 at 13:31, Scott Oonk wrote:

Sorry for taking forever to get back around to this.  I haven't been
spending any time on gnucash recently.  I've checked in your changes
with a couple of modifications as mentioned below.

David


> > 1) The code fragment:
> > 
> >   key = g_unichar_islower (key_char_uc)
> >     ? g_unichar_toupper (key_char_uc)
> >     : key_char_uc;
> > 
> > doesn't seem necessary given the description of g_unichar_toupper.
> > 
> > g_unichar_toupper returns the result of converting c to uppercase. If c
> > is not an lowercase or titlecase character, or has no upper case
> > equivalent c is returned unchanged.
> > 
> > This code fragment could just be:
> > 
> >   key = g_unichar_toupper (key_char_uc);
> 
> I agree.

I've changed this in the code.

> > 2) In quickfill_insert_recursive you've left a strncmp() of two utf-8
> > strings.  This should probably be a comparison of the two strings after
> > they are passed to g_utf8_normalize.  Actually, quickfill should
> > probably store only normalized strings.
> I'll update gnc_quickfill_insert to normalize before storing the string.

I added a call to create a normalized string, and some cleanup at the
end of the function.

> > There's also a string comparison here that
> > should probably compare normalized strings.
> I'll go back through the code and try to find the places where we
> receive the strings from the gui components and normalize there.  Do you
> think we need to call g_utf8_validate as well, or can we trust Gnome to
> pass us valid utf-8 strings?

Better safe than sorry.  I think we can probably trust gnome, but then
again we're only talking about a couple of strings at a time, and the
validation overhead should be small compared.

> > 4) In gnc_quickfill_cell_modify_verify() where you have a comment about
> > something being non-safe, I think there's a bug.  You're passing the
> > number of characters to g_strndup() instead of the number of bytes.
> Yes, that should be newval_len not newval_chars

Changed.

> > If
> > you know that the byte count includes only whole utf characters, you
> > could just use it here.  Its also utf-8 safe I think.  I think utf-8
> > guarantees that the null character isn't used anywhere in the string
> > except as a terminator.
> The problem I was worried about here was languages where a single
> lowercase character can map to multiple uppercase characters.  In German
> the lowercase letter ß (Eszett), maps to uppercase 'SS'.  The byte
> length could easily be different - I'm not even sure the character
> length would be the same (I don't know if the 'SS' is two characters or
> one).  

Huh. I knew those two were equivalent, but never knew one was upper case
and one was lower case.

I've seen several sources state that a null character is never used as
part of a utf-8 string, so it should be safe to replace this g_strndup
with g_strdup.  I didn't change it though.

David

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20040302/05d871d2/attachment.pgp


More information about the gnucash-devel mailing list