r23598 - gnucash/trunk/src/backend/xml - Bug 710824 - GnuCash should sanitise UTF-8 before serialising files

John Ralls jralls at ceridwen.us
Thu Dec 26 10:53:44 EST 2013


On Dec 26, 2013, at 5:41 AM, Derek Atkins <warlord at MIT.EDU> wrote:

> John Ralls <jralls at ceridwen.us> writes:
> 
>>>> Bug 710824 - GnuCash should sanitise UTF-8 before serialising files
>>>> 
>>>> to avoid writing broken unparseable XML.
>>>> This checks for both bad UTF8 and for invalid control characters
>>>> that libxml2 doesn't convert to entities.
>>> 
>>> Are we going to need a similar process for the SQL backend?
>>> 
>> 
>> I don’t think so. SQL won’t refuse to load a database because one
>> field has a character that doesn’t match some spec. In fact, it
>> doesn’t much care what you put into it; as far as the DB is concerned,
>> bytes is bytes.
> 
> Potentially true for the current set of databases, but it does mean that
> if you go from SQL -> XML -> SQL then the resulting second SQL will not
> be the same as the first.

Well, there are two "right" solutions: One is to get libxml2 to convert those characters into entities. I'll see if there's already a bug for that and file one if there isn't. The other is to filter them out at input, which I've already done for OFX import. I can't think of a use-case where those characters would be useful in one of our fields. That should be extracted into an input module that's called by everything that brings in text from outside of GnuCash, including the GUI. After all, bug 710824 itself probably was caused by a copy-and-paste error.

Regards,
John Ralls





More information about the gnucash-devel mailing list