[GNC-dev] backend character encoding (related to python 2 to 3 )

John Ralls jralls at ceridwen.us
Fri Sep 21 10:11:55 EDT 2018

> On Sep 21, 2018, at 1:43 AM, Geert Janssens <geert.gnucash at kobaltwit.be> wrote:
> Op vrijdag 21 september 2018 10:02:02 CEST schreef c.holtermann at gmx.de:
>> Dear developers,
>> thinking about moving from python2 to 3 I wonder how character
>> encoding in the backend is done. Can you point me to some docs
>> about that ? Which encoding in sqlite, mysql, xml ? Where does
>> encoding take place, where is it being controlled ? I don't need
>> an extensive answer, just some pointers where to start looking.
> From memory I believe we force our gnucash data to be utf-8 at all times. As 
> this is also the encoding used by glib2 internally there is no need for extra 
> encoding functionality in the backends.
> While not supported users could make changes to the xml data outside of the 
> application and insert invalid utf-8 in that case. To protect against that we 
> have one function to validate strings in the xml data file while loading/
> saving. It's called checked_char_cast and is located in:
> https://github.com/Gnucash/gnucash/blob/maint/libgnucash/backend/xml/gnc-xml-helper.cpp
> I don't think we have something similar in the db backends. I think we rely on 
> the dbms to handle this for us.
> I'm writing much in conditional terms as I don't know this part in gnucash 
> very thoroughly. John may correct me if I misunderstood.

Right. checked_char_cast isn’t needed in the SQL backends because the database engines enforce utf-8 for us.

I *think* for python the only issue is making sure that every path between python and GnuCash is bridged as utf-8.

Guile is another matter, but it’s not germane to this topic.

John Ralls

