Testing locale change from gnucash 1.8 to gnucash-g2

Neil Williams linux at codehelp.co.uk
Sun Sep 25 05:55:26 EDT 2005


On Sunday 25 September 2005 9:59 am, Didier Vidal wrote:
> So, here is what I understand of the situation with encoding:
>
>    * internally, gnucash-g2 data are in utf-8, whatever the locale used
> to launch gnucash.

True.

> What lead me to believe this is a trace I added in 
> xaccAccountSetName, in src/engine/Account.c

It's the libxml2 documentation that should have told you about UTF-8.

"One of the core decisions was to force all documents to be converted to a 
default internal encoding, and that encoding to be UTF-8, "
http://xmlsoft.org/encoding.html

"If there is no encoding declaration, then the input has to be in either UTF-8 
or UTF-16, if it is not then at some point when processing the input, the 
converter/checker of UTF-8 form will raise an encoding error. You may end-up 
with a garbled document, or no document at all ! 

When saving:
if no encoding is given, libxml2 will look for an encoding value associated to 
the document and if it exists will try to save to that encoding,

otherwise everything is written in the internal form, i.e. UTF-8"

G2 does not associate an encoding with the xmlDocPtr and does not use 
xmlDocPtr to write out a file using the Gnucash XML file backend so libxml2 
has no chance to alter the encoding. QSF does use xmlDocPtr and I can 
therefore set it to use the local encoding using:
char* locale = setlocale(LC_CTYPE, NULL);

This will be in my next commit (provided it tests successfully). It is only 
for human readability, libxml2 is quite happy with everything in UTF-8.

> --------
> void
> xaccAccountSetName (Account *acc, const char *str)
> {
>    char * tmp;
>
>
>    printf("xaccAccountSetName: %s\n", str);

? The actual source code doesn't use printf:
void
xaccAccountSetName (Account *acc, const char *str)
{
   char * tmp;

   if ((!acc) || (!str)) return;

   xaccAccountBeginEdit(acc);
   {
     /* make strdup before freeing (just in case str==accountName !!) */
     tmp = g_strdup (str);
     g_free (acc->accountName);
     acc->accountName = tmp;

     mark_account (acc);
   }
   acc->inst.dirty = TRUE;
   xaccAccountCommitEdit(acc);
}

You might have picked up the debug message routine.

>    * the encoding conversion when reading a file seems to be handled
> correctly by libxml2, even if the file doesn't respect the XML spec and
> doesn't specify its encoding.

Correct.

>    * gnucash-g2 seems to write the xml files in ISO-8859-1, whatever the
> locale used to launch gnucash (at least on my machine). I don't yet
> understand why.

I've just tried that on my system and I can find no char set conversion - it's 
output as UTF-8.

>    * The code pointed by Neil
> (fprintf(out, "<?xml version=\"1.0\"?>\n"); in io-gncxml-v2.c)
> is not called when you save a gnucash file.

??? Umm, it is:

Click on save calls qof_session_save.

qof_session_save calls QofBackend->sync_all which in the case of the GnuCash 
XML File backend is file_sync_all in gnc_backend_file.c - maybe you missed 
the indirection there, it's a generic pointer to the specific function in the 
backend. Each backend provides their own sync_all routine so perhaps your 
output mentioned sync_all and not the actual file_sync_all.

file_sync_all calls gnc_file_be_write_to_file
which calls gnc_book_write_to_xml_file_v2 in io-gnc-xml-v2.c
gnc_book_write_to_xml_file_v2  calls gnc_book_write_to_xml_filehandle_v2
which calls write_v2_header whose first line is    
 fprintf(out, "<?xml version=\"1.0\"?>\n");

That's how the namespace lines now appear in G2 that didn't in 1.8, 
write_v2_header was patched a few weeks ago to call a series of 
gnc_xml2_write_namespace_decl which add the      
xmlns:bt-days="http://www.gnucash.org/XML/bt-days" type lines at the head of 
all subsequent G2 files.

As those lines ARE present in the file you attached previously, it is clear 
that write_v2_header IS being called to create the first dozen or so lines of 
the file, including the first one.

> Anyway, it seems dangerous 
> to me to write an encoding specification in this part if we don't know
> the actual encoding that will be used to write the rest of the file.

The rest of the file follows what libxml2 does - it knows nothing about the 
original encoding and therefore uses the internal UTF-8 encoding of the 
libxml2 data.

It would be dangerous to set write_v2_header to use a different encoding 
string, yes, because each call to xmlElemDump elsewhere in the gnc-backend 
would have to know the intended charset and use conversion routines in 
libxml2 before returning the characters to write.

However, the default is to write UTF-8. If we have to specify the encoding at 
all, it can only be UTF-8 that can be used. Changing gnc-backend to use the 
locale charset is (IMHO) pointless and wasteful as it is already slated for 
replacement. libxml2 shows no signs of changing their default - certainly not 
within the timeframe that would see gnc-backend-file itself being replaced.

> I 
> agree with David that utf-8 is a good target.

Then adding UTF-8 to write_v2_header is the correct way of implementing the 
expression of the encoding that has always been implicit.

QSF uses libxml2 to write out the XML header and that will use libxml2 to add 
an expression of the encoding too.

However, none of this is a problem within GnuCash - libxml2 enforces the 
internal UTF-8 and the only omission was not to state that UTF-8 is what is 
written out.

-- 

Neil Williams
=============
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20050925/cabc2eca/attachment.bin


More information about the gnucash-devel mailing list