Testing locale change from gnucash 1.8 to gnucash-g2
linux at codehelp.co.uk
Sun Sep 25 05:55:26 EDT 2005
On Sunday 25 September 2005 9:59 am, Didier Vidal wrote:
> So, here is what I understand of the situation with encoding:
> * internally, gnucash-g2 data are in utf-8, whatever the locale used
> to launch gnucash.
> What lead me to believe this is a trace I added in
> xaccAccountSetName, in src/engine/Account.c
It's the libxml2 documentation that should have told you about UTF-8.
"One of the core decisions was to force all documents to be converted to a
default internal encoding, and that encoding to be UTF-8, "
"If there is no encoding declaration, then the input has to be in either UTF-8
or UTF-16, if it is not then at some point when processing the input, the
converter/checker of UTF-8 form will raise an encoding error. You may end-up
with a garbled document, or no document at all !
if no encoding is given, libxml2 will look for an encoding value associated to
the document and if it exists will try to save to that encoding,
otherwise everything is written in the internal form, i.e. UTF-8"
G2 does not associate an encoding with the xmlDocPtr and does not use
xmlDocPtr to write out a file using the Gnucash XML file backend so libxml2
has no chance to alter the encoding. QSF does use xmlDocPtr and I can
therefore set it to use the local encoding using:
char* locale = setlocale(LC_CTYPE, NULL);
This will be in my next commit (provided it tests successfully). It is only
for human readability, libxml2 is quite happy with everything in UTF-8.
> xaccAccountSetName (Account *acc, const char *str)
> char * tmp;
> printf("xaccAccountSetName: %s\n", str);
? The actual source code doesn't use printf:
xaccAccountSetName (Account *acc, const char *str)
char * tmp;
if ((!acc) || (!str)) return;
/* make strdup before freeing (just in case str==accountName !!) */
tmp = g_strdup (str);
acc->accountName = tmp;
acc->inst.dirty = TRUE;
You might have picked up the debug message routine.
> * the encoding conversion when reading a file seems to be handled
> correctly by libxml2, even if the file doesn't respect the XML spec and
> doesn't specify its encoding.
> * gnucash-g2 seems to write the xml files in ISO-8859-1, whatever the
> locale used to launch gnucash (at least on my machine). I don't yet
> understand why.
I've just tried that on my system and I can find no char set conversion - it's
output as UTF-8.
> * The code pointed by Neil
> (fprintf(out, "<?xml version=\"1.0\"?>\n"); in io-gncxml-v2.c)
> is not called when you save a gnucash file.
??? Umm, it is:
Click on save calls qof_session_save.
qof_session_save calls QofBackend->sync_all which in the case of the GnuCash
XML File backend is file_sync_all in gnc_backend_file.c - maybe you missed
the indirection there, it's a generic pointer to the specific function in the
backend. Each backend provides their own sync_all routine so perhaps your
output mentioned sync_all and not the actual file_sync_all.
file_sync_all calls gnc_file_be_write_to_file
which calls gnc_book_write_to_xml_file_v2 in io-gnc-xml-v2.c
gnc_book_write_to_xml_file_v2 calls gnc_book_write_to_xml_filehandle_v2
which calls write_v2_header whose first line is
fprintf(out, "<?xml version=\"1.0\"?>\n");
That's how the namespace lines now appear in G2 that didn't in 1.8,
write_v2_header was patched a few weeks ago to call a series of
gnc_xml2_write_namespace_decl which add the
xmlns:bt-days="http://www.gnucash.org/XML/bt-days" type lines at the head of
all subsequent G2 files.
As those lines ARE present in the file you attached previously, it is clear
that write_v2_header IS being called to create the first dozen or so lines of
the file, including the first one.
> Anyway, it seems dangerous
> to me to write an encoding specification in this part if we don't know
> the actual encoding that will be used to write the rest of the file.
The rest of the file follows what libxml2 does - it knows nothing about the
original encoding and therefore uses the internal UTF-8 encoding of the
It would be dangerous to set write_v2_header to use a different encoding
string, yes, because each call to xmlElemDump elsewhere in the gnc-backend
would have to know the intended charset and use conversion routines in
libxml2 before returning the characters to write.
However, the default is to write UTF-8. If we have to specify the encoding at
all, it can only be UTF-8 that can be used. Changing gnc-backend to use the
locale charset is (IMHO) pointless and wasteful as it is already slated for
replacement. libxml2 shows no signs of changing their default - certainly not
within the timeframe that would see gnc-backend-file itself being replaced.
> agree with David that utf-8 is a good target.
Then adding UTF-8 to write_v2_header is the correct way of implementing the
expression of the encoding that has always been implicit.
QSF uses libxml2 to write out the XML header and that will use libxml2 to add
an expression of the encoding too.
However, none of this is a problem within GnuCash - libxml2 enforces the
internal UTF-8 and the only omission was not to state that UTF-8 is what is
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20050925/cabc2eca/attachment.bin
More information about the gnucash-devel