[GNC-dev] More need for speed improvement around XML

John Ralls jralls at ceridwen.us
Fri Jan 4 19:38:26 EST 2019



> On Dec 30, 2018, at 4:52 PM, John Ralls <jralls at ceridwen.us> wrote:
> 
> 
> 
>> On Dec 30, 2018, at 1:25 PM, Christian Stimming <christian at cstimming.de> wrote:
>> 
>> Am Sonntag, 30. Dezember 2018, 00:44:40 CET schrieb John Ralls:
>>>> On Dec 29, 2018, at 2:16 PM, Christian Stimming <christian at cstimming.de>
>>>> wrote:> 
>>>> Am Samstag, 29. Dezember 2018, 01:45:32 CET schrieb John Ralls:
>>>>>> When saving to XML file, for each transaction the call stack with the
>>>>>> expensive code walks down like this:
>>>>>> 
>>>>>> gnc_transaction_dom_tree_create
>>>>>> add_time64
>>>>>> time64_to_dom_tree
>>>>>> gnc_print_time64
>>>>>> GncDateTime::format
>>>>>> GncDateTimeImpl::format
>>>>> 
>>>>> Christian,
>>>>> 
>>>>> I don't have time to fully test it out right now, but give
>>>>> https://github.com/jralls/gnucash/tree/maint a go.
>>>>> 
>>>>> As noted in the HEAD commit it has a side effect of recording times in
>>>>> XML
>>>>> files in UTC, no timezones. Users who want to keep their files in version
>>>>> control will like that, but it needs to be checked for backward
>>>>> compatibility with 2.6 before it can be merged into maint.
>>>> 
>>>> Thanks for working at this. Indeed the main issue is that we don't need a
>>>> user-visible format conversion (which must respect the locale) but instead
>>>> a serialization to a fixed iso8601 format. Fixing this issue also means
>>>> that the XML files will no longer depend on the local timezone, which is
>>>> an issue that has been asked for a long time.
>>>> 
>>>> Thanks for your patch - this fixes performance issues with
>>>> gnc_time64_to_iso8601_buff() and all of its callers. However, the
>>>> Save-to-XML function calls gnc_print_time64() instead (from
>>>> time64_to_dom_tree), which is a different code path. Consequently, your
>>>> patch alone does not modify the saved XML (just verified). Your call to
>>>> GncDateTime::format_iso8601 needs to be used in gnc_print_time64, too,
>>>> like so:
>>>> 
>>>> xmlNodePtr
>>>> time64_to_dom_tree (const char* tag, const time64 time)
>>>> {
>>>> 
>>>>   xmlNodePtr ret;
>>>>   g_return_val_if_fail (time != INT64_MAX, NULL);
>>>> 
>>>> -    auto date_str = gnc_print_time64 (time, "%Y-%m-%d %H:%M:%S %q");
>>>> -    if (!date_str)
>>>> -        return NULL;
>>>> +    GncDateTime gncdt(time);
>>>> +    auto sstr = gncdt.format_iso8601();
>>>> +    gchar* date_str = g_strdup(sstr.c_str());
>>>> 
>>>> 
>>>> However, with this patch, all time64 timestamps in the XML file will then
>>>> change from time+timezone to UTC time only. To my surprise, although we
>>>> knew about this issue for such a long time and changed most timestamps
>>>> appropriately, there are still timestamps that will change date due to
>>>> this
>>>> change. With a quick glance, the "reconciled-date" timestamp contains
>>>> beginning-of-day in a local timezone. Also maybe user-entered commodity
>>>> prices. Others might still show up, oh well.
>>>> 
>>>> I haven't checked for backward compatibility with 2.6, but will do during
>>>> the next days.
>>> 
>>> Rats, I got the two functions scrambled.
>>> 
>>> Geert and I discussed a month or two ago expanding the use of time-neutral
>>> to everywhere that we currently use day-begin and day-end except searches,
>>> but neither of us has gotten to it yet.
>> 
>> For completeness: I checked loading such a file with gnucash-2.6, and as far 
>> as I can tell, this runs just fine. All transactions and their dates show up 
>> as usual even though the timestamps are now read as UTC instead of localtime.
> 
> The timestamps have always been UTC, they've just been written oddly. It's explained in comments somewhere, I'll summarize: We've historically written timestamps as an ISO8607-like string, i.e. YYYY-MM-DD HH:MM:SS with an offset representing a timezone, ±HHMM. For example my TZ offset is -0800, so the date-time right now is 2018-12-30 14:26:24 -0800. The parser adds back the 8 hours and stores whatever 2018-12-30 22:26:24 is in seconds since the epoch. Losing the offset doesn't change anything except to make less work for
> the parser and reducing the diff of two files written with different offsets.
> 
> It's good news that the 2.6 parser reads them correctly.
> 
> Since SQL doesn't understand offsets we've always used plain UTC date times in the SQL backend.

I've merged Chris Carson's PRs to speed up the scrub and to get the C++ locale only once and I've fixed up my format_iso8601 so that it's used in both backends instead of GncDateTime::format and pushed that as well.

Regards,
John Ralls



More information about the gnucash-devel mailing list