[GNC-dev] Introduction, a story, and 50% improvement in XML loading speed

Chris Carson chriscarson60187 at gmail.com
Mon Dec 24 07:38:27 EST 2018


TL;DR:  hi!  I'm a programmer!  The attached patch to one line of code
gives a 50% reduction in XML load CPU use!  Skeptical?  I was.

*Introduction* (of me)

My name is Chris Carson.  I wrote code for a living from 1976-1986, and
then entered management but continued dabbling in code as a hobbyist.  C,
C++, Unix, Linux, blah blah.

*A Story*

I have financial data in Quicken dating back to 1991.  When Intuit sold off
Quicken I decided I needed a path to another financial management package.
I wrote a processor to deal with the QIF file duplicate transfer problem
(that's another story) and imported my data into Gnucash.  The resulting
XML save file is 55.8Mb (uncompressed.  Yes, I store it compressed.)

The XML file takes ~38 seconds of user CPU time to load on my build of the
Gnucash 3.3 maint stream.  (For reference, starting Gnucash with an empty
simple account file takes ~4.5 seconds of user CPU time on my machine.) I
did a relatively tedious run of callgrind which showed that about half of
that time was being consumed dom_chars_handler(...) and
checked_char_cast(...), both in the libgnucash/backend/xml directory.

Turns out dom_chars_handler(...) is called with an enormous multi-line
string.  It copies the whole thing and validates it, nibbles off a few
bytes, and returns, only to be called again with the remainder of the
enormous multi-line string to copy, validate, nibble again.

*The Patch*

I tried a couple of different fixes to this.  The patch below copies off
and validates only the bytes being consumed.  It brings the user CPU to
startup and load my XML file from ~38 seconds to ~20.5 seconds, and given
that 4.5 seconds of that is startup I make that about a 50% improvement in
load speed.  I tried a more aggressive fix for funsies and it wasn't much
better.

I have tested this *ONLY* on the load of my largeish XML file.  But the
patched code reads well.

What would you guys advise as next steps?

Patch included below signature, and separately as a file.

Regards,
Chris Carson
=====================
>From b4e1911f774bfc292e97cffd2492a0257d0aee3c Mon Sep 17 00:00:00 2001
From: "Christopher D. Carson" <chriscarson60187 at gmail.com>
Date: Sun, 23 Dec 2018 20:48:02 -0600
Subject: [PATCH] Performance fix in dom_chars_handler: use g_strndup instead
 of g_strdup

Because the origin string can be extraordinarily long, you get more
benefit from this than you would imagine
---
 libgnucash/backend/xml/sixtp-to-dom-parser.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgnucash/backend/xml/sixtp-to-dom-parser.cpp
b/libgnucash/backend/xml/sixtp-to-dom-parser.cpp
index e6ba43039..9aba0801a 100644
--- a/libgnucash/backend/xml/sixtp-to-dom-parser.cpp
+++ b/libgnucash/backend/xml/sixtp-to-dom-parser.cpp
@@ -95,7 +95,7 @@ static gboolean dom_chars_handler (
 {
     if (length > 0)
     {
-        gchar* newtext = g_strdup (text);
+        gchar* newtext = g_strndup (text,length);
         xmlNodeAddContentLen ((xmlNodePtr)parent_data,
                               checked_char_cast (newtext), length);
         g_free (newtext);
-- 
2.19.2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Performance-fix-in-dom_chars_handler-use-g_strndup-i.patch
Type: application/x-patch
Size: 1066 bytes
Desc: not available
URL: <http://lists.gnucash.org/pipermail/gnucash-devel/attachments/20181224/dfadd330/attachment.bin>


More information about the gnucash-devel mailing list