SQL backend performance

Donald Allen donaldcallen at gmail.com
Wed Feb 24 14:37:11 EST 2010


Some good news:

Doing this the easy way first, I did a little manual pc sampling. I
ran gnucash (today's trunk) under gdb, let it get to the point where
it begins to load my data from postgresql, and periodically ctrl-c'd
in gdb and copied the interrupted location and a backtrace to an emacs
buffer. In the interest of saving space in this message, rather than
just copying the result into the message (I assume that attachments
are not permitted in emails to the gnucash mailing lists), I'll tell
you that I interrupted execution 7 times and the first 6 were
identical to this:

0x00007ffff0471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
(gdb) bt
#0  0x00007ffff0471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
#1  0x00007fffe86d5625 in query_transactions (be=0xa58150, stmt=<value
optimized out>) at gnc-transaction-sql.c:385
#2  0x00007fffe86d5919 in gnc_sql_transaction_load_all_tx
(be=0xa58150) at gnc-transaction-sql.c:768
#3  0x00007ffff0464643 in g_hash_table_foreach () from /usr/lib/libglib-2.0.so.0
#4  0x00007ffff7bc69e3 in qof_object_foreach_backend
(backend_name=<value optimized out>, cb=0x7fffe86cd220
<initial_load_cb>, user_data=<value optimized out>) at qofobject.c:373
#5  0x00007fffe86cd1f1 in gnc_sql_load (be=0xa58150, book=0x9cfc80,
loadType=<value optimized out>) at gnc-backend-sql.c:193
#6  0x00007fffe88e172b in gnc_dbi_load (qbe=0xa58150, book=0x9cfc80,
loadType=LOAD_TYPE_INITIAL_LOAD) at gnc-backend-dbi.c:800
#7  0x00007ffff7bccf5c in qof_session_load (session=0xa34ec0,
percentage_func=0x7ffff65a2d10 <gnc_window_show_progress>) at
qofsession.c:1320
#8  0x00007ffff656dd57 in gnc_post_file_open (filename=<value
optimized out>) at gnc-file.c:758
#9  0x000000000040368c in inner_main (closure=<value optimized out>,
argc=<value optimized out>, argv=<value optimized out>) at
gnucash-bin.c:521
#10 0x00007ffff01aa1cf in invoke_main_func () from /usr/lib/libguile.so.17
#11 0x00007ffff018056a in c_body () from /usr/lib/libguile.so.17
#12 0x00007ffff01e6cbd in scm_c_catch () from /usr/lib/libguile.so.17
#13 0x00007ffff0180a17 in scm_i_with_continuation_barrier () from
/usr/lib/libguile.so.17
#14 0x00007ffff0180ab0 in scm_c_with_continuation_barrier () from
/usr/lib/libguile.so.17
#15 0x00007ffff01e5c14 in scm_i_with_guile_and_parent () from
/usr/lib/libguile.so.17
#16 0x00007ffff01aa185 in scm_boot_guile () from /usr/lib/libguile.so.17
#17 0x00000000004032f6 in main (argc=1, argv=0x7fffffffe798) at
gnucash-bin.c:672

I got myself a copy of the glib source code. In the comments preceding
g_list_append, this appears:

 * Note that g_list_append() has to traverse the entire list
 * to find the end, which is inefficient when adding multiple
 * elements. A common idiom to avoid the inefficiency is to prepend
 * the elements and reverse the list when all elements have been added.

This is good advice that is familiar to any of us who have done a lot
of Lisp/Scheme work. So I changed query_transactions to call
g_list_prepend inside the transaction loop and reverse the list after
the loop complete. Now my data loads in about 30 seconds, whereas it
was 45 or so previously (my checking account register looks ok, so I
don't think I broke anything). So I repeated the process, periodically
interrupting data loading in gdb. Now I started to see things like
this:

(gdb) bt
#0  0x00007ffff0471187 in g_list_append () from /usr/lib/libglib-2.0.so.0
#1  0x00007fffe86d55c0 in load_splits_for_tx_list (be=0xa5a300,
stmt=<value optimized out>) at gnc-transaction-sql.c:238
#2  query_transactions (be=0xa5a300, stmt=<value optimized out>) at
gnc-transaction-sql.c:395
#3  0x00007fffe86d5969 in gnc_sql_transaction_load_all_tx
(be=0xa5a300) at gnc-transaction-sql.c:769
#4  0x00007ffff0464643 in g_hash_table_foreach () from /usr/lib/libglib-2.0.so.0
#5  0x00007ffff7bc69e3 in qof_object_foreach_backend
(backend_name=<value optimized out>, cb=0x7fffe86cd270
<initial_load_cb>, user_data=<value optimized out>) at qofobject.c:373
#6  0x00007fffe86cd241 in gnc_sql_load (be=0xa5a300, book=0x9d2480,
loadType=<value optimized out>) at gnc-backend-sql.c:193
#7  0x00007fffe88e172b in gnc_dbi_load (qbe=0xa5a300, book=0x9d2480,
loadType=LOAD_TYPE_INITIAL_LOAD) at gnc-backend-dbi.c:800
#8  0x00007ffff7bccf5c in qof_session_load (session=0xa36b80,
percentage_func=0x7ffff65a2d10 <gnc_window_show_progress>) at
qofsession.c:1320
#9  0x00007ffff656dd57 in gnc_post_file_open (filename=<value
optimized out>) at gnc-file.c:758
#10 0x000000000040368c in inner_main (closure=<value optimized out>,
argc=<value optimized out>, argv=<value optimized out>) at
gnucash-bin.c:521
#11 0x00007ffff01aa1cf in invoke_main_func () from /usr/lib/libguile.so.17
#12 0x00007ffff018056a in c_body () from /usr/lib/libguile.so.17
#13 0x00007ffff01e6cbd in scm_c_catch () from /usr/lib/libguile.so.17
#14 0x00007ffff0180a17 in scm_i_with_continuation_barrier () from
/usr/lib/libguile.so.17
#15 0x00007ffff0180ab0 in scm_c_with_continuation_barrier () from
/usr/lib/libguile.so.17
#16 0x00007ffff01e5c14 in scm_i_with_guile_and_parent () from
/usr/lib/libguile.so.17
#17 0x00007ffff01aa185 in scm_boot_guile () from /usr/lib/libguile.so.17
#18 0x00000000004032f6 in main (argc=1, argv=0x7fffffffe798) at
gnucash-bin.c:672
(gdb)

So I applied the same treatment to load_splits_for_tx_list,
substituting g_list_prepend for g_list_append inside the
split-fetching loop and reversing the list on completion of the loop.
I rebuilt and tried again and now my data loads in about 9 seconds,
approximately the same as the xml file and about a factor of 5
improvement! I haven't tested it yet, but I believe we'll see a nice
improvement with sqlite3 as well, since this code is not specific to
either database.

Here's the diff:

Index: gnc-transaction-sql.c
===================================================================
--- gnc-transaction-sql.c	(revision 18719)
+++ gnc-transaction-sql.c	(working copy)
@@ -235,11 +235,12 @@
 			Split* s;
             s = load_single_split( be, row );
 			if( s != NULL ) {
-				split_list = g_list_append( split_list, s );
+				split_list = g_list_prepend( split_list, s );
 			}
 			row = gnc_sql_result_get_next_row( result );
         }
-
+	split_list = g_list_reverse(split_list);
+	
 		if( split_list != NULL ) {
 			gnc_sql_slots_load_for_list( be, split_list );
 			g_list_free( split_list );
@@ -382,10 +383,12 @@
         while( row != NULL ) {
             tx = load_single_tx( be, row );
 			if( tx != NULL ) {
-				tx_list = g_list_append( tx_list, tx );
+				tx_list = g_list_prepend( tx_list, tx );
 			}
 			row = gnc_sql_result_get_next_row( result );
         }
+	tx_list = g_list_reverse(tx_list);
+	
 		gnc_sql_result_dispose( result );

 		// Load all splits and slots for the transactions

BTW, I found the indentation/formatting of this (and other) files to
be very odd, and it rendered the code pretty unreadable for me. What
you see above is a replication of the changes I made to the original
file. In order to do this work, though, I had to format the two
procedures involved (in emacs with meta-ctrl-q) so I could read them
(I could have cleaned them up some more to eliminate unnecessary
pointy-brackets, but didn't bother). It was that code that I actually
tested, so there could be a typo in the above.

I attribute great importance to code readability and have some strong
opinions about how C code should look. I saw something on
gnucash-devel earlier that gave me the impression there is something
afoot to clean up the code. Is that so? If yes, I'd like to put my
$.02 into that pot.

/Don


More information about the gnucash-devel mailing list