[GNC-dev] Slow user interface in 3.x gnucash (for large files), and potential optimization

Christian Stimming christian at cstimming.de
Fri Jun 15 15:55:49 EDT 2018

Am Freitag, 15. Juni 2018, 10:05:09 schrieb John Ralls:
> >> A very good catch indeed. But pre-constructing the string in
> >> qofbook.cpp only saves two string constructions per invocation as the
> >> vector still has to make its own copies. I guess that its much worse
> >> for you because the ancient gcc on Ubuntu 14.04 (gcc4.8) doesn't do
> >> small-string optimization.
> > 
> > Is there any reason we cant use std::string& in the vector?  Or do we
> > think that we might lose references there?
> In this instance the string is static so it would be safe to use const
> std::string&, but while there's an is_volatile type trait there's no
> is_static so we can't insert a static assert to catch cases where the
> actual std::string is on the stack. That could lead to some ugly bugs.

The std containers are specified in a way so that they make only sense to 
contain the values by-value (ok, this might be formulated a bit too short, but 
not too long ago this was a good rule of thumb). It is up to the 
implementation to make optimizations (copy-on-write and such), but the 
interface requires the programmer to think of the stored type as "by-value".

Nevertheless I think my change made good use of the std::string's already 
existing optimizations. In particular, the saving was so surprisingly large 
here that I think gcc's std::string itself implements some sort of copy-on-
write optimization, and this means its content isn't deep-copied in these 

Some numbers from the valgrind/callgrind evaluation. I got the following 
number of calls, starting from qof_query_run and skipping some uninterestings 
calls in between:

- 6 qof_query_run
- 6 g_list_sort_with_data
- 360,000 (approx.) sort_func - due to the number of splits and txn in my file 
- 360,000 xaccSplitOrder
    The most expensive part in this function is this next callee:
- 360,000 qof_book_use_split_action_for_num_field
- 360,000 (approx.) qof_book_get_property

This function in turn has approx. 1,100,000 calls to std::basic_string 
constructor and destructor. The change in my commit is this:
Before my commit, these calls also caused 1,100,000 calls to 
std::string::_S_create and std::string::_M_destroy i.e. the by-value 
constructor and destructor. After my commit, I see a mere 22,000 calls to 
std::string::_S_create and std::string::_M_destroy. 

> Besides, using strings is still leaving performance on the table: A string
> comparison is n times longer than an int comparison where n is the number
> of consecutive leading characters that are the same in the two strings. I
> got a huge speedup on loading a few months ago because GncGUID's
> operator==() was doing a character-wise comparison of the ascii
> representation instead of comparing the two int64_ts of the actual GUID.

Sounds good. The above numbers seem to indicate that std::string already 
optimizes away a long comparison as long as the string memory itself is 
identical. In my opinion this is good enough of an optimization.

> KVP paths aren't really a good use for std::string anyway: It's a lot of
> weight for something that's used as a human-readable index. Even static
> char[] is heavy for this purpose. We could get a huge boost if we use
> something like Glib's quarks [1].
> Want to have a go at that?

Err... this would mean a major refactoring of any access to the KVP system. 
No, I don't feel motivated to work into that direction. Also, if I understnad 
the above valgrind evaluation correctly, we can already achieve almost optimum 
(i.e. pointer comparison) by ensuring a set of std::string constants to be 
used. I think this is the most efficient way to proceed here, and I would just 
push this commit after some more cleanup.


More information about the gnucash-devel mailing list