[GNC-dev] Slow user interface in 3.x gnucash (for large files), and potential optimization
Christian Stimming
christian at cstimming.de
Fri Jun 15 15:55:49 EDT 2018
Am Freitag, 15. Juni 2018, 10:05:09 schrieb John Ralls:
> >> A very good catch indeed. But pre-constructing the string in
> >> qofbook.cpp only saves two string constructions per invocation as the
> >> vector still has to make its own copies. I guess that its much worse
> >> for you because the ancient gcc on Ubuntu 14.04 (gcc4.8) doesn't do
> >> small-string optimization.
> >
> > Is there any reason we cant use std::string& in the vector? Or do we
> > think that we might lose references there?
>
> In this instance the string is static so it would be safe to use const
> std::string&, but while there's an is_volatile type trait there's no
> is_static so we can't insert a static assert to catch cases where the
> actual std::string is on the stack. That could lead to some ugly bugs.
The std containers are specified in a way so that they make only sense to
contain the values by-value (ok, this might be formulated a bit too short, but
not too long ago this was a good rule of thumb). It is up to the
implementation to make optimizations (copy-on-write and such), but the
interface requires the programmer to think of the stored type as "by-value".
Nevertheless I think my change made good use of the std::string's already
existing optimizations. In particular, the saving was so surprisingly large
here that I think gcc's std::string itself implements some sort of copy-on-
write optimization, and this means its content isn't deep-copied in these
calls.
Some numbers from the valgrind/callgrind evaluation. I got the following
number of calls, starting from qof_query_run and skipping some uninterestings
calls in between:
- 6 qof_query_run
- 6 g_list_sort_with_data
- 360,000 (approx.) sort_func - due to the number of splits and txn in my file
- 360,000 xaccSplitOrder
The most expensive part in this function is this next callee:
- 360,000 qof_book_use_split_action_for_num_field
- 360,000 (approx.) qof_book_get_property
This function in turn has approx. 1,100,000 calls to std::basic_string
constructor and destructor. The change in my commit is this:
Before my commit, these calls also caused 1,100,000 calls to
std::string::_S_create and std::string::_M_destroy i.e. the by-value
constructor and destructor. After my commit, I see a mere 22,000 calls to
std::string::_S_create and std::string::_M_destroy.
> Besides, using strings is still leaving performance on the table: A string
> comparison is n times longer than an int comparison where n is the number
> of consecutive leading characters that are the same in the two strings. I
> got a huge speedup on loading a few months ago because GncGUID's
> operator==() was doing a character-wise comparison of the ascii
> representation instead of comparing the two int64_ts of the actual GUID.
Sounds good. The above numbers seem to indicate that std::string already
optimizes away a long comparison as long as the string memory itself is
identical. In my opinion this is good enough of an optimization.
> KVP paths aren't really a good use for std::string anyway: It's a lot of
> weight for something that's used as a human-readable index. Even static
> char[] is heavy for this purpose. We could get a huge boost if we use
> something like Glib's quarks [1].
> Want to have a go at that?
Err... this would mean a major refactoring of any access to the KVP system.
No, I don't feel motivated to work into that direction. Also, if I understnad
the above valgrind evaluation correctly, we can already achieve almost optimum
(i.e. pointer comparison) by ensuring a set of std::string constants to be
used. I think this is the most efficient way to proceed here, and I would just
push this commit after some more cleanup.
Regards,
Christian
More information about the gnucash-devel
mailing list