What's the source of the slowdown in file loading?

Rob Browning rlb@cs.utexas.edu
27 Oct 2000 16:41:49 -0500

Dave Peticolas <dave@krondo.com> writes:

> We think it's the excessive amount of string copying performed by
> the gnome-xml libraries. Each tag is copied for every instance of
> use.

That's our hypothesis at least, from looking briefly at gprof's output
after I managed to move the engine itself out of the "most expensive"
category with the Account.c Begin/Commit mods.

However, AFAIK, no one's looked at the libxml source yet to be sure.
My guess is that libxml's creating a new string for *every* tag in the
tree during output, and since we can't do incremental output, that's
going to be a *lot* of copying and allocation.  A simple fix,
presuming this guess turns out to be correct, would be to use a hash
as part of the xmldoc struct to store only one of each tag and put
that tag's pointer into the appropriate nodes when requested.  Right
now, given 5000 splits, if the guess is right, we'd have
"<date-reconciled>" and "</date-reconciled>" allocated and copied at
least 5000 times each.

On the read side, I'm not as sure what's going on, but I suspect there
might be similar string tricks that could help.

Also, ISTR that I checked into the glib default hash functions, and
from at least a cursory inspection, I was worried that they might be a
little weak, but I didn't get a chance to inspect enough to be sure.
If anyone's motivated, it might be nice to see, and if they are weak,
we might want to grab better ones from somewhere else (maybe rscheme,
STL, guile, wherever...).

Rob Browning <rlb@cs.utexas.edu> PGP=E80E0D04F521A094 532B97F5D64E3930