KVP and data that contains a forward slash

Fri Feb 12 13:35:29 EST 2016

> On Feb 12, 2016, at 9:26 AM, Derek Atkins <warlord at MIT.EDU> wrote:
> 
> John Ralls <jralls at ceridwen.us> writes:
> 
>> Again, the path argument would be more convincing if we actually used
>> XPath. I don't think it really matters what grand plan the designer
>> (you?) had in mind, what matters is how we're using it now.
> 
> No, it wasn't me; KVP was in there (long?) before I got involved.
> 
>> No, SQL just makes the penalty more severe. Cache misses are
>> expensive; so much so that the current C++ doctrine (based on many
>> simulations) is that it's much faster to sequentially search a
>> std::vector than to use any container that relies on independently
>> allocated nodes, even when the overhead of complete reallocation for
>> an insert operation is accounted for. Every indirection in KVP,
>> i.e. every step involving a KvpFrame means dereferencing the
>> KvpFrame*, dereferencing its GHashTable*, dereferencing the hash
>> table's hashes and getting the next KvpItem*, dereferencing it,
>> getting its content type and ptr, and if it's a KvpFrame, repeating
>> the process. Because each of them is a different type GSlice will have
>> a different magazine for each of them. If not much else is going on on
>> the system all of the magazines might be in cache after the first
>> round and subsequent descents will be faster. Or not.
>> 
>> That overhead could be reduced immensely by not having KvpFrames. With
>> the whole path as a key there'd be a single GHashtable lookup and only
>> three derefences (the QofInstance's Kvp GHashtable, the KvpItem* it
>> returns, and the KvpItem's contents).
> 
> Is it a structural problem or an implementation/storage problem?

Why implementation, of course, by the first theorem of CS.;-)
> 
>> All of that said, I agree that not implementing KVP in the SQL backend
>> that way was a mistake and I regret it.
> 
> Why do YOU regret it?  I don't think you implemented it that way.

Actually, I did. It was the first real development I did for GnuCash; Phil Longstaff hadn't quite finished the DBI implementation but had apparently run out of time and it was holding up the 2.4.0 release, so I dove in. Handling KvpFrames was the biggest missing piece. I made it work like the in-memory implementation without thinking through the query implications.

> 
>> Moving almost all of the Kvp access to being through GObject
>> properties does make all of the access via the object's API. The next
>> step is to add members to the objects and load them from KVP in the
>> backend. That gets rid of the Kvp performance penalty entirely for the
>> XML backend and makes it one-time for the SQL backend.
> 
> Right.  That works; you just need to change the way the 'write' works.

Yeah, we've been over this ground a couple of times before, and I'm not worried about that part. The two Kvp uses that aren't object extensions, File Properties and Import Matching, will require a bit more effort. File Properties can just be made into a object that's a member of QofBook and loaded with the book. Easy. Import Matching makes a lot of records and I'd really prefer not to load it into GnuCash's memory at all and just query for results, but that would require a schema change with a separate table.

Regards,
John Ralls