KVP and data that contains a forward slash

Derek Atkins warlord at MIT.EDU
Thu Feb 11 14:20:23 EST 2016


John Ralls <jralls at ceridwen.us> writes:

>> Originally the KVP path was supposed to be a file-system.  So if there
>> is a desire to move away from that, then we should be explicit about it.
> I hope that's not literally true in the sense of writing it out to
> disk, that would be incredibly stupid. Surely you mean that it was
> meant to have paths *like* a file system, which it does. It's still an
> inefficient design because there's no locality so every step is a (or
> more likely several because of the hash tables) cache miss.

No, of course it's not in the sense of writing out to disk.  That would
be silly!  :-)

Keep in mind it was designed when GnuCash ONLY had XML, and the "paths"
in a KVP tree directly mapped to the XML object hierarchy.

>>> Most of our existing use of nested KVP isn't necessary anyway, but
>>> changing it will require a new file/db version and conversion
>>> routines. No point in introducing new nesting though, and besides that
>>> would also create a file incompatibility between 2.6 and 2.8.
>> I would disagree; it's nice to have some nesting (certainly within the
>> XML framework) to make it easy to remove whole KVP subtrees.  When you
>> view KVP as a file system then it makes total sense to have nesting, and
>> all the benefits that come with it.
> That would carry more weight if we actually used the XML DOM tree
> inside of GnuCash, but we don't. Besides, that's not how we use
> it. With a couple of exceptions (import matching and book properties)
> we use it to add members to classes without changing the XML or SQL
> Schema.  Those paths are two or three deep and for the most part have
> only one or two elements at the bottom. Getting rid of KvpFrames and
> converting the "path" to a string name so that it takes a single hash
> lookup instead 2 or three will be more performant with no affect on
> our actual usage. The payoff is even higher when the KVP data isn't
> all in memory: Having to do 3 SQL queries to retrieve an int64_t or a
> string is ridiculous. For most of those uses having to do a separate
> query at all is ridiculous. The members should be part of the object
> record and retrieved when the object is instantiated.

There are two separate issues here:

1) The SQL encoding of KVP.  The fact that it's encoded the way it is,
using frames that multiple queries, arguable is itself a bug.  I'm not
sure why we didn't use full-path encodings in SQL.  But that is
completely seprate from how KVPs are used/stored internally.

2) Not loading KVP with the object.  When an object is instantiated it
should, arguably, load all the associated KVP data, too, so that it can
be accessed directly through the object APIs.  Whether it's in core as a
KVP or in core as a table column is mostly irrelevant.

Of course fixing #1 would help with #2.

> Regards,
> John Ralls


       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available

More information about the gnucash-devel mailing list