Splitting the slots table

Mon Jan 3 14:15:23 EST 2011

On Jan 3, 2011, at 9:25 AM, Jeff Kletsky wrote:

> John Ralls writes:
> 
> "I think that the first step is to work through all of the code and make an ERD for the existing model, documenting the use and structure of KVP. (Pretend for the purpose of this exercise that every use of KVP is a separate entity). Then we can normalize it into a good relational model and work out a transition path."
> 
> This, to my ears, is great news!
> 
> Getting the code objects to more closely agree with business objects and the database representation of those objects to more closely represent the structure of the objects is a valuable step in a direction that, in my opinion means:
> 
> * A more comprehensible and testable code base -- which indirectly means that becoming a contributor is a lot easier
> * A database structure that allows the SQL layer to optimise queries, rather than having all kinds of "custom" search/select logic in the code
> * A database structure that reports can be generated against
> * A database structure that can allow multi-user access (be it exclusive or opportunistic locking of objects, rather than the whole database)
> 
> To me, this is a critical "next step" even if/when the languages of choice are changed.
> 
> Perhaps another topic that will need to be discussed is the "end-user contract." As I understand it, previously it was "A data file from version Y will be usable an older version X, although all data may not be understood." To me, that sort of locks one in the past (case in point, budgets, which have some significant structural problems right now, such as being unaware of the difference between data-storage sign and UI sign). The "way around" that, as I understand it, is to stuff anything new into KVPs, which, as John points out, doesn't work well with relational databases.
> 
> I think it is reasonable to:
> 
> 1) Restrict any major non-backwards-compatible-for-read changes to objects or database representation to major releases (2.4, 2.6,...) -- Namely reporting and data-extraction tools that work against the database for 2.4.0 should work for 2.4.x without changes
> 
> 2) Require database upgrade triggers to be run for any release, major or minor -- This means that once you upgrade to Version Y, you're done with Version X
> 
> 3) Violate "Rule 1" when "critical bugs" related to data integrity or security dictate
> 
> 4) Drop XML file support -- If people want a lightweight, single-file transport/backup approach, then SQLite is a great option. (Without this, GNUCash is either just using the database as a data store, or would need to maintain two versions of search/select logic.)
> 
> Part of the proposed end-user contract is that the end user "must" only use the matching version of the app to /write/ to the database. However, they may use other apps to /read/ from it for their own needs.
> 
> Yes, "Rule 2" means that you need to decide which releases you are going to take and that adoption of the "latest and greatest" may be a bit slower. It also means that code and data structures can be refactored as appropriate. In the widely-used Open-Source world, Wordpress does this with great success. My "day job" is administration and billing systems for large insurance carriers and this approach works for them as well.
> 
> 
> 
> On 01/02/2011 05:23 PM, John Ralls wrote:
>> [...]
>> We need to re-think KVP entirely: It doesn't match up very well with the relational model.
>> 
>> A couple of examples:
>> 
>> Splits use KVP to store memos. Good, because not everyone uses them on every split, and there's no point wasting the space. But we can provide a split-memo table with a foreign key into the splits table (or vice-versa). That will be much faster to query (no WHERE name= clause in the join) and the data design will be clearer.
>> 
>> The HBCI (online banking) setup, on the other hand, is contained entirely in a hierarchy of KVPs. This makes some amount of sense in XML, but it's insane in an RDB. RDBs don't like recursion, and there's no way to do arbitrary hierarchies without recursion. HBCI needs its own tables.
>> 
>> I think that the first step is to work through all of the code and make an ERD for the existing model, documenting the use and structure of KVP. (Pretend for the purpose of this exercise that every use of KVP is a separate entity). Then we can normalize it into a good relational model and work out a transition path.
>> 
>> I have some more Gtk stuff to do over the next couple of weeks, but I'll start on the ERD after that.

The policy that Derek declared a couple of weeks ago was that stable has to have added to it the ability to read dev's data. You're absolutely right that it shouldn't be able to write to a higher-rev's database; it should have to "save as".

We don't *have* to drop XML support: There's a very capable XML query language (XQuery) and several backends that support it. To mix it with sql would require an abstraction layer, of course, but we'll want that anyway so that there aren't sql strings scattered around the codebase. It probably makes sense to provide an XML export/import facility as well.

Regards,
John Ralls