Splitting the slots table

Mon Jan 3 12:25:40 EST 2011

John Ralls writes:

"I think that the first step is to work through all of the code and make 
an ERD for the existing model, documenting the use and structure of KVP. 
(Pretend for the purpose of this exercise that every use of KVP is a 
separate entity). Then we can normalize it into a good relational model 
and work out a transition path."

This, to my ears, is great news!

Getting the code objects to more closely agree with business objects and 
the database representation of those objects to more closely represent 
the structure of the objects is a valuable step in a direction that, in 
my opinion means:

* A more comprehensible and testable code base -- which indirectly means 
that becoming a contributor is a lot easier
* A database structure that allows the SQL layer to optimise queries, 
rather than having all kinds of "custom" search/select logic in the code
* A database structure that reports can be generated against
* A database structure that can allow multi-user access (be it exclusive 
or opportunistic locking of objects, rather than the whole database)

To me, this is a critical "next step" even if/when the languages of 
choice are changed.

Perhaps another topic that will need to be discussed is the "end-user 
contract." As I understand it, previously it was "A data file from 
version Y will be usable an older version X, although all data may not 
be understood." To me, that sort of locks one in the past (case in 
point, budgets, which have some significant structural problems right 
now, such as being unaware of the difference between data-storage sign 
and UI sign). The "way around" that, as I understand it, is to stuff 
anything new into KVPs, which, as John points out, doesn't work well 
with relational databases.

I think it is reasonable to:

1) Restrict any major non-backwards-compatible-for-read changes to 
objects or database representation to major releases (2.4, 2.6,...) -- 
Namely reporting and data-extraction tools that work against the 
database for 2.4.0 should work for 2.4.x without changes

2) Require database upgrade triggers to be run for any release, major or 
minor -- This means that once you upgrade to Version Y, you're done with 
Version X

3) Violate "Rule 1" when "critical bugs" related to data integrity or 
security dictate

4) Drop XML file support -- If people want a lightweight, single-file 
transport/backup approach, then SQLite is a great option. (Without this, 
GNUCash is either just using the database as a data store, or would need 
to maintain two versions of search/select logic.)

Part of the proposed end-user contract is that the end user "must" only 
use the matching version of the app to /write/ to the database. However, 
they may use other apps to /read/ from it for their own needs.

Yes, "Rule 2" means that you need to decide which releases you are going 
to take and that adoption of the "latest and greatest" may be a bit 
slower. It also means that code and data structures can be refactored as 
appropriate. In the widely-used Open-Source world, Wordpress does this 
with great success. My "day job" is administration and billing systems 
for large insurance carriers and this approach works for them as well.

On 01/02/2011 05:23 PM, John Ralls wrote:
> [...]
> We need to re-think KVP entirely: It doesn't match up very well with the relational model.
>
> A couple of examples:
>
> Splits use KVP to store memos. Good, because not everyone uses them on every split, and there's no point wasting the space. But we can provide a split-memo table with a foreign key into the splits table (or vice-versa). That will be much faster to query (no WHERE name= clause in the join) and the data design will be clearer.
>
> The HBCI (online banking) setup, on the other hand, is contained entirely in a hierarchy of KVPs. This makes some amount of sense in XML, but it's insane in an RDB. RDBs don't like recursion, and there's no way to do arbitrary hierarchies without recursion. HBCI needs its own tables.
>
> I think that the first step is to work through all of the code and make an ERD for the existing model, documenting the use and structure of KVP. (Pretend for the purpose of this exercise that every use of KVP is a separate entity). Then we can normalize it into a good relational model and work out a transition path.
>
> I have some more Gtk stuff to do over the next couple of weeks, but I'll start on the ERD after that.
>
> Regards,
> John Ralls
>