python GnuCash interface to SQL backend

Sat Nov 15 12:03:30 EST 2014

> On Nov 14, 2014, at 3:18 PM, Sébastien de Menten <sdementen at gmail.com> wrote:
> 
> 
> 
> On Friday, November 14, 2014, John Ralls <jralls at ceridwen.us <mailto:jralls at ceridwen.us>> wrote:
> 
>> On Nov 14, 2014, at 4:28 AM, Sébastien de Menten <sdementen at gmail.com <javascript:_e(%7B%7D,'cvml','sdementen at gmail.com');>> wrote:
>> 
>> 
>> In terms of the implementation itself of the object model, the main things I see not that clean are:
>> - KVP vs field representation 
>> - XXX_denom / XXX_num split (instead of using a Decimal type)
>> But these two points should be hidden for the user/developper in python bindings (official or piecash).
> 
> I experimented with the available decimal libraries over the summer. They’re mostly too slow and they don’t afford enough control over rounding to be sufficiently accurate for accounting use.
> 
> The object model as it stands now has too much interdependence between classes, especially the transaction, split, account, and commodity classes. The implementation has a lot of 
> 
> 
> When looking at the SQL/XML document, what would be the unneeded interdependencies ? The links between account->split->transaction looks meaningful to me. The links transaction->currency and account->commodity also. 
> Is it the issues with the scu/denom/num that is cumbersome to handle (if there is a change in the scu of a currency, all splits related to it should be updated) ?
> 
>  
>> Could you point me to 2 or 3 real case example we would nevertheless end up corrupting the database ? It would really help me to materialise this "risk".
>> With my practical experience with piecash (creating a full account tree structure and importing thousands of transactions from csv file), I haven't found any case of corruption (except when developing the API and having "obvious" bugs).
> 
> I can’t provide concrete examples without doing an extensive code review of piecash, for which I have neither the time nor the inclination. Some obvious trouble spots include cross-commodity transactions, especially involving lots or trading accounts. 
> 
> Have you tested with bad data to see if piecash rejects it? Did you thoroughly analyze the ways that bad data could be created and ensure that you have test cases proving that piecash rejects all of them?
> 
> Well, at first, I wanted to have a simple way to extract data out of GnuCash (a read-only mode). This was easy to do with SQL alchemy and is safe. 
> Afterwards, I wanted to be able to modify a GnuCash book knowing what I was doing (ie being cautious to not create inconsistent objects. This was also ok (as said, I have used piecash to create automatically a complex account structure from an excel file and to import thousands of records without errors). 
> Next is the ability to ensure consistency checks/error detection when the user does changes. This is the less mature part and I hear well your warnings about the complexity of this part. 
>  
>> For piecash, there is no need for a 3N db schema and there is no need to write any logic in SQL !
>> It is through SQLAlchemy that we can handle/encapsulate the extra bit of logic (which is, as far as I am in the development of piecash and given piecash scope, rather limited). Nothing is done in SQL itself.
> 
> I have trouble believing that an ORM will generate a correct implementation with a non-normal schema. 
> 
> In piecash, the ORM does not generate the SQL schema as it already exists (it is the one defined by GnuCash). It just maps the existing schema (that needs to be redescribed in python) to python objects transparently handling type conversion, transactions, guids, relationships, etc 
>> 
>> The table schema is exactly the only piece of code that needs to be written (piecash is in fact just that, table schemas with some metadata). 
>> I do not hope/expect/think to generate this code automatically from C headers (as SWIG does).
> 
> You’re not being consistent. Are you using SQLAlchemy as an ORM or simply as a SQL abstraction layer? You said above that nothing is done in SQL, and earlier that you are doing consistency checks, but here you’re saying that all you’ve written is a table-schema, which would imply that you’re relying on the database to do all of the work. 
> 
> 
> I am using sqlalchemy as a mapper and as a session manager (unit of work pattern). Nothing is done in SQL (meaning I do not write SQL queries nor SQL stored procedures). 
> A cascade delete constrain (if I remove a transaction, all related splits are automatically deleted) is also managed by SA. 
> So I am "essentially" writing a mapping and then adding some methods to either create objects in a consistent way (like having a function "create_transaction(from_acc, to_acc, amount, date, description)" that create always correct objects) or to check objects are consistent before commit (this is not yet done)
> 
>  
>> You’ll get closer working with the XML schema. There’s a reasonably up-to-date version in src/doc/xml/gnucash-v2.rnc.
>> 
>> 
>> As written here above, XML is not easy to work with (at least for me) and does not save me of anything you said before (corruption, etc).
> 
> Sounds like a learning opportunity.  
> 
> But will this save me from all the data corruption issues you mention ? If not, I prefer working with python objects than XML. If so, how is this done ? 
> 
> The object model is the class hierarchy, with its member variables and functions, expressed in C and Scheme. In a proper database-based application there would be a close mapping between the two, but GnuCash isn’t yet one of those. In a proper OO design, everything that directly affects an object’s state would be encapsulated in the cmember functions of the object’s class. GnuCash isn’t one of those either: State-changing code is spread throughout the program.
> 
> If we take a document centric view instead of an application centric view, the only important element is the data (and not the implementation of the engine/class hierarchy/...). 
> As long as another program changes the data while keeping the GnuCash schema and related invariants, we do not really care about the program being OO or not, with encapsulation or not,... In fact, we do not need the full power of the real GnuCash engine to manipulate (for simple cases) the data. 
> 
> I understand from your comment that the status of the GnuCash code is not optimal and you're working on a clean rewrite in C++. This can be an extra argument to try to have python bindings not linked to the GnuCash C code but only relying on the GnuCash data. 

Sébastien,

The close coupling between classes is a OO design issue, not a data issue.

It’s certainly true that there’s no requirement for an accounting program to be OO. GnuCash is, mostly, and many of its design flaws are OO-related. Since your language of choice is Python, which heavily favors OO abstraction mechanisms, it’s surprising that you would want to ignore those flaws.

Yes, if another program changing the data keeps the schema *and invariants*, it will be compatible with GnuCash. The invariants are the catch: In order to maintain them, you have two choices: To use the GnuCash API or to faithfully duplicate GnuCash’s internals. You absolutely do need the full power of GnuCash to manipulate the data, because the full power of GnuCash is required to correctly interpret the stored state.

Regards,
John Ralls