plugin/module architecture proposal for gnucash-1.7

Mon, 11 Jun 2001 16:22:41 -0500

Well, now that the 1.6 release has happened, it's time to start
looking forward to the 1.7 series :) This is a proposal that I have
been wanting to get off my chest for a while.

A module/plugin system for Gnucash-1.7: rationale
-------------------------------------------------

As the functionality of gnucash has grown, so has the complexity of
its codebase and the number of its external dependencies.  Right now,
new developers trying to come to grips with gnucash have a daunting
challenge; there's a lot of code there, and that code is not organized
as cleanly as it could be.

Many people have expressed interest in features that are only of
interest to a limited subset of the Gnucash user base: small business
features are not interesting to home users, interfaces to the German
HBCI computer banking system are not interesting to anyone outside
Germany, etc.

There are some people who want to use the Gnucash codebase to build
custom financial solutions for their own needs, and don't want to drag
in the whole app to do it.

All of this taken together points (to me) directly at a strong
plugin/module architecture for Gnucash, which starts by refactoring
the existing codebase into a family of core modules with well-defined
interfaces and a set of optional modules for extra functionality.  This 
has the following advantages:

1. Gnucash can be factored into a "core" system plus add-ons.  That
would allow people to install gnucash without Guppi, for example, if
they didn't want plots, or without SQL backend support, or without GUI
support as a backend server.

2. New functionality can be added as modules that require only the
installation of a new module rather than an upgrade of the whole
system.  Module versioning and version dependencies that track module
APIs will prevent skew. 

3. Useful bits and pieces, in the form of modules, can be pulled out
to be used for other purposes.  When people are using the gnucash
technology for other things, they will extend it, and when they extend
it, those extensions will become part of the gnucash code base,
available for others to use.

4. The Gnucash code base can be shuffled to group like with like,
breaking up the g-wrap definitions, glade user-interface files, C, and
Scheme code into more manageable chunks that go with each module. 

5. The process of modularizing gnucash will require us to look at
where we have "behavior tables" that might need to be modified at run
time.  By converting the ubiquitous "switch" on a type variable to a
table lookup, we can allow almost all of gnucash's low-level
functionality to be expanded by new modules at runtime.

Technology
----------

The technology has several parts:

  1. The actual structure of the source/compilation system used to build
  the modules

  2. Runtime code to load modules, either through explicit calls or
  autoloads

  3. A mechanism for letting new modules insert behaviors into the
  running system

My proposal goes like this: 

1. Structure of source/compilation for modules.  Each module is
defined by a Scheme source file containing a 'define-module' block.
define-module is a standard part of guile.  The guile module system
has changed fairly dramatically recently, which means that gnucash-1.7
and beyond will probably depend on guile-1.6 (which is not yet
released, but will be before any of these changes appear in gnucash).

define-module allows for a collection of Scheme source code and/or
shared libraries to be combined together as a logical unit.  It has
the added advantage of defining a "namespace" for the Scheme code in
the module; Scheme (define) statements in code that's part of a module
are not visible outside the module unless published with an "export"
in the define-module form.  This means it's no longer necessary to use
the (let () (define ...) trick to avoid polluting the global
namespace.

Scheme code that's part of the module can be separately compiled (once
rlb finishes whipping the Guile compiler into shape) or just loaded as
source code like it is now.  rlb's experiments have shown that just
taking out our top-level (let ...) "namespaces" in code and replacing
them with module namespaces can speed up loading by a factor of 10 or
more, even without any compilation.

C code that's part of the module will be linked into its own shared
library.  For modules which wish to remain Scheme-free, like the
gnucash engine, there's no need to put any Scheme references in this
library; it will be opened with 'dlopen' during the module load, and
the dlopen initialization function will get run if it's there, but
Scheme just knows that there's a library it needs to 'dlopen' at
module load time.

Any Glade window definitions that are related to the module should be
stored in a .glade XML file specific to the module in question, and
the source code generated by glade should be linked into the module's
shared lib.  I have used 'libglade' some and highly recommend it for
future development, but that's just my preference.

Interfaces to other modules are defined mostly as before, in 3 major 
ways: 
  - exporting C function definitions through public headers 
  - exporting Scheme function bindings (now with "export" in the 
    module definition rather than just making them top-level 
    defines)
  - exporting C and Scheme global variables that can be viewed or 
    modified by other modules. 

We want to minimize (3) and try to wrap up such functionality in
exported get/setter functions with static global data inside the
module.

2. Runtime code to load modules.  Well, that's easy: 
  ;; use the SQL backend 
  (use-modules (gnucash engine sql-backend))

Since the modules are defined by Scheme code, each one can enumerate
its dependencies by more use-modules in its initialization code.  Any
strictly C libraries can be explicitly loaded from Scheme in the
module init code.

3. A mechanism for allowing new modules to modify the behavior of the
app at runtime.  Modules may have initialization code that gets run at
module load time.  This code must be able to "poke" handlers into 
various various behavior registries in other modules.  

This sort of thing is already implemented in the HTML subsystem of
gnucash.  When an <object classid=foo> tag is seen in HTML, the
classid "foo" is looked up in a table to find a handler for the
object.  Same with HTML form submission methods.  Modules loaded at
runtime could already alter the way reports are viewed by poking new
<object> class handlers into this table.  That's how GPG and the
Gnucash Network handlers are initialized: at startup time, an init
function is called which pokes the new behaviors into the table.  That
could just as easily be called at module load time.

There are lots of other places where a C switch statement is used now
but a more generic table lookup could be used instead.  Using dynamic
tables would effectively allow new "case"s to be added at runtime.  We
need to think about what kinds of plugins/modules people will be
writing, and then pull things out into the module's public interface
when it's clear that we will want to add more cases.  Different
backends come immediately to mind, as do custom register types (as
might be needed for an inventory or AP/AR module) and account types.
Menus can be constructued in a similar way, allowing new items to be
added to any menu by a loaded module.

So, overall, the framework is just that we use dlopen shared libraries
wherever possible, have table-driven behavior where possible to allow
modules to modify functionality, and wrap and load each module using a
Guile module definition with dependency loading. I would guess we can
break Gnucash up into 10-15 modules comprised of C and Scheme code,
plus a wrapper that handles command line parsing and loading the
starting set of modules, and can reshuffle the code in src/gnome/,
src/guile/, and src/scm/ into the appropriate module directories

What do you guys think?  Specific thoughts/desires about how it should
be done?

Thanks,
b.g.