Income statement performance (was Re: Reading whole db at startup))

Fri Nov 21 12:19:11 EST 2008

On November 21, 2008 11:39:33 am Phil Longstaff wrote:
> OK.  I have modified code (not committed yet) which doesn't read tx/splits
> at startup.
>
> At startup, it executes:
>
> SELECT account_guid, reconcile_state, sum(quantity_num) AS quantity_num,
> quantity_denom FROM splits GROUP BY account_guid, reconcile_state,
> quantity_denom
>
> This gives me 1 row per guid/reconcile_state/quantity_denom triplet.  From
> these, I set the start balance, start cleared balance and start reconciled
> balance for each account.
>
> When the backend receives a query to load splits for an account, it uses
> this algorithm:
>
> 1: save start and end balance, cleared balance and reconciled balance for
> each account
> 2: stop qof events
> 3: load all transactions which contains splits for the target account. 
> This loads all splits for those transactions, even those in other accounts.
> 4: for each account, get new end balances.  If end balance doesn't match
> end balance from step 1, calculate old_start_balance-(new_end_balance-
> old_end_balance) and set this as the new start balance.
> 5: resume qof events
>
> IOW, if the end balance for an account increases by X because of newly
> loaded splits, step 4 decreases the start balance by X to make the end
> balance match what is should be.  Eventually, once all tx/splits are
> loaded, all start balances are 0.
>
> Each account that is the target of a query for all splits is marked so that
> it's splits are never reloaded.  Done as a performance improvement but
> would need to be revisited for a multi-user back end.
>
> I need a bit more testing, but once I'm happy with this, I'll commit it. 
> It does lead to a delay when opening an account with a lot of splits.  I
> may look at adding a progress bar or other indication that the app is busy.

BTW, when I was testing this, I had a printf() telling me each time a split 
query was done for an account.  When I ran an income statement, it queried 
each expense and income account twice, then repeated that, so that each 
account was queried 4 times.  I know someone was looking at report performance 
earlier.  Might there be more opportunies to improve performance by cutting 
out duplicate queries?  Of course, for a report, adding a mechanism to 
calculate values in the db engine would be even better, but this is off in the 
future.

Phil