generating reports takes minutes

John Clements gnucash at brinckerhoff.org
Wed Nov 1 14:33:26 EST 2006


On Nov 1, 2006, at 10:43 AM, Derek Atkins wrote:

> Quoting John Clements <gnucash at brinckerhoff.org>:
>
>>>
>>> I'm using gnucash with data that starts in 2002 - that is 4 years
>>> of data...
>>> Generating a report seems to take minutes - and I generate  
>>> several at
>>> startup - :(
>>>
>>> Is there a way to improve this "speed" ?
>>> Report generation is becomming a pain, and I feel it's horribly slow
>>> compared to M$ Money which only takes seconds ...
>>>
>>
>> I've experienced this for years, and I recently noticed that it seems
>> to be much slower for finer-grained time intervals.  So let me ask a
>> (possibly insulting) question.  When I'm generating, say, a cash-flow
>> report for N time slices (e.g. quarterly for 4 years = 16 slices),
>> with a database of M transactions, is there an operation in there
>> that's of time NM?
>
> Why would this be insulting.   I suspect the answer is "yes".  In
> fact, it's quite possible that it's WORSE than M*N!
>
>> For instance, I can just imagine this piece of code:
>>
>> (let ([divided (map (lambda (range) (filter (falls-in-range range)
>>                                             all-transactions))
>>                     given-ranges)])
>>   ...)
>>
>> ... which would result in scanning the list of transactions N times.
>> You can tell me the code isn't written this way, and I'll just keep
>> my mouth shut and deal with it.
>
> Oh, it might be.

Hmm... well, I think I can see where this open-source software  
conversation is heading, and I don't like it at all. Is there someone  
who actually has the time & inclination & knowledge to answer this  
question definitively and replace the aforementioned code with  
something like (warning! totally untested code!):

(let ([range-vec (make-vector (+ (vector-length split-dates) 1) null)])
   (for-each (lambda (transaction)
               (let ([i (search split-dates transaction 0 (- (vector- 
length split-dates) 1))])
                 (vector-set! range-vec i (cons transaction (vector- 
ref range-vec i)))))))

(define (search split-dates this-date left right)
   (if (= left right)
     (+ right 1)
     (if (< this-date (vector-ref split-dates left))
       left
       (search split-dates this-date (+ left 1) right))))

?

... and before you point it out, this one is ALSO N*M. The interface  
of search is such that you could drop in a binary search instead, if  
you cared.  My guess is that the improved cache locality of  
considering each transaction once is going to be a big win over the  
code that scans the transaction set N times... if the code is  
actually written this way.

Ah well.

All the best and thanks again,

John Clements
Cal Poly State University



More information about the gnucash-user mailing list