Managing large files

Derek Atkins warlord at MIT.EDU
Mon Feb 25 10:59:12 EST 2008


Hi,

Ian Merrithew <ian.merrithew at ieee.org> writes:

> Hi all,
>
> I've been a gnucash user for several years now, and I've got a data file that 
> goes all the way back to 2002.  It's only now that I'm realizing just how 
> much of a performance hit I'm taking - an income statement report now takes a 
> full 60 seconds to open up.  It's getting quite aggravating.  From reading 
> the archives, I understand it's a consequence of the XML format the data is 
> being stored in, that the larger it gets, the worse the performance gets.

What version of gnucash are you using?  There was a significant
performance hit in many reports in 2.0 that's being worked on (and
improved) in recent 2.2 releases.  But no, this performance isn't
really due to the XML format.  Generally the issue is poor algorithms
in the reports themselves.  Instead of having O(n) algorithms, some
reports do things that cause them to be O(n^2) or worse.

> So my question is, is there anything I can do to mitigate this?  There's no 
> functionality to export transactions.  There's no alternate file format.  I'm 
> not about to delete old data.  I lack the expertise to develop some sort of 
> XML parser that could pull out, say, this year's transactions and save them 
> to a new file.  Even if I could, there are issues like opening balances and 
> tracking investments that wouldn't be well-served by this approach.

As I said it has little to do with the XML format.  If you search the
archives, Jonathan Kamens sent a perl script back in 2003 to strip
out old transactions.  I have no idea how smart that is.

There's also some experimental book-splitting code in there, but
that's been disabled because it has...issues.

> And question #2 is: is there any plan in place for future releases to address 
> this?  By changing the file format from flat XML to something like a SQL 
> database?  Allowing exports of transactions, splitting of files?

Changing from XML to SQL wont change the reports.  If you fixed the
reports themselves you'd get the speedup regardless of the backend
storage.  The MAIN reason for switching from XML to SQL in the backend
is to enable "save on commit".. So you never lose data because your data
is saved every time you commit a transaction.

> I'm feeling a little trapped.  My data's in this application, this format, but 
> how much longer can I continue until I reach a tipping point and it all 
> crashes down, or just gets so slow I can't use it?

Hey, the data file is XML.  You can write an XSLT to convert it to
anything you want.  I'm sorry you feel trapped, but it's not like
the data is in a proprietary format.  C.f. GnuCash2QIF.

> Please remember to CC this list on all your replies.
> You can do this by using Reply-To-List or Reply-All.

-derek

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord at MIT.EDU                        PGP key available


More information about the gnucash-user mailing list