Startup speed (was: Archiving old transations)

Fri Jun 21 11:44:03 EDT 2013

Hi David,

That's not a dumb question at all; in fact, it's a very smart question.

To give my own answer (which I'm sure is only one of many possible 
answers), I actually need to "go up a level," hence the change in the 
Subject line.

For me, the issue isn't really archiving transactions; archiving is just 
a workaround for the /real/ issue, which is that GnuCash's startup time 
grows linearly with the number of transactions.

People who are saying, "Well, it only takes 30 seconds to start up for 
me, and that seems fine," or, "I don't have a problem waiting patiently 
for it to launch," are in my opinion missing the point, or more 
accurately missing two points.

First of all, let me say it again, the startup time /grows linearly with 
the number of transactions./ I've been using GnuCash to track my 
finances since 2004. I hope to live for another fifty years or so, and I 
hope to continue using GnuCash to track my finances for much of that 
time. I also hope that I will be entering more and more transactions per 
year into GnuCash, not less, because I hope to have more money to spend 
as I get older ;-). As time goes on, it's going to take longer and 
longer for me to launch the program, unless the linear growth problem is 
fixed.

Start-up time isn't the only issue. Searching over 60 years of 
transactions is going to be much slower than searching over 10 years of 
transactions, and the search results are going to be cluttered with 
ancient stuff I don't care about. Generating a report over 60 years of 
transactions is going to be much slower than generating a report over 10 
years of transactions, and many of the reports default to the entire 
time span of transactions and don't let you change the default until the 
first version of the report has already been fully generated.

In short, requiring a monolithic GnuCash file and reading the entire 
contents of that file into memory before allowing the user to do 
anything with it simply does not scale. If GnuCash is in it for the long 
haul, and I certainly hope that it is, this problem cannot be ignored 
forever.

Second, anybody with a clue about UX knows that responsiveness is far 
and away the single issue that users care most about. Slowness -- during 
or after start-up -- is guaranteed to irritate people and make them 
switch apps. If you're looking for low-hanging fruit in terms of 
improving the user experience, this is it. (Yes, yes, I know, "GnuCash 
is free, you don't need to use it if you don't want to, if it's too slow 
for you you're welcome to go use something else." Please, spare me the 
free software lecture. I am assuming here that the folks who write and 
maintain GnuCash actually want it to be attractive to users and more 
pleasant for they, themselves, to use.)

Colin claimed that this isn't an issue because computers keep getting 
faster so startup time remains relatively constant. There are a number 
of reasons why I don't buy that, including:

  * I find the attitude which Microsoft let loose on the world that it's
    OK for software to get more and more slow and bloated because faster
    hardware will compensate for it to be abhorrent. The software should
    be just as big and slow as it needs to be to do the work it is
    expected to do. Since there are financial applications that manage
    just as much data as GnuCash without this start-up problem, GnuCash
    clearly doesn't /need/ to suffer from this. The fact that it does is
    indicative of a problem with its design, as John Ralls has already
    acknowledged. We should be acknowledging that and aspiring to fix
    it, not rationalizing it.
  * I should not need to upgrade my computer every couple of years just
    to be able to balance my checkbook and credit-card statements. Not
    everybody can afford to do that.
  * In a related vein, the kind of work that GnuCash does really is the
    kind of work that it should be possible to do on a low-end computer.
    We're not talking about Pixar animation, folks, we're talking about
    double-entry accounting.

So, I've now made the case that GnuCash's current architecture is 
problematic because it requires a molothic GnuCash file / database and 
reads it all on startup. People are, of course, free to disagree with me 
about that. For those who do, the rest of what I'm about to write is 
irrelevant so you can just stop here. ;-) But if there is a problem, 
then what are the possible solutions?

Well, one of them is the one that my Perl script implements, i.e., 
archiving old transactions into a separate file. That solves the startup 
problem, and it also solves the problem of all your searches showing you 
ancient search results you don't care about. However, as others have 
pointed out, it introduces a new problem -- your data is no longer 
searchable all in one place, so if you need to find transactions from 
years ago, you have to go digging through old files. Furthermore, at 
least in my implementation, the files with archived transactions make no 
effort to maintain historical balances. It's not ideal.

Another possible solution is something like this...

  * Enhance the format of the GnuCash file format by adding a section at
    the top of each file summarizing the balances of all the accounts
    referenced by transactions in that file.
  * Allow GnuCash to read transactions from and write transactions to
    multiple files rather than just a single file.
  * By default, the detailed transactions are only read from some of the
    files (how many is configurable by the user); only balances are read
    from the older files and summed to produce initial balances of every
    account.
  * If the user needs to access transactions archived in one of the
    older files, s/he tells GnuCash to read the transactions in that
    file, and the balances previously read from that file are replaced
    by the actual transactions.

With the database backend, something like this is even easier to 
implement... Instead of using separate files, transactions can simple be 
queried from the DB by date, and initial balances can be produced by 
doing fast aggregate queries for transactions earlier than the earliest 
date the user currently wants displayed.

An enhancement to the above idea is to read /only/ the aggregate 
balances, from /all/ files or rows in the database, on startup, and then 
either (a) read detailed transactions in the background, (b) read 
detailed transactions as they are needed for display, or (c) some smart 
combination of the two.

I'm sure there are other possible solutions.

Ultimately, I think the issue is as I mentioned previously (and so did 
John Ralls) is that there is a fundamental design flaw in requiring all 
transactions to be read at startup, so the /right/ solution is to fix 
the design flaw and allow transactions to be read in the background or 
as needed or both.

Now I'm sort of regretting offering to contribute to a "bounty" to 
convince somebody to implement archiving, because I think I'd rather see 
the developers spending time working on fixing the fundamental design 
flaw than on a workaround for it. Maybe John Ralls is right that doing 
something real about this "requires a complete re-write of Gnucash's 
core functionality," or just maybe it might be possible to with some 
out-of-the-box thinking to graft something useable onto what's there now 
without rewriting everything?

   jik

On 06/21/2013 10:08 AM, David Carlson wrote:
> Excuse me for being dumb, but what is the definition of archiving, anyway.
> Is it the same for GnuCash as for e-mail?
> Physically where is the stuff moved to and how hard is it to get it back
> if needed?
> Would I use GnuCash to look at that archived stuff?
>
> David C
> _______________________________________________
> gnucash-user mailing list
> gnucash-user at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-user
> -----
> Please remember to CC this list on all your replies.
> You can do this by using Reply-To-List or Reply-All.
>
>