Startup speed (was: Archiving old transations)

Fri Jun 21 22:04:47 EDT 2013

I find it hard to believe that reading the data file is the bottleneck.   Unless very inefficient reads are being used.  

Without actually looking at the code, I would assume that it is the manner in which the datafile is read and processed that is the bottleneck.    Disk I/O is much slower than memory accesses, but any modern system can read a rather large data file in a single gulp in a matter of a fraction of a second.   But parsing the file and building a data structure that uses the entire file's contents, now that's another matter.   And if the file is being read a line (or a byte) at a time, that would also be slow.

However, I do agree that eliminating the slow startup should be a priority.

On 2013-06-21, at 11:44 AM, Jonathan Kamens wrote:

> Hi David,
> 
> That's not a dumb question at all; in fact, it's a very smart question.
> 
> To give my own answer (which I'm sure is only one of many possible answers), I actually need to "go up a level," hence the change in the Subject line.
> 
> For me, the issue isn't really archiving transactions; archiving is just a workaround for the /real/ issue, which is that GnuCash's startup time grows linearly with the number of transactions.
> 
> People who are saying, "Well, it only takes 30 seconds to start up for me, and that seems fine," or, "I don't have a problem waiting patiently for it to launch," are in my opinion missing the point, or more accurately missing two points.
> 
> First of all, let me say it again, the startup time /grows linearly with the number of transactions./ I've been using GnuCash to track my finances since 2004. I hope to live for another fifty years or so, and I hope to continue using GnuCash to track my finances for much of that time. I also hope that I will be entering more and more transactions per year into GnuCash, not less, because I hope to have more money to spend as I get older ;-). As time goes on, it's going to take longer and longer for me to launch the program, unless the linear growth problem is fixed.
> 
> Start-up time isn't the only issue. Searching over 60 years of transactions is going to be much slower than searching over 10 years of transactions, and the search results are going to be cluttered with ancient stuff I don't care about. Generating a report over 60 years of transactions is going to be much slower than generating a report over 10 years of transactions, and many of the reports default to the entire time span of transactions and don't let you change the default until the first version of the report has already been fully generated.
> 
> In short, requiring a monolithic GnuCash file and reading the entire contents of that file into memory before allowing the user to do anything with it simply does not scale. If GnuCash is in it for the long haul, and I certainly hope that it is, this problem cannot be ignored forever.
> 
> Second, anybody with a clue about UX knows that responsiveness is far and away the single issue that users care most about. Slowness -- during or after start-up -- is guaranteed to irritate people and make them switch apps. If you're looking for low-hanging fruit in terms of improving the user experience, this is it. (Yes, yes, I know, "GnuCash is free, you don't need to use it if you don't want to, if it's too slow for you you're welcome to go use something else." Please, spare me the free software lecture. I am assuming here that the folks who write and maintain GnuCash actually want it to be attractive to users and more pleasant for they, themselves, to use.)
> 
> Colin claimed that this isn't an issue because computers keep getting faster so startup time remains relatively constant. There are a number of reasons why I don't buy that, including:
> 
> * I find the attitude which Microsoft let loose on the world that it's
>   OK for software to get more and more slow and bloated because faster
>   hardware will compensate for it to be abhorrent. The software should
>   be just as big and slow as it needs to be to do the work it is
>   expected to do. Since there are financial applications that manage
>   just as much data as GnuCash without this start-up problem, GnuCash
>   clearly doesn't /need/ to suffer from this. The fact that it does is
>   indicative of a problem with its design, as John Ralls has already
>   acknowledged. We should be acknowledging that and aspiring to fix
>   it, not rationalizing it.
> * I should not need to upgrade my computer every couple of years just
>   to be able to balance my checkbook and credit-card statements. Not
>   everybody can afford to do that.
> * In a related vein, the kind of work that GnuCash does really is the
>   kind of work that it should be possible to do on a low-end computer.
>   We're not talking about Pixar animation, folks, we're talking about
>   double-entry accounting.
> 
> So, I've now made the case that GnuCash's current architecture is problematic because it requires a molothic GnuCash file / database and reads it all on startup. People are, of course, free to disagree with me about that. For those who do, the rest of what I'm about to write is irrelevant so you can just stop here. ;-) But if there is a problem, then what are the possible solutions?
> 
> Well, one of them is the one that my Perl script implements, i.e., archiving old transactions into a separate file. That solves the startup problem, and it also solves the problem of all your searches showing you ancient search results you don't care about. However, as others have pointed out, it introduces a new problem -- your data is no longer searchable all in one place, so if you need to find transactions from years ago, you have to go digging through old files. Furthermore, at least in my implementation, the files with archived transactions make no effort to maintain historical balances. It's not ideal.
> 
> Another possible solution is something like this...
> 
> * Enhance the format of the GnuCash file format by adding a section at
>   the top of each file summarizing the balances of all the accounts
>   referenced by transactions in that file.
> * Allow GnuCash to read transactions from and write transactions to
>   multiple files rather than just a single file.
> * By default, the detailed transactions are only read from some of the
>   files (how many is configurable by the user); only balances are read
>   from the older files and summed to produce initial balances of every
>   account.
> * If the user needs to access transactions archived in one of the
>   older files, s/he tells GnuCash to read the transactions in that
>   file, and the balances previously read from that file are replaced
>   by the actual transactions.
> 
> With the database backend, something like this is even easier to implement... Instead of using separate files, transactions can simple be queried from the DB by date, and initial balances can be produced by doing fast aggregate queries for transactions earlier than the earliest date the user currently wants displayed.
> 
> An enhancement to the above idea is to read /only/ the aggregate balances, from /all/ files or rows in the database, on startup, and then either (a) read detailed transactions in the background, (b) read detailed transactions as they are needed for display, or (c) some smart combination of the two.
> 
> I'm sure there are other possible solutions.
> 
> Ultimately, I think the issue is as I mentioned previously (and so did John Ralls) is that there is a fundamental design flaw in requiring all transactions to be read at startup, so the /right/ solution is to fix the design flaw and allow transactions to be read in the background or as needed or both.
> 
> Now I'm sort of regretting offering to contribute to a "bounty" to convince somebody to implement archiving, because I think I'd rather see the developers spending time working on fixing the fundamental design flaw than on a workaround for it. Maybe John Ralls is right that doing something real about this "requires a complete re-write of Gnucash's core functionality," or just maybe it might be possible to with some out-of-the-box thinking to graft something useable onto what's there now without rewriting everything?
> 
>  jik
> 
> On 06/21/2013 10:08 AM, David Carlson wrote:
>> Excuse me for being dumb, but what is the definition of archiving, anyway.
>> Is it the same for GnuCash as for e-mail?
>> Physically where is the stuff moved to and how hard is it to get it back
>> if needed?
>> Would I use GnuCash to look at that archived stuff?
>> 
>> David C
>> _______________________________________________
>> gnucash-user mailing list
>> gnucash-user at gnucash.org
>> https://lists.gnucash.org/mailman/listinfo/gnucash-user
>> -----
>> Please remember to CC this list on all your replies.
>> You can do this by using Reply-To-List or Reply-All.
>> 
>> 
> 
> _______________________________________________
> gnucash-user mailing list
> gnucash-user at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-user
> -----
> Please remember to CC this list on all your replies.
> You can do this by using Reply-To-List or Reply-All.