libmpdecimal

Sat Aug 23 21:01:15 EDT 2014

On Jul 31, 2014, at 9:38 AM, John Ralls <jralls at ceridwen.us> wrote:

> 
> On Jul 31, 2014, at 7:13 AM, Geert Janssens <info at kobaltwit.be> wrote:
> 
>> On Saturday 19 July 2014 11:53:37 John Ralls wrote:
>>> The libmpdecimal branch now passes make check all the way through.
>>> I've force-pushed a rebase on the latest master to
>>> https://github.com/jralls/gnucash.git for anyone who'd like to play
>>> with it.
>>> 
>>> Next step is to do some tuning to see how much I can shrink it while
>>> getting full 64-bit coefficients (current tests limit it to 44-bits)
>>> then profiling to see if there are any performance differences to
>>> master.
>>> 
>>> Regards,
>>> John Ralls
>>> 
>>> 
>>> _______________________________________________
>>> gnucash-devel mailing list
>>> gnucash-devel at gnucash.org
>>> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
>> 
>> Nice! Keep up the good work.
>> 
>> I'm curious to hear about the performance difference. Hopefully the performance will be better :)
> 
> Libmpdecimal is about 25% slower as it is now, but I see some good optimization opportunities from the profile. I'm also looking at the Intel and GCC/ICU versions to see if they might prove faster, since mpdecimal is written specifically for Python and so has some extra overhead that doesn't seem to be present in the others.
> 
> The Intel version is particularly interesting because it uses a different encoding scheme and dispenses with contexts, both of which they claim afford much faster execution. Unfortunately their code is rather impenetrable and the documentation is sparse and difficult to understand. It also doesn't appear to expose an interface that can be used to extract a rational expression of the number, so it would require more code changes in the rest of GnuCash to be useable.
> 
> Removing the 44-bit clamp and increasing the range of the denominators from 10^6 to 10^9 passes all tests except test-lots, which fails from being unable to balance the lots in complex cases. I'm still debugging that.

So, having gotten test-lots and all of the other tests working* with libmpdecimal, I studied the Intel library for several days and couldn't figure out how to make it work, so I decided to try the GCC implementation, which offers a 128-bit IEEE 754 format that's fixed size. Since it doesn't ever call malloc, I thought it might prove faster, and indeed it is. I haven't finished integrating it -- the library doesn't provide formatted printing -- but it's far enough along that it passes all of the engine and backend tests. Some results:

test-numeric, with NREPS increased to 20000 to get a reasonable execution time for profiling:
    master     9645ms
    mpDecimal 21410ms
    decNumber 12985ms

test-lots:
    master      16300ms
    mpDecimal   20203ms
    decNumber   19044ms

The first shows the relative speed in more or less pure computation, the latter shows the overall impact on one of the longer-running tests that does a lot of other stuff. 

I haven't investigated Christian's other suggestion of aggressive rounding to eliminate the overflow issue to make room for larger denominators, nor my original idea of replacing gnc_numeric with boost::rational atop a multi-precision class (either boost::mp or gmp). I have noticed that we're doing some dumb things with Scheme, like using double as an intermediate when converting from Scheme numbers to gnc_numeric (Scheme numbers are also rational, so the conversion should be direct) and representing gnc_numerics as a tuple (num, denom) instead of just using Scheme rationals. Neither will work for decimal floats, of course; the whole class will have to be wrapped so that computation takes place in C++. Storage in SQL is also an issue, as is maintaining backward file compatibility.

Another issue is equality: In order to get tests to pass I've had to implement a fuzzy comparison where both numbers are first rounded to the smaller number of decimal places -- 2 fewer if there are 12 or more -- and compared with two roundings, first truncation and second "bankers", and declared unequal only if they're unequal in both. I hate this, but it seems to be necessary to obtain equality when dealing with large divisors (as when computing prices or interest rates). I suspect that we'd have to do something similar if we pursue aggressive rounding to avoid overflows, but the only way to know for certain is to try.

I've force-pushed both branches to my github repo for the curious; beware that ATM neither passes "make check".

Regards,
John Ralls

* That didn't last long, though: The latest rebase onto master broke some of the tests. I haven't fixed them yet because I wanted to get this profiling data done.