libmpdecimal

Wed Sep 24 09:49:27 EDT 2014

On Sep 24, 2014, at 2:10 AM, Geert Janssens <geert.gnucash at kobaltwit.be> wrote:

> On Saturday 20 September 2014 18:21:44 John Ralls wrote:
>> On Aug 27, 2014, at 10:31 PM, John Ralls <jralls at ceridwen.us> wrote:
>>> On Aug 27, 2014, at 8:32 AM, Geert Janssens <janssens-
> geert at telenet.be> wrote:
>>>> On Saturday 23 August 2014 18:01:15 John Ralls wrote:
>>>>> So, having gotten test-lots and all of the other tests working*
>>>>> with
>>>>> libmpdecimal, I studied the Intel library for several days and
>>>>> couldn't figure out how to make it work, so I decided to try the
>>>>> GCC
>>>>> implementation, which offers a 128-bit IEEE 754 format that's
>>>>> fixed
>>>>> size. Since it doesn't ever call malloc, I thought it might prove
>>>>> faster, and indeed it is. I haven't finished integrating it -- the
>>>>> library doesn't provide formatted printing -- but it's far enough
>>>>> along that it passes all of the engine and backend tests. Some
>>>>> results:
>>>>> 
>>>>> test-numeric, with NREPS increased to 20000 to get a reasonable
>>>>> execution time for profiling: master     9645ms
>>>>> 
>>>>>   mpDecimal 21410ms
>>>>>   decNumber 12985ms
>>>>> 
>>>>> test-lots:
>>>>>   master      16300ms
>>>>>   mpDecimal   20203ms
>>>>>   decNumber   19044ms
>>>>> 
>>>>> The first shows the relative speed in more or less pure
>>>>> computation,
>>>>> the latter shows the overall impact on one of the longer-running
>>>>> tests that does a lot of other stuff.
>>>> 
>>>> John,
>>>> 
>>>> Thanks for implementing this and running the tests. The topic was
>>>> last touched before my holidays so it took me a while to refresh
>>>> my memory...
>>>> 
>>>> decNumber clearly performs better, although both implementations
>>>> lag on our current gnc_numeric performance.>> 
>>>>> I haven't investigated Christian's other suggestion of aggressive
>>>>> rounding to eliminate the overflow issue to make room for larger
>>>>> denominators, nor my original idea of replacing gnc_numeric with
>>>>> boost::rational atop a multi-precision class (either boost::mp or
>>>>> gmp).
>>>> 
>>>> Do you still have plans for either ?
>>>> 
>>>> I suppose aggressive rounding is orthogonal to the choice of data
>>>> type. Christian's argument that we should round as is expected in
>>>> the financial world makes sense to me but that argument does not
>>>> imply any underlying data type.
>>>> 
>>>> How about the boost::rational option ?
>>>> 
>>>>> I have noticed that we're doing some dumb things with Scheme,
>>>>> like using double as an intermediate when converting from Scheme
>>>>> numbers to gnc_numeric (Scheme numbers are also rational, so the
>>>>> conversion should be direct) and representing gnc_numerics as a
>>>>> tuple
>>>>> (num, denom) instead of just using Scheme rationals.
>>>> 
>>>> Does this mean you see potential performance gains in this as we
>>>> clean up the C<->Scheme number conversions ?>> 
>>>>> Neither will
>>>>> work for decimal floats, of course; the whole class will have to
>>>>> be
>>>>> wrapped so that computation takes place in C++.
>>>> 
>>>> Which means some performance drop again...
>>>> 
>>>>> Storage in SQL is
>>>>> also an issue,
>>>> 
>>>> From the previous conversation I recall sqlite doesn't have a
>>>> decimal type so we can't run calculating queries on it directly.
>>>> 
>>>> But how about the other two: mysql and postsgresql. Is the decimal
>>>> type you're using in your tests directly compatible with the
>>>> decimal data types in mysql and postgresql, or compatible enough
>>>> to convert automatically between them ?>> 
>>>>> as is maintaining backward file compatibility.
>>>>> 
>>>>> Another issue is equality: In order to get tests to pass I've had
>>>>> to
>>>>> implement a fuzzy comparison where both numbers are first rounded
>>>>> to
>>>>> the smaller number of decimal places -- 2 fewer if there are 12 or
>>>>> more -- and compared with two roundings, first truncation and
>>>>> second
>>>>> "bankers", and declared unequal only if they're unequal in both. I
>>>>> hate this, but it seems to be necessary to obtain equality when
>>>>> dealing with large divisors (as when computing prices or interest
>>>>> rates). I suspect that we'd have to do something similar if we
>>>>> pursue
>>>>> aggressive rounding to avoid overflows, but the only way to know
>>>>> for
>>>>> certain is to try.
>>>> 
>>>> Ugh. :(
>>>> 
>>>> So what's the current balance ?
>>>> 
>>>> I see following pros and cons of your tests so far:
>>>> 
>>>> Pro:
>>>> - using a decimal type gives us more precision
>>>> 
>>>> Con:
>>>> - sqlite doesn't have a decimal data type, so as it currently
>>>> stands we can't run calculations in queries in that database type
>>>> - we loose backward/forward compatibility with earlier versions of
>>>> GnuCash - decNumber or mpDecimal are new dependencies
>>>> - their performance is currently less than the original gnc_numeric
>>>> - guile doesn't know of a decimal data type so we may need some
>>>> conversion glue - equality is fuzzy
>>>> 
>>>> Please add if I forgot arguments on either side.
>>>> 
>>>> Arguably many of the con arguments can be solved. That will effort
>>>> however. And I consider the first two more important than the
>>>> others.
>>>> 
>>>> So do you think the benefits (I assume there will be more than the
>>>> one I mentioned) will outweigh the drawbacks ? Does the work that
>>>> will go into it bring GnuCash enough value to continue on this
>>>> track ?
>>>> 
>>>> It's probably too early to tell for sure but I wanted to get your
>>>> ideas based on what we have so far.> 
>>> Testing boost::rational is next on the agenda. My original idea was
>>> to use it with boost::multiprecision or gmp, but I'd prefer
>>> something that doesn't depend on heap allocations because it's so
>>> much slower than stack allocation and must be passed by pointer,
>>> which is a major change in the API -- meaning a ton of cleanup work
>>> up front. I think I'll do a straight substitution of the existing
>>> math128 with boost::rational<int64_t> just to see what happens.
>>> 
>>> I think that part of implementing immediate rounding must include
>>> constraining denominators to powers-of-ten. The main reason is that
>>> it makes my head hurt when I try to think about how to do rounding
>>> with arbitrary denominators. If you consider that a big chunk of
>>> the overflow problems arise from denominators and divisors that are
>>> large primes, it becomes quickly apparent that avoiding large prime
>>> denominators might well resolve much of the problem. It's also true
>>> that for real-world numbers, as opposed to free random-generated
>>> numbers from tests, that all numbers have powers-of-ten
>>> denominators. We'd still have many-digit-prime divisors  to deal
>>> with, but constraining denominators gives us something to round to.
>>> Does that make sense, or does it seem the rambling of a lunatic?
>>> This really does make my head hurt.
>> Boost::Rational is a serious disappointment. Boost::rational<int64_t>
>> didn’t allow a significant increase in precision and is further
>> hampered by not providing any overflow detection. Benchmarks of
>> test-numeric with NREPS set to 20000 (the numbers are a bit different
>> from before because I’m using my Mac Pro instead of my Mac Book Air,
>> and because these are debug builds):
>> 
>> Branch			Tests		Time
>> master: 		1187558		 5346ms
>> libmpdecimal: 		1180076		 8718ms
>> boost-rational, cppint: 1187558		20903ms
>> boost-rational, gmp: 	1187558		34232ms
>> 
>> cppint means boost::multiprecision::checked_cppint128_t, a 16-byte
>> stack allocated multi-precision integer. “Checked” means that it
>> throws std::overflow_error instead of wrapping.  Gmp means the Gnu
>> Multiprecision library. It’s supposed to be faster than cppint, but
>> its performance is killed by having to malloc everything. The fact
>> that our own C code is substantially faster than any library I’ve
>> tried is a tribute to Linas.
>> 
>> There’s another wrinkle: Boost::Rational immediately reduces all
>> numbers to what we called in my grade school “simplest form”, meaning
>> no common factors between the numerator and denominator. This
>> actually helps prevent overflows, but means that we have to be very
>> careful to supply the SCU as the rounding denominator or we’ll get
>> unexpected rounding results.  Boost::Rational provides no rounding
>> function of its own so I rewrote gnc_numeric_convert into C++ using
>> the overloaded operators from boost::multiprecision. That at least
>> taught me about rounding arbitrary denominators, so my head doesn’t
>> explode any more.
>> 
>> The good news is that using 128-bit numbers for all internal
>> representations along with aggressive reduction and a tweak to
>> get_random_gnc_numeric() so that the actual number doesn’t exceed
>> 1E13/1 and careful attention to rounding prevents overflow errors
>> during testing, at least up through test-lots.
>> 
>> Looking a bit more at rounding, it doesn’t appear to me that at 14 out
>> of 151 gnc_numeric operations in the code base we’re over-using
>> GNC_HOW_RND_NEVER. I’m not convinced that it would help much to
>> eliminate those cases.
>> 
>> It looks like the best solution is to work over our existing
>> gnc-numeric with math128 implementation so that the internals are
>> always 128-bit and we don’t declare overflows prematurely.
>> 
> Thanks for the update and the elaborate testing.
> 
> So,... math128 is what we use now, using the rational representation of 
> numbers, do I get that right ? And the best option is to stick with it 
> and improve on it ? Would you still transform it into C++ so it becomes 
> an object with properties and members ?

Yes to all.

Regards,
John Ralls