libmpdecimal

Wed Sep 24 05:10:00 EDT 2014

On Saturday 20 September 2014 18:21:44 John Ralls wrote:
> On Aug 27, 2014, at 10:31 PM, John Ralls <jralls at ceridwen.us> wrote:
> > On Aug 27, 2014, at 8:32 AM, Geert Janssens <janssens-
geert at telenet.be> wrote:
> >> On Saturday 23 August 2014 18:01:15 John Ralls wrote:
> >>> So, having gotten test-lots and all of the other tests working*
> >>> with
> >>> libmpdecimal, I studied the Intel library for several days and
> >>> couldn't figure out how to make it work, so I decided to try the
> >>> GCC
> >>> implementation, which offers a 128-bit IEEE 754 format that's
> >>> fixed
> >>> size. Since it doesn't ever call malloc, I thought it might prove
> >>> faster, and indeed it is. I haven't finished integrating it -- the
> >>> library doesn't provide formatted printing -- but it's far enough
> >>> along that it passes all of the engine and backend tests. Some
> >>> results:
> >>> 
> >>> test-numeric, with NREPS increased to 20000 to get a reasonable
> >>> execution time for profiling: master     9645ms
> >>> 
> >>>    mpDecimal 21410ms
> >>>    decNumber 12985ms
> >>> 
> >>> test-lots:
> >>>    master      16300ms
> >>>    mpDecimal   20203ms
> >>>    decNumber   19044ms
> >>> 
> >>> The first shows the relative speed in more or less pure
> >>> computation,
> >>> the latter shows the overall impact on one of the longer-running
> >>> tests that does a lot of other stuff.
> >> 
> >> John,
> >> 
> >> Thanks for implementing this and running the tests. The topic was
> >> last touched before my holidays so it took me a while to refresh
> >> my memory...
> >> 
> >> decNumber clearly performs better, although both implementations
> >> lag on our current gnc_numeric performance.>> 
> >>> I haven't investigated Christian's other suggestion of aggressive
> >>> rounding to eliminate the overflow issue to make room for larger
> >>> denominators, nor my original idea of replacing gnc_numeric with
> >>> boost::rational atop a multi-precision class (either boost::mp or
> >>> gmp).
> >> 
> >> Do you still have plans for either ?
> >> 
> >> I suppose aggressive rounding is orthogonal to the choice of data
> >> type. Christian's argument that we should round as is expected in
> >> the financial world makes sense to me but that argument does not
> >> imply any underlying data type.
> >> 
> >> How about the boost::rational option ?
> >> 
> >>> I have noticed that we're doing some dumb things with Scheme,
> >>> like using double as an intermediate when converting from Scheme
> >>> numbers to gnc_numeric (Scheme numbers are also rational, so the
> >>> conversion should be direct) and representing gnc_numerics as a
> >>> tuple
> >>> (num, denom) instead of just using Scheme rationals.
> >> 
> >> Does this mean you see potential performance gains in this as we
> >> clean up the C<->Scheme number conversions ?>> 
> >>> Neither will
> >>> work for decimal floats, of course; the whole class will have to
> >>> be
> >>> wrapped so that computation takes place in C++.
> >> 
> >> Which means some performance drop again...
> >> 
> >>> Storage in SQL is
> >>> also an issue,
> >> 
> >> From the previous conversation I recall sqlite doesn't have a
> >> decimal type so we can't run calculating queries on it directly.
> >> 
> >> But how about the other two: mysql and postsgresql. Is the decimal
> >> type you're using in your tests directly compatible with the
> >> decimal data types in mysql and postgresql, or compatible enough
> >> to convert automatically between them ?>> 
> >>> as is maintaining backward file compatibility.
> >>> 
> >>> Another issue is equality: In order to get tests to pass I've had
> >>> to
> >>> implement a fuzzy comparison where both numbers are first rounded
> >>> to
> >>> the smaller number of decimal places -- 2 fewer if there are 12 or
> >>> more -- and compared with two roundings, first truncation and
> >>> second
> >>> "bankers", and declared unequal only if they're unequal in both. I
> >>> hate this, but it seems to be necessary to obtain equality when
> >>> dealing with large divisors (as when computing prices or interest
> >>> rates). I suspect that we'd have to do something similar if we
> >>> pursue
> >>> aggressive rounding to avoid overflows, but the only way to know
> >>> for
> >>> certain is to try.
> >> 
> >> Ugh. :(
> >> 
> >> So what's the current balance ?
> >> 
> >> I see following pros and cons of your tests so far:
> >> 
> >> Pro:
> >> - using a decimal type gives us more precision
> >> 
> >> Con:
> >> - sqlite doesn't have a decimal data type, so as it currently
> >> stands we can't run calculations in queries in that database type
> >> - we loose backward/forward compatibility with earlier versions of
> >> GnuCash - decNumber or mpDecimal are new dependencies
> >> - their performance is currently less than the original gnc_numeric
> >> - guile doesn't know of a decimal data type so we may need some
> >> conversion glue - equality is fuzzy
> >> 
> >> Please add if I forgot arguments on either side.
> >> 
> >> Arguably many of the con arguments can be solved. That will effort
> >> however. And I consider the first two more important than the
> >> others.
> >> 
> >> So do you think the benefits (I assume there will be more than the
> >> one I mentioned) will outweigh the drawbacks ? Does the work that
> >> will go into it bring GnuCash enough value to continue on this
> >> track ?
> >> 
> >> It's probably too early to tell for sure but I wanted to get your
> >> ideas based on what we have so far.> 
> > Testing boost::rational is next on the agenda. My original idea was
> > to use it with boost::multiprecision or gmp, but I'd prefer
> > something that doesn't depend on heap allocations because it's so
> > much slower than stack allocation and must be passed by pointer,
> > which is a major change in the API -- meaning a ton of cleanup work
> > up front. I think I'll do a straight substitution of the existing
> > math128 with boost::rational<int64_t> just to see what happens.
> > 
> > I think that part of implementing immediate rounding must include
> > constraining denominators to powers-of-ten. The main reason is that
> > it makes my head hurt when I try to think about how to do rounding
> > with arbitrary denominators. If you consider that a big chunk of
> > the overflow problems arise from denominators and divisors that are
> > large primes, it becomes quickly apparent that avoiding large prime
> > denominators might well resolve much of the problem. It's also true
> > that for real-world numbers, as opposed to free random-generated
> > numbers from tests, that all numbers have powers-of-ten
> > denominators. We'd still have many-digit-prime divisors  to deal
> > with, but constraining denominators gives us something to round to.
> > Does that make sense, or does it seem the rambling of a lunatic?
> > This really does make my head hurt.
> Boost::Rational is a serious disappointment. Boost::rational<int64_t>
> didn’t allow a significant increase in precision and is further
> hampered by not providing any overflow detection. Benchmarks of
> test-numeric with NREPS set to 20000 (the numbers are a bit different
> from before because I’m using my Mac Pro instead of my Mac Book Air,
> and because these are debug builds):
> 
> Branch			Tests		Time
> master: 		1187558		 5346ms
> libmpdecimal: 		1180076		 8718ms
> boost-rational, cppint: 1187558		20903ms
> boost-rational, gmp: 	1187558		34232ms
> 
> cppint means boost::multiprecision::checked_cppint128_t, a 16-byte
> stack allocated multi-precision integer. “Checked” means that it
> throws std::overflow_error instead of wrapping.  Gmp means the Gnu
> Multiprecision library. It’s supposed to be faster than cppint, but
> its performance is killed by having to malloc everything. The fact
> that our own C code is substantially faster than any library I’ve
> tried is a tribute to Linas.
> 
> There’s another wrinkle: Boost::Rational immediately reduces all
> numbers to what we called in my grade school “simplest form”, meaning
> no common factors between the numerator and denominator. This
> actually helps prevent overflows, but means that we have to be very
> careful to supply the SCU as the rounding denominator or we’ll get
> unexpected rounding results.  Boost::Rational provides no rounding
> function of its own so I rewrote gnc_numeric_convert into C++ using
> the overloaded operators from boost::multiprecision. That at least
> taught me about rounding arbitrary denominators, so my head doesn’t
> explode any more.
> 
> The good news is that using 128-bit numbers for all internal
> representations along with aggressive reduction and a tweak to
> get_random_gnc_numeric() so that the actual number doesn’t exceed
> 1E13/1 and careful attention to rounding prevents overflow errors
> during testing, at least up through test-lots.
> 
> Looking a bit more at rounding, it doesn’t appear to me that at 14 out
> of 151 gnc_numeric operations in the code base we’re over-using
> GNC_HOW_RND_NEVER. I’m not convinced that it would help much to
> eliminate those cases.
> 
> It looks like the best solution is to work over our existing
> gnc-numeric with math128 implementation so that the internals are
> always 128-bit and we don’t declare overflows prematurely.
> 
Thanks for the update and the elaborate testing.

So,... math128 is what we use now, using the rational representation of 
numbers, do I get that right ? And the best option is to stick with it 
and improve on it ? Would you still transform it into C++ so it becomes 
an object with properties and members ?

Geert