libmpdecimal

Fri Sep 26 22:59:08 EDT 2014

Dear Mr. Ralls,

Excellent work! I'm happy to hear the results, although with you I'm
disappointed that boost::rational didn't bring something valuable to the
table. I look forward to getting to know that code some day...on an
as-needed basis!

In Christ,
Aaron Laws

On Sat, Sep 20, 2014 at 9:21 PM, John Ralls <jralls at ceridwen.us> wrote:

>
> On Aug 27, 2014, at 10:31 PM, John Ralls <jralls at ceridwen.us> wrote:
>
> >
> > On Aug 27, 2014, at 8:32 AM, Geert Janssens <janssens-geert at telenet.be>
> wrote:
> >
> >> On Saturday 23 August 2014 18:01:15 John Ralls wrote:
> >>> So, having gotten test-lots and all of the other tests working* with
> >>> libmpdecimal, I studied the Intel library for several days and
> >>> couldn't figure out how to make it work, so I decided to try the GCC
> >>> implementation, which offers a 128-bit IEEE 754 format that's fixed
> >>> size. Since it doesn't ever call malloc, I thought it might prove
> >>> faster, and indeed it is. I haven't finished integrating it -- the
> >>> library doesn't provide formatted printing -- but it's far enough
> >>> along that it passes all of the engine and backend tests. Some
> >>> results:
> >>>
> >>> test-numeric, with NREPS increased to 20000 to get a reasonable
> >>> execution time for profiling: master     9645ms
> >>>    mpDecimal 21410ms
> >>>    decNumber 12985ms
> >>>
> >>> test-lots:
> >>>    master      16300ms
> >>>    mpDecimal   20203ms
> >>>    decNumber   19044ms
> >>>
> >>
> >>> The first shows the relative speed in more or less pure computation,
> >>> the latter shows the overall impact on one of the longer-running
> >>> tests that does a lot of other stuff.
> >> John,
> >>
> >> Thanks for implementing this and running the tests. The topic was last
> touched before my holidays so it took me a while to refresh my memory...
> >>
> >> decNumber clearly performs better, although both implementations lag on
> our current gnc_numeric performance.
> >>
> >>>
> >>> I haven't investigated Christian's other suggestion of aggressive
> >>> rounding to eliminate the overflow issue to make room for larger
> >>> denominators, nor my original idea of replacing gnc_numeric with
> >>> boost::rational atop a multi-precision class (either boost::mp or
> >>> gmp).
> >> Do you still have plans for either ?
> >>
> >> I suppose aggressive rounding is orthogonal to the choice of data type.
> Christian's argument that we should round as is expected in the financial
> world makes sense to me but that argument does not imply any underlying
> data type.
> >>
> >> How about the boost::rational option ?
> >>
> >>> I have noticed that we're doing some dumb things with Scheme,
> >>> like using double as an intermediate when converting from Scheme
> >>> numbers to gnc_numeric (Scheme numbers are also rational, so the
> >>> conversion should be direct) and representing gnc_numerics as a tuple
> >>> (num, denom) instead of just using Scheme rationals.
> >> Does this mean you see potential performance gains in this as we clean
> up the C<->Scheme number conversions ?
> >>
> >>> Neither will
> >>> work for decimal floats, of course; the whole class will have to be
> >>> wrapped so that computation takes place in C++.
> >> Which means some performance drop again...
> >>
> >>> Storage in SQL is
> >>> also an issue,
> >> From the previous conversation I recall sqlite doesn't have a decimal
> type so we can't run calculating queries on it directly.
> >>
> >> But how about the other two: mysql and postsgresql. Is the decimal type
> you're using in your tests directly compatible with the decimal data types
> in mysql and postgresql, or compatible enough to convert automatically
> between them ?
> >>
> >>> as is maintaining backward file compatibility.
> >>>
> >>> Another issue is equality: In order to get tests to pass I've had to
> >>> implement a fuzzy comparison where both numbers are first rounded to
> >>> the smaller number of decimal places -- 2 fewer if there are 12 or
> >>> more -- and compared with two roundings, first truncation and second
> >>> "bankers", and declared unequal only if they're unequal in both. I
> >>> hate this, but it seems to be necessary to obtain equality when
> >>> dealing with large divisors (as when computing prices or interest
> >>> rates). I suspect that we'd have to do something similar if we pursue
> >>> aggressive rounding to avoid overflows, but the only way to know for
> >>> certain is to try.
> >> Ugh. :(
> >>
> >> So what's the current balance ?
> >>
> >> I see following pros and cons of your tests so far:
> >>
> >> Pro:
> >> - using a decimal type gives us more precision
> >>
> >> Con:
> >> - sqlite doesn't have a decimal data type, so as it currently stands we
> can't run calculations in queries in that database type
> >> - we loose backward/forward compatibility with earlier versions of
> GnuCash
> >> - decNumber or mpDecimal are new dependencies
> >> - their performance is currently less than the original gnc_numeric
> >> - guile doesn't know of a decimal data type so we may need some
> conversion glue
> >> - equality is fuzzy
> >>
> >> Please add if I forgot arguments on either side.
> >>
> >> Arguably many of the con arguments can be solved. That will effort
> however. And I consider the first two more important than the others.
> >>
> >> So do you think the benefits (I assume there will be more than the one
> I mentioned) will outweigh the drawbacks ? Does the work that will go into
> it bring GnuCash enough value to continue on this track ?
> >>
> >> It's probably too early to tell for sure but I wanted to get your ideas
> based on what we have so far.
> >
> > Testing boost::rational is next on the agenda. My original idea was to
> use it with boost::multiprecision or gmp, but I'd prefer something that
> doesn't depend on heap allocations because it's so much slower than stack
> allocation and must be passed by pointer, which is a major change in the
> API -- meaning a ton of cleanup work up front. I think I'll do a straight
> substitution of the existing math128 with boost::rational<int64_t> just to
> see what happens.
> >
> > I think that part of implementing immediate rounding must include
> constraining denominators to powers-of-ten. The main reason is that it
> makes my head hurt when I try to think about how to do rounding with
> arbitrary denominators. If you consider that a big chunk of the overflow
> problems arise from denominators and divisors that are large primes, it
> becomes quickly apparent that avoiding large prime denominators might well
> resolve much of the problem. It's also true that for real-world numbers, as
> opposed to free random-generated numbers from tests, that all numbers have
> powers-of-ten denominators. We'd still have many-digit-prime divisors  to
> deal with, but constraining denominators gives us something to round to.
> Does that make sense, or does it seem the rambling of a lunatic? This
> really does make my head hurt.
>
> Boost::Rational is a serious disappointment. Boost::rational<int64_t>
> didn’t allow a significant increase in precision and is further hampered by
> not providing any overflow detection. Benchmarks of test-numeric with NREPS
> set to 20000 (the numbers are a bit different from before because I’m using
> my Mac Pro instead of my Mac Book Air, and because these are debug builds):
>
> Branch                  Tests           Time
> master:                 1187558          5346ms
> libmpdecimal:           1180076          8718ms
> boost-rational, cppint: 1187558         20903ms
> boost-rational, gmp:    1187558         34232ms
>
> cppint means boost::multiprecision::checked_cppint128_t, a 16-byte stack
> allocated multi-precision integer. “Checked” means that it throws
> std::overflow_error instead of wrapping.  Gmp means the Gnu Multiprecision
> library. It’s supposed to be faster than cppint, but its performance is
> killed by having to malloc everything. The fact that our own C code is
> substantially faster than any library I’ve tried is a tribute to Linas.
>
> There’s another wrinkle: Boost::Rational immediately reduces all numbers
> to what we called in my grade school “simplest form”, meaning no common
> factors between the numerator and denominator. This actually helps prevent
> overflows, but means that we have to be very careful to supply the SCU as
> the rounding denominator or we’ll get unexpected rounding results.
> Boost::Rational provides no rounding function of its own so I rewrote
> gnc_numeric_convert into C++ using the overloaded operators from
> boost::multiprecision. That at least taught me about rounding arbitrary
> denominators, so my head doesn’t explode any more.
>
> The good news is that using 128-bit numbers for all internal
> representations along with aggressive reduction and a tweak to
> get_random_gnc_numeric() so that the actual number doesn’t exceed 1E13/1
> and careful attention to rounding prevents overflow errors during testing,
> at least up through test-lots.
>
> Looking a bit more at rounding, it doesn’t appear to me that at 14 out of
> 151 gnc_numeric operations in the code base we’re over-using
> GNC_HOW_RND_NEVER. I’m not convinced that it would help much to eliminate
> those cases.
>
> It looks like the best solution is to work over our existing gnc-numeric
> with math128 implementation so that the internals are always 128-bit and we
> don’t declare overflows prematurely.
>
> But first it’s time to squash some bugs before next week’s release.
>
> Regards,
> John Ralls
>
>
> _______________________________________________
> gnucash-devel mailing list
> gnucash-devel at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
>