libmpdecimal

Thu Aug 28 01:31:40 EDT 2014

On Aug 27, 2014, at 8:32 AM, Geert Janssens <janssens-geert at telenet.be> wrote:

> On Saturday 23 August 2014 18:01:15 John Ralls wrote:
> > So, having gotten test-lots and all of the other tests working* with
> > libmpdecimal, I studied the Intel library for several days and
> > couldn't figure out how to make it work, so I decided to try the GCC
> > implementation, which offers a 128-bit IEEE 754 format that's fixed
> > size. Since it doesn't ever call malloc, I thought it might prove
> > faster, and indeed it is. I haven't finished integrating it -- the
> > library doesn't provide formatted printing -- but it's far enough
> > along that it passes all of the engine and backend tests. Some
> > results:
> > 
> > test-numeric, with NREPS increased to 20000 to get a reasonable
> > execution time for profiling: master     9645ms
> >     mpDecimal 21410ms
> >     decNumber 12985ms
> > 
> > test-lots:
> >     master      16300ms
> >     mpDecimal   20203ms
> >     decNumber   19044ms
> > 
>  
> > The first shows the relative speed in more or less pure computation,
> > the latter shows the overall impact on one of the longer-running
> > tests that does a lot of other stuff.
> John,
>  
> Thanks for implementing this and running the tests. The topic was last touched before my holidays so it took me a while to refresh my memory...
>  
> decNumber clearly performs better, although both implementations lag on our current gnc_numeric performance.
>  
> > 
> > I haven't investigated Christian's other suggestion of aggressive
> > rounding to eliminate the overflow issue to make room for larger
> > denominators, nor my original idea of replacing gnc_numeric with
> > boost::rational atop a multi-precision class (either boost::mp or
> > gmp).
> Do you still have plans for either ?
>  
> I suppose aggressive rounding is orthogonal to the choice of data type. Christian's argument that we should round as is expected in the financial world makes sense to me but that argument does not imply any underlying data type.
>  
> How about the boost::rational option ?
>  
> > I have noticed that we're doing some dumb things with Scheme,
> > like using double as an intermediate when converting from Scheme
> > numbers to gnc_numeric (Scheme numbers are also rational, so the
> > conversion should be direct) and representing gnc_numerics as a tuple
> > (num, denom) instead of just using Scheme rationals.
> Does this mean you see potential performance gains in this as we clean up the C<->Scheme number conversions ?
>  
> > Neither will
> > work for decimal floats, of course; the whole class will have to be
> > wrapped so that computation takes place in C++.
> Which means some performance drop again...
>  
> > Storage in SQL is
> > also an issue,
> From the previous conversation I recall sqlite doesn't have a decimal type so we can't run calculating queries on it directly.
>  
> But how about the other two: mysql and postsgresql. Is the decimal type you're using in your tests directly compatible with the decimal data types in mysql and postgresql, or compatible enough to convert automatically between them ?
>  
> > as is maintaining backward file compatibility.
> > 
> > Another issue is equality: In order to get tests to pass I've had to
> > implement a fuzzy comparison where both numbers are first rounded to
> > the smaller number of decimal places -- 2 fewer if there are 12 or
> > more -- and compared with two roundings, first truncation and second
> > "bankers", and declared unequal only if they're unequal in both. I
> > hate this, but it seems to be necessary to obtain equality when
> > dealing with large divisors (as when computing prices or interest
> > rates). I suspect that we'd have to do something similar if we pursue
> > aggressive rounding to avoid overflows, but the only way to know for
> > certain is to try.
> Ugh. :(
>  
> So what's the current balance ?
>  
> I see following pros and cons of your tests so far:
>  
> Pro:
> - using a decimal type gives us more precision
>  
> Con:
> - sqlite doesn't have a decimal data type, so as it currently stands we can't run calculations in queries in that database type
> - we loose backward/forward compatibility with earlier versions of GnuCash
> - decNumber or mpDecimal are new dependencies
> - their performance is currently less than the original gnc_numeric
> - guile doesn't know of a decimal data type so we may need some conversion glue
> - equality is fuzzy
>  
> Please add if I forgot arguments on either side.
>  
> Arguably many of the con arguments can be solved. That will effort however. And I consider the first two more important than the others.
>  
> So do you think the benefits (I assume there will be more than the one I mentioned) will outweigh the drawbacks ? Does the work that will go into it bring GnuCash enough value to continue on this track ?
>  
> It's probably too early to tell for sure but I wanted to get your ideas based on what we have so far.

Testing boost::rational is next on the agenda. My original idea was to use it with boost::multiprecision or gmp, but I'd prefer something that doesn't depend on heap allocations because it's so much slower than stack allocation and must be passed by pointer, which is a major change in the API -- meaning a ton of cleanup work up front. I think I'll do a straight substitution of the existing math128 with boost::rational<int64_t> just to see what happens.

I think that part of implementing immediate rounding must include constraining denominators to powers-of-ten. The main reason is that it makes my head hurt when I try to think about how to do rounding with arbitrary denominators. If you consider that a big chunk of the overflow problems arise from denominators and divisors that are large primes, it becomes quickly apparent that avoiding large prime denominators might well resolve much of the problem. It's also true that for real-world numbers, as opposed to free random-generated numbers from tests, that all numbers have powers-of-ten denominators. We'd still have many-digit-prime divisors  to deal with, but constraining denominators gives us something to round to. Does that make sense, or does it seem the rambling of a lunatic? This really does make my head hurt.

I'd modify your "pro" summary to "using a decimal type gives us more significant digits without overflow while maintaining a 128-bit stack-allocatable object size". There are a lot of ways to get more precision, and the necessity of using fuzzy equality rather suggests that we're not really getting more precision with decimal floating point, we're just getting more significant digits. That sounds a bit weird, so I'll offer an example: With a rational number, one can represent 1/3 exactly, but if one constrains the denominator to powers-of-ten, either as a computational rule with rationals or by using a decimal representation, then one can only approximate it with a numerator of as many 3s as will fit in the numerator's type and the equivalent power of ten in the denominator. Decimal floats are a bit more efficient because instead of representing the denominator directly as an int they can represent it as an exponent; that allows more bits to be assigned to the numerator, increasing precision *for any particular object size*. But it doesn't bother me much to have to use 256 bits instead of 128 to store enough bits to deal with Bitcoin and mutual funds vainly trying to get a more accurate representation of 1/3 as long as I don't have to write and maintain the math256.cpp required to pull that off.

Equality is going to be fuzzy in any rounding environment. To some extent we're fooling ourselves now: Our tests are carefully written to avoid rounding or to check for predictable rounding in controlled circumstances. I'm not at all convinced that that reflects real life. That said, our current rational-number system gives us a great deal more control over rounding than the decimal-float libraries do.

The SQL representation is another problem. None of our supported DBs support decimal floats or rationals, but rationals can be worked around. There's at least one more round of experiments, maybe two, before it's time to address storage.

That's kind of a rambling answer, and I doubt that it really adds much besides more stuff to think about. I think that getting the numeric representation right is really important, so I'm willing to keep at it for a bit longer, and I don't yet know enough to say which direction is best, or maybe least-bad. I'll start on the boost::rational implementation next week and see where that goes. 

Regards,
John Ralls