Rethinking Numeric
John Ralls
jralls at ceridwen.us
Sat May 24 15:29:01 EDT 2014
We use rational numbers with int64 numerators and denominators in GnuCash, and every once in a while (mostly in testing) we encounter overflow errors. I've been resisting calls to raise the maximum denominator from the current 6 digits to 8 or 10 to accommodate Bitcoin and certain eastern European mutual funds that use very small decimal fractions when reporting balances and offering a multi-precision replacement as the way forward to eliminate the overflow errors and so to allow the increased fraction size. I was about to start implementing that with boost::multiprecision.
Then I thought about serialization. Not really a problem for XML, just output more digits. The 2.6 numeric handler would have to be adjusted to notice if it encountered a number bigger than an int64 can hold and decline to continue loading the file. So far, so good. SQL is another matter entirely, and for two reasons: One is that SQLite can't handle numbers bigger than 64bits, though MySql and Postgresql both offer a NUMERIC type that is fixed-decimal and can handle up to 65 digits in MySql and 131072 digits in Postgresql. We could of course serialize to a string in any database and so store as many digits in any way we like. But that brings up the other problem, that we want to be able to calculate in a query. With a string serialization, or indeed with our current two-field rational storage, we can't do computations in queries (in the latter case because the database's division function won't return a rational and we can't trust a float to be exact). We'd need to query for e.g. all of the splits in an account and then run a cursor over them to convert each one's value and amount into rationals just to get the account balance. That's not going to be a fast operation if there are a lot of splits.
Another facet of the problem is how to represent multiprecision numbers in structs. Since the size of a multiprecision number can only be determined at runtime, its value must be dynamically allocated so structures or classes must carry a pointer rather than the actual number. Even with a fast allocator like GSlice or boost::pool, the value isn't stored with the rest of the struct which screws up prefetch and caching.
The reality of modern money is that 7/3 of a unit, whether currency or securities, isn't meaningful. It always gets rounded and handled in decimal form.
So I propose to bottle up the multiprecision data in the implementation of GncNumeric. In order to maintain a deterministic representation of GncNumeric, I'd use
typedef struct {
uint64_t upper; /* high order bit indicates sign rather than using two's complement for negative numbers */
uint64_t lower;
} GncNumeric;
const unit64_t denominator = 1000000000000; /* 1E12*/
That translates to a 26.12 fixed-point number which allows two digits of extra precision to a new maximum actual denominator of 10 digits and should be a sufficiently large number to handle any reasonable quantity of any currency or commodity while limiting rounding errors to an insignificant amount.
Serialization is still a problem. MySql and Postgresql can use NUMERIC(38, 12) fields for direct serialization. They're able to compute on that sort of value, and as long as we limit queries to adding and subtracting there should be no problem. We'll have to introduce an element attribute that indicates the new storage method for XML, and the numeric handler in 2.6 will have to signal an error and quit if it encounters a value > 2^63 in a numerator. SQLite3 is a real problem: While in the real world it's unlikely that we'll encounter a number that will lose precision when converted to a float64_t, the possibility is real and we'll have to detect it and tell the user that SQLite3 isn't a suitable backend for that dataset.
Discussion invited...
Regards,
John Ralls
More information about the gnucash-devel
mailing list