Glen Pitt-Pladdy :: BlogPerl: dealing with extreme numbers using Math::BigFloat | |||
I've been working on a statistical analysis project which involves processing of large numbers of data points (currently 15 million and counting) and combining them into an overall ratio-like representation. While optimising code I discovered that I sometimes get very different results depending on the order that the data is processed. That should not happen - it's rather like finding $a * $b != $b * $a Further investigation leads to the discovery that this is due to floating point numbers getting so small they zero out and thus the earlier in the data a zero is reached and subsequent data is ignored. Preserving integrityFirst up I decided I had to be able to get the most out of the standard Perl floating point. In my case I am multiplying a load of numbers between 0 and 1 resulting in the overall numbers gradually reducing until it rounds to zero. As this is a ratio, integrity can be preserved far longer simply by normalising the ratio each time:
my @ratio = ( 1, 1 ); That can give a considerable amount of additional scope for the use of the basic Perl float. Trouble is eventually that isn't enough... Enter Math::BigFloatMath::BigFloat provides arbitrary length floating point arithmetic, but at a cost: performance In my tests I found it had a performance hit in the order of 250x which when processing large data sets is a non-starter. That doesn't mean that it can't be used, just that it needs care. Hanging on to the bitter endMy approach involves using basic Perl floating point for a temporary ratio until the point that there is a risk of rounding to zero, at which point we use Math::BigFloat to transfer the temporary ratio into the master ratio:
my @ratio = ( Math::BigFloat->bone(), Math::BigFloat->bone() ); Too longThere is still one catch I ran into with Math::BigFloat - with the number of calculations, the number of significant (non zero) digits can grow really large. When it comes to divide out the ratio everything comes to a halt. The problem is that the number of significant digits can get extremely big, and to divide these extremely long numbers takes a long time. If you are dealing with these extremes then you may want to round the numbers to something sensible before processing further: @ratio = map { $_->fround ( 20 ) } @ratio; |
|||
This is a bunch of random thoughts, ideas and other nonsense, and is not intended to be taken seriously. I'm experimenting and mostly have no idea what I am doing with most of this so it should be taken with cuation and at your own risk. Intrustive technologies are minimised where possible. For the purposes of reducing abuse and other risks hCaptcha is used and has it's own policies linked from the widget.
Copyright Glen Pitt-Pladdy 2008-2023
|