 Index

Contact
GitHub
Atom Feed

# Similar Articles

2011-12-22 09:30
Peak Network Bandwidth for Cacti
2011-08-24 19:10
TEMPer under Linux (perl) with Cacti
2011-06-15 09:34
Universal Log Analyser and snmpd extension scripts
2009-11-22 15:20
IMDB ratings for MythTV
2009-06-09 07:43
Look Sharp

# Recent Articles

2019-07-28 16:35
git http with Nginx via Flask wsgi application (git4nginx)
2018-05-15 16:48
Raspberry Pi Camera, IR Lights and more
2017-04-23 14:21
Raspberry Pi SD Card Test
2017-04-07 10:54
DNS Firewall (blackhole malicious, like Pi-hole) with bind9
2017-03-28 13:07
Kubernetes to learn Part 4

## Perl: dealing with extreme numbers using Math::BigFloat

I've been working on a statistical analysis project which involves processing of large numbers of data points (currently 15 million and counting) and combining them into an overall ratio-like representation. While optimising code I discovered that I sometimes get very different results depending on the order that the data is processed. That should not happen - it's rather like finding \$a * \$b != \$b * \$a

Further investigation leads to the discovery that this is due to floating point numbers getting so small they zero out and thus the earlier in the data a zero is reached and subsequent data is ignored.

#### Preserving integrity

First up I decided I had to be able to get the most out of the standard Perl floating point. In my case I am multiplying a load of numbers between 0 and 1 resulting in the overall numbers gradually reducing until it rounds to zero. As this is a ratio, integrity can be preserved far longer simply by normalising the ratio each time:

my @ratio = ( 1, 1 );
foreach my \$data (@dataset) {
..... some processing on \$data to classify it ....
\$ratio[\$classification] *= \$smallnumber;
# normalise the ratio to avoid premature zero
my \$avg = 0;
foreach (@ratio) { \$avg += \$_; }
\$avg /= @ratio;
@ratio = map {\$_ / \$ratio} @ratio;
}

That can give a considerable amount of additional scope for the use of the basic Perl float. Trouble is eventually that isn't enough...

#### Enter Math::BigFloat

Math::BigFloat provides arbitrary length floating point arithmetic, but at a cost: performance

In my tests I found it had a performance hit in the order of 250x which when processing large data sets is a non-starter. That doesn't mean that it can't be used, just that it needs care.

#### Hanging on to the bitter end

My approach involves using basic Perl floating point for a temporary ratio until the point that there is a risk of rounding to zero, at which point we use Math::BigFloat to transfer the temporary ratio into the master ratio:

my @ratio = ( Math::BigFloat->bone(), Math::BigFloat->bone() );
my @tmpratio = ( 1, 1 );
foreach my \$data (@dataset) {
..... some processing on \$data to classify it ....
\$tmpratio[\$classification] *= \$smallnumber;
# check for risk of zeroing
if ( \$tmpratio[\$classification] < 1e-200 ) {
# we need to take action
for ( my \$i = 0; \$i <= \$#tmpratio; ++\$i ) {
\$ratio[\$i] *= \$tmpratio[\$i];
}
@tmpratio = ( 1, 1 );
}
}
# last transfer
for ( my \$i = 0; \$i <= \$#tmpratio; ++\$i ) {
\$ratio[\$i] *= \$tmpratio[\$i];
}

#### Too long

There is still one catch I ran into with Math::BigFloat - with the number of calculations, the number of significant (non zero) digits can grow really large. When it comes to divide out the ratio everything comes to a halt. The problem is that the number of significant digits can get extremely big, and to divide these extremely long numbers takes a long time.

If you are dealing with these extremes then you may want to round the numbers to something sensible before processing further:

@ratio = map { \$_->fround ( 20 ) } @ratio;