Menu
Index

Contact
LinkedIn
GitHub
Atom Feed
Comments Atom Feed



Tweet

Similar Articles

22/12/2011 09:30
Peak Network Bandwidth for Cacti
24/08/2011 19:10
TEMPer under Linux (perl) with Cacti
15/06/2011 09:34
Universal Log Analyser and snmpd extension scripts
22/11/2009 15:20
IMDB ratings for MythTV
09/06/2009 07:43
Look Sharp
25/11/2008 08:27
Ping, ping, ping....

Recent Articles

23/04/2017 14:21
Raspberry Pi SD Card Test
07/04/2017 10:54
DNS Firewall (blackhole malicious, like Pi-hole) with bind9
28/03/2017 13:07
Kubernetes to learn Part 4
23/03/2017 16:09
Kubernetes to learn Part 3
21/03/2017 15:18
Kubernetes to learn Part 2

Glen Pitt-Pladdy :: Blog

Perl: dealing with extreme numbers using Math::BigFloat

I've been working on a statistical analysis project which involves processing of large numbers of data points (currently 15 million and counting) and combining them into an overall ratio-like representation. While optimising code I discovered that I sometimes get very different results depending on the order that the data is processed. That should not happen - it's rather like finding $a * $b != $b * $a

Further investigation leads to the discovery that this is due to floating point numbers getting so small they zero out and thus the earlier in the data a zero is reached and subsequent data is ignored.

Preserving integrity

First up I decided I had to be able to get the most out of the standard Perl floating point. In my case I am multiplying a load of numbers between 0 and 1 resulting in the overall numbers gradually reducing until it rounds to zero. As this is a ratio, integrity can be preserved far longer simply by normalising the ratio each time:

my @ratio = ( 1, 1 );
foreach my $data (@dataset) {
    ..... some processing on $data to classify it ....
    $ratio[$classification] *= $smallnumber;
    # normalise the ratio to avoid premature zero
    my $avg = 0;
    foreach (@ratio) { $avg += $_; }
    $avg /= @ratio;
    @ratio = map {$_ / $ratio} @ratio;
}

That can give a considerable amount of additional scope for the use of the basic Perl float. Trouble is eventually that isn't enough...

Enter Math::BigFloat

Math::BigFloat provides arbitrary length floating point arithmetic, but at a cost: performance

In my tests I found it had a performance hit in the order of 250x which when processing large data sets is a non-starter. That doesn't mean that it can't be used, just that it needs care.

Hanging on to the bitter end

My approach involves using basic Perl floating point for a temporary ratio until the point that there is a risk of rounding to zero, at which point we use Math::BigFloat to transfer the temporary ratio into the master ratio:

my @ratio = ( Math::BigFloat->bone(), Math::BigFloat->bone() );
my @tmpratio = ( 1, 1 );
foreach my $data (@dataset) {
    ..... some processing on $data to classify it ....
    $tmpratio[$classification] *= $smallnumber;
    # check for risk of zeroing
    if ( $tmpratio[$classification] < 1e-200 ) {
        # we need to take action
        for ( my $i = 0; $i <= $#tmpratio; ++$i ) {
            $ratio[$i] *= $tmpratio[$i];
        }
        @tmpratio = ( 1, 1 );
    }
}
# last transfer
for ( my $i = 0; $i <= $#tmpratio; ++$i ) {
    $ratio[$i] *= $tmpratio[$i];
}

Too long

There is still one catch I ran into with Math::BigFloat - with the number of calculations, the number of significant (non zero) digits can grow really large. When it comes to divide out the ratio everything comes to a halt. The problem is that the number of significant digits can get extremely big, and to divide these extremely long numbers takes a long time.

If you are dealing with these extremes then you may want to round the numbers to something sensible before processing further:

@ratio = map { $_->fround ( 20 ) } @ratio;

Comments:




Are you human? (reduces spam)
Note: Identity details will be stored in a cookie. Posts may not appear immediately