We had a very odd bug in a simulation we were writing recently. We were supposed to be sampling from a large pool of possible data, but were getting a very weird distribution of values. After much debugging we found a most unusual cause.
Here is the pop quiz – read through the short script below and take your best guess at what the output will be. Being correct within 5% is good enough.
#!/usr/bin/perl use warnings; use strict;
my %seen;
for (1..10000000) {
my $rand = int(rand(1000000));
++$seen{$rand};
}
print "I saw ".(scalar keys %seen)." different values\n";
I should point out that the random number generation here is done according to the perl documentation, which simply says:
“Apply “int()” to the value returned by “rand()” if you want random integers instead of random fractional numbers. For example,
int(rand(10))
returns a random integer between 0 and 9, inclusive.”
OK, have you guessed – well the answer we got was:
I saw 32768 different values
Yes, that’s right, after selecting 10 million values from a range of 0-999,999 we only actually saw just over 32 thousand different values. This was the reason our distribution looked odd – we were only seeing around 2% of the values we could have seen.
It turns out that the cause of this oddity is platform specific. Perl doesn’t itself include code to generate random numbers – it simply makes a call to the random number library supplied by the underlying operating system. In our case this code was being run on 64-bit Activeperl under Windows 7, and the standard windows random number library is only capable of generating 32768 different values (15 bits of randomness).
If we take this exact code and run it under Linux we get:
I saw 999950 different values
..and our simulation returns sensible numbers.
This appears to be pretty poor – I’m sure the perl people will just blame the microsoft library, but this could be worked around in the perl implementation, or at the very least a note should be added to the rand() documentation to specifically warn that there is a precision limit to rand, and that this might be very low on some platforms.
Fortunately however there are some proper work rounds for this problem. If you need to reliably generate random numbers from a large range in Perl then there are a few modules which provide more fully featured random number generators than the default rand() function. Two of the most popular are Math::Random and Math::Random::MT, either of which will work reliably and consistently on all platforms.