I have a large code that at one point samples values from an array according to the probabilities taken from a probability density function (PDF).
To do this I use the numpy.random.choice which worked just fine until numpy 1.8.0
. Here's a MWE (the file pdf_probs.txt
can be downloaded here):
import simplejson
import numpy as np
# Read probabilities from file.
f = open('pdf_probs.txt', 'r')
probs = simplejson.load(f)
f.close()
print sum(probs) # <-- Not *exactly* 1. but very close: 1.00000173042
# Define array.
arr = np.linspace(1., 100., len(probs))
# Get samples using the probabilities in probs.
samples = np.random.choice(arr, size=1000, replace=True, p=probs)
The thing is that after testing it with numpy 1.9.0
the above code fails with the error:
Traceback (most recent call last):
File "numpy_180_vs_190_np_random_choice.py", line 13, in <module>
samples = np.random.choice(arr, size=1000, replace=True, p=probs)
File "mtrand.pyx", line 1083, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:10106)
ValueError: probabilities do not sum to 1
The sum of the PDF probabilities will not sum to exactly 1. given the small deviations that appear when using very small floats.
From what I can gather the previous version of numpy
(1.8.0
) apparently had a larger tolerance than the new 1.9.0
version, but I could be wrong.
Why does this work with numpy 1.8.0
but not with 1.9.0
? How can I make my code work with the new 1.9.0
version?
I think 1.7e-6 is a large enough relative error to be worth complaining about. You can renormalize easily enough, though, if you're confident the error is negligible:
>>> probs = np.array(probs)
>>> probs /= probs.sum()
>>> probs.sum()
1.0
>>> samples = np.random.choice(arr, size=1000, replace=True, p=probs)
>>> samples[:5]
array([ 1.37635054, 1.1287515 , 1.7229892 , 19.8967587 , 2.07953181])