numpy 1.9.0: ValueError: probabilities do not sum to 1

Gabriel picture Gabriel · Sep 23, 2014 · Viewed 11k times · Source

I have a large code that at one point samples values from an array according to the probabilities taken from a probability density function (PDF).

To do this I use the numpy.random.choice which worked just fine until numpy 1.8.0. Here's a MWE (the file pdf_probs.txt can be downloaded here):

import simplejson
import numpy as np

# Read probabilities from file.
f = open('pdf_probs.txt', 'r')
probs = simplejson.load(f)
f.close()

print sum(probs)  # <-- Not *exactly* 1. but very close: 1.00000173042
# Define array.
arr = np.linspace(1., 100., len(probs))

# Get samples using the probabilities in probs.
samples = np.random.choice(arr, size=1000, replace=True, p=probs)

The thing is that after testing it with numpy 1.9.0 the above code fails with the error:

Traceback (most recent call last):
  File "numpy_180_vs_190_np_random_choice.py", line 13, in <module>
    samples = np.random.choice(arr, size=1000, replace=True, p=probs)
  File "mtrand.pyx", line 1083, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:10106)
ValueError: probabilities do not sum to 1

The sum of the PDF probabilities will not sum to exactly 1. given the small deviations that appear when using very small floats.

From what I can gather the previous version of numpy (1.8.0) apparently had a larger tolerance than the new 1.9.0 version, but I could be wrong.

Why does this work with numpy 1.8.0 but not with 1.9.0? How can I make my code work with the new 1.9.0 version?

Answer

DSM picture DSM · Sep 23, 2014

I think 1.7e-6 is a large enough relative error to be worth complaining about. You can renormalize easily enough, though, if you're confident the error is negligible:

>>> probs = np.array(probs)
>>> probs /= probs.sum()
>>> probs.sum()
1.0
>>> samples = np.random.choice(arr, size=1000, replace=True, p=probs)
>>> samples[:5]
array([  1.37635054,   1.1287515 ,   1.7229892 ,  19.8967587 ,   2.07953181])