python seed() not keeping same sequence

Motion4D picture Motion4D · Apr 14, 2014 · Viewed 8.2k times · Source

I'm using a random.seed() to try and keep the random.sample() the same as I sample more values from a list and at some point the numbers change.....where I thought the one purpose of the seed() function was to keep the numbers the same.

Heres a test I did to prove it doesn't keep the same numbers.

import random

a=range(0,100)
random.seed(1)
a = random.sample(a,10)
print a

then change the sample much higher and the sequence will change(at least for me they always do):

a = random.sample(a,40)
print a

I'm sort of a newb so maybe this is an easy fix but I would appreciate any help on this. Thanks!

Answer

NPE picture NPE · Apr 14, 2014

If you were to draw independent samples from the generator, what would happen would be exactly what you're expecting:

In [1]: import random

In [2]: random.seed(1)

In [3]: [random.randint(0, 99) for _ in range(10)]
Out[3]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2]

In [4]: random.seed(1)

In [5]: [random.randint(0, 99) for _ in range(40)]
Out[5]: [13, 84, 76, 25, 49, 44, 65, 78, 9, 2, 83, 43 ...]

As you can see, the first ten numbers are indeed the same.

It is the fact that random.sample() is drawing samples without replacement that's getting in the way. To understand how these algorithms work, see Reservoir Sampling. In essence what happens is that later samples can push earlier samples out of the result set.

One alternative might be to shuffle a list of indices and then take either 10 or 40 first elements:

In [1]: import random

In [2]: a = range(0,100)

In [3]: random.shuffle(a)

In [4]: a[:10]
Out[4]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80]

In [5]: a[:40]
Out[5]: [48, 27, 28, 4, 67, 76, 98, 68, 35, 80, ...]