I would like a function that can generate a pseudo-random sequence of values, but for that sequence to be repeatable every run. The data I want has to be reasonably well randomly distributed over a given range, it doesn't have to be perfect.
I want to write some code which will have performance tests run on it, based on random data. I would like that data to be the same for every test run, on every machine, but I don't want to have to ship the random data with the tests for storage reasons (it might end up being many megabytes).
The library for the random
module doesn't appear to say that the same seed will always give the same sequence on any machine.
EDIT: If you're going to suggest I seed the data (as I said above), please provide the documentation that says the approach valid, and will work on a range of machines/implementations.
EDIT: CPython 2.7.1 and PyPy 1.7 on Mac OS X and CPython 2.7.1 and CPython 2.52=.2 Ubuntu appear to give the same results. Still, no docs that stipulate this in black and white.
Any ideas?
For this purpose, I've used a repeating MD5 hash, since the intention of a hashing function is a cross-platform one-to-one transformation, so it will always be the same on different platforms.
import md5
def repeatable_random(seed):
hash = seed
while True:
hash = md5.md5(hash).digest()
for c in hash:
yield ord(c)
def test():
for i, v in zip(range(100), repeatable_random("SEED_GOES_HERE")):
print v
Output:
184 207 76 134 103 171 90 41 12 142 167 107 84 89 149 131 142 43 241 211 224 157 47 59 34 233 41 219 73 37 251 194 15 253 75 145 96 80 39 179 249 202 159 83 209 225 250 7 69 218 6 118 30 4 223 205 91 10 122 203 150 202 99 38 192 105 76 100 117 19 25 131 17 60 251 77 246 242 80 163 13 138 36 213 200 135 216 173 92 32 9 122 53 250 80 128 6 139 49 94
Essentially, the code will take your seed (any valid string) and repeatedly hash it, thus generating integers from 0 to 255.