The Zipf probability distribution is often used to model file size distribution or item access distributions on items in P2P systems. e.g. "Web Caching and Zip like Distribution Evidence and Implications", but neither Boost or the GSL (Gnu Scientific Library) provide an implementation to generate random numbers using this distribution. I have not found a (trustworthy) implementation using the common search engines.
How can random numbers that are distributed according to the Zipf distribution by using a U(0,1) random generator, e.g. the Mersenne twister?
Here's a Python Zipf-like distribution generator for n
items with parameter alpha >= 0
:
import random
import bisect
import math
class ZipfGenerator:
def __init__(self, n, alpha):
# Calculate Zeta values from 1 to n:
tmp = [1. / (math.pow(float(i), alpha)) for i in range(1, n+1)]
zeta = reduce(lambda sums, x: sums + [sums[-1] + x], tmp, [0])
# Store the translation map:
self.distMap = [x / zeta[-1] for x in zeta]
def next(self):
# Take a uniform 0-1 pseudo-random value:
u = random.random()
# Translate the Zipf variable:
return bisect.bisect(self.distMap, u) - 1