Generate random numbers distributed by Zipf

dmeister picture dmeister · Sep 2, 2009 · Viewed 10.4k times · Source

The Zipf probability distribution is often used to model file size distribution or item access distributions on items in P2P systems. e.g. "Web Caching and Zip like Distribution Evidence and Implications", but neither Boost or the GSL (Gnu Scientific Library) provide an implementation to generate random numbers using this distribution. I have not found a (trustworthy) implementation using the common search engines.

How can random numbers that are distributed according to the Zipf distribution by using a U(0,1) random generator, e.g. the Mersenne twister?

Answer

stanga picture stanga · Jan 9, 2012

Here's a Python Zipf-like distribution generator for n items with parameter alpha >= 0:

import random 
import bisect 
import math 

class ZipfGenerator: 

    def __init__(self, n, alpha): 
        # Calculate Zeta values from 1 to n: 
        tmp = [1. / (math.pow(float(i), alpha)) for i in range(1, n+1)] 
        zeta = reduce(lambda sums, x: sums + [sums[-1] + x], tmp, [0]) 

        # Store the translation map: 
        self.distMap = [x / zeta[-1] for x in zeta] 

    def next(self): 
        # Take a uniform 0-1 pseudo-random value: 
        u = random.random()  

        # Translate the Zipf variable: 
        return bisect.bisect(self.distMap, u) - 1