Data structure for loaded dice?

templatetypedef picture templatetypedef · Feb 17, 2011 · Viewed 11.8k times · Source

Suppose that I have an n-sided loaded die where each side k has some probability pk of coming up when I roll it. I'm curious if there is good algorithm for storing this information statically (i.e. for a fixed set of probabilities) so that I can efficiently simulate a random roll of the die.

Currently, I have an O(lg n) solution for this problem. The idea is to store a table of the cumulative probability of the first k sides for all k, them to generate a random real number in the range [0, 1) and perform a binary search over the table to get the largest index whose cumulative value is no greater than the chosen value. I rather like this solution, but it seems odd that the runtime doesn't take the probabilities into account. In particular, in the extremal cases of one side always coming up or the values being uniformly distributed, it's possible to generate the result of the roll in O(1) using a naive approach, though my solution will still take logarithmicallh many steps.

Does anyone have any suggestions for how to solve this problem in a way that is somehow "adaptive" in it's runtime?

EDIT: Based on the answers to this question, I have written up an article describing many approaches to this problem, along with their analyses. It looks like Vose's implementation of the alias method gives Θ(n) preprocessing time and O(1) time per die roll, which is truly impressive. Hopefully this is a useful addition to the information contained in the answers!

Answer

mhum picture mhum · Feb 17, 2011

You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.

The idea is to take your array of probabilities pk and produce three new n-element arrays, qk, ak, and bk. Each qk is a probability between 0 and 1, and each ak and bk is an integer between 1 and n.

We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If qi < s then return ai else return bi. The work in the alias method is in figuring out how to produce qk, ak and bk.