How to randomly sample from a Scala list or array?

Carter picture Carter · Oct 4, 2015 · Viewed 15.8k times · Source

I want to randomly sample from a Scala list or array (not an RDD), the sample size can be much longer than the length of the list or array, how can I do this efficiently? Because the sample size can be very big and the sampling (on different lists/arrays) needs to be done a large number of times.

I know for a Spark RDD we can use takeSample() to do it, is there an equivalent for Scala list/array?

Thank you very much.

Answer

Marius Soutier picture Marius Soutier · Oct 4, 2015

An easy-to-understand version would look like this:

import scala.util.Random

Random.shuffle(list).take(n)
Random.shuffle(array.toList).take(n)

// Seeded version
val r = new Random(seed)
r.shuffle(...)