I want to randomly sample from a Scala list or array (not an RDD), the sample size can be much longer than the length of the list or array, how can I do this efficiently? Because the sample size can be very big and the sampling (on different lists/arrays) needs to be done a large number of times.
I know for a Spark RDD we can use takeSample() to do it, is there an equivalent for Scala list/array?
Thank you very much.
An easy-to-understand version would look like this:
import scala.util.Random
Random.shuffle(list).take(n)
Random.shuffle(array.toList).take(n)
// Seeded version
val r = new Random(seed)
r.shuffle(...)