Is Collections.shuffle() really random enough? Practical examples seem to deny this statement

basZero picture basZero · Mar 14, 2012 · Viewed 19.2k times · Source

I have 1000 unique objects in a java.util.List, each referring to an image, each image in the 1000-list is unique and now I'd like to shuffle them, so that I can use the first 20 objects and present them to the website-user. The user can then click a button saying "Shuffle", and I retrieve the 1000 images again from scratch and calling again shuffle(). However, it seems that out of 1000 image objects, I very often see the same image again and again between the 20-image-selections.

Something seems to be wrong, any better suggestion, advices?

My code is very simple:

List<String> imagePaths = get1000Images();
Collections.shuffle(imagePaths);

int i = 0;
for (String path: imagePaths) {
  ... do something with the path ...
  i++;
  if (i >= 20) break;
}

I know that Collections.shuffle() is well distributed: see for instance http://blog.ryanrampersad.com/2012/03/03/more-on-shuffling-an-array-correctly/

However, I just have the feeling that the probability of seeing the same image over and over again in a set of 20 images out of 1000 should be much less...

Inputs highly appreciated.

Answer

Peter Lawrey picture Peter Lawrey · Mar 14, 2012

Its human nature to see patterns which are not there. Many people see patterns in the planets and stars as guiding their life.

In the first 1000 digits of PI there are six nines in a row. Does that mean the digits of PI are not random? no. The pattern doesn't occur again any more than your might expect.

Having said that, Random is not completely random and it will repeat after 2^48 calls. (it uses a 48-bit seed) This means its not possible to produce every possible long or double using it. If you want more randomness you can use SecureRandom with shuffle instead.

It sounds like what you want is something like this

List<String> imagePaths = new ArrayList<>();

// called repeatedly
if (imagePaths.size() <= 500) {
    imagePaths = get1000Images();
    Collections.shuffle(imagePaths);
}

for (String path: imagePaths.subList(0, 20)) {
  ... do something with the path ...
}

imagePaths = imagePaths.subList(20, imagePaths.size());

This will ensure that you don't see the same image in the last 500 calls.