Reasons for using the set.seed function

Vignesh picture Vignesh · Nov 28, 2012 · Viewed 294.6k times · Source

Many times I have seen the set.seed function in R, before starting the program. I know it's basically used for the random number generation. Is there any specific need to set this?

Answer

Dirk Eddelbuettel picture Dirk Eddelbuettel · Nov 28, 2012

The need is the possible desire for reproducible results, which may for example come from trying to debug your program, or of course from trying to redo what it does:

These two results we will "never" reproduce as I just asked for something "random":

R> sample(LETTERS, 5)
[1] "K" "N" "R" "Z" "G"
R> sample(LETTERS, 5)
[1] "L" "P" "J" "E" "D"

These two, however, are identical because I set the seed:

R> set.seed(42); sample(LETTERS, 5)
[1] "X" "Z" "G" "T" "O"
R> set.seed(42); sample(LETTERS, 5)
[1] "X" "Z" "G" "T" "O"
R> 

There is vast literature on all that; Wikipedia is a good start. In essence, these RNGs are called Pseudo Random Number Generators because they are in fact fully algorithmic: given the same seed, you get the same sequence. And that is a feature and not a bug.