I'm programming in R. I've got a vector containing, let's say, 1000 values. Now let's say I want to partition these 1000 values randomly into two new sets, one containing 400 values and the other containing 600. How could I do this? I've thought about doing something like this...
firstset <- sample(mydata, size=400)
...but this doesn't partition the data (in other words, I still don't know which 600 values to put in the other set). I also thought about looping from 1 to 400, randomly removing 1 value at a time and placing it in firstset
. This would partition the data correctly, but how to implement this is not clear to me. Plus I've been told to avoid for
loops in R whenever possible.
Any ideas?
Instead of sampling the values, you could sample their positions.
positions <- sample(length(mydata), size=400) # ucfagls' suggestion
firstset <- mydata[positions]
secondset <- mydata[-positions]
EDIT: ucfagls' suggestion will be more efficient (especially for larger vectors), since it avoids allocating a vector of positions in R.