Randomly sample a percentage of rows within a data frame

ATMathew picture ATMathew · Feb 22, 2013 · Viewed 25.3k times · Source

Related to this question.

gender <- c("F", "M", "M", "F", "F", "M", "F", "F")
age    <- c(23, 25, 27, 29, 31, 33, 35, 37)
mydf <- data.frame(gender, age) 

mydf[ sample( which(mydf$gender=='F'), 3 ), ]

Instead of selecting a number of rows (3 in above case), how can I randomly select 20% of rows with "F"? So of the five rows with "F", how do I randomly sample 20% of those rows.

Answer

Zhen Liang picture Zhen Liang · Apr 7, 2017

You can use sample_frac() function in dplyr package.

e.g. If you want to sample 20 % within each group:

mydf %>% sample_frac(.2)

If you want to sample 20 % within each gender group:

mydf %>% group_by(gender) %>% sample_frac(.2)