Related to this question.
gender <- c("F", "M", "M", "F", "F", "M", "F", "F")
age <- c(23, 25, 27, 29, 31, 33, 35, 37)
mydf <- data.frame(gender, age)
mydf[ sample( which(mydf$gender=='F'), 3 ), ]
Instead of selecting a number of rows (3 in above case), how can I randomly select 20% of rows with "F"? So of the five rows with "F", how do I randomly sample 20% of those rows.
You can use sample_frac()
function in dplyr
package.
e.g. If you want to sample 20 % within each group:
mydf %>% sample_frac(.2)
If you want to sample 20 % within each gender group:
mydf %>% group_by(gender) %>% sample_frac(.2)