I have dataframe that contains 70-80 rows of ordered response time (rt) data for each of 228 people each with a unique id# (everyone doesn't have the same amount of rows). I want to bin each person's RTs into 5 bins. I want the 1st bin to be their fastest 20 percent of RTs, 2nd bin to be their next fastest 20 percent RTs, etc., etc. Each bin should have the same amount of trials in it (unless the total # of trial is odd).
My current dataframe looks like this:
id RT
7000 225
7000 250
7000 253
7001 189
7001 201
7001 225
I'd like my new dataframe to look like this:
id RT Bin
7000 225 1
7000 250 1
After getting my data to look like this, I will aggregate by id and bin
The only way I can think of to do this is to split the data into a list (using the split command), loop through each person, use the quantile command to get break points for the different bins, assign a bin value (1-5) to every response time. This feels very convoluted (and would be difficult for me). I'm in a bit of a jam and I would greatly appreciate any help in how to streamline this process. Thanks.
The answer @Chase gave split the range into 5 groups of equal length (difference of endpoints). What you seem to want is pentiles (5 groups with equal number in each group). For that, you need the cut2
function in Hmisc
library("plyr")
library("Hmisc")
dat <- data.frame(id = rep(1:10, each = 10), value = rnorm(100))
tmp <- ddply(dat, "id", transform, hists = as.numeric(cut2(value, g = 5)))
tmp now has what you want
> tmp
id value hists
1 1 0.19016791 3
2 1 0.27795226 4
3 1 0.74350982 5
4 1 0.43459571 4
5 1 -2.72263322 1
....
95 10 -0.10111905 3
96 10 -0.28251991 2
97 10 -0.19308950 2
98 10 0.32827137 4
99 10 -0.01993215 4
100 10 -1.04100991 1
With the same number in each hists
for each id
> table(tmp$id, tmp$hists)
1 2 3 4 5
1 2 2 2 2 2
2 2 2 2 2 2
3 2 2 2 2 2
4 2 2 2 2 2
5 2 2 2 2 2
6 2 2 2 2 2
7 2 2 2 2 2
8 2 2 2 2 2
9 2 2 2 2 2
10 2 2 2 2 2