calculate median from data.table columns in R

Dan picture Dan · Jun 1, 2016 · Viewed 10.9k times · Source

I am trying to calculate a median value across a number of columns, however my data is a bit funky. It looks like the following example.

library(data.table)

dt <- data.table("ID" = c(1,2,3,4),"none" = c(0,5,5,3), 
                 "ten" = c(3,2,5,4),"twenty" = c(0,2,3,1))


   ID none ten twenty
1:  1    0   3      0
2:  2    5   2      2
3:  3    5   5      3
4:  4    3   4      1

In the table to column represents the number of occurrences of that value. I am wanting to calculate the median occurrence.

For example for ID = 1

median(c(10, 10, 10))

is the calculation I am wanting to create.

for ID = 2

median(c(0, 0, 0, 0, 0, 10, 10, 20, 20))

I have tried using rep() and lapply() with very limited success and am after some clear guidance on how this might be achieved. I understand for the likes of rep() I would be having to hard code my value to be repeated (e.g. rep(0,2) or rep(10,2)) and this is what I expect. I am just struggling to create a list or vector with the repetitions from each column.

Answer

Arun picture Arun · Jun 2, 2016

Here's another data.table way (assuming unique ID):

dt[, median(rep(c(0, 10, 20), c(none, ten, twenty))), by=ID]
#    ID V1
# 1:  1 10
# 2:  2  0
# 3:  3 10
# 4:  4 10

This is just an attempt to get @eddi's answer without reshaping (which I tend to use as a last resort).