So I have data where many values (x) have been separated because of case issue and I would like to merge all these values ignoring case and simply adding the values in the other columns (y and z)
I have a dataframe like:
x y z
rain 2 40
Rain 4 50
RAIN 7 25
Wind 8 10
Snow 3 9
SNOW 11 25
I want a Dataframe like:
x y z
Rain 13 115
Wind 8 10
Snow 14 34
You could lower the caps on the first column and then aggregate.
Option 1: base R's aggregate()
with(df, aggregate(list(y = y, z = z), list(x = tolower(x)), sum))
# x y z
# 1 rain 13 115
# 2 snow 14 34
# 3 wind 8 10
Alternatively, the formula method could also be used.
aggregate(. ~ x, transform(df, x = tolower(x)), sum)
Option 2: data.table. This also keeps the order you show in the result.
library(data.table)
as.data.table(df)[, lapply(.SD, sum), by = .(x = tolower(x))]
# x y z
# 1: rain 13 115
# 2: wind 8 10
# 3: snow 14 34
To order the result, use keyby
instead of by
Option 3: base R's xtabs()
xtabs(cbind(y = y, z = z) ~ tolower(x), df)
#
# tolower(x) y z
# rain 13 115
# snow 14 34
# wind 8 10
although this results in a table (probably not what you want, but worth noting), and I have yet to determine how to change the name on the x
result.
Data:
df <- tructure(list(x = structure(c(1L, 2L, 3L, 6L, 4L, 5L), .Label = c("rain",
"Rain", "RAIN", "Snow", "SNOW", "Wind"), class = "factor"), y = c(2L,
4L, 7L, 8L, 3L, 11L), z = c(40L, 50L, 25L, 10L, 9L, 25L)), .Names = c("x",
"y", "z"), class = "data.frame", row.names = c(NA, -6L))