If I wanted to sum over some variables in a data-frame using dplyr
, I could do:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> select(iris, starts_with('Petal')) %>% rowSums()
[1] 1.6 1.6 1.5 1.7 1.6 2.1 1.7 1.7 1.6 1.6 1.7 1.8 1.5 1.2 1.4 1.9 1.7 1.7 2.0 1.8 1.9 1.9 1.2 2.2 2.1 1.8 2.0 1.7 1.6 1.8 1.8 1.9 1.6 1.6 1.7 1.4
[37] 1.5 1.5 1.5 1.7 1.6 1.6 1.5 2.2 2.3 1.7 1.8 1.6 1.7 1.6 6.1 6.0 6.4 5.3 6.1 5.8 6.3 4.3 5.9 5.3 4.5 5.7 5.0 6.1 4.9 5.8 6.0 5.1 6.0 5.0 6.6 5.3
[73] 6.4 5.9 5.6 5.8 6.2 6.7 6.0 4.5 4.9 4.7 5.1 6.7 6.0 6.1 6.2 5.7 5.4 5.3 5.6 6.0 5.2 4.3 5.5 5.4 5.5 5.6 4.1 5.4 8.5 7.0 8.0 7.4 8.0 8.7 6.2 8.1
[109] 7.6 8.6 7.1 7.2 7.6 7.0 7.5 7.6 7.3 8.9 9.2 6.5 8.0 6.9 8.7 6.7 7.8 7.8 6.6 6.7 7.7 7.4 8.0 8.4 7.8 6.6 7.0 8.4 8.0 7.3 6.6 7.5 8.0 7.4 7.0 8.2
[145] 8.2 7.5 6.9 7.2 7.7 6.9
That's fine, but I would have thought rowwise
accomplishes the same thing, but it doesn't,
> select(iris, starts_with('Petal')) %>% rowwise() %>% sum()
[1] 743.6
What I particularly want to do is select a set of columns, and create a new variable each value of which is the maximum value of each row of the selected columns. For example, if I selected the "Petal" columns, by maximum values would be 1.4, 1.4, 1.3 and so on.
I could do it like this:
> select(iris, starts_with('Petal')) %>% apply(1, max)
and that's fine. But I'm just curious as to why the rowwise
approach doesn't work. I realize I am using rowwise
incorrectly, I'm just not sure why it is wrong.
The problem is that the entire data frame is passed as dot despite the rowwise
. To handle this use do
which will interpret dot as meaning just the current row. One further problem is that the dot within do
will represent the row as a list so convert it appropriately.
library(dplyr)
iris %>%
slice(1:6) %>%
select(starts_with('Petal')) %>%
rowwise() %>%
do( (.) %>% as.data.frame %>% mutate(sum = sum(.)) ) %>%
ungroup
giving:
# A tibble: 6 x 3
Petal.Length Petal.Width sum
* <dbl> <dbl> <dbl>
1 1.40 0.200 1.60
2 1.40 0.200 1.60
3 1.30 0.200 1.50
4 1.50 0.200 1.70
5 1.40 0.200 1.60
6 1.70 0.400 2.10
Since this was asked dplyr 1.0 was released and it has cur_data()
which can be used to simplify the above eliminating the need for do
. cur_data()
within a rowwise
block refers only to the current row.
iris %>%
slice(1:6) %>%
select(starts_with('Petal')) %>%
rowwise() %>%
mutate(sum = sum(cur_data())) %>%
ungroup