I have a dataframe that looks like this:
Store Temperature Unemployment Sum_Sales
1 1 42.31 8.106 1643691
2 1 38.51 8.106 1641957
3 1 39.93 8.106 1611968
4 1 46.63 8.106 1409728
5 1 46.50 8.106 1554807
6 1 57.79 8.106 1439542
What I can't figure out in R is how to group by and apply. So for each store (grouped), I want to normalize/scale two columns (sum_sales and temperature).
Desired output that I want is the following:
Store Temperature Unemployment Sum_Sales
1 1 1.000 8.106 1.00000
2 1 0.000 8.106 0.94533
3 1 0.374 8.106 0.00000
4 2 0.012 8.106 0.00000
5 2 0.000 8.106 1.00000
6 2 1.000 8.106 0.20550
Here is the normalizing function that I created:
normalit<-function(m){
(m - min(m))/(max(m)-min(m))
}
I'm using the dply package and can't seem to figure out how to group by and apply that function to a column. I tried something like this and get an error:
df2 <- df %.%
group_by('Store') %.%
summarise(Temperature = normalit(Temperature), Sum_Sales = normalit(Sum_Sales)))
Any suggestions/help would be greatly appreciated. Thanks.
The issue is that you are using the wrong dplyr verb. Summarize will create one result per group per variable. What you want is mutate. Mutate changes variables and returns a result of the same length as the original. See http://cran.rstudio.com/web/packages/dplyr/vignettes/dplyr.html. Below two approaches using dplyr:
df %>%
group_by(Store) %>%
mutate(Temperature = normalit(Temperature), Sum_Sales = normalit(Sum_Sales))
df %>%
group_by(Store) %>%
mutate_each(funs(normalit), Temperature, Sum_Sales)
Note: The Store variable is different between your data and desired result. I assumed that @jlhoward got the right data.