To clean some messy data I would like to start using pipes %>%
, but I fail to get the R code working if gsub()
is not at the beginning of the pipe, should occur late (Note: this question is not concerned with proper import, but with data cleaning).
Simple example:
df <- cbind.data.frame(A= c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C"))
Column A contains characters (in this case numbers, but this also could be string) and need to be cleaned. The steps are
df$D <- gsub("\\.","",df$A)
df$D <- str_trim(df$D)
df$D <- as.numeric(gsub(",", ".",df$D))
One easily could pipe this
df$D <- gsub("\\.","",df$A) %>%
str_trim() %>%
as.numeric(gsub(",", ".")) %>%
The problem is the second gsub because it asks for the Input .... which actually the result of the previous line.
Please, could anyone explain how to use functions like gsub() further down the pipeline? Thanks a lot!
system: R 3.2.3, Windows
Try this:
library(stringr)
df$D <- df$A %>%
{ gsub("\\.","", .) } %>%
str_trim() %>%
{ as.numeric(gsub(",", ".", .)) }
With pipe your data are passed as a first argument to the next function, so if you want to use it somewhere else you need to wrap the next line in {}
and use .
as a data "marker".