R: combine several gsub() function in a pipe

user2006697 picture user2006697 · Oct 12, 2016 · Viewed 9.8k times · Source

To clean some messy data I would like to start using pipes %>%, but I fail to get the R code working if gsub() is not at the beginning of the pipe, should occur late (Note: this question is not concerned with proper import, but with data cleaning).

Simple example:

df <- cbind.data.frame(A= c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C"))

Column A contains characters (in this case numbers, but this also could be string) and need to be cleaned. The steps are

df$D <- gsub("\\.","",df$A)
df$D <- str_trim(df$D) 
df$D <- as.numeric(gsub(",", ".",df$D))

One easily could pipe this

df$D  <-  gsub("\\.","",df$A) %>%
          str_trim() %>%
          as.numeric(gsub(",", ".")) %>%

The problem is the second gsub because it asks for the Input .... which actually the result of the previous line.

Please, could anyone explain how to use functions like gsub() further down the pipeline? Thanks a lot!

system: R 3.2.3, Windows

Answer

m-dz picture m-dz · Oct 12, 2016

Try this:

library(stringr)

df$D <- df$A %>%
  { gsub("\\.","", .) } %>%
  str_trim() %>%
  { as.numeric(gsub(",", ".", .)) }

With pipe your data are passed as a first argument to the next function, so if you want to use it somewhere else you need to wrap the next line in {} and use . as a data "marker".