Removing Whitespace From a Whole Data Frame in R

Thirst for Knowledge picture Thirst for Knowledge · Dec 24, 2013 · Viewed 72.3k times · Source

I've been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains white space in every data entry.

Is there a quick way to remove the white space from the whole data frame? I've been trying to do this on a subset of the first 10 rows of data using:

gsub( " ", "", mydata) 

This didn't seem to work, although R returned an output which I have been unable to interpret.

str_replace( " ", "", mydata)

R returned 47 warnings and did not remove the white space.

erase_all(mydata, " ")

R returned an error saying 'Error: could not find function "erase_all"'

I would really appreciate some help with this as I've spent the last 24hrs trying to tackle this problem.

Thanks!

Answer

Adam picture Adam · Mar 9, 2019

A lot of the answers are older, so here in 2019 is a simple dplyr solution that will operate only on the character columns to remove trailing and leading whitespace.

library(dplyr)
library(stringr)

data %>%
  mutate_if(is.character, str_trim)

## ===== 2020 edit for dplyr (>= 1.0.0) =====
df %>% 
  mutate(across(where(is.character), str_trim))

You can switch out the str_trim() function for other ones if you want a different flavor of whitespace removal.

# for example, remove all spaces
df %>% 
  mutate(across(where(is.character), str_remove_all, pattern = fixed(" ")))