How can I remove non-numeric characters from strings using gsub in R?

jair.jr picture jair.jr · Oct 9, 2018 · Viewed 20.4k times · Source

I use the gsub function in R to remove unwanted characters in numbers. So I should remove from the strings every character that is not a number, ., and -. My problem is that the regular expression is not removing some non-numeric characters like d, +, and <.

Below are my regular expression, the gsub execution, and its output. How can I change the regular expression in order to achieve the desired output?

Current output:

gsub(pattern = '[^(-?(\\d*\\.)?\\d+)]', replacement = '', x = c('1.2<', '>4.5', '3+.2', '-1d0', '2aadddab2','1.3h'))
[1] "1.2<"  ">4.5"  "3+.2"  "-1d0"  "2ddd2" "1.3"

Desired output:

[1] "1.2"  "4.5"  "3.2"  "-10"  "22" "1.3"

Thank you.

Answer

Andre Elrico picture Andre Elrico · Oct 10, 2018

Simply use

gsub("[^0-9.-]", "", x)

You can in case of multiple - and . have a second regEx dealing with that. If you struggle with it, open a new question.


(Make sure to change . with , if needed)