I'm working on a project involving cleaning a list of data on college majors. I find that a lot are misspelled, so I was looking to use the function gsub()
to replace the misspelled ones with its correct spelling. For example, say 'biolgy' is misspelled in a list of majors called Major. How can I get R to detect the misspelling and replace it with its correct spelling? I've tried gsub('biol', 'Biology', Major)
but that only replaces the first four letters in 'biolgy'. If I do gsub('biolgy', 'Biology', Major)
, it works for that case alone, but that doesn't detect other forms of misspellings of 'biology'.
Thank you!
You should either define some nifty regular expression, or use agrep
from base
package. stringr
package is another option, I know that people use it, but I'm a very huge fan of regular expressions, so it's a no-no for me.
Anyway, agrep
should do the trick:
agrep("biol", "biology")
[1] 1
agrep("biolgy", "biology")
[1] 1
EDIT:
You should also use ignore.case = TRUE
, but be prepared to do some bookkeeping "by hand"...