Remove Unicode characters from textfiles - sed , other Bash/shell methods

alvas picture alvas · Dec 19, 2011 · Viewed 83.2k times · Source

How do I remove Unicode characters from a bunch of text files in the terminal?

I've tried this, but it didn't work:

sed 'g/\u'U+200E'//' -i *.txt

I need to remove these Unicode characters from the text files:

U+0091 - sort of weird "control" space
U+0092 - same sort of weird "control" space
A0 - non-space break
U+200E - left to right mark

Answer

kev picture kev · Dec 19, 2011

Clear all non-ASCII characters of file.txt:

$ iconv -c -f utf-8 -t ascii file.txt
$ strings file.txt