Removing all special characters from a string in Bash

Marta Koprivnik picture Marta Koprivnik · Apr 29, 2016 · Viewed 38.6k times · Source

I have a lot of text in lowercase, only problem is, that there is a lot of special characters, which I want to remove it all with numbers too.

Next command it's not strong enough:

tr -cd '[alpha]\n '

In case of éćščž and some others it returns "?" But I want to remove all of them. Is there any stronger command?

I use linux mint 4.3.8(1)-release

Answer

Inian picture Inian · Apr 29, 2016

You can use tr to print only the printable characters from a string like below. Just use the below command on your input file.

tr -cd "[:print:]\n" < file1   

The flag -d is meant to the delete the character sets defined in the arguments on the input stream, and -c is for complementing those (invert what's provided). So without -c the command would delete all printable characters from the input stream and using it complements it by removing the non-printable characters. We also keep the newline character \n to preserve the line endings in the input file. Removing it would just produce the final output in one big line.

The [:print:] is just a POSIX bracket expression which is a combination of expressions [:alnum:], [:punct:] and space. The [:alnum:] is same as [0-9A-Za-z] and [:punct:] includes characters ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~