Removing punctuation and tabs with sed

I0_ol picture I0_ol · Feb 8, 2017 · Viewed 7.7k times · Source

I am using the following to remove punctuation, tabs, and convert uppercase text to lowercase in a text file.

sed 's/[[:punct:]]//g' $HOME/file.txt | sed $'s/\t//g' | tr '[:upper:]' '[:lower:]'

Do I need to use these two separate sed commands to remove punctuation and tabs or can this be done with a single sed command?

Also, could someone explain what the $ is doing in the second sed command? Without it the command doesn't remove tabs. I looked in the man page but I didn't see anything that mentioned this.

The input file looks like this:

Pochemu oni ne v shkole?
Kto tam?
Otkuda eto moloko?
Chei chai ona p’et?
    Kogda vy chitaete?
    Kogda ty chitaesh’?

Answer

Inian picture Inian · Feb 8, 2017

A single sed with multiple -e expressions, which can be done as below for FreeBSD sed

sed -e $'s/\t//g' -e "s/[[:punct:]]\+//g" -e 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/' file

With the y quanitifier for,

[2addr]y/string1/string2/
      Replace all occurrences of characters in string1 in the pattern 
      space with the corresponding characters from string2.

If in GNU sed, \L quantifier for lower-case conversion should work fine.

sed -e $'s/\t//g' -e "s/[[:punct:]]\+//g" -e "s/./\L&/g" 

$'' is a bash quoting mechanism to enable ANSI C-like escape sequences.