Remove dot(.) from specific columns using gsub and awk

jamie picture jamie · Sep 26, 2013 · Viewed 8.8k times · Source

I want to remove dot(.) only from the 4th and 5th columns of the table.

input
1    10057   .       A       AC      
1    10146   .       AC.      A       
1    10177   .       A       AC      
1    10230   .       AC      .A,AN    
1    10349   .       CCCTA   C,CCCTAA.              
1    10389   .       .AC      A,AN



desired output
1    10057   .       A       AC      
1    10146   .       AC      A       
1    10177   .       A       AC      
1    10230   .       AC      A,AN    
1    10349   .       CCCTA   C,CCCTAA              
1    10389   .       AC      A,AN    

So I tried the following command.

awk 'BEGIN {OFS=FS="\t"} {gsub("\.","",$4);gsub("\.","",$5)}1' input

and I got this result (The whole 4th and 5th columns were removed).

1    10057   .          
1    10146   .            
1    10177   .        
1    10230   .       
1    10349   .                 
1    10389   .       

Can you please point out where I have to modify? Thanks in advance.

Answer

Ed Morton picture Ed Morton · Sep 26, 2013

When you use a string to hold an RE (e.g. "\.") the string is parsed twice - once when the script is read by awk and then again when executed by awk. The result is you need to escape RE metacharacters twice (e.g. "\\.").

The better solution in every way is not to specify the RE as a string but specify it as an RE constant instead using appropriate delimiters, e.g. /\./:

awk 'BEGIN {OFS=FS="\t"} {gsub(/\./,"",$4);gsub(/\./,"",$5)}1' input