Grep all instances of strings that start with certain characters

Stephopolis picture Stephopolis · Feb 22, 2013 · Viewed 22.8k times · Source

I would like to grep out all instances of strings that start with the characters 'rs' (from just one file) and pipe the full string into a new file. I managed to get the count of the instances but I don't know how to get them into the new file:

grep -c rs < /home/Stephanie/this.txt
698572

An example of a line in the file is:

1203823    forward   efjdhgv   rs124054t8 dhdfhfhs
12045345    back   efjdkkjf   rs12445368 dhdfhfhs

I just want to grab the rs string and move it to a ne file. Can someone help me out with the piping? I read around a bit but what I found wasn't particularly helpful to me. thanks

Answer

biophonc picture biophonc · Feb 22, 2013

I'd suggest something like this:

egrep -o "(\s(rs\S+))" data.txt | cut -d " " -f 2 > newfile.txt

\s looks for something that starts with any whitespace character

(rs\S+) and then searches for a string that starts with "rs" and is followed by any non-whitespace character

The results still have the white spaces in it, which we don't want, so we "cut" them out, before the content gets written to new file.