Print lines in one file matching patterns in another file

Jon picture Jon · Jan 27, 2014 · Viewed 36.5k times · Source

I have a file with more than 40.000 lines (file1) and I want to extract the lines matching patterns in file2 (about 6000 lines). I use grep like this, but it is very slow: grep -f file2 file1 > out

Is there a faster way to do this using awk or sed?

Here's some extracts from my files:

File1:
scitn003869.2| scign003869 CGCATGTGTGCATGTATTATCGTATCCCTTG
scitn007747.1| scign007747  CACGCAGACGCAGTGGAGCATTCCAGGTCACAA
scitn003155.1| scign003155  TAAAAATCGTTAGCACTCGCTTGGTACACTAAC
scitn018252.1| scign018252  CGTGTGTGTGCATATGTGTGCATGCGTG
scitn004671.2| scign004671  TCCTCAGGTTTTGAAAGGCAGGGTAAGTGCT

File2:
scign000003
scign000004
scign000005
scign004671
scign000013

`

Answer

glenn jackman picture glenn jackman · Jan 27, 2014

Try grep -Fwf file2 file1 > out

The -F option specifies plain string matching, so should be faster without having to engage the regex engine.