From the unix terminal, we can use diff file1 file2
to find the difference between two files. Is there a similar command to show the similarity across 2 files? (many pipes allowed if necessary.
Each file contains a line with a string sentence; they are sorted and duplicate lines removed with sort file1 | uniq
.
file1
: http://pastebin.com/taRcegVn
file2
: http://pastebin.com/2fXeMrHQ
And the output should output the lines that appears in both files.
output
: http://pastebin.com/FnjXFshs
I am able to use python to do it as such but i think it's a little too much to put into the terminal:
x = set([i.strip() for i in open('wn-rb.dic')])
y = set([i.strip() for i in open('wn-s.dic')])
z = x.intersection(y)
outfile = open('reverse-diff.out')
for i in z:
print>>outfile, i
If you want to get a list of repeated lines without resorting to AWK, you can use -d
flag to uniq:
sort file1 file2 | uniq -d