Using linux command "sort -f | uniq -i" together for ignoring case

Question 1

Using linux command "sort -f | uniq -i" together for ignoring case

linux sorting awk gawk uniq

Steve3p0 · Feb 23, 2013 · Viewed 11.7k times · Source

Answer

Answer

You might keep it simple:

sort -uf
#where sort -u = the unique findings
#      sort -f = insensitive case

Question 2

I am trying to find unique and duplicate data in a list of data with two columns. I really just want to compare the data in column 1.

The data might look like this (separated by a tab):

What are you doing?     Che cosa stai facendo?
WHAT ARE YOU DOING?     Che diavolo stai facendo?
what are you doing?     Qual è il tuo problema amico?

So I have been playing around with the following:

Sorting without ignoring case (just "sort", no -f option) gives me less duplicates

gawk '{ FS = "\t" ; print $1 }' EN-IT_Corpus.txt | sort | uniq -i -D > dupes
Sorting with ignoring case ("sort -f") gives me more duplicates

gawk '{ FS = "\t" ; print $1 }' EN-IT_Corpus.txt | sort -f | uniq -i -D > dupes

Am I right to think that #2 is more accurate if I want to find duplicates ignoring case, because it sorts it ignoring case first and then finds duplicates based on the sorted data?

As far as I know I can't combine the sort and unique commands because sort doesn't have an option for displaying duplicates.

Thanks, Steve

Using linux command "sort -f | uniq -i" together for ignoring case

Answer

Related questions