1.gui Qxx 16
2.gu Qxy 23
3.guT QWS 18
4.gui Qxr 21
i want to sort a file depending a value in the 3rd column, so i use:
sort -rnk3 myfile
2.gu Qxy 23
4.gui Qxr 21
3.guT QWS 18
1.gui Qxx 16
now i have to output as: (the line starting with 3.gui is out because the line with 4.gui has a greater value)
2.gu Qxy 23
4.gui Qxr 21
1.guT QWS 18
i can not use -head
because i have millions of rows and i do not where to cut, i could not figure a way to use -uniq
because it treats a line as whole and since i can not tell -uniq
to look at first column, it counts a line which has unique it outputs it -which is normal-. i know -uniq
can ignore a number of characters but as you can see from example first column might have various character count..
please advice..
Try this:
sort -rnk3 myfile | awk -F"[. ]" '!a[$2]++'
awk removes the duplicates depending on the 2nd column. This is actually a famous awk syntax to remove duplicates. An array is maintained where the record of 2nd field is maintained. Every time before a record is printed, the 2nd field is checked in the array. If not present, it is printed, else its discarded since it is duplicate. This is achived using the ++. First time, when a record is encountered, this ++ will keep the count as 0 since its post-fix. SUbsequent occurences will increase the value which when negated becomes false.