How to delete duplicate lines in a file without sorting it in Unix?

Vijay picture Vijay · Sep 18, 2009 · Viewed 106.4k times · Source

Is there a way to delete duplicate lines in a file in Unix?

I can do it with sort -u and uniq commands, but I want to use sed or awk. Is that possible?

Answer

Jonas Elfström picture Jonas Elfström · Sep 18, 2009
awk '!seen[$0]++' file.txt

seen is an associative-array that Awk will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. Awk will print the lines where the expression evaluates to true. The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on.
Awk evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.