Unix sort treatment of underscore character

user145245 picture user145245 · Jul 26, 2009 · Viewed 9.9k times · Source

I have two linux machines, on which unix sort seems to behave differently. I believe I've narrowed it down to the treatment of the underscore character.

If I run sort tmp, where tmp contains the following two lines:

aa_d_hh
aa_dh_ey

one machine outputs

aa_d_hh
aa_dh_ey

(i.e. '_' precedes 'h') while the other outputs

aa_dh_ey
aa_d_hh

(i.e. 'h' precedes '_'). I need these machines to behave together (as I use sort -m later, to merge very large files).

Is there any way I can force sort to behave in one way or the other?

Thanks.

Answer

Mehmet Ergut picture Mehmet Ergut · Jul 26, 2009

You can set LC_COLLATE to traditional sort order just for your command:

env LC_COLLATE=C sort tmp

This won't change the current environment just the one in which the sort command executes. You should have the same behaviour with this.