I'm trying to merge many sorted files in a UNIX/Linux script with sort -m
, and I noticed that sort
first writes the result to a temporary file, then copies it to destination. My understanding of -m
was that it assumes the files are sorted, so using a temporary file is completely unnecessary, and it wastes both hard disk space and CPU cycles (I'm using sort
in a pipeline which gets stuck waiting for sort to output anything.) Is there a way to tell sort
to not use temporary files when merging sorted files? Or a better version which doesn't?
The exact CL looks like:
$ sort -m -s -t '_' -k 1,1n -k 2,2n <(gunzip <file_1) [...] <(gunzip <file_n) | gzip >output
I'm using sort
from GNU coreutils 5.97.
Check out these options from man sort
, they might let you minimize the amount of space needed for merging.
--batch-size=NMERGE
merge at most NMERGE inputs at once; for more use temp files
--compress-program=PROG
compress temporaries with PROG; decompress them with PROG -d