Running shell script in parallel

Tony picture Tony · Apr 5, 2011 · Viewed 88.5k times · Source

I have a shell script which

  1. shuffles a large text file (6 million rows and 6 columns)
  2. sorts the file based the first column
  3. outputs 1000 files

So the pseudocode looks like this

file1.sh 

#!/bin/bash
for i in $(seq 1 1000)
do

  Generating random numbers here , sorting  and outputting to file$i.txt  

done

Is there a way to run this shell script in parallel to make full use of multi-core CPUs?

At the moment, ./file1.sh executes in sequence 1 to 1000 runs and it is very slow.

Thanks for your help.

Answer

Jonathan Dursi picture Jonathan Dursi · Apr 5, 2011

Another very handy way to do this is with gnu parallel, which is well worth installing if you don't already have it; this is invaluable if the tasks don't necessarily take the same amount of time.

seq 1000 | parallel -j 8 --workdir $PWD ./myrun {}

will launch ./myrun 1, ./myrun 2, etc, making sure 8 jobs at a time are running. It can also take lists of nodes if you want to run on several nodes at once, eg in a PBS job; our instructions to our users for how to do that on our system are here.

Updated to add: You want to make sure you're using gnu-parallel, not the more limited utility of the same name that comes in the moreutils package (the divergent history of the two is described here.)