run a python script in qsub

gabboshow picture gabboshow · Mar 7, 2017 · Viewed 8.7k times · Source

I have a python script main_script.py that looks like this:

import os

Files = os.listdir(os.path.join(path, "."))
FilesNumber = len(Files)

for fileID in range (0,FilesNumber):
    filename = Files[fileID]

    # load file specified in filename and do stuff

basically it does the same kind of operations for each file in the variable Files

I would like to use qsub to parallelize the for loop.

Assuming that I have a txt file files.txt containing all the files names:

//mypath//pathfile1
//mypath//pathfile2
...
//mypath//pathfile100

how can I write the shell script that calls qsub and runs main_script.py I think that I would also need to adapt main_script.py but I do not know how...

The scheduler is Torque/Maui

Answer

dbeer picture dbeer · Mar 7, 2017

One way to call any executable from a job script is to simply wrap it inside a bash script:

#/bin/bash

<full path to call executable>

If you name that script script.sh, and script.sh is executable, then you can execute:

qsub script.sh

and it will be submitted to the batch system. The gotchas - which you may well already know - are things like: if your executable isn't accessible from the compute node, then it won't be found when the job executes. The same is true for files that your script is using, so you'll want to make sure they're all located appropriately, usually a network-accessible filesystem.

If you wanted to directly submit the python script, you can add:

#!/usr/bin/python 

to the top (double-check that python is in /usr/bin on your system) and then you can directly qsub your python script. In your case,

qsub main_script.py

When submitted this way, the script no longer has to be in a network-accessible location, but the input files still do.