What is the best way to run python scripts in AWS?

Parijat Bose picture Parijat Bose · May 6, 2019 · Viewed 11.3k times · Source

I have three python scripts, 1.py, 2.py, and 3.py, each having 3 runtime arguments to be passed.

All three python programs are independent of each other. All 3 may run in a sequential manner in a batch or it may happen any two may run depending upon some configuration.

Manual approach:

  1. Create EC2 instance, run python script, shut it down.
  2. Repeat the above step for the next python script.

The automated way would be trigger the above process through lambda and replicate the above process using some combination of services.

What is the best way to implement this in AWS?

Answer

au kk picture au kk · May 21, 2019

AWS Batch has a DAG scheduler, technically you could define job1, job2, job3 and tell AWS Batch to run them in that order. But I wouldn't recommend that route.

For the above to work you would basically need to create 3 docker images. image1, image2, image3. and then put these in ECR (Docker Hub can also work if not using Fargate launch type).

I don't think that makes sense unless each job is bulky has its own runtime that's different from the others.

Instead I would write a Python program that calls 1.py 2.py and 3.py, put that in a Docker image and run a AWS batch job or just ECS Fargate task.

main.py:

import subprocess

exit_code = subprocess.call("python3 /path/to/1.py", shell=True)

# decide if you want call 2.py and so on ...
# 1.py will see the same stdout, stderr as main.py
# with batch and fargate you can retrieve these form cloudwatch logs ...

Now you have a Docker image that just needs to run somewhere. Fargate is fast to startup, bit pricey, has a 10GB max limit on temporary storage. AWS Batch is slow to startup on a cold start, but can use spot instances in your account. You might need to make a custom AMI for AWS batch to work. i.e. if you want more storage.

Note: for anyone who wants to scream at shell=True, both main.py and 1.py came from the same codebase. It's a batch job, not an internet facing API that took that from user request.