How to schedule tasks on SageMaker

VicariousAT picture VicariousAT · Mar 31, 2018 · Viewed 10.4k times · Source

I have a notebook on SageMaker I would like to run every night. What's the best way to schedule this task. Is there a way to run a bash script and schedule Cron job from SageMaker?

Answer

Guy picture Guy · Apr 8, 2018

Amazon SageMaker is a set of API that can help various machine learning and data science tasks. These API can be invoked from various sources, such as CLI, SDK or specifically from schedule AWS Lambda functions (see here for documentation: https://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html )

The main parts of Amazon SageMaker are notebook instances, training and tuning jobs, and model hosting for real-time predictions. Each one has different types of schedules that you might want to have. The most popular are:

  • Stopping and Starting Notebook Instances - Since the notebook instances are used for interactive ML models development, you don't really need them running during the nights or weekends. You can schedule a Lambda function to call the stop-notebook-instance API at the end of the working day (8PM, for example), and the start-notebook-instance API in the morning. Please note that you can also run crontab on the notebook instances (after opening the local terminal from the Jupyter interface).
  • Refreshing an ML Model - Automating the re-training of models, on new data that is flowing into the system all the time, is a common issue that with SageMaker is easier to solve. Calling create-training-job API from a scheduled Lambda function (or even from a CloudWatch Event that is monitoring the performance of the existing models), pointing to the S3 bucket where the old and new data resides, can create a refreshed model that you can now deploy into an A/B testing environment .

----- UPDATE (thanks to @snat2100 comment) -----

  • Creating and Deleting Real-time Endpoints - If your realtime endpoints are not needed 24/7 (for example, serving internal company users working during workdays and hours), you can also create the endpoints in the morning and delete them at night.