I have a website running on AWS EC2. I need to create a nightly job that generates a sitemap file and uploads the files to the various browsers. I'm looking for a utility on AWS that allows this functionality. I've considered the following:
1) Generate a request to the web server that triggers it to do this task
2) Create a cron job on the machine the web server is running on to execute this task
3) Create another EC2 instance and set up a cron job to run the task
Are there any other options? Is this a job for ElasticMapReduce?
If I were in your shoes, I'd probably start by trying to run the cron job on the web server each night at low tide and monitor the resource usage to make sure it doesn't interfere with the web server.
If you find that it doesn't play nicely, or you have high standards for the elegance of your architecture (I can admire that), then you'll probably need to run a separate instance.
I agree that it seems like a waste to run an instance 24 hours a day for a job you only need to run once a night.
Here's one aproach: The cron job on your primary machine (currently a web server) could fire up a new instance to run the task. It could pass in a user-data script that gets run when the instance starts, and the instance could shut itself down when it completes the task (where instance-initiated-shutdown-behavior was set to "terminate").
Unfortunately, this misses your desire to enforce separation of concerns, it gets complicated when you start scaling to multiple web servers, and it requires your web server to be alive in order for the job to run.
A couple months ago, I came up with a different approach to run an instance on a cron schedule, relying entirely on existing AWS features and with no requirement to have other servers running.
The basic idea is to use Amazon's Auto Scaling with a recurring action that scales the group from "0" to "1" at a specific time each night. The instance can terminate itself when the job is done, and the Auto Scaling can clean up much later to make sure it's terminated.
I've provided more details and a working example in this article:
Running EC2 Instances on a Recurring Schedule with Auto Scaling
http://alestic.com/2011/11/ec2-schedule-instance