How to design a distributed job scheduler?

coderz picture coderz · Nov 12, 2014 · Viewed 18.5k times · Source

I want to design a job scheduler cluster, which contains several hosts to do cron job scheduling. For example, a job which needs run every 5 minutes is submitted to the cluster, the cluster should point out which host to fire next run, making sure:

  1. Disaster tolerance: if not all of the hosts are down, the job should be fired successfully.
  2. Validity: only one host to fire next job run.

Due to disaster tolerance, job cannot bind to a specific host. One way is all the hosts polling a DB table(certainly with lock), this guaranteed only one host gets the next job run. Since it often locks table, is there any better design?

Answer

Stefan picture Stefan · Nov 12, 2014

Use the Quartz framework for that. It has a cron like syntax, can be clustered and only one of the hosts in the cluster will do one job at a time. If a host or job fails, another host will retry the pending job.