Distributed Job scheduling, management, and reporting

teabot picture teabot · Dec 16, 2009 · Viewed 11.2k times · Source

I recently had a play around with Hadoop and was impressed with it's scheduling, management, and reporting of MapReduce jobs. It appears to make the distribution and execution of new jobs quite seamless, allowing the developer to concentrate on the implementation of their jobs.

I am wondering if anything exists in the Java domain for the distributed execution of jobs that are not easily expressed as MapReduce problems? For example:

  • Jobs that require task co-ordination and synchronization. For example, they may involve sequential execution of tasks yet it is feasible to execute some tasks concurrently:

                   .-- B --.
            .--A --|       |--.
            |      '-- C --'  |
    Start --|                 |-- Done
            |                 |
            '--D -------------'
    
  • CPU intensive tasks that you'd like to distribute but don't provide any outputs to reduce - image conversion/resizing for example.

So is there a Java framework/platform that provides such a distributed computing environment? Or is this sort of thing acceptable/achievable using Hadoop - and if so are there any patterns/guidelines for these sorts of jobs?

Answer

teabot picture teabot · Jan 4, 2010

I have since found Spring Batch and Spring Batch Integration which appear to address many of my requirements. I will let you know how I get on.