How is the fork/join framework better than a thread pool?

Joonas Pulakka picture Joonas Pulakka · Oct 28, 2011 · Viewed 49.5k times · Source

What are the benefits of using the new fork/join framework over just simply splitting the big task into N subtasks in the beginning, sending them to a cached thread pool (from Executors) and waiting for each task to complete? I fail to see how using the fork/join abstraction simplifies the problem or makes the solution more efficient from what we've had for years now.

For example, the parallelized blurring algorithm in the tutorial example could be implemented like this:

public class Blur implements Runnable {
    private int[] mSource;
    private int mStart;
    private int mLength;
    private int[] mDestination;

    private int mBlurWidth = 15; // Processing window size, should be odd.

    public ForkBlur(int[] src, int start, int length, int[] dst) {
        mSource = src;
        mStart = start;
        mLength = length;
        mDestination = dst;
    }

    public void run() {
        computeDirectly();
    }

    protected void computeDirectly() {
        // As in the example, omitted for brevity
    }
}

Split in the beginning and send tasks to a thread pool:

// source image pixels are in src
// destination image pixels are in dst
// threadPool is a (cached) thread pool

int maxSize = 100000; // analogous to F-J's "sThreshold"
List<Future> futures = new ArrayList<Future>();

// Send stuff to thread pool:
for (int i = 0; i < src.length; i+= maxSize) {
    int size = Math.min(maxSize, src.length - i);
    ForkBlur task = new ForkBlur(src, i, size, dst);
    Future f = threadPool.submit(task);
    futures.add(f);
}

// Wait for all sent tasks to complete:
for (Future future : futures) {
    future.get();
}

// Done!

The tasks go to the thread pool's queue, from which they're executed as worker threads become available. As long as the splitting is granular enough (to avoid having to particularly wait for the last task) and the thread pool has enough (at least N of processors) threads, all processors are working at full speed until the whole computation is done.

Am I missing something? What's the added value of using the fork/join framework?

Answer

A.H. picture A.H. · Oct 28, 2011

I think the basic misunderstanding is, that the Fork/Join examples do NOT show work stealing but only some kind of standard divide and conquer.

Work stealing would be like this: Worker B has finished his work. He is a kind one, so he looks around and sees Worker A still working very hard. He strolls over and asks: "Hey lad, I could give you a hand." A replies. "Cool, I have this task of 1000 units. So far I have finished 345 leaving 655. Could you please work on number 673 to 1000, I'll do the 346 to 672." B says "OK, let's start so we can go to the pub earlier."

You see - the workers must communicate between each other even when they started the real work. This is the missing part in the examples.

The examples on the other hand show only something like "use subcontractors":

Worker A: "Dang, I have 1000 units of work. Too much for me. I'll do 500 myself and subcontract 500 to someone else." This goes on until the big task is broken down into small packets of 10 units each. These will be executed by the available workers. But if one packet is a kind of poison pill and takes considerably longer than other packets -- bad luck, the divide phase is over.

The only remaining difference between Fork/Join and splitting the task upfront is this: When splitting upfront you have the work queue full right from start. Example: 1000 units, the threshold is 10, so the queue has 100 entries. These packets are distributed to the threadpool members.

Fork/Join is more complex and tries to keep the number of packets in the queue smaller:

  • Step 1: Put one packet containing (1...1000) into queue
  • Step 2: One worker pops the packet(1...1000) and replaces it with two packets: (1...500) and (501...1000).
  • Step 3: One worker pops packet (500...1000) and pushes (500...750) and (751...1000).
  • Step n: The stack contains these packets: (1..500), (500...750), (750...875)... (991..1000)
  • Step n+1: Packet (991..1000) is popped and executed
  • Step n+2: Packet (981..990) is popped and executed
  • Step n+3: Packet (961..980) is popped and split into (961...970) and (971..980). ....

You see: in Fork/Join the queue is smaller (6 in the example) and the "split" and "work" phases are interleaved.

When multiple workers are popping and pushing simultaneously the interactions are not so clear of course.