Mule batch processing vs foreach vs splitter-aggregator

mcvkr picture mcvkr · Apr 14, 2017 · Viewed 9.4k times · Source

In Mule, I have quite many records to process, where processing includes some calculations, going back and forth to database etc.. We can process collections of records with these options

  1. Batch processing

  2. ForEach

  3. Splitter-Aggregator

    So what are the main differences between them? When should we prefer one to others?

Mule batch processing option does not seem to have batch job scope variable definition, for example. Or, what if I want to benefit multithreading to fasten the overall task? Or, which is better if I want to modify the payload during processing?

Answer

Roger Butenuth picture Roger Butenuth · Apr 19, 2017

When you write "quite many" I assume it's too much for main memory, this rules out spliter/aggregator because it has to collect all records to return them as a list.

I assume you have your records in a stream or iterator, otherwise you probably have a memory problem...

So when to use for-each and when to use batch?

For Each

The most simple solution, but it has some drawbacks:

  1. It is single threaded (so may be too slow for your use case)
  2. It is "fire and forget": You can't collect anything within the loop, e.g. a record count
  3. There is not support handling "broken" records

Within the loop, you can have several steps (message processors) to process your records (e.g. for the mentioned database lookup).

May be a drawback, may be an advantage: The loop is synchronous. (If you want to process asynchronous, wrap it in an async-scope.)

Batch

A little more stuff to do / to understand, but more features:

  1. When called from a flow, always asynchronous (this may be a drawback).
  2. Can be standalone (e.g. with a poll inside for starting)
  3. When the data generated in the loading phase is too big, it is automatically offloaded to disk.
  4. Multithreading for free (number of threads configurable)
  5. Handling for "broken records": Batch steps may be executed for good/broken records only.
  6. You get statitstics at the end (number of records, number of successful records etc.)

So it looks like you better use batch.