Where do Apache Samza and Apache Storm differ in their use cases?

blz picture blz · Mar 18, 2015 · Viewed 11.2k times · Source

I've stumbled upon this article that purports do contrast Samza with Storm, but it seems only to address implementation details.

Where do these two distributed computation engines differ in their use cases? What kind of job is each tool good for?

Answer

Luis Casillas picture Luis Casillas · Aug 5, 2015

Well, I've been investigating these systems for a few months, and I don't think they differ profoundly in their use cases. I think it's best to compare them along these lines instead:

  1. Age: Storm is the older project, and the original one in this space, so it's generally more mature and battle-tested. Samza is a newer, second-generation project that seems informed by lessons that were learned from Storm.
  2. Kafka: Samza grew out of the Kafka ecosystem, and is very Kafka-centric. For example, the documentation says that they allow plugging in different messaging systems... as long as they provide similar partitioning, ordering and replay semantics as Kafka does. Storm, being an older system, isn't so specialized to work with Kafka.
  3. Complexity: Samza, partly because it makes stronger assumptions about its environment ("you can have any infrastructure you like as long as it works like Kafka") and partly because it's just newer, strikes me as generally simpler than Storm, in a good way. But one perhaps less good way that Samza is simpler is that it (deliberately?) lacks Storm's concept of topologies (complex execution graphs). If you need a complex, multi-stage processor, it needs to be implemented as independent tasks that communicate through Kafka. This has advantages as well as disadvantages, but Samza makes the choice for you whereas Storm gives you more options.
  4. State management: Many Storm applications need to use an external store like Redis when they need to maintain a large volume of state to process incoming tuples. This situation seems to be one of the main things that motivated Samza's design; one of Samza's most distinctive features is that it provides its tasks with their own local disk-based key/value store to use for this purpose if they need it.