Kafka or SNS or something else?

Vor picture Vor · May 8, 2013 · Viewed 38k times · Source

Sorry if it is a newbie question. But I'm trying to understand what should I use. As far as I understand Kafka is :

Apache Kafka is a distributed publish-subscribe messaging system.

And SNS is also pub/sub system.

My goal is to use some queue messaging system on AWS with application that will be distributed over few servers (By the way the main language is Python). And because it is on amazon, my first thought was to use SNS and SQS. But than I saw a lot of people using Kafka on AWS. What are the advantages one over another?

Answer

adamw picture adamw · May 9, 2013

The use-cases for Kafka and Amazon SQS/Amazon SNS are quite different.

Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself. It supports multiple readers, which may "catch up" with the stream of messages at any point (well, as long as the messages are still on disk). You can use it both as a queue (using consumer groups) and as a topic.

An important characteristic is that you cannot selectively acknowledge messages as "processed"; the only option is acknowledging all messages up to a certain offset.

SQS/SNS on the other hand:

  • no setup/no maintenance
  • either a queue (SQS) or a topic (SNS)
  • various limitations (on size, how long a message lives, etc)
  • limited throughput: you can do batch and concurrent requests, but still achieving high throughputs would be expensive
  • I'm not sure if the messages are replicated; however at-least-once guarantee delivery in SQS would suggest so
  • SNS has notifications for email, SMS, SQS, HTTP built-in. With Kafka, you would probably have to code it yourself
  • no "message stream" concept

So overall I would say SQS/SNS are well suited for simpler tasks and workloads with a lower volume of messages.