Kafka data types of messages

Fateax picture Fateax · Jul 8, 2017 · Viewed 19.6k times · Source

I was wondering about what types of data we could have in Kafka topics. As I know in application level this is a key-value pairs and this could be the data of type which is supported by the language. For example we send some messages to the topic, could it be some json, parquet files, serialized data or we operate with the messages only like with the plain text format?

Thanks for you help.

Answer

Hans Jespersen picture Hans Jespersen · Jul 8, 2017

There are various message formats depending on if you are talking about the APIs, the wire protocol, or the on disk storage.

Some of these Kafka Message formats are described in the docs here

https://kafka.apache.org/documentation/#messageformat

Kafka has the concept of a Serializer/Deserializer or SerDes (pronounced Sir-Deez).

https://en.m.wikipedia.org/wiki/SerDes

A Serializer is a function that can take any message and converts it into the byte array that is actually sent on the wire using the Kafka Protocol.

A Deserializer does the opposite, it reads the raw message bytes portion of the Kafka wire protocol and re-creates a message as you want the receiving application to see it.

There are built-in SerDes libraries for Strings, Long, ByteArrays, ByteBuffers and a wealth of community SerDes libraries for JSON, ProtoBuf, Avro, as well as application specific message formats.

You can build your own SerDes libraries as well see the following

How to create Custom serializer in kafka?