End-to-end large messages processing with Apache Kafka Streams and Kafka Connect with Philipp Schirmer | Berlin Apache Kafka® Meetup

There are several data streaming scenarios, where the messages are too large at the beginning when published to Kafka, or become too large during processing when puzzled together and pushed to the next Kafka topic or to another data system. Due to performance impacts, the default message size in Kafka is 1 MB. Although this limit can be increased, there will always be messages exceeding the configured limit and therefore are too large for Kafka.

Therefore, we implemented a lightweight and transparent approach to publish and process large messages with Kafka Streams and Kafka Connect. Messages exceeding a configurable maximum message size are stored on an external file system, such as Amazon S3. By using the available Kafka APIs, i.e., SerDes and Kafka Connect Converters, this process works transparently without changing any existing code. Our implementation works as a wrapper for actual serialization and deserialization and is thus suitable for any data format used with Kafka.

If you're interested in future Berlin Apache Kafka® Meetup's check it out here: https://www.meetup.com/Berlin-Apache-Kafka-Meetup-by-Confluent

Slides: https://www.slideshare.net/ConfluentInc/end-toend-large-messages-processing-with-kafka-streams-kafka-connect/ConfluentInc/end-toend-large-messages-processing-with-kafka-streams-kafka-connect