Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications

82ff0ce0 4701 4a3c 95d6 dc87a7d8c08d

Speaker: Gwen Shapira, Product Manager, Confluent

When you are running systems in production, clearly you want to make sure they are up and running at all times. But in a distributed system such as Apache Kafka®… what does “up and running” even mean?

Experienced Apache Kafka users know what is important to monitor, which alerts are critical and how to respond to them. They don’t just collect metrics - they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines.

In this presentation, we discuss best practices of monitoring Apache Kafka. We look at which metrics are critical to alert on, which are useful in troubleshooting and what may actually be misleading. We review a few “worst practices” - common mistakes that you should avoid. We then look at what metrics don’t tell you - and how to cover those essential gaps.

This is part 5 out of 5 in the Best Practices for Apache Kafka in Production Confluent Online Talk Series.