English | Deutsch

Product
Cloud
- Confluent Cloud
- Support
- Sign Up
- Log In
- Cloud FAQ
Developers
About Us
- Company
- Partners
- News
- Events
- Careers
- Contact
Blog
Docs
Download

Product
Cloud
- Confluent Cloud
- Support
- Sign Up
- Log In
- Cloud FAQ
Developers
Blog
Docs
Download

This is a preview of your video. Customize your viewer experience and add your own logo and branding. Customize your theme

Upgrade to add your own logo

Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit NYC 2019

Walmart.com generates millions of events per second. At WalmartLabs, I’m working in a team called the Customer Backbone (CBB), where we wanted to upgrade to a platform capable of processing this event volume in real-time and store the state/knowledge of possibly all the Walmart Customers generated by the processing. Kafka streams’ event-driven architecture seemed like the only obvious choice.

However, there are a few challenges w.r.t. Walmart’s scale:
• the clusters need to be large and the problems thereof.
• infinite retention of changelog topics, wasting valuable disk.
• slow stand-by task recovery in case of a node failure (changelog topics have GBs of data)
• no repartitioning in Kafka Streams.

As part of the event-driven development and addressing the challenges above, I’m going to talk about some bold new ideas we developed as features/patches to Kafka Streams to deal with the scale required at Walmart.
• Cold Bootstrap: Where in case of a Kafka Streams node failure, how instead of recovering from the change-log topic, we bootstrap the standby from active’s RocksDB using JSch and zero event loss by careful offset management.
• Dynamic Repartitioning: We added support for repartitioning in Kafka Streams where state is distributed among the new partitions. We can now elastically scale to any number of partitions and any number of nodes.
• Cloud/Rack/AZ aware task assignment: No active and standby tasks of the same partition are assigned to the same rack.
• Decreased Partition Assignment Size: With large clusters like ours (less than 400 nodes and 3 stream threads per node), the size of Partition Assignment of the KS cluster being few 100MBs, it takes a lot of time to settle a rebalance.

Key Takeaways:
• Basic understanding of Kafka Streams.
• Productionizing Kafka Streams at scale.
• Using Kafka Streams as Distributed NoSQL DB

Vidyard uses cookies to better understand how videos are viewed, and to improve your experience. Learn more

Product
Confluent Platform
KSQL
Subscription
Professional Services
Training
Customers

Cloud
Confluent Cloud
Support
Sign Up
Log In

Solutions
Industry Solutions
Microservices
Internet of Things
Financial Services
Fraud Detection
Customer 360
Azure Hybrid Streaming

Developers
What is Kafka?
Resources
Events
Online Talks
Meetups
Kafka Summit
Kafka Tutorials
Confluent Developer
Docs
Blog

About
Company
Careers
Partners
News
Contact

Copyright © Confluent, Inc. 2014-2025. Terms & Conditions Privacy Policy Do Not Sell My Information Modern Slavery Policy

Apache®, Apache Kafka®, Kafka®, Apache Flink®, Flink®, Apache Iceberg®, Iceberg® and associated open source project names are trademarks of the Apache Software Foundation