Distributed Data Quality – Technical Solutions for Organizational Scaling (Justin Cunningham, Yelp) Kafka Summit 2018

Yelp is composed of thousands of aligned, but autonomous people. Effectively sharing context is vital in large organizations to maintain alignment without sacrificing autonomy. Communicating context around data meaning, ownership, authority, availability, lineage, and quality is critically important in operating large-scale streaming infrastructure. This talk explores how Yelp uses Apache Kafka and managed schemas to answer questions like “What does this column mean?”, “What data is available?”, “What data should I use?”, “Is this data accurate?”, and “How can I get that data?”