How Kroger embraced a "schema first" philosophy in building real-time data pipelines (Rob Hoeting, Rob Hammonds& Lauren McDonald, Kroger ) Kafka Summit SF 2019
Early attempts at real-time business event streaming at Kroger was based on JSON formatted events. Modifications to the event formats occasionally broke downstream consumers, causing costly downtime. In the course of reimagining what an industrial strength streaming platform would look like, we decided to focus heavily on schema lifecycle and management as a foundation. The schema registry is a great service, but it's only one part of the schema lifecycle management process. Here are the core principles around schema management: (1) Event schema are expressed in Avro (2) New versions will be fully compatible with older versions (3) Event producers create, manage, and fully document event schemas (4) Avro Schemas are managed in git and represent the source of truth (5) Complex schemas can be broken into smaller reusable component schemas and referenced in larger schemas The CI/CD Build process, in conjunction with customized gradle plugins, perform the following: (1) Constructs the full event schemas from components into larger registerable schemas (2) Generates Java source code based on the event schemas (3) Checks compatibility with prior registered versions (4) Registers the new/updated version in the schema registry (5) Publishes generated JAR File into artifactory for producers & consumers (6) Other Source Code Generation (Future) (7) Publishes the schema into other metadata tools to help make them more discoverable (future).