Making Kafka Queryable with Apache Pinot

Tim Berglund | GOTO Copenhagen 2023

Share on:
linkedin facebook
Copied!

Transcript

Apache Kafka has become the standard infrastructure for event-driven and streaming data systems. The stunningly simple abstraction of the distributed log provides exactly what modern microservices and real-time systems need, but no choice is without its tradeoffs. Logs are an excellent way to keep track of events, but they are notoriously difficult to query. Given a constellation of services exchanging events with each other and reacting to inputs in real time, how can you find out—and gain insight into—what has just happened? How, in other words, do you query a log? This is where Apache Pinot comes in.

Developed at LinkedIn alongside Kafka, Pinot is a distributed, real-time analytics database designed to ingest data from Kafka (and other sources) and make it instantly queryable at low latency in the face of a huge number of concurrent requests. All that data tucked neatly away into topics, maintaining an immutable record of how the state of the system has evolved, can now be ingested into Pinot and made accessible through simple SQL queries.

This talk explores Pinot's internal architecture, how its integration with Kafka is specially optimized, and how Pinot fits architecturally in the modern streaming stack. You'll leave understanding how Pinot works, how it fits together with Kafka, where it has been used successfully in the real world, and what steps to take next in your own Pinot learning journey.

About the speakers