Building a Real-Time Analytics Database: A 'Choose Your Own Adventure' Journey
You need to be signed in to add a collection
Have you ever stopped to think about how to build a database? The thing is, there isn't just one way, as we can see by the massive number of data infrastructure options we have to choose from. It's a nonstop series of tradeoffs, each motivated by the constraints the database wants to satisfy. An in-memory transactional database would be one thing. A general-purpose, single-server relational database would be another. A low-latency, horizontally scalable analytics database would be...the journey we're going to take. In this talk, we'll start by picking a data model, make decisions about serialization and storage, choose indexing strategies, pick a query language, and figure out how to scale, eventually ending up with something that looks remarkably like Apache Pinot, a real-time analytics database. Pinot was built on a journey like this, always optimized for ultra low-latency, user-facing analytics at scale. In the real world, Pinot is used by applications like LinkedIn and UberEats to expose the state of the system not just to internal decision-makers, but to the users of the system itself, including all of us people who consumers of analytical queries. By focusing on the internals of Pinot and the tradeoffs made along the way to build a database of its kind, we'll see how it enables a new class of applications that every user of a system into a decision maker.
Transcript
Have you ever stopped to think about how to build a database? The thing is, there isn't just one way, as we can see by the massive number of data infrastructure options we have to choose from. It's a nonstop series of tradeoffs, each motivated by the constraints the database wants to satisfy. An in-memory transactional database would be one thing. A general-purpose, single-server relational database would be another. A low-latency, horizontally scalable analytics database would be...the journey we're going to take.
In this talk, we'll start by picking a data model, make decisions about serialization and storage, choose indexing strategies, pick a query language, and figure out how to scale, eventually ending up with something that looks remarkably like Apache Pinot, a real-time analytics database. Pinot was built on a journey like this, always optimized for ultra low-latency, user-facing analytics at scale. In the real world, Pinot is used by applications like LinkedIn and UberEats to expose the state of the system not just to internal decision-makers, but to the users of the system itself, including all of us people who consumers of analytical queries. By focusing on the internals of Pinot and the tradeoffs made along the way to build a database of its kind, we'll see how it enables a new class of applications that every user of a system into a decision maker.