Home Bookclub Episodes Data Mesh: Deliv...

Data Mesh: Delivering Data-Driven Value at Scale

Zhamak Dehghani | Gotopia Bookclub Episode • March 2022

You need to be signed in to add a collection

How can modern organizations handle their data in a way that delivers value at scale? Zhamak Dehghani, author of “Data Mesh: Delivering Data-Driven Value at Scale,” covers the key principles of data mesh and how it can help organizations move beyond the data lake to provide meaningful insights. She’s joined by Samia Rahman, director of data and AI at Seagen, as they also explore the concept of the earliest explorable data.

Share on:

Copied!

Transcript

Intro

Samia Rahman: Hi, everyone. Welcome to GOTO Book Club. I'm Samia Rahman, currently working as the director of data and AI at Seagen. It's a biotechnology company where we are looking to develop innovative targeted therapies to better the lives of cancer patients. I like to solve complex problems, leveraging data, and our book today is Data Mesh, and I'm super excited to talk to the founder and author of Data Mesh today. I had the privilege to work with her just when the first article on the topic was published back in May 2019 and applied the principles in highly complex domains in large enterprises. So, welcome, Zhamak Dehghani. Would you like to introduce yourself?

Zhamak Dehghani: Samia, it's great to see you. Working with you has been the highlight of my work, especially when we were just creating this concept and you were building it, right, for one of our clients. I remember those whiteboard sessions and workshops together to figure out even how to create a language around this. So, I'm excited to be here and talk to you about this. I work as the director of emerging technologies for Thoughtworks in North America. But that's my day job. My night job is evangelizing and, I guess, refining and applying the concept of data mesh globally. And I'm excited to be here to talk about that.

What is Data Mesh?

Samia Rahman: Awesome. So, I think to introduce the topic to the audience for people who haven't known about the topic, can you just tell us what it means, how did it come about to be, and we can take it from there?

Zhamak Dehghani: Sure. I think...well, data mesh, the problem it's trying to solve is giving us new ways, and new approaches, to be able to get value from data for AI and ML and analytics responsibly, right? So I think that it's a socio-technical approach. It's an approach that considers both architecture and how we decompose and compose the components of the data management system at scale, and also people, the roles and responsibilities and accountabilities and the structure, the working structure, and processes of people, so that we can responsibly share data in a way that, you know, you can really use it for AI and ML. At its heart is a decentralized approach. I guess we have had many different approaches in the past to get data in a shape and form that is usable for analytics and AI, which is different from doing applications and transactions.

We've had many of those architectures in the past and it's different in a way that it decentralizes both the architecture and the accountability people, people and roles and responsibilities. So, that's the problem it's trying to solve. How it's trying to solve it, I think we are still refining the details of that, but its core is based on kind of four principles that need to be applied together so that you can pick and choose. There are four complementary principles. And they are making sure that the data sharing and data also usage is really embedded into domains of an organization. And the domain is, basically, is an area of concern for business, right, with a specific outcome, specific language, or specific tasks.

So, it brings the responsibility and also architecture and decentralizes it around this concept of domains, it creates a new language around what we call data. Data in data mesh is no longer just bits and bytes and disks, so it creates a language around data as a product, which is a combination of all of the elements around a particular set of data or a concern of data around the domain data that make that data trustworthy, longstanding, reliable, timely, and also usable, again, usable for analytical use cases. This is really important because...and we call this data as a product, but this is very important because analytical use cases can't just go to one database and one area of business and create insights. They would really require connecting and stitching and correlating information across data that comes from many different domains. So data as a product tries to create this new notion of exchanging data as a unit of value that contains everything, all the structure components to make that data usable and trustworthy, discoverable for analytical use cases. So, the second pillar of data is the product.

The third pillar is how do we make all of this feasible, right? How is it possible that we can essentially own and be accountable for data and essentially produce data as a product or consume data as a product? It's really the elements of the self-serve platform, and in a way that it caters both for the diversity of these domains and teams, and yet harmony that would require for composing data across the teams. Also, it is kind of a platform that increases the level of abstraction to reduce the cognitive load for everyone to be a data developer, right, data user. So, really reduce the complexity that we see with our platforms today. So, that's also platforms, the third pillar.

And the last pillar is around addressing kind of all of the quality and security and harmonization concerns, all of these crosscutting concerns that we traditionally put under the umbrella of governance in a way that is practical, right? In a way that the governance doesn't get in the way of moving fast and being innovative, right? So it's called federated computational governance, which touches on both the operating model, as well as technology to implement cross-cutting concerns around all of these data products in a computational and automated fashion. So that's just to make everything really come together at the end. That's the long definition of kind of its principles and why we yet need another decentralized approach to managing data.

The structure of the book

Samia Rahman: Wow. That was awesome. I think there are so many parts that resonate with me from the project we did together to what I'm doing right now at work. I think my favorite word in the first sentence of your definition is the socio-technical approach. I think that's so key in how we deliver rapidly.

How did you structure your book, and how do you manage people who question decentralization? I think that's a very common challenge a lot of us have when we go into or some convincing that decentralization does give us value.

Zhamak Dehghani: Yes, absolutely. I think we can unpack those, maybe answer those as two separate questions. So if I forget to answer your second question, please remind me. But in terms of structuring the book, I guess the motivation behind the book was, this is a new concept, right? It's only a few years old. I guess a lot of its principles have been around forever and ever. Like it's been around. The principles are not so new, they're just principles that I personally saw working over the last, you know, 10 to 15 years in distribution and decentralization of applications and APIs, and trying to kind of bring that to the world of data. So even though we've seen these principles at work and in practice in adjacent kinds of operational systems, they aren't new to the data people or the data world.

So, because it's a new concept, I wrote this book probably a little bit earlier than I would've liked to. I would've liked to write this book five, seven years down the track when we have many successful cases and we can point to all of those finished case studies and so on. But it was necessary to write this book even three years down the track because the industry showed such a strong interest in the topic. Like it was beyond my imagination. I mean, my imagination was, I'm gonna go talk at this conference and then I will get a text, and then I would, you know, come back home and shut up for the rest of my life, but it was just...I pointed to the problems and perhaps solutions that people had seen and had resonated.

So, the industry spoke, and then the vendors had to respond. And suddenly every other vendor kind of had their products repurposed for data mesh. A lot of people came out and said, "We're kind of already doing it." So it was...I was really overwhelmed that, wow, there's really a need. As you can imagine, in any situation like that, there is a misconception, there are opportunists that take it in a different direction for their own personal benefits and gains. So I felt it was necessary to have a body of work that was not only establishing why we did it, what it is, also establishing what's missing, right? What are the hard problems we have to solve, to...answering your second question, to those, you know, doubts around decentralization? I mean, decentralization is not for free. It's very expensive and you have to have the engineering and infrastructure level. You have to do a lot of work to make it work, so we can talk about that in a minute.

The reason I wrote the book was first and foremost to establish what it is before the industry takes it and percolates it to something unrecognizable so years down the tracks, there is somewhere to come back to. As you can imagine, the first chapter, the first part of the book is to structure it as what it is, what is it at the principle, without me becoming too descriptive so that we can kill the innovation around it, right? The purpose is to be open enough, but also close enough so that you can allow refinement and creation behind it and after it, but yet be cognizant of like the boundaries of it, right?

So that's the first part of the book, what it is. I've structured what it is based on the pillars. But before I even start a book in the prologue, and I highly encourage readers to read that small chapter, I tell a little story for us to imagine if we did have data mesh in place, where is it that we wanna get to? So there's a bit of storytelling, putting data mesh in an environment, in a kind of hypothetical environment, and showing what is possible. So then once I establish that I feel it's really important to say why. Why are we even talking about this? What are the macro-environmental drivers that are getting us to this inflection point where we have two choices, either continue doing what we're doing, spend more money, get a plateau of result, or really pivot and change some of the...not everything, but change some of those key elements that haven't changed for a long time and hopefully go to new heights.

The second chapter... the second part of the book is around why we're doing data mesh. I think it's really funny, even within our own company, I sometimes get into discussions with people because we're getting into solutions, how are we gonna build it, and how are we gonna wire this data product. But people, when I zoom back and I look at, oh, some of the patterns of behavior that I see, some of the patterns of how organizations are trying to apply this, it's actually counter to why we should be doing this. We're getting to the what and how, and forget why, and then the hows are completely misaligned as to why we're doing this thing.

So, the second part of the book is really describing the environmental, organizational drivers that require something like data management, describing the benefits and objectives that you're trying to achieve, like giving you those litmus tests that am I actually getting toward this objective, or am I getting away despite all of this fancy technology I'm building? I mean, you know, we are both technologists, so we're getting excited about taking solutions, but we have to have a litmus test to know why. What are those objectives, and are we getting closer to them? That's kind of the second part of the book.

Then the third part of the book is around how...third and fourth, in fact, again, I'm in love with technology, so two parts of the book are dedicated to architecture. And again, I talk about architecture at a very high level. I don't talk about why this tech will use that open source because those things will change and evolve. But it gives them enough guidelines for people to see how technology that exists will fit into that architecture. So, the third part of the book is the kind of the overall architecture of what do we mean by multi-plane data infrastructure? What do we mean with computational policies? What do we mean by the end product? What do we mean by, you know, the inputs and outputs and discoverability? So it talks about a decentralized, kind of distributed architecture for that.

And again, in that section, I try to be thoughtful of the fact that we are early, early in the days. I don't know how many times I apologized in the book that we are still early, guys. Like, this is just a starting point. Don't take this and just, you know, implement it, and assume that this is the target state. But again, I was trying to show a way, an approach to how to architect this as opposed to, "Here's the blueprint, go and copy it." Although there is some of that. But so then, I put myself in the shoes of a data user, a data analyst, a data scientist, a data developer. And you and I have done this work together. You have done this work, in fact. I draw from some of your work that, what does that journey look like? Then thinking about the abstraction of complexity from that journey, what are the APIs of the platform? Again, what are the API interfaces? Not the mechanics underneath it, but just what experiences we expect to create for these folks with the platform. So that's kind of part three.

Then part four, because data product or the more geeky name I sometimes use is data quantum, is just such a core, in my opinion, is such a core ingredient of this architecture that we must get right. That I go through quite a lot of chapters around all of these capabilities that now, once we have been implementing in a centralized way, like discovery and catalog and observability, and then we had massive pipelines underneath, and then we had storage beneath that, how do we still need to have some of those affordances, but in a very decentralized way, with these whole equal nodes of data quantum sharing, data across each other, referencing each other's schemas, and so on? And so I go through the details of how, again, how to think about the design of your data product as an architectural quantum in these architectures. That's part four of the book.

Then I couldn't really leave people right then, because the majority of the questions are around the organization. Like, how do I get started? Again, I was very cognizant of the fact that what does that multi-year transformation journey for a large organization looks like? And we don't really know that. We haven't done many of these cases to generalize patterns and say, "This is what it looks like," but I give enough tools to leaders, to managers, to executives, to people who are influencing this transformation journey to use, to really both map out what could an adoption of a new innovation in an organization look like, and how to think about these milestones of adoption, and we can go through some of these details of that later, but also give some tools around measuring and monitoring...what is it that we should measure as the measure of success? And again, those are just starting points. We still have to discover which of those are the right ones or not.

I talk about the organizational structure. And I really draw a lot from the work of team topologies, and you are familiar with that. We did that at one of the clients we worked together to see what are the team structures. And again, this is the start point. Even on the project you and I were together, the team grew quite rapidly and we restructured ourselves so many times, right? But some of the elements remained constant or remained to prove relevant over time. So I just extracted kind of some of those elements back into the book on the organizational structure and culture and incentives, and the things about people that we should care about every time we talk about such a shift. It's why...what, why, how technically, how from execution strategy, and how from organization. It's not for the light-hearted, I should say. It's like, it turned out to be quite a large body of work.

Samia Rahman: There's just so much there that reminds me of the past, as well as capturing everything we've learned along the way. To me, it's the recipe book right now for how you can start adopting and start incrementally and get to that North Star. I love the entire structure because I can give it to people I work with to adopt those things and then try it out incrementally. So, thank you so much for the book, and all the things you put thoughts into.

Zhamak Dehghani: You're one of the first people I acknowledged in the book for contributing.

Samia Rahman: Oh, thanks. Thanks.

Zhamak Dehghani: Yes. You had a very key part in just, brainstorming, refining, implementing some of these concepts, evangelizing within an environment that was completely unaware of these concepts, and bringing these new concepts to the folks on the ground. So yes, please don't forget what a key role you had, yourself.

Who does the book apply to?

Samia Rahman: Thank you so much. I guess I'll ask a question for the audience here. Who does this book apply to? I don't wanna give the answer. I have my own opinion, but I'd love to hear from you, who should be reading the book over here?

Zhamak Dehghani: Yes. So, that's the tricky part because it takes a village or maybe the whole organization to build data mesh. IThis book is really trying to stay at the breadth, not so much...even though I go into depth in many areas, but it tries to incentivize, mobilize, and support a whole multidisciplinary team that is gonna make data mesh real in an organization. Because that's the mission, one of the, I guess, objectives behind it, that multidisciplinary team has a lot of different roles in it. At the beginning of the book, when I introduce the chapters, I introduce who gets most out of this chapter, and what are the chapters.

Is for everyone to read. Even if you wanna talk about data mesh, you must read these chapters, but some of the chapters might be more optional or you can just browse through them if you're in a different role. So those roles are, you know, people who are in influential management or executive places, where you are influencing the direction and even decisions around data mesh, whether we should do it, whether we shouldn't do it, how much budget we should give it. Like some of those decision-makers will definitely get a lot of benefit from both the introductory material at the beginning so that you are informed when people talk about data mesh, but also you have some guidance around how to go about executing this and making decisions around it and mobilizing your people. So, a portion of the book is suitable for them.

The other role in that multidisciplinary is of course architects and technical leads and, you know, kind of solution developers. So again, if you are a developer and right now you are assessing a very specific tool, how do I use this particular, this book is not gonna give you the answer you're looking for. But if you are, you know, someone in a technical leadership position, in an architect position that you are influencing the direction of the technology and the architecture for the technology, definitely this is the book for you. So, introductory material, must-read, and then you go through more of the technical hows, part three, and part four.

If you are a member of governance or you're going to be, and you're trying to get your head around the operating model and very different approach to governance, and you are worried about the security of the data and compliance and legal, and then you're also worried about, "Oh, holy, you know, S. Now I have to give this responsibility out to the teams themselves," and you really want to be participating in structuring a new mode, operation of governance, and be also be participating in the creation of the policies as code that we need for data, definitely, there are sections for you.

Did I miss anybody else? I would say basically if you're in an operational role in that team, governance role, technology role, this book is for you. If you are a, let's say, a hands-on data engineer, and right now you are building a data mesh, I think you definitely get from the architecture part, like the language that data mesh uses, and you and I developed some of that language together, so you can definitely get the benefit, but this could get you started and help your thinking. It's not a command line, you know, or a code that you can just copy and paste. It's not at that level. But I do have some...I go deep on some areas like schema definition, like what are the existing patterns and what we can learn and what we can just leave behind perhaps, or what we can question. So yeah, so technologies definitely can benefit. But again, the assumption was that this is not a one-man's job. This is teamwork to make it possible. There are many different roles in this team and this book is for the team to make data mesh real.

Samia Rahman: I think I like to sum it up with, it brings the people, process, and technology together, and you have chapters for each persona that needs to build or consume the various things to make value out of data.

The first principle of data mesh: Domain-oriented ownership

Samia Rahman: Let's talk about the first principle around domain-oriented ownership, and then your second principle, data as a product. I see those as must-haves. There's no way out, especially in complex landscapes. What would you say the audience should take away from that? Or what would be that key call for action for them?

Zhamak Dehghani: A side-effect of those two principles is that we are bringing app developers, people who are working on operational systems closer to data, and we are bringing business and technology and data close together. We are trying to remove the last one of the last standing walls in our organizations between data and non-data. So I think the call to action is that if you are part of a technology team and building solutions, first and foremost, I assume that you are aligned, as a longstanding technology team, with a business function. If you are not, and if you are a traditional organization with a giant enterprise tech...like enterprise IT department where people just move around and you have devs that build, and then you throw them to ops that maintain all of the devs that they have built, and there's no longstanding accountability and alignment with your business, there's no action for you.

Like wait a few years or wait until your organization has moved to a space where the IT department or technology teams have organized themselves to do business. And to be honest, I've been working with enterprises now for over 10 years with Thoughtworks, and many of the enterprises are in that place already. That's kind of been the trend with digital transformation and bringing tech to the core of the organizations. Of course, there are still enterprises that...or organizations that see tech, not as a core component of the business. They just see that it's a solution that I'm gonna buy, and I just hire a bunch of integrator consultants to integrate them together. I don't have the ownership of that technology, I don't have the technology product owner internally.

So, those companies, I think they're gonna have difficulty to get to data mesh, just based on assumptions. But if you're an organization that has the tech and business alignment, and again, I'll give an example of like retail because we are all kind of familiar with it, like you have teams, even in your IT department that are really focused on the best experience of eCommerce, and you have the eCommerce folks in your brands that focus on digital channels and digital touchpoints, so you might have that team already, and you might have a team that's focused on the retail, in the shops, and maybe a team on the point of sales tooling, and a team on order management and customer and so on. If you already have that technology team and maybe on the other side of the organization, you have all of these data people, what data domain-oriented kind of owner should just say, "Let's bring the data ownership back to these domains."

So now you have tech, app-based dev data, and analytics right then and there. So now, the people who are thinking about the best eCommerce experience, but also, there are people from the data with data responsibility, like the data product owner and developer, that are thinking about how the data that we're generating as a byproduct of running this application can fuel insights, can train machine learning so that not only we in the eCommerce can personalize the experience. But the guys in the order management can also see maybe the problems...the portion of the orders from online versus the portion of reorders from retail. Also, sales can use that information and combine that with orders and the success of the orders. Sales and marketing can use that together. So, the first action is, to find your technology and business-aligned domains that are most conducive to be able to now generate and share, and also use that analytical data for augmenting their own experiences with kind of analytics and ML and locate those and start bridging that gap.

The second principle of data mesh: Data as a product

As you do that, then you need to have a new architecture, new components for sharing, and so on. As part of that, then think about data as a product. So, data as a product could be really the language that we built together, like this native or raw or source aligned data, which is the kind of the data that is being generated as part of customers who are maybe interacting with the eCommerce website. So it could be the history or the events of all of the customers and all of the countries and all of the regions that have interacted with the website. So it could be that but provided as a reusable component, and it has contracts, so it can change without breaking all of the downstream. It has specific APIs, a specific code allocated there in that data piling that keeps modifying and updating, that the data is not stale. Data product is actually a very active ingredient. It's like a data service almost for your architecture.

So still building those, but not just for the sake of having many data products for the reason of using that data product to actually generate some value, right? Generate value in terms of new insights and learnings or generate value in terms of new ML-driven data products that are, you know, generating kind of new classification of the customers or personalization cohorts and so on, so that you can act...data that you can act upon, either change your application because people seem to be quitting the shopping cart, or change the services that they provide, however you wanna take action.

Start thinking about what are these data products, how do I want to create it. And then while you do that, I think the two other ingredients, the two other pillars, again, are indispensable. Like you cannot just do that and not think about, okay, what is the platform that makes it so easy for people to create end products that have some level of standardization, so that they can talk to each other, right? So you start kind of building your platform. And again, you can prioritize, for example, supporting some consistent set of APIs around your data, or you can prioritize bootstrapping these data products.

I mean, it depends where you are. You can prioritize what the capabilities of the platforms are. And then, even from that day one, you need to bring the accountability for that data ownership and even governance concerns into the domains and have domains. Even when you're doing with two domains, have them be part of a global kind of governance body so that they can have both local incentives to build these amazing data products and enable insights, but also have these global incentives that these AI products must be interoperable, must be secure. They cannot compromise privacy. The subject matter experts or all people that understand security, and with collaboration with the platform, can figure out, okay, what are the parts of the data products we need to standardize and have policies as code around security?

Then rinse and repeat, right? Do this with one domain and another and another. You go through this kind of a scale of adoption with exploration at the beginning and figuring out some of the sensible defaults, and then get ready for the expanse, because once you are successful in showing value in one or two places, you do a good marketing job within your organization, put enablements within the organization, be ready for expansion. And I think you remember that, when we were part of the platform team, there's this particular client, where the executive would say, "We have people from domains knocking at our door who wants to use the platform. We've gotta get ready for that expansion." So you go through that expansion phase where now you can really unleash the capabilities and the sensible default practices and so on across the organization.

Then you get to a point that you say, "Okay, for this situation we're in, at this point in time with this technology with these domains that we have, we have reached a space of like extracting value now from many reusable data products that we have, from many of the features of the platform. And then this scale of course keeps repeating itself as the environment changes. So, I guess just to wrap up, I think just be super pragmatic, have your eye on where you wanna get to, that expansion and extraction and exploiting this, and I can't put this as elegantly as 3X kind of phases of any sort of technology product development, but be pragmatic as what do I need to do to get started with one domain? Who are the best domains to get started with?

And don't compromise. There are things that I think will be creating probably something, some more content around, even if you get started, the things that you cannot leave out early on like there are many things you can leave out, but you can't say, "Look, I'm not gonna think about governance and I'm not gonna think about the source team. I'm just gonna fix the middle, the data team." Well, what you are gonna get is a pretty expensive, overly engineered decentralized data team. You will still have a bottom... And so have that thin slice end to end across all of the pillars, even if it's a tiny use case and a tiny thin slice, but it goes across all of the pillars and touches all of them.

The earliest explorable data product

Samia Rahman: That fully resonates with me. I think I've heard things like building the earliest explorable data product and the MVP...it must go to production. I think a lot of people forget about that, that it has to go with the security, trust, et cetera. Then you can go into those later stages of reusability and even derive further insights out of those core explorable data sets.

Zhamak Dehghani: I think there is something there, Samia. I mean, you were part of this, so you can probably speak to this, that today, or traditionally, unlike actually perhaps operational teams where open source is so prevalent and it's so easy for us to get started with things. As developers, everything can be pushed down to grassroots developers, data initiatives are big and kind of ominous, and have to have this like multiyear plan of all of these multimillion-dollar purchases and partnerships of seven years. It's just so scary to even get started, right? And predominantly, it's been driven by which vendor I'm gonna buy or go into partnership with. Like, heavily, heavily vendor-driven approaches. So, for example, data discoverability and explorability, as you said, is like a key feature of a mesh. You need to have a window into your mesh of data products, but I would say people get started...so okay, what big, expensive data catalog do I need to buy? The data catalog space right now, it's like exploding, right? We started with the simplest data catalog and it was fine. You can tell them what we wrote, right?

Samia Rahman: I think we used Marquez or something. It's an open-source framework. But it can be very lightweight. Tribal knowledge for those first sets of data products in an Excel sheet works great. That's your data catalog. Put it in a SharePoint. People can find it, you can crowdsource it as you start scaling. And that tipping point happens, let's say, more than 10, that's when you assess which tool fits the way you operate. I think there's a general shift I'm observing, that people are being mindful of what are their criteria and then pick the technology over technology-first. So there's a bit of pushback from organizations and vendors these days in terms of, no, I wanna first see if this really works in the way I wanna deliver quality in a rapid manner. So, I think the technology and the tooling out there, they're also having to shift their way of delivering things.

Zhamak Dehghani: I think it's gonna be an interesting few years because the vendors can't overnight shift their strategy. They've been working for almost some of them decades, some of them years, around a particular division of responsibility in the organization around a particular meaning of data. Data was like this passive stuff that we dumped into a lake or a warehouse, and now we have to have all these layers on top of all of that data to build meaning out of it. And that's changing with data mesh. That completely inverses with data mesh, as in data is active, data as the tables or streams, or, files will have all of the other auxiliary kind of components around them, their APIs, their metadata, the policy, and just everything that right then and there in an isolated, autonomous way, they can maintain that data, they can share it and make it actually usable. So, then this need for this heavy-duty, after-the-fact common overlay metadata, common overlay knowledge, like those tools need to shift so that they can be...they can allow the emergence of knowledge and emerge... They become search tools, right?

Samia Rahman: Exactly.

Zhamak Dehghani: Indexes, they become the Google search. While we're going through this transitional phase of moving from the old way of doing things to the new ways, technologies...I would suggest that people picking technologies are the most adaptive to this reconfiguration and repurpose. So the technologies, as you said, that they come with, this is the way that the technology will impose the way you work. They're going to spend a lot of time and a lot of money to integrate that into a mesh configuration and perhaps that'll become a blocker for your actual rapid development of data mesh.

Samia Rahman: I think I just wanna highlight, one part of the book that I really love is this notion of a data product that really lines up with how fair calls for life sciences and healthcare advancement. You need to build out these data products that are interoperable, trustworthy, etc. I was just yesterday listening in on a webinar that the Pistoia Alliance did. I think I got that right, they're a nonprofit where some of the pharma companies came together to talk about, be pragmatic and build it from that bottom-up. Incrementally, build that ontology with the data in mind, as opposed to, let's get all the data out there, then build all...The old pattern of layer by layer, waterfall delivery no longer works. People need to shift into that agile way of...and I think data mesh really...the frame of the product, all of that really enables us to shift into that agile model of delivery as well.

Zhamak Dehghani: A lot of the concerns shift left, right? Shift left to...

Samia Rahman: Shift left.

Zhamak Dehghani: ...the beginning of a life cycle of a data product, not its end. And beginning could be right from the team that is generating the data or the team that is consuming and generating new sets of aggregates and insights. So it's really the beginning of a life cycle of your data, whether it's your data...is aligned to the source application, or whether your data is an aggregate, or whether your data is newly, you know, machine learning model generated, you know, fit for purpose kind of data, whatever it is, it has a life cycle. And right from that beginning of the life cycle, you're shifting left, quality, SLOs, metadata, all of that shifts left. And it's not once done and forgotten. It's actually...the one thing that data mesh really strives to solve is change. Change is constant. And if change is constant, and if everything is constantly changing, as in new data arrives, new data gets processed, the schema of data changes, like if everything's constantly changing, we need to have a longstanding process for managing that change and longstanding orders who understand the destination and the past of this data product to continuously evolve it and change it. So, shift left and also end-to-end manage the life cycle of the data.

The visual language of data mesh

Samia Rahman: I guess I have one fun question before we end. I love the visual language in your book, and I found it resonates with a lot of people, especially people who come from enterprise architecture or the software development world because of the hexagons. It's the icon I love. Can you share with the audience why that really helps even that visualization or the process by which people are developing things?

Zhamak Dehghani: I think the moral of the story is that you don't know what resonates with people until you experiment. So this...how it all started was that I gave this talk, presented data mesh first as a talk, and at the time even I didn't have a name for it. I called it "beyond the lake." We need something beyond the lake. And I just put this talk together in a hurry. I was coming from Australia and it was after the holidays, and so I didn't have really a lot of time to sit down and use visual tools to kind of put them in nice, neat boxes and lines together. So I just used this app called Paper on iPad with iPad Pencil and I drew them. And drawing helps me also try to clarify some of these really complex concepts, visually.

I'm a very visual person. I learn visually and I learn by listening better than kind of reading. I guess reading is last for me. It was also helping me to kind of clarify my own thinking. I put this hand-drawn diagram together and I went to the talk and it didn't matter. It wasn't so fancy. So it just was a hand-drawn diagram. And then after that, we thought about publishing this as written content. So I continued with this hand drawing. And then I told Martin "Can you give me some more time? I need to turn these diagrams to properly, you know, boxes and hexagons and so on, use...I dunno, a graphical tool?" And then he said, "It's actually fine. It's just fine. Just publish it as is."

Again, to my surprise, a lot of people came back and said, they really like that human, flawed, squiggly, kind of imperfect, I suppose, lines. And it was great because they brought down the workload and the cost for generating those items, I would just like to draw them. Then I threw them to preview and I make them transparent and then add them to the book. Then with O'Reilly Publishing, they were kind enough to allow me to just publish those. It became kind of a Data Mesh brand, I guess, over time.

The learning was like, "Do the easiest, simplest thing, throw it out there, get feedback, and maybe that's good enough. And maybe you can just do some improvements." But I must apologize to people who will be buying the print version of the book. When I drew these diagrams, I wasn't thinking that they will be used in a grayscale. I had assumed it will be used in color. So, colors have some meanings that get lost in the black and white print version. O'Reilly does black and white only. Maybe in the future, we will have a second version in color, but for now, it's black and white. I have, of course, compensated with text and labels wherever color had a meaning. So for example, I used the color...I had used the color blue, but of course, as a way of saying, "Okay, these are the standard part of the platforms you're using in your data products. But now, I've made sure that I label, so people who maybe can't see the full spectrum of the color or they're just using the print version in grayscale. So the information is not lost, but it's not also super pretty in grayscale because I just didn't know that was going to be the case when I first drew them.

Samia Rahman: Well, thank you so much. I had an amazing time talking to you. I hope the audience goes and gets this book, reads about it. I will strongly encourage all my friends. The recipe really brings success. I firmly believe in it. I'm doing it another round again, and I hope that we can get to that value that we need as quickly as possible. So thank you from the bottom of my heart, Zhamak, for writing the book. I know it's a lot of effort. It's been a long journey for you to get it out there. So, thanks a lot for that.

Zhamak Dehghani: Thank you, Samia Rahman. Thank you for this conversation and being a partner. And I wish you all the best and success in your new role. And my mother personally uses targeted therapy for cancer, so it's close to my heart and I'm really excited to see what magic you're gonna create.

About the speakers

Zhamak Dehghani

Director of Emerging Technologies at Thoughtworks & Author of "Data Mesh"