Unlock the power of data-oriented programming with this groundbreaking guide ‘Data-Oriented Programming: Reduce software complexity‘, introducing a paradigm that revolutionizes software design by representing data through generic immutable structures. DOP simplifies state management, streamlines concurrency, and eradicates common issues in object-oriented code, all while offering language-agnostic flexibility. In this GOTO Book Club episode, author Yehonathan Sharvit spoke to James Lewis about how you can change the way you look at programming where code is clearer, state-related bugs are history, and your applications are more robust.
The appeal of Clojure
James Lewis: Hello, everyone, and welcome back to "GOTO Book Club". My name's James Lewis. And I'm here today with the author of Data-Oriented or Data, "Data-Oriented Programming," Yehonathan Sharvit.
James Lewis: Welcome to "GOTO Book Club."
Yehonathan Sharvit: Happy to be here.
James Lewis: So, we're here to talk about your book. Can you just give us a brief introduction to the content?
Yehonathan Sharvit: First of all, it's my first tech book. Maybe it'll be the last one. I'm not sure. I did it with Manning, and I quite enjoyed it. It took around two years to write the book. Manning has this very interesting early adoption program. So, I didn't have to wait until the book was completed to get feedback and see how many book copies I sold. I'm a developer and I'm in the Agile world, so I need immediate feedback. I'm not sure I could have waited for two years before getting any hints. So, after three chapters were published, it was launched and people bought it and wrote about the book, what they liked, and what they didn't like, and I was able even to change a little bit the table of contents based on feedback. So, I enjoyed the process.
James Lewis: It's almost like a Leanpub type process where you get very, very early feedback. Yeah. Awesome.
Yehonathan Sharvit: Yes.
James Lewis: Maybe you could just give us a brief introduction to yourself as well, for the folk out there who don’t know you.
James Lewis: What is it you think about the Clojure programming language in particular that gives you that kind of, that real buzz?
Yehonathan Sharvit: That was kind of difficult to nail down. I discussed that with the Clojure community after I attempted to write a book about Clojure with Manning, an attempt that didn't succeed. So, there was an early program launch, but 100 people bought the book overall. So, Manning told me, "Look, we like you, but we cannot do business with you. So, can you come up with another topic for the book?" And I say, okay I cannot convince people to adopt Clojure, but I like it so much that there must be something in Clojure, like a universal paradigm that I could share with the community of developers. So, I spent a couple of weeks and months with Manning folks, and with Clojure developers to figure out if is there a way to formulate Clojure principles in a non-Clojure way. That's what "Data-Oriented Programming" is about. So, it is a way to get all the Clojure goodies without learning Clojure syntax.
Data-oriented Programming Goals
James Lewis: So, when we were talking before, you talked about the principles that you set out. Maybe you could talk through some of those principles, that would be...
Yehonathan Sharvit: Yes. So, just before the principles, what is the goal of... each program partner has a goal. It's not that it's the best thing for everything. So, the goal of data-oriented programming is to reduce the complexity of information systems and complexity, you know, not of thing, not the computational complexity, but more the system complexity. By that I mean the amount of suffering that you experience when you try to understand the system or to add new features to a system. So, that's complexity. My claim is that if you write your code following data-oriented programming principles, then you will reduce the level of complexity. Make sense?
James Lewis: No, it does make sense. When we talk about complexity, we often talk about the difference between accidental versus essential complexity, so are you talking about the same sort of complexity, the sort of stuff around the problem?
Yehonathan Sharvit: The essential complexity if you want to reduce it, you have to move to another problem. Assuming that you are forced to solve the current problem, you cannot reduce the essential complexity, but you can reduce the amount of accidental complexity that you add to your work. So, yeah, we deal with accidental complexity.
James Lewis: I mean, I think maybe some other people would argue as well that some programming languages are better at dealing with certain types of essential complexity as well. I mean, if you find that with Clojure you get to solve problems in a slightly different way when you...
Yehonathan Sharvit: Right.
James Lewis: A full disclaimer here. I have written Clojure codes. I'm not entirely ignorant of writing Clojure. But when you write Clojure, do you find that certain problems fit it more?
Yehonathan Sharvit: Yes. So, this kind of problem is what we call the information system. For example, a backend software that takes data from the database, manipulates it a little bit, and passes it forward, or front front-end application with a state that we need to manage. I don't think it's a good fit, for data-oriented programming, for writing databases or writing embedded systems or routers or stuff like that, where you need highly computational things and when performance is the key. But when you manipulate data here and there and when you... Also, one important aspect of it is that when your program manipulates data that it doesn't own, when the owner of the data is another system, for example, the database is the owner of your data and your backend only has to pass data around. If you write a compiler, for example, you own the data. You know exactly what is the structure of your data.
James Lewis: Yes.
Yehonathan Sharvit: There is no surprise. So, for that kind of problem, I'm not sure that data-oriented programming is a perfect fit, but for dealing with the real world and surprises of the real world, I think it's a good fit.
James Lewis: We have a sort of old joke in Thoughtworks, which is maybe even 90% of our job is taking data from A and moving it to B. And the other 10% is showing it at C, right? I mean, that's pretty much what most enterprise developers do.
Recommended talk: Database as a Value • Rich Hickey • GOTO 2012
James Lewis: So, I guess to recap a bit, the goal of the book is to introduce the way of thinking that Clojure developers use, if you like, when they're once in Clojure, to introduce that to a wider audience and to give a wider audience the sort of tools, the mental tools, to think about problems in a way that and solve problems in a way that you might solve with them with a function programming.
I wrote the book as a story, as a meeting between an old or a 5 or 10-year-old Java developer who struggles at work. He works for a consulting company like yours, but another, the name is Albatross, I think, in the book. He has a problem meeting deadlines because of their complexity. And then he meets Joe, a Clojure guy, that is very enthusiastic and wants to reveal all the mental models from Clojure, but to formulate them in a non-Clojure way. The book is the story of their meeting. And the guy is not a yes man, the Java developer. So, he asks many questions, and the Clojure coach, the DOP coach has to refine his insights and teachings, and it allowed me to manage the two voices inside me to make it, you know, a dialogue.
James Lewis: That sounds like a really interesting approach. So, it's almost like what we would call a business fable in some senses.
Yehonathan Sharvit: Like "The Goal."
James Lewis: Like "The Goal," exactly, like that sort of structure where you have someone who's sort of taking the individual on a journey towards enlightenment.
Yehonathan Sharvit: Yes. And, you know it goes back to Socrates.
James Lewis: Yes. Then the dialogues.
Yehonathan Sharvit: Like the dialogues.
Treat Data as Data
James Lewis: Super interesting. So, we've heard a bit about what the goal is, as it were. But also, I mean, so you sort of said about some of the tools the sort of mental and the sort of the technical tools and principles and things. Maybe you could talk to us a little bit about those.
Yehonathan Sharvit: So, there are four principles that we could summarize as one meta-principle that can be expressed in four words, "Treat data as data." So, that's data-oriented programming. That's very simple. "Treat data as data."
James Lewis: I would say data. So, I would say, "Treat data as data." Though data is...I get shivers.
Yehonathan Sharvit: Or treat data as data.
James Lewis: So, treat data as data. Treat data as data. So, maybe you could unpick some of that for us.
Yehonathan Sharvit: Basically, what does it mean to treat data as data or data as data? It means that data is a first-class citizen, like in object data programming, objects or classes are first-class citizens. In functional programming, functions are first-class citizens. In business class, rich people are first class, you know. So, here data is the first class citizen1 . And what does it take? So, it takes four principles. Principle number one is to separate between code and data. Unlike what OP teaches or has taught for years, we don't want to encapsulate data and functionality together in objects. We want data to have the right to live on its own.
And then the cool thing is that the behavioral part of the system becomes stateless, stateless functions or stateless methods. As you probably know, the state is the number one enemy. So, we cannot avoid the state, but we want to tame it by localizing it. So, only some slight pieces of our code will deal with the state. But all the other pieces are stateless. It's like a dream for unit tests for debug ability for code reuse. So, that's principle number one. We separate between code and...
James Lewis: And treat data as a first-class, as the first class.
Yehonathan Sharvit: It's not enough. It's not there. It's a required condition, but it's not enough. So, after we have done that, that's principle number one, comes the question, how are we going to represent data? Would we use structs like in C? Will we use records like in modern Java? Will we use dictionaries like in Python? So, the answer that DOP gives is generic. We use generic data structures, hash maps, dictionaries, arrays, lists, and stuff like that. We don't need anything more to represent our data. Principle number four will deal with its limitations. But before the limitation, let's down to the advantages of it. So, first of all, you can create data as you go. You don't have a ceremony before you create data, right?
Like with numbers, when you create a number 42 and you want to add one, you are not creating a class of even numbers or even numbers bigger than 40, or even numbers bigger than 40 and less than 50, right? You just create the number, you have the right to create it. And you can manipulate it, and you can pass it around. It gives you lots of dynamics and flexibility. Also, you can manipulate 42 with all the operations that are there for you, multiplication, addition, logarithm, exponentiation, and any complex math function is available to you no matter if it's part of the language, like the basic one, or it's written with the library. So, the same, if you represent data with generic data structures, you can leverage any piece of functionality that has been written by anybody.
James Lewis: That's super interesting actually because, I mean, in my mind, I have been for a very, very long time, super interested in domain-driven design, right? I'm sure you've come across domain-driven design. You're essentially trying to solve similar problems with domain-driven design, but sort of trying to solve them in very different ways. So, for example, one could argue that tiny types would be a sort of almost the opposite approach, right? So, rather than just treat data as generic, it's a number, or it's a, I guess a string or whatever it is, right? With tiny types, you would try and you would create a type that encapsulates the...
Yehonathan Sharvit: The specifics.
James Lewis: Exactly. You're right. The specifics of the thing you're trying to describe. I mean, I remember years ago, back in 2005, you turn up at a bank, or even later you turn up at a big, big project and you'd see a big code base, and it was all hash maps and it was all dictionaries. And the problem was, it was really hard to reason about all these things in amongst all this, you know, all this sort of giant sort of object-oriented sort of, with these object-oriented structures. That's why, sort of, I think back in 2009, I did a talk with Daniel Terhorst-North on programming in the large, where we talked about, one of the things to do is to introduce a domain model, right? Because that allows you to reason about the things that you're working with. I mean, how do you solve that problem in data-oriented programming where you want to be able to reason about the businessy things that your code is doing as well?
Recommended talk: Domain Storytelling • Stefan Hofer, Henning Schwentner & Avraham Poupko • GOTO 2022
Yehonathan Sharvit: Yes. So, that's a great question. That's the most challenging question. And that's principle number four. So, let's go to it. And then we come back to principle number three. Who cares about order? So, right, when you have small code, when you write your...and you do homework for university, that's fine. You have a hash map and you know what the hash map what are the fields. You don't have surprises, but when the team grows, when the code base grows, you have a function. I don't know, the amount that receives something called user. Maybe it's called user, maybe it'll be called U. And it's a hash map. So, how do I know what are the fields in this hash map? If you're lucky, you have a documentation string that says, in this user thing, you have a user ID, you have an email address, you have blah, blah, blah. But documentation is sometimes not there and sometimes is inaccurate. The code evolved and someone forgot to update the documentation.
And for that object, the programming shines because you have no mistakes. For each part, you can inspect the type, and you have autocompletion. So, that seems to be a problem. So, we want data validation. Right? Another problem is that if you don't do data validation and you make mistakes, you will encounter errors down the stream. So, instead of having the error in the amount function and saying, oh, the user is invalid, you will have a full function that says the X is not defined. Where the hell does it come from? And I've seen that. That's the problem of dynamic languages.
James Lewis: Yes.
Yehonathan Sharvit: And quite recently, I would say over the last five years, there have been development and awareness in the dynamic programming language community that we need something. We don't want the limitation of static types, but we don't want the far west of the dynamic type. We want something in the middle, or maybe we want both. And there are ways to have... How do you say it? To cut the cake?
James Lewis: Yes. Both, have your cake and eat it or something.
Yehonathan Sharvit: Yes. So, there is a way.
James Lewis: Yes.
Yehonathan Sharvit: And the key is to separate between data representation and data validation. So, we want both, but we don't want them to be entangled. Right? Someone commented in my book that every paradigm or good design principle is about separating things. So, here again, we want both data representation and data validation, but we don't want them entangled. We want them separate. We will decide, and we, the developers will decide when we want to validate this piece of data. Sometimes we don't want to. Sometimes we have a function that is just a utility function, and it could handle various kinds of data. Let me give you an example of that, you could write a function that receives a hash map and receive a list of keys that need to be renamed or removed. So, it works with any hash map, not only with users and books.
So, this function, you want is generic. You don't want to limit yourself in any way. On the other hand, you have a function called checkout, and you receive a card, and you don't want this function to work with a user., It has to be a card. So, there are schema languages like JSON Schema that allow you to express the schema, the expected schema of your data as data. And some libraries allow you to say, "Here, I have a piece of data. I have a schema. Could you please validate for me that the data is conforming to the schema? And if not, tell me exactly why. What are the missing fields? What are the invalid fields?" And we could use it in areas where it's the only way to go.
Let me give you an example. If you have an HTTP server that receives a request, you cannot force the user to send you a valid request. You have no compiler over the wire, right? So, how are you going to manage that? Here, you have the schema, you have the payload schema that could be expressed in JSON Schema or something else. And you have the user request, which is a string. So, first, you deserialize the JSON string. If you can make it an object, you move forward. If you're not able, you send an error to the user in valid JSON, but then you have a valid map that might or might not conform to your schema. So, you use these JSON schema libraries and you validate data with the schema. And if it's not valid, you are able dynamically to send in the response body of the error, "Hey," to send back the error that is created by the library. Like, here I expect a user. It should have a user ID, and it should be a number. Here is an email. It's an invalid email.
James Lewis: That's super interesting because, I mean, so how does that work with... I mean, so, full disclosure, you know, I've done a lot of sort of integration work over the years. And, you know, what I'm thinking about here is Postel's law that's coming into my mind. Right.
Yehonathan Sharvit: What?
James Lewis: So, Postel's law, so from TCP of IP.
Yehonathan Sharvit: Be generic in what you receive and be hard in what you sense.
James Lewis: Yes. That's the paraphrasing.
Yehonathan Sharvit: Yes. I mention it in the book Chapter 12, I think.
James Lewis: That's it. And it's one of these things where, over the years when I was involved with building systems that were doing things like passing, lots of XML, or JSON these days, you know, we developed lots of techniques which were to only bind to the things that you need in the response that you're getting, right? So say you get a user, and the user has a first name and last name, a full name, a first-line address, a second-line address, and a full address, right?
Yehonathan Sharvit: Right.
James Lewis: Depending on what you want, you either buy into the first name and last name, or you buy into the full name. But you don't expect necessarily everything to be there because you don't necessarily need everything. And what that allows you to do is to use patterns like expand contract for, you know, interfaces where you say, okay, you know, as long as you're only buying to the things you need, I can add extra fields in without breaking my consumers, if you like.
Yehonathan Sharvit: And you don't have to create a new class.
James Lewis: No, exactly.
Yehonathan Sharvit: Probably people know from Java, you have user, user and database, user for controller A, user for controller B, you have thousands of classes that are just a subset of the field.
James Lewis: Yes, exactly. And so, you have that problem, but then you also have this thing about how you evolve interfaces between systems. How are you able to add new fields? You should be able to add new fields to a response without breaking things.
Yehonathan Sharvit: Breaking, exactly.
James Lewis: But that only works, if you are using straight schema validation
Yehonathan Sharvit: Yes.
James Lewis: That's gonna blow up, right?
Yehonathan Sharvit: Right.
James Lewis: Does this make sense?
Yehonathan Sharvit: And in JSON Schema, there is natively this concept of required or non-required field. I think that still in Java, the nullability is a problem, right? If a field could be there or could not be there, you have problems. In a hash map, either you have it or you don't have it. It's not nullable. In the schema, it could be required or non-required. So, it seems to me that it's the proper mental model to deal with systems that communicate over the wire. I must admit that in terms of tools, we are not there yet. This means that if you have a function that receives the user and you have the JSON Schema of the user, your ID will not completely be able to auto-complete the field based on the JSON schema. But we are starting to get there.
James Lewis: I find it super interesting because we're obviously, we're in the world of dynamic languages here, right? And obviously, there's a whole different set of functional programming languages that are static types, the sort of Haskells of this world, the TypeScripts in the end now, I guess. And they're all trying to solve similar problems, but they're solving it in very, very different ways. What is it you think about data-oriented programming in particular, that gives you something extra that gives you something different from the other solutions?
Yehonathan Sharvit: I think it gives you the freedom because even in TypeScript when you want to write a generic function, it's difficult. TypeScript, you have to work against TypeScript. I try to use a library like Lodash in TypeScript, and for most of the functions, it works. But for some of them, the types are too dynamic that you have to tell, "Okay, you know what? It's any type. Don't care. I don't care." And so, back to what you said about the tiny types and the big types, I think that we want the ability to have when you have low-level pieces of data that are generic, to have the ability to say, "I don't mind." I don't want the compiler to bother me, and I don't have to do things like that to convince the type system that it's okay here. I want to decide.
And for the big things, for the high-level business entities where there are no surprises, you want validation, but you want also the ability to deal with surprises or nullability, which I'm not sure that TypeScript or Java are yet there. But I think both communities have to learn from each other instead of fighting and saying, "Types are good. Types are bad. We are better. You are better." No, there is a sweet spot in the middle. And I've been in touch with James Clark from Ballerina. Ballerina is an interesting language that leverages something called a flexible type system, which tries to bring the best from both worlds. It's not static. It's not dynamic. It's somewhere in the middle.
And James gave me this great analogy that types are not maps. They are glasses through which you look at reality. But the reality is untyped. And sometimes you could look at the same reality with pink glasses or with blue glasses. And it's the same reality. So, back to your example, you have a hash map, and from my perspective, it's a user with just the first name and the last name. So, that's my glasses. And from your perspective, it's a user with first name, last name, full name, and address. But the reality is the same. It is just you change glasses. And I think that traditionally, languages and static-type systems tend to confuse maps with reality. And you superimpose your mental thinking of the reality and you consider it as the reality, and that's the source of suffering.
James Lewis: Who is it? James?
Yehonathan Sharvit: James Clark.
James Lewis: James Clark. So, thank you, James. I'm gonna steal that. That's going into my toolkit for explaining stuff. Because that's brilliant.
Immutability in Data-Oriented Programming
James Lewis: So, we talked about the, well, the first two and then the fourth principle. We said we'd come back to the third principle.
James Lewis: Cool. Well, I mean, it sounds super interesting. I mean, my background is in distributed systems and that kind of thing. And one of the things that attracted me about distributed systems was that it was another way of managing complexity. So, when faced with a big problem, chop up into lots of really small problems and solve the small problems, you know? Sam Newman a good friend of mine and author of "Building Microservices," sort of says that quite a lot. That was one way I sort of settled on in the world of solving the problem of having big things that were complicated and complex with lots of accidental complexity to take those things and break them up into smaller bits.
Recommended talk: When To Use Microservices (And When Not To!) • Sam Newman & Martin Fowler • GOTO 2020
This is a different approach to try and tackle the same problem, the sort of cognitive overloads you get when you're trying to think about these large code bases and trying to work with them and so on. I mean, there's one thing that always, and maybe this is a bit mean, I don't know, but one thing that I always struggled with Clojure, and I think it's because I'm stupid. I think that's why is like more so than any other language I've ever written, maybe barring Fortran 77 back in university, when I came back to Clojure after I'd written it and looked at what I'd written, it was the language I find most difficult to understand the code that I had written. Does that make sense? It's almost like I'd write some Clojure and I'd solve a problem in a really elegant, beautiful way, in a really small number of lines of code. And then I'd come back a week later and I would have no idea what it did.
Yehonathan Sharvit: Oh, I think, yeah.
James Lewis: Does that resonate?
Yehonathan Sharvit: Yes. I think I had the same kind of thing when I discovered functional programming and the power of anonymous functions. And I would write the whole data pipeline with 10 lines of anonymous function that made me feel very smart, but that nobody could read. It took me a while to discover that probably the better way to write it is to give a name to each step to give a name to the function and to call them one after the other. So, it might be less elegant and might be a little bit more verbose, but I think it's much more readable for other people or other versions of myself.
James Lewis: Exactly. Future versions.
Yehonathan Sharvit: Future version of myself, immutable version of myself. But, in data-oriented programming, the interesting thing about it is that you don't have to leave your zone of comfort in terms of the way your habits, the way you write code, the way you encapsulate...
James Lewis: Your favorite language.
Yehonathan Sharvit: ...modules for language, it's a different way to represent your data. And if you take into account principle number four about schema and data validation, and don't go too much into the wild I don't think you will have this problem. You will write code in a way that should feel familiar to you. And hopefully, you'll have fewer bugs. Or when you want to add a new feature, it'll take less headache to figure out what's going on. You won't have to understand the whole system to modify a bit the functionality of a small part of the system.
James Lewis: So, what I'm hearing is we should buy the book because it'll teach us how to write code that's gonna better able us to add new features and modify existing codes more easily. We'll hopefully have fewer bugs. These are big promises. I'm certainly gonna take a look at your book. It sounds super, super interesting. But thank you very much. I think we'll call it a day here.
Yehonathan Sharvit: Thank you.
James Lewis: Thank you very much for coming to the "GOTO Book Club." It's been a pleasure.
Yehonathan Sharvit: Perfect.