Hanna Prinz, a consultant at InnoQ, and Eberhard Wolff, fellow at InnoQ, explain what a service mesh is and, more importantly, when and if you should use it. They also delve into the main features and reasons for deciding to use one. Hanna and Eberhard are the authors of Service Mesh Primer.
Preben Thoro: Are we ready?
Hanna Prinz: Yes.
Eberhard Wolff: I was born ready.
Preben Thoro: Amazing. And we have that on the recording. Very good. Okay, well, in that case, Eberhard and Hanna, thanks a lot for joining us today. What we're doing here has been an amazing success. And we're proud of being able to continuously attract experts on a specific topic to what we're doing here. So thanks a lot for being part of this. Thanks for joining us.
Eberhard Wolff: Thanks a lot for having us.
Hanna Prinz: Yes.
Why did you write a book about Service Meshes?
Preben Thoro: Today, we're going to talk about your book about "Service Mesh Primer." Which brings me to the question. First of all, why did you decide to write a book about it? And what is it? What is the service mesh anyway?
Hanna Prinz: So when I was about to find a topic for my master’s thesis, I was at INNOQ. At this time, there was a lot of attention around the topic of service mesh. There were a lot of conference talks and blog posts, and it was a very hot topic. And we felt like this is worth a deep dive into and that's why I wrote my master's thesis on the topic. And afterward, Eberhard and I decided to continue in writing a book, a small book, actually about it.
Eberhard Wolff: Yes, so I'm actually interested in microservices for quite a long time. I wrote a book about "Microservices" like an architectural approach. And I continue to write another book called ''Pragmatic microservices about technologies for microservices." Service mesh has caught my attention because there was a new technology. I decided that I really wanted to have a better understanding of it. And that is when I met Hanna Prinz, and we decided to write the "Service Mesh Primer." I used that as material to do the second edition of my pragmatic microservices book. So that's how I got interested in it.
What is Service Mesh?
Hanna Prinz: Maybe we should talk about what a service mesh actually is. I would describe it as an additional infrastructure layer. That consists of many things, actually. So it's not just one application that you will add to your stack. It's a distributed infrastructure layer. What do I mean? I mean, distributed in the way that each of your applications, each of the containers of your microservice application, gets an additional application to its side. This is the proxy that will intercept all routing, that's coming in and out. And this is called the sidecar pattern. This is something we know for a long time.
What is really new is that there's an additional component in a service mesh that has the ability to control all these proxies that are deployed to the side of each container. So, what we could do with this kind of architecture is we can put all the cross-cutting concerns of our application, such as security concerns or monitoring concerns. We could put these in the proxy and control these features with the central part of our service mesh, which is the control plane. And we can also gather metrics and use them for a lot of things.
Eberhard Wolff: Then the next question would be, what are the most relevant service mesh features? So Hanna do you want to say a few words about that?
Key takeaway no. 1
A Service Mesh is a dedicated infrastructure layer that adds features to a network between services.
What are the most relevant Service Mesh features?
Hanna Prinz: Yes, so with these metrics that are gathered, because of all the proxies that are deployed to the side of each container, we can build, for example, a dashboard, where we can see how much traffic each workload gets and how many errors it produces. And the latency that it adds to the request.
Eberhard Wolff: Then there are other features. For example, there is security. What you can do once there are the proxies in places you can just distribute certificates to the proxies. And then you can not just encrypt the communication between the different microservices, but you can also have a mutual authentication. You can make sure that one microservice actually really talks to the other microservice and that there is no man-in-the-middle of tech or no other things like some microservices that will be deployed by a third party. And you can even have a service authorization. If one service calls another, you can check whether that is actually allowed, and if it's not allowed, because the traffic goes through the proxy, it will just lead to some security exception, and the call will not be passed through. And I think this feature is actually quite important. Because if you don't use a service mesh, it's very hard to set up this infrastructure and to reliably and securely distribute those certificates and actually encrypt the communication.
Hanna Prinz: From what I've seen, this is the main reason for many people to introduce a service mesh, because this is something that they really don't want to do manually. Because it's also risky to handle these things on your own. It's always in the field of security, I believe — it's always a good idea to optimize.
Eberhard Wolff: I've watched people trying to do that manually by themselves. And I think it's really not a good idea. So I would rather stay away from it.
Hanna Prinz: There are also some other things that you could use a service mesh for. I think monitoring and security are the most important ones. But there's also the topic around Canary releasing and A/B testing. If you would do it without a service mesh, you would actually need a lot of configuration and also some tools. If you have a service mesh, you could use this proxy infrastructure to apply a routing rules, for example, you could distribute traffic on a percentage basis, so you could write 10 percent to this use version of your service and 90 percent to the old one. You can test new versions very easily, you could even do the routing based on headers, for example.
The headers could contain your user profile data, or, you know, the kind of user that will make the request, or a region maybe, so you could roll out new versions for a specific group of people. And don't risk your whole application to fail in case it does. Of course, the service mesh can also provide additional load balancing features. And also, you could also do A/B testing with it. So you could also just route 50 percent to one version and 50 percent to the other.
Eberhard Wolff: Yes, and then there is resilience. If you have a microservices infrastructure, and one microservice fails, well, it will be called by other microservices. If that failed microservices makes those other microservices fail too, then you do have a problem because one microservice will eventually make all of the systems fail. So that is why you need some kind of resilience. And if you have proxies in place in the communication between the microservices, then you can have some features in those proxies that allow you to increase resilience. So for example, there is the circuit breaker.
The circuit breaker will cut the communication between two microservices if there is a problem. So instead of some communication locking up, you actually get an error and you don't wait forever, but instead, you get an error immediately. So that is good because that means the caller won't lock up instead it will get an error immediately. And also there are things like retries. So you can have a call be retried transparently for you because it's in the proxy, you can have a timeout again so that you don't lock up when you call a microservice that is actually blocked, and so on and so on. So, that is quite an interesting feature. And something that you actually have to take care of if you do synchronous microservices.
Hanna Prinz: One thing that is also related to resilience is the topic of chaos engineering that is also a very hot, or a very popular topic to talk about. And so chaos engineering means that you test the assumptions that you have around your application by doing experiments. So you think that if one service in your chain fails, the others can balance this error. But you never really try that. So with a service mesh, you could add errors and delays in your application and see what happens. So this is also something you could use a service mesh for.
Eberhard Wolff: And lastly, there's tracing. So if you have a microservice calling another microservice, calling another microservice, and so on, and so on, then you can trace those calls and see what call causes which other calls. Service meshes do support that because they are already in the communication so they can actually provide information to some tracing service. However, you would still need to take care of that, for example, HTTP headers are passed through to figure out that this outgoing call was actually caused by these incoming calls. And for that, you need to route through some information.
Key takeaway no. 2
Some relevant features of a Service Mesh are monitoring, security, canary releasing, resilience, chaos engineering and tracing.
When do you need a Service Mesh?
Eberhard Wolff: Okay. So with all these features and advantages, there is still the question of when you would need a service mesh because it's another complicated technology that you would add to your technology stack. So Hanna, what do you think about that?
Hanna Prinz: I think the first impression that most people have, when they hear about the service mesh is, "Awesome, you know, we can fix a lot of problems we have around our microservice architecture, and it does make the introduction of my microservice architecture so much easier." And it's true because the ideal place for service mesh is the microservice architecture. Where you have many services and many different languages because this is the point where it's really hard to put all these cross-cutting concerns in each of your services. Because if you have different languages, you can't rely on the same library, for example. And so, yeah, if you have different languages, you really don't want to implement these features in your services. And also, a service mesh solves many problems, not only around microservices but specifically around microservices that communicate in an asynchronous way. So if actually, your services wait for the answer of, you know, of the call that they do make, and also if you have a deep call hierarchy, a service mesh is also really useful because you get information about where an error is coming from, where a delay is coming from. And so if you, for example, have a non-synchronous communication pattern, you don't really need all of these features. So for example, monitoring the latency, will not be a concern, because you don't have much latency if you communicate asynchronously. And also, resilience is mostly a topic for microservices that communicate in asynchronous ways.
Eberhard Wolff: So you could actually argue that what Hanna has just said is what you're aiming for in a microservices system. So one of the advantages is that you can actually have all these languages, so obviously, implementing these features in a service mesh is a good idea. You want to have maybe a lot of microservices because obviously, they are so great, so why not have a lot of them? And well, synchronous communication is easy to think about. So probably that's what you want to do. And actually, we are not that sure. So if you go through that list, first of all, there is this heterogeneous technology stack, many languages, many different frameworks. And I'm not sure whether you want to aim for that because it means that there is additional complexity. Because we have to understand all of these languages, frameworks, and so on. And what we are seeing out there in the projects that we are consulting is that actually, people limit themselves to a few choices, probably even just one. In some corners of the system, they use some specific language like, for example, Python for machine learning, which makes a lot of sense. But the main system will be implemented in Java, for example, with Spring Boot. And then the advantage to put that not in a library, but in the infrastructure like a service mesh is probably not that great, that big. Then there is the question of whether you want to have so many services.
Again, that adds extra complexity. Now, you could argue that "Well, we do want to have these services and we don't want to limit ourselves," but if you look at the domain-driven design, and bounded context, those are actually pretty coarse-grained things. They have their own domain model. So and the domain model is only justifiable if you have quite a lot of functionality. And that means there are costs-grained services. And that is actually what I propose, as a starting point to come up with some microservices. Maybe there are not too many, but rather large ones.
In terms of communication, well, that is actually what made me write the practical microservices book. To give an overview of all these different communication paradigms that you can have. So you can have asynchronous communication with messaging, you can have asynchronous communication with risk by pulling information from other services, you can have UI integration, where one part of the UI is delivered by one microservice, and you can have another part of the UI delivered by another microservice and merge them together. And those things are maybe even beneficial.
So I think asynchronous communication is great, because, well, if you do that, then you assume that communication will happen eventually. And that means if some service isn't there now, you can actually just wait for it. And it's sort of built-in because you're not waiting for the result anyway, it will just happen at one point. And also UI integration is something that leads to loose coupling because all the functionalities including UI is important microservice. So there is less need for services to communicate with one another. And there is less need to have changes distributed across many microservices. And that means sort of the sweet spot that Hanna Prinz was talking about, is probably not what you want to aim for in your microservices architecture.
Hanna Prinz: Yes, I remember in the Book Club, there was also a conversation between Sam Newman and Martin Fowler, and they were also talking about microservices. I agree with what they said that microservices should not be the first option to think about, so there are many others.
In our opinion, asynchronous communication with microservices especially should not be the default. And I believe it's worth investing a lot of time and thought for your overall architecture, your communication pattern, your microservices or not microservices architecture, instead of throwing more, like for more service mesh tools on top. So, I think it's really hard to find the right choices for your environment. And not just see, for example, talk at a conference and you just want to try it out, and you believe that it will help you a lot. But it's really hard to tell before you actually experiment. So, because of this problem, that you don't really want to change your system all the time, and you don't want to follow trends. But at the same time, you need to do experiments. I think it's important to establish something in your team that you have space to try things out. And also the possibility to exchange experience with other teams, with other companies, for example, you could go to meetups, you could do podcasting, you could go to conferences, and exchange the experiences you have.
I think most teams will gain a lot of good, a lot of value about this, rather than, you know, applying something they saw on a conference talk. I finally believe that architecture is something you really want to decide for your system, and your business case individually, and there's no really a default answer to it.
Eberhard Wolff: I think this is also very important. As an architect, you have to decide on your system, which architecture fits that specific system. You shouldn't rely on someone telling you, "Well, obviously, for all the cases, this is the architecture." This means that we end up in a situation where we have this great technology, but we probably shouldn't be aiming for using it, because then something is wrong in the architecture. I believe, Hanna, you've written a blog post about that, right?
Hanna Prinz: Yes, there's a blog post that is called "Happy without a service mesh," which is exactly about this topic. That many people come to us and say, "We want a service mesh." And then I tend to say, "Well, I'm not sure you really do. Because let's see what is your problem really? And it might be on a different level than a service mesh?"
Key takeaway no. 3
The perfect use case for a Service Mesh is a microservice-based application with many services and maybe even different languages.
What considerations do you recommend before adopting a Service Mesh?
Hanna Prinz: Let's say you have decided that a service mesh is a good idea for your system, then Eberhard, what considerations do you recommend before you adopt a service mesh? So what are the downsides, if it's useful to you?
Eberhard Wolff: Well, obviously, there is a performance impact, or I should rather say latency impacts, for every call now goes through two proxies, and that adds some latency time. And that might be quite important. In particular, if you have a lot of distributed communication, and that's the sweet spot, as I said, or as we said, and that might be very interesting, because there are those studies that say, even a slight additional latency makes you generate less revenue in e-commerce site, for example. So that's important. Obviously, there are some resources that you spent on it. So the proxies have to run somewhere. Maybe it's important to keep in mind that those features that we were talking about, like tracing or reporting all those metrics, you want to have them in your system anyway. So you have to compare them to the other way of implementing that. So that is probably something that is important. And last but not least, it adds complexity. So if we talk about a service mesh, like is Istio or Linkerd, that's another piece of your system. And that means you have to install it, you have to configure it, you have to make sure it runs. And that is something that that adds some complexity. And they are actually quite complex pieces of software. I believe, Hanna Prinz, that Istio, for example, has about as many lines of code as Kubernetes, right?
Hanna Prinz: Yeah, I think it's a bit less than Kubernetes. But it's indeed, it's a huge amount of code that service mesh consists of. Yes.
Eberhard Wolff: Obviously, that's not like the perfect indication of its complexity. But still, when I heard about it the first time, I was pretty surprised that these are such large pieces of infrastructure.
Key takeaway no. 4
Tailor the adoption of a Service Mesh to your own situation and consider implementation specs, usability and mental complexity
How do you choose the right Service Mesh for your team?
Hanna Prinz: There are many choices you have when you decide on a service mesh. Most people know Istio because it has been in the professional media for a long time. Obviously, there were a lot of marketing options that they had. Another service mesh that many knows is Linkerd which is as old as Istio. Or maybe even it's a bit older than Istio, which is made by Buoyant and it's a smaller company that builds it, and they have taken their experience with the first version of their Linkerd service mesh. They had decided to rebuild it completely because they saw many challenges in configuration, and usability, and performance.
Linkerd already has version two, where they decided to optimize for performance and for usability. There are approaches like zero-touch service mesh, while Istio is more like a very high configurable service mesh that you can configure anything, pretty much anything you want. And there are other options too, other than Istio and Linkerd. And we have collected all the information about these different service mesh implementations on a website that is also licensed under Creative Commons. This is servicemesh.es. And we have gathered information about all the features. And also we have links to the documentation where you can find how to implement it.
Although I believe this table that we provide is very useful, I think it's also important to really look at the other implementation details such as the concepts, so for example, Istio, which is a project of Google and IBM, decided to introduce a new pattern for handling Ingress traffic. So if you run Istio on Kubernetes, which is right now, the only use-case, or the only possibility, you need to rethink the way that traffic goes into your application. You have to apply this new way of configuring that. So it's pretty invasive at this point.
Other than features and those details, I think that usability should also be an important topic for your team. So you should really experiment with the implementations. As I said, Istio is very configurable. And if this is important to you, then you know, maybe something like Istio would be a better option. This is also something to think about. If you prefer installing it and leaving it like, like it is or if you want to maybe even put more features under your service mesh, which you could do, for example with Istio. As I said Kubernetes is the infrastructure for Istio, and also for Linkerd, but many companies might not run all of their infrastructure with Kubernetes. If you have this case, then you have to also think about which service mesh also supports workloads running in VMs, for example. This is also something we have compared on servicemesh.es.
What Eberhard has just said performance and resource consumption is also a very important thing to compare.
Although I agree that that latency is an issue, the mental complexity of a service mesh that you introduce on your team, and the team might also be very busy already with learning everything around Kubernetes, and Docker, and Prometheus, and all these things. So I think you have to think about many things. I would recommend everyone to think about the mental complexity of the mental capacity of the team first.
Eberhard Wolff: Even though a service mesh would probably, I mean, some of the sort of the thing of, are implemented in the service mesh then, but still, you have to take care about installing that and taking care of that. So it's still something that's probably out of the hands of the developers, but it's in the hands of the people caring about the infrastructure, right? I think that's how you could sum it up.
Key takeaway no. 4
Tailor the adoption of a Service Mesh to your own situation and consider implementation specs, usability and mental complexity
Practical start with Service Mesh
Hanna Prinz: Yes, that's true. So finally, for those who want to do these experiments that we recommend to do, Eberhard Wolff what are places to start?
Eberhard Wolff: Well, you already talked about one, right, so the website servicemesh.es is something where you can take a look at all the different implementations. There's obviously the service mesh primer, that gives a nice introduction to the ideas. And then there are tasks for Istio. So those are, I actually liked them a lot. Those are sort of small hands-on tutorials that walk you through specific features that we talked about. And there is something very similar for Linkerd. So that's a good way to get some hands-on experience with it. And also for the service mesh primer and for other purposes, we have written some examples with extensive documentation about how you can get an application running on Kubernetes using service meshes. We have one for Istio and another one for Linkerd. So that is also something that you can just try out to get your hands dirty. Anything else we want to add to you?
Preben Thoro: This has been amazing. I like the way you got around things. And then it all tied up to, so how do we actually start here? Wonderful. Well done. Thanks a lot for joining us today.
Eberhard Wolff: Thank you.
Hanna Prinz: Thank you
Key takeaway no. 5
The book and the Service Mesh site form the perfect start into Service Meshes.
Other books mentioned in this episode
About the authors
Eberhard Wolff has 15+ years of experience as an architect and consultant — often on the intersection of business and technology. He is a Fellow at INNOQ in Germany. As a speaker, he has given talks at international conferences, and as an author, he has written more than 100 articles and books about microservices, technologies for microservices and continuous delivery. His technological focus is on modern architectures — often involving Cloud, Continuous Delivery, DevOps or Microservices.
Hanna Prinz is a consultant at INNOQ, focusing on service mesh and infrastructure. Before that, she worked as a developer for backend, web and apps, and as a lecturer for programming. Ever since she experienced the challenges of ops, she has been most interested in the field of automation and DevOps like Kubernetes, CI/CD and Service Meshes.