Home Bookclub Episodes Kubernetes: Up a...

Kubernetes: Up and Running

Brendan Burns | Gotopia Bookclub Episode • August 2021

You need to be signed in to add a collection

Brendan Burns will take you on a journey through the life of Kubernetes. Where does it stand now, what is its history, and what’s waiting for us in the future that you might not expect?

Share on:

Copied!

Transcript

Intro

Matt Turner: My name is Matt Turner. I'm an SRE at Marshall Wace here in London. I've been working with Kubernetes for quite a while. I feel like I know a fair amount about it, but with me today is Brandon Burns, co-founder of Kubernetes and co-author of "Kubernetes: Up and Running: Dive into the Future of Infrastructure." Brendan, welcome. It's a pleasure to meet you.

Brendan Burns: Thanks. It's really great to be here and have a chance to chat.

Matt Turner: I'll dive straight into my first question, which is kind of going to ask you to introduce yourself. I was wondering if you could tell us briefly what your role is now. I understand what a normal vice president does, but you're a super, super technical guy. So what are you up to day-to-day?

Brendan Burns: I'm generally responsible for cloud-native open source on Azure, Microsoft's cloud, as well as sort of all of the APIs that drive accessing the cloud. My teams are responsible for the metadata layer that you interact with when you interact with Azure. It doesn't necessarily mean that we go and provision the VMs, but we deal with the templates and policy and access control. So generally cloud-native and DevOps for Azure. But as you say, I can't help myself from coding as well. So in addition to running a fairly large set of teams, I'm also doing things like maintaining the clients for Kubernetes. I maintain the Java, Javascript and C# and actually these days C as well. There's a nascent C client for Kubernetes being developed. It's a good reminder of how painful it is to program in C.

Matt Turner: I bet it's a nice difference from all the big company meetings that you might have to go to.

Brendan Burns: Yeah, for sure. I learned over time that, as you have other responsibilities, it's important to pull yourself out of the mainline but find those areas where you can have an impact and where you can contribute. If I have to go away for a week, deal with a customer or go and do a bunch of meetings, nobody's going to really care if the Java client is delayed a little bit. Whereas being in the mainline, reviewing PRs, and stuff like that, you're introducing a lot of latency for other people.

How involved are you with the Kubernetes movement these days?

Matt Turner: How closely do you follow upstream Kubernetes? It sounds like you're not hands-on as a maintainer. How involved are you with the original project these days?

Brendan Burns: I'm involved with it in the sense that a lot of what we do is helping deliver success for our customers using Kubernetes. I'm pretty involved with it in terms of understanding, how it's evolving, release to release, what's going to break release to release, what's gotten better release to release and trying to anticipate potential problems at scale. We have thousands and thousands of customers, and it's a little bit different. You see something and you're like: "Oh, probably most people won't hit that.." That's true. But once you have tens of thousands of customers, that probably becomes a "certainly." I’m trying to think about that and usability as well.

There has been a really interesting discussion around: Should we add confirmation of delete to kubectl? It is kind of an interesting question that seven years in we're only just starting to have that conversation. It is interesting to see the different perspectives from people who have all the scars from accidentally deleting stuff. And then also the people who are writing automation in scripts and who are like, "I don't want to add dash dash yes to every single line and every single script that I've ever written." One of the things I think about is actually, I feel like this should be a cluster administration capability where I can say "Hey, these are the things that I care about. And regardless of whether you put dash dash yes or not, you're locked.” You need to make an explicit gesture, like removing an annotation or similar, to delete something.

One of the other things I do a lot of these days is cross Azure live site reviews, postmortems and things like that. And, when you do that, you get this breadth of perspective about how accidents happen. I have a friend who's a vocational counselor, who helps people rehabilitate after job site injuries. His perspective of how dangerous it is to work in particular industries is a little bit skewed by the fact that he sees all the bad stuff that happens. It is sort of similar when you go through a lot of postmortem reviews. You start saying, "Oh, okay, actually, we need to put some safety guards in here that you as a user may not love, but will prevent you from hurting yourself." It was along with one of the directions we wanted to follow.

Matt Turner: That's really interesting. I was a software engineer at first and then moved into the infrastructure direction. I remember very quickly getting the attitude of "What are the devs doing? This is crazy dangerous. Stop, stop, stop, stop!" But I also remember when delete became synchronous. It doesn't ask for confirmation, but it used to be instant. And then all of a sudden it took a while and we were all like, "What is this? What is this bug? What is this bug? I just want it to go fast? I just want to tell the cluster to make the best effort." I guess I still had some dev in me. It's interesting, especially in a smaller company, how you wear all those different hats. It sounds like you interact with people who wear all those different hats, and you have to try to make all of them happy.

Brendan Burns: It's really fascinating. We've been thinking about, how do you create a policy to reduce manual touches? It's great to aspire and tell people to reduce manual touches. But actually, I find that aspirations are no substitute for actually implementing policy. But then there's always somebody who has some workflow that you're going to break. And how do you negotiate them along and help them through maybe changing their workflow. You’re trying to understand where they're coming from.

New developments in the Kubernetes space

Matt Turner: Are there any features coming down the pipeline at you from upstream that you're particularly excited for? You've talked about the confirmation stuff, which is interesting. Anything you think is gonna make this better or is actually maybe a bad idea or that's super cool?

Brendan Burns: The most interesting thing in this area that I'm thinking about is the mainstreaming of the Gatekeeper project for policy. It's something that we developed a while ago. It runs as an admissions controller. It's been out for a while but I think we're starting to see interest. For example, there used to be pod security policies as part of Kubernetes and it was a very specialized thing. They actually never got past beta and they're actually on a path to deprecation to be replaced with a general-purpose policy as implemented by Gatekeeper. I think that's a really good thing and a really good development for Kubernetes in general.

It's a funny trajectory. When we started out, we had basic authentication, no RBAC. When we started to add in RBAC, it felt like we're in the big league now. This is a sign that people are actually using this thing. Fast forward another four or five years and RBAC is great, but it doesn't tell you anything about the insides of the object. It'll say yes or no, it's like a stop sign. It doesn't say, is the thing you're doing a good idea or not. And so, in the levels of sophistication, the first stage is Do I know you or not? Do you have my password? The second stage is I'm going to give you some fine-grained access control, read and write or delete or whatever. And the third stage is I'm actually going to look at what you have in your thing, and try and make a judgment call: Is this actually something that I want to allow you to do?

I'm excited that we're starting to see that mainstream a little bit more. I'm excited that it's being discussed as a general capability that should be part of what every Kubernetes cluster looks like. I'm trying to get people to adopt more. I think it's adopted in a bunch of enterprises, but it's not necessarily being adopted more broadly. The reason I'm interested in it is that I'm deeply concerned about some of the software supply chain stuff. Even today, there's way too much pulling of random stuff off the internet, exposing ports on the internet. It's very scary out there.

Along those lines, one of the other interesting things that I've been thinking a lot about lately has been sidecar as a dependency management.

For a long time, we thought about projects like sidecar, because they add capabilities. I think one of the other things they actually do is they add encapsulation. You can make value judgments that say, "Hey, we're going to have a team that focuses on enabling clients for something like Redis or Cassandra or whatever. And we're going to put that in a sidecar."

And in fact the Dapper project that we're working on, takes that and does exactly that. The only thing that your developers need to have in there is an HTTP client. They are not actually going to take a ton of dependencies. They're going to mostly keep their dependencies down to a minimum. And we're going to actually try and put the logic and the dependencies in a sidecar. There it's a little bit easier for a single team to own it, to reason about it, centralize it, it's like having a library or a platform team in a company. Actually, I think, it enables even more encapsulation of stuff like that. It becomes really essential for being able to understand, what is my software supply chain? If the software supply chain is encapsulated in hundreds of different images that I run, it's much harder to keep track of, than if I put it all in this one thing that we share across everywhere. That's something I've been noodling on lately.

Matt Turner: Presumably with the intention of that becoming first-class right? It's a new idea to me. It sounds great and I'm sure somebody out there smarter than me has built that tooling. Are you looking at getting that first-class on the API surface?

Brendan Burns: Or just something that you could reason about in policy, maybe via annotations of this initially. There's a lot of opportunity to experiment first. I wouldn't want to experiment a lot before we go too far down. But I do think that we need to start thinking about provenance of images, not just in terms of, “Wait, did it come from somewhere that I trust?," but actually like, “What kind of baggage is it pulling in with it?” I think again it's very difficult if you have a cluster right now to know if there are vulnerable images running it, for example.

Matt Turner: Yeah for sure.

Brendan Burns: There's no way to walk up to the API server, other than the image itself. The minute a developer builds that image, you lose track of where it came from.

Matt Turner: Things are meant to be easy. The temptation is definitely to use an upstream image, to pull it from a public registry, without even putting it through a write-through cache and a scanner. I've built tooling that tries to address this, but identifying dependencies with static analysis of container images is very standard. You're right in making that more first class. I've been on the OPA train. It's really cool to hear about that idea, using Gatekeeper to do exactly what you said. To just lint and say, "Hey, if you're trying to expose all ports to 0.0.0.0./0 we're going to deny that."

Brendan Burns: Yeah, for sure.

Matt Turner: Making that more sophisticated sounds like a really powerful tool.

Brendan Burns: And the other thing I've been thinking about a lot lately is how can you do this in a warning style too? One of the things I wish we'd built into Kubernetes at the beginning is warnings. Everything we have now is yes or no. But not in the way of “You probably shouldn't." There's more opportunity for that kind of stuff. We have a little bit of these kinds of warnings coming in now actually into the API service area.

Matt Turner: You mean the deprecation stuff? If I use extensions/beta deployment, I'm told that I have to stop doing that, but that's the only example I can think of.

Brendan Burns: I feel like there was something else that I saw recently that Clayton was working on. I can't remember the details, but HTTP does have a warning field, it has a warning header.

Matt Turner: Really? Wow!

Brendan Burns: In the spec is a header that is dedicated to this. We just never made use of it really.

Matt Turner: It was interesting when you said that. You described Authn, which is a yes or no. “Do I know who you are? Do your credentials check out?” And there’s authz which is always going to be a binary decision. I guess you went from no Auth to ABAC to RBAC. It got more sophisticated and very infrastructure-y. To me sounds like a linter or more of a dev experience kind of thing than a sort of infrastructure tool.

Brendan Burns: A good example of this is something like resource limits. You can require that everybody supply resource limits and that is probably not a terrible idea. You will run into a couple of people who are like, "No, no, but my use case is...," and they may be right in that particular case. I think it is like that linter at some level where you think, "Well, mostly I want this rule to be true and enforced, but if you're this tall, I'm going to trust that you're sophisticated enough to know what's gonna happen if you don't." Having that ability is valuable and important.

Another thing I think about a lot of this in terms of security as well: should you block an image that has Bash in the container? It's not insecure. The newest version of Bash doesn't have any CVEs on it and it's a secure binary, but it's probably a bad idea. Especially with debug containers, which is a cool thing that's been added relatively recently, there's no good reason for anybody to have a shell sitting in their image. It just makes it easier for someone who breaks out to do stuff. It doesn't make it any more secure to remove it, or any less secure to have it there at some level. I'm into this idea of friction for security. So there's like security as a wall and then there's security as the spiky wood things that you put between the walls. That is more like, “I'm actually just going to try and make it a little harder for you. I'm not actually going to be able to tell you that it's more secure to a dedicated attacker, but at least I'm going to get more time to find you.”

“Best practices” vs “leading the way” in Kubernetes

Matt Turner: There's always been this sort of "best practice" of containers built on“distroless" or scratch, and then host images, a CoreOS or a COS or Atomic, or something like that. Why do you need a shell? You need a containerd, you need a C library, you need a Go runtime. What else are you putting in there? And there have always been ways to address this. You have static analysis mostly, or just curation by a platform team or an infrastructure team. It's really interesting to me that you're talking about Kubernetes starting to have opinions about this almost and maybe not leave it up to the operators, anecdotally.

Brendan Burns: Well, I would say enable Kubernetes to have the ability to have opinions.

Matt Turner: Right.

Brendan Burns: I'm not ever sure that, as a community, we would say, "This is the way." I think we would instead say, "Look, you're going to define the way, but we're going to give you capabilities to do it."

Matt Turner: So this is a bit meta. This is about making that easier and more accessible. Sort of paving the road for cluster operators to have an opinion and to not have to use a load of tooling to enforce it.

Brendan Burns: I think that that's part of it. There's a big difference between asking someone the question versus expecting them to find the documentation somewhere,implement it and get it to work. A good example of this is something like a modern computer. A modern computer effectively forces you to turn on password auth, like my iPad. When I reinstall my iPad, and actually, I hate it, particularly with the family one that all the kids share. It really tries to force you into having a pin. In fact, you have to put a pin on it and then go into the settings and say, "No, I don't want a pin on it. I don't put anything secure on this thing." I think modern computing has decided, we're no longer going to say, "Hey, you can figure out how to password encode your thing. We’ll no longer let you figure out how to encrypt it. We're going to actually shove it into your face and force you to turn it off." And I think that's actually the right thing to do.

Matt Turner: There’s that idea of a paid road, a set of sensible defaults. Anecdotally, my approach with developers has always been pretty similar. “This is what we want you to do. The CI system will fail your build if your deployment.yaml doesn't have a resource request and a limit in it.” Because they're just good practice.

Brendan Burns: Actually the VS Code work puts a red squiggle under your YAML. So we have a VS code extension for Kubernetes, it puts a little red squiggle under your YAML if you don't put resources in there.

Matt Turner: Especially in the days of DevOps, somebody may be primarily a Dev, and it's not their job to have this kind of expertise. My approach has always been, they come to me and argue their case. For example "I don't want to limit myself because this is an unbounded job. All it's doing is some massive computation.” It's kind of a best-effort thing. If we get a result, great. If there are clusters doing other stuff and it wants to OOM kill me, then fine. If you can make a case for an exception, then as you say, you add a policy exception or you stick an annotation on. Somebody with authority does a privileged operation and then the cluster lets that kind of thing slide. That's interesting. I was wondering, do you see the kubelet scanning the host that it's on and refusing to run on a system that's got a shell on it or a Python interpreter or something? I wonder how far this goes.

Brendan Burns: I do think that kubeadm already starts down this path, the way that people install. And certainly we, as a win in AKS, are seeing increasing demand and desire to pre-configure things for people. In the beginning, it was a lot of, “But I want to run this, and this shell script and I want to install this thing." And now more and more customers are coming and saying, "Look, I just want to run containers. Just take care of it all. I don't even want to see the OS," at some level. From a provider perspective, that's absolutely where we're headed. And in the community, it's going to head there too.

Matt Turner: I think so. I think a lot of people are building this themselves. You've preempted my question. I was there when RBAC came in and I remember the pain it was. RBAC was an absolutely great addition, a great move, but it was a pain to retrofit it, to clusters, all of my controllers. Back in those days, everybody had a hand-rolled Kubernetes installed and suddenly every controller needed a service account and that needed the correct permissions. I do remember it was kind of a tedious week.

Brendan Burns: For sure.

Matt Turner: But it had to be done. I was going to ask about the migration plan, but it sounds like you're building a sort of opinion of opinions. It's more of an opt-in mechanism.

Brendan Burns: Yes.

The history of Kubernetes + to document or not to document

Matt Turner: Earlier, I did want to ask something. You're not coding all day on the project now. It sounds like you're still involved at a strategic level. You're an engineer and you're a very good one because you worked on this massively successful project and it's given you great success, a VP. Was there a sense of hanging up your brackets or whatever the XKCD is?

Brendan Burns: Yes, I think that's natural and it was really explicit from the beginning. We were never intending to build a benevolent dictator for life open-source project. There are a lot of projects running around a single person oftentimes in these open-source communities. That's not my notion of a good open-source community. We were very explicit from the very beginning to give away power. To create SIGs and establish leaders in the SIGs who have pretty exclusive power to generate.

It took a while to get the governance in place but we got the governance in place. There are elections to the steering committee so that people understand how the various things shift. For me personally, it's a realization that the times in the Kubernetes community that I was most happy with were the first 80%. I laid down a ton of the early code and I think the last major thing that I pushed in was called Third-Party Resources at the time and then became Custom Resource Definitions. It was a pretty big fight actually at the time, like a really big fight.

Matt Turner: Oh wow.

Brendan Burns: I came out of it thinking it's time to move on. The project had gotten big enough that it was time to let a lot of other voices speak. Also practically speaking, I really enjoy delivering stuff for the customers, and for the end-users and being able to build those managed services and have an impact at cloud scale. That requires building and running and managing a big team, which ultimately takes a lot of time. As I said earlier, the worst thing you can do is keep yourself in the mainline when you're a blocker. But at the same time, it was still important for me to stay at least a little bit involved. I had that itch that I needed to scratch.

I found various things in a similar way. I wrote most of the VS Code extension that we built for Kubernetes. There’s a bunch of people out there working on it. Eventually, I handed that off when it became clear that it wasn't an experiment anymore. It has millions of people downloading it and it needs more persistent work. And that's a pattern that persists. I've hung onto the Kubernetes clients. First of all, they're relatively low traffic. And then also it's really hard to find people to work on them actually. Oddly enough there's a lot of users but finding people who want to come work on the Kubernetes clients has proven to be a challenge. I've found one or two other people who are passionate and interested and who come and do great work.

Matt Turner: I mean, hats off to you. There's an award they give out at KubeCon called Chop Wood Carry Water. I feel like the clients are one of those really high-value, really low sexy factor things. It needs to be done, but you don't get to tackle bigger algorithms or write shiny features in the release notes.

Brendan Burns: I know. I find it really interesting for me personally, at a couple of different levels. One is because it really gives you a sense of how people are building on top. Anybody who's writing code is building on top. They're platform building at some level, they're building an operator, they're building a UI, they're building something. If you're not, you're going to use helm or kubectl or whatever. You're not going to use any of the programming tools. There you get a real sense of how people are building and what people are building. And two, I found personally it was really fun and gratifying to implement the same thing over and over and over again in different programming languages. It was a really interesting exercise in comparing and contrasting. I've used different programming languages in different parts of my career, but generally, I haven't, in the span of a day, implemented the same feature in multiple places. Then you start to go, "Oh, wow. This is interesting. I see that it’s easy here. It's hard here." It's more complicated. There were some features that I discovered where I was like, "Oh wow, this is a really good idea. I wish I had this over here."

Matt Turner: It's interesting. I was going to talk about it later when we get into the controversial section of the talk about Golang and about the sort of co-evolution of those projects. Golang is a language with opinions and that suits certain things. If you have the Kubernetes API, it has a certain structure, it has a certain kind of ergonomics to it. Some languages make it easy to express that, some languages make it a little bit more difficult.

Brendan Burns: And the other interesting thing is you find all of the places where things weren't written down. There's this general expectation, and I think it's the right expectation that if kubectl works in a particular environment, the client library should work in the same environment. That makes sense. But then you go, "Oh, wow. Like actually there's a whole bunch of stuff that was shoved into client Go. That was never documented anywhere."

Matt Turner: You keep finding things.

Brendan Burns: Literally, we just kept finding things. I can't remember the specific one. Someone just came in and I think there's some environment variable that if we use to pick up the namespace that you want to be in, that will override the namespace in the context. And it's used only inside of pods.

Matt Turner: Wow okay.

Brendan Burns: If you're in a pod, the namespace is supplied as an environment variable. It's also, by the way, supplied as a file in the pod and that is documented. But there's an environment variable that client Go looks for, it probably was pushed in before the file existed. It might be some convenience thing, I don't know. No one had ever hit this or complained about the fact that we didn't do this. Then somebody showed up, I think it was in the TypeScript library, and said, "This code doesn't work." We were like, "Okay, we can dig into the client Go, but if it's not documented anywhere and it's not a standard somewhere, it's pretty hard for client library authors to figure out what the right thing to do is." Is client Go the thing that we should just go and emulate? Or is client Go a specific implementation, and there's a general Kubernetes spec for what config loading looks like?

Matt Turner: There was a project I was dealing with the other day. It was the cue language. There's this lovely statement that says there's not actually a spec for cue as a language. There's just a reference implementation. That's your spec, but it's imperative, not declarative. It's a spec that executes, that could be understood by machines. I feel it’s always the same: should you have documented all of the arguments and environments and runtime configured of kubectl? Should you have documented the API? There's all the proto-driven stuff. Kubernetes is fairly IDL, API first. There's always that tradeoff between, do we sit down and design something upfront, document it and say, "We have a spec. We have an API, implementation is an implementation detail. Literally, it's an exercise for the reader." Or do you deliver code? Do you make something work and then try to pick up these pieces five years later?

Brendan Burns: When we implemented local exec in a pod, the web sockets implementation of that, including the error codes and all the numbers for the streams, we didn’t document it. The only way you can figure it out is by going and reading the Go source code that sits in the API server. And to make matters worse, there's also a SPDY implementation. And the SPDY implementation is the one that kubectl actually uses. They have this undocumented WebSockets API that most people don't use. It turns out that in most programming languages, it's way easier to do WebSockets than it is to do SPDY because SPDY is deprecated. All the client libraries ended up using WebSockets. Assuming hilarity of poor documentation and nobody testing it.

Matt Turner: It can't be proxied and doesn't really support OAuth properly.

Brendan Burns: Yes, there's that too. There are some errors that you hit where people tried to do WebSockets through their HTTP proxy and they didn't know that it was with Nginx or whatever the IT person set up that doesn't handle WebSockets properly. Lots of fun bugs. It is one of the things that are really valuable, it keeps you living the experience of both your customers and your developers on your teams. One of the perils you have as you run bigger and bigger teams, is you get abstracted away from the friction, things that make their lives unpleasant.

Matt Turner: Somebody who's looking for promotion goes and solves that problem for you, and you never even see it.

Brendan Burns: Yes, it's invaluable to sit in and live the experience of people. Be on the front lines of helping somebody with your issue queue and stuff like that.

Matt Turner: For sure. It does sound like a really interesting place to be. I'm thinking of the way you can construct a client Go context in a thousand different ways. Presumably, each one was just PR-ed by one person who’s maybe in a cluster or in a test rig, or local. And maybe the API server is mocked, and they are not using OAuth, because it's on loopback and there's a function for that. You just call it and you work it out. And none of the other client libraries obviously have that, because the guy who gets you to go onside gets the most attention. But you can do this sort of archeology can't you? "Somebody had this requirement one day, but I wonder how many other people's production systems now rely on that."

Brendan Burns: Yes, and another one that's like that is multi-cube config loading.

Matt Turner: I rely on that so much.

Brendan Burns: There's some controversy about whether it was ever a good idea to do it, but it got in and people use it. You can have multiple kube configs in the same kube config path. None of the client libraries support it.

Matt Turner: Nope.

Brendan Burns: Except for client Go. It's because it's totally un-spec'd. The implementation in Go is incredibly long and it’s hard to know exactly what it does. Every once in a while somebody will pop up and say it doesn't work. We then need to say "Yes, sorry, that's just not happening." It's like 5% of users and it would be wicked complicated to figure it out.

Someone else showed up recently and was like, "I'm constructing a config from scratch in the client. Why do I have to specify a context, a user and a cluster config? Why can't I just give you the cluster config, and you can run through it blindly and there's no OAuth. I don't want you to do OAuth and you're making me specify this user with empty OAuth." We decided to do that. But at what level is this? How hard is it for you to put in an empty user? There's all the little details of people's use cases.

Where is Kubernetes heading?

Matt Turner: I had exactly that on my bashrc builds, a kubeconfig colon-separated path. That's the thing that's undocumented but works really well for my use case. I used K9S or something, written in Python, but it just didn't work. To bring us slightly back on track, now Kubernetes is a mature stable project, I don't want to use the word enterprise-grade, because it sounds derogatory, that's trying to address a wide audience. You have a lot of strategic influence from the work you've done in the past and the position you have now, where does Kubernetes go? Is there a big focus on developer ergonomics and trying to bring everybody into the fold? Whatever operating system you're on, whatever language you want to use, addressing this particularly to somebody who works with Microsoft, do you have a good time with it? Or are we kind of saying, "Well look, actually only 1% of the people are doing this work and everybody else is realistically going to be interacting with Kubernetes at a completely different level of abstraction.

The people who are doing this, there’s good Go bindings, there's good Python bindings, use one of those libraries and get over it. Deliver some value and move on. It feels like it was a very grassroots sort of road to success for Kubernetes. It was really cool, and it was componentizing run on Linux. It was very accessible and as a senior engineer at the time, it was my Friday afternoon R&D project that I spent a bit of time on and really liked it. When I showed my boss it went from there. This wasn't a product that came shrink-wrapped and with Super Bowl adverts. Correct me if I'm wrong, but I feel like a lot of Kubernetes' success came from a grassroots developer first, you know sophisticated user first. Is that something you all are still concentrating on or is that now the engine room? And are you thinking everybody else is going to have a completely different experience with it?

Brendan Burns: I would answer it in a different way: I think both are very important. One of the things we did very early on is to sort out what we're not. We're not intending to be a PaaS. We're not going to go be Heroku. We draw a line. We're the kernel if you will. I do think that a lot of people are going to and should interact with it at a higher level. We as in Azure have no code solutions now, they're run on top of Kubernetes. We have functions-as-a-service. Many people have built functions-as-a-service that run on top of Kubernetes.

And I think that's right, what's valuable is actually that ability to move up and down and inter-operate. It used to be that if you're in a functions-as-a-service environment and you want to do service discovery into something running in a raw container or on a raw VM, like good luck, right? You're putting it on a public internet and you're dealing with that whole thing.

The beauty of being in Kubernetes is that a lot of the primitives are the same. You may be using a functions-as-a-service, but it's still a container image. All that vulnerability scanning that you were doing still works the same as if it were a vanilla image. The service discovery, the secrets management, it all works the same.

I don't think it's going to be an either-or. It's not like people are going to be up in the Heroku layer, or they're going to be down in the Kubernetes layer. I think that you're going to see it being more like a traditional operating system where sometimes you're going to write something in Bash and sometimes you're going to write something in C, and sometimes you're going to write something in Python. They're all going to work together through the same signals and the same threads and the same infrastructure. I think that's the real value. I mean even sophisticated developers want to use unsophisticated, easy-to-use tools for certain things. The important thing is the ability to move between them and find the right tool for the right job. That's our focus, and I think the great thing about the Kubernetes design is that we can do all that independent iteration on top, and other people can do it too. We can build a really good ecosystem where the tools that are useful to most people succeed, and the other tools fall away.

When you mentioned the custom resources at the beginning, this is why I pushed so hard for them to start with. It was clear that stuff was going to get built on top of the system. And if everything had to be built into the system, we were going to hit stasis so fast where we just couldn't make any more decisions. Where people are saying, “I need this!” We're already seeing it with all the storage providers and all the network providers who wanted to come in. It was horrible. Getting all these extensibility places in the right places meant you could let go and say, "Okay, ecosystem, go nuts. Let us know when something wins, and we'll start bringing it back into the core."

Matt Turner: There's been a lot of that. With the CNI, the CSI and also the work that was done with cert-manager. The first time somebody said, "Hey, I want to be an API server too. So can we separate the business logic that's baked in, the first-class knowledge of the built-in resources from the library code that actually serves APIs?" I think it was cert-manager, right?

Brendan Burns: There's a few of those. That initiative was going on for a long time. It was very hard to rip it out. I don't know who the first one to actually use it was. It definitely was a lot of work to isolate the API server code. That happened for a long, long time. That's a good example of engineering we should have done up front. If we'd modularized it right up front we would have done it right.

The kernel distribution model

Matt Turner: Obviously I have a question: is there anything you'd do differently?

If I look at the service meshes, I look at Istio, Linkerd, it's pretty obvious that one of them wants to be a product that's supported, that's easy to use, that has a nice clean ergonomic interface. And one wants to be a platform. A Swiss army knife of tooling that I can build on top of. Was Kubernetes ever consciously aiming for one or another. Tim Hockin has talked quite a lot about that, the kernel distribution model. Do you want to expand on that a little bit? What's your view on it?

Brendan Burns: We wanted it to be usable in the sense that we were never going to get adoption if it wasn't easy to use. I also think that we never intended it to be a PaaS. It was always intended to be infrastructure. We've always thought of it as the two concepts that have all of the sorts of rung true for us are that it is application-oriented infrastructure. Instead of previous cloud APIs, which I would say are like physical infrastructure oriented, they were like, it's a machine, or it's a disc, or it's a network. It's not an application concept, it's a physical concept. The Kubernetes APIs are really oriented around concepts that you care about as an application developer or maintainer, services and replicas and deployments and things like that.

It was definitely an infrastructure API, it's just a different kind of IS API. We all said that it was POSIX. When you think of POSIX it was POSIX for the cloud. That is the ideal that you're shooting for. And when you think about what POS-X is, it is intended for end-users at some level, but most of the people who interact with it are going to interact with it through higher-level constructs.

Ultimately we thought that people would build on top.That is why we added a lot of extensibility.

I think we never want it to be the whole vertical because if you look at functions-as-a-service or other kinds of verticalized PaaS solutions, they have really horrible models where if you hit the limits, you have to eject down to something very, very low level.

They don't have this notion of gradual degradation or even service interconnection between the complicated bits and the easy bits. And that was clear to us even then. We wanted to build for modularity, so that there's a nice, easy steppingstone function downward or upward from infrastructure to platform as a service. I don't think we fully achieved that, frankly. I think there's a lot in the middle between Kubernetes and a FaaS service that doesn't exist today. We're trying to build them.

Matt Turner: But people have definitely built it. And then there are distributions. I use that loosely, I guess, to take it further. I'm thinking about something like Anthos specifically, which is, Kubernetes plus Istio plus functions, given that it's a Knative server.

Brendan Burns: It's fairly monolithic, I guess, is what I would say. You wouldn't look at that like it's Java or .Net where it's modular in the middle you can do with it what you want. It doesn't necessarily have opinions but it's clearly a little bit more abstract than being down in C land. It's not as abstract as maybe being up in NodeJS or Python or Visual Basic.

Matt Turner: But what do a lot of Python modules do? They're just a wrapper and they actually link against a C library.

Brendan Burns: Yes, exactly. I think that's sort of the piece that's in some ways missing in Kubernetes, that linking layer that says there's all this complicated view underneath you. Here's the easy-to-use and there are these individual examples. I think cert manager, you mentioned, is a great individual example of something like that, but it's not a standard library-ish kind of thing. You don't see the pattern.

Matt Turner: No. What about WASM?

Brendan Burns: Oh, the WASM stuff is super cool. We can do a whole other thing on that someday.

Matt Turner: If I want to alter the Scheduler, I have to fork it, rebuild it. The code is modular, but I can't plug it at run time. I have to fork and rebuild. When am I gonna be able to sling WASM?

Brendan Burns: That's funny. You talked about Go earlier, it's because Go never had dynamic loading.

Matt Turner: Right, and hence the CNI, it's gRPC of a loopback, basically, right?.

Brendan Burns: Exactly. So some of this stuff would have happened more easily if Go had some capabilities.

Matt Turner: I was going to ask about your relationship with Rob Pike and his team, and whether you were giving them requirements, they were giving you requirements. There's so much we could talk about. But I think we need to end it there. Brendan, this has been great fun. And I have a whole list of questions. I didn't get to that. I might just email you or something.

Brendan Burns: Fantastic. I'm happy to answer them on Twitter.

Matt Turner: Now. This is, this has been super cool. Thank you. Thank you so much.

Brendan Burns: Take care, folks. Bye.

Matt Turner: Thank you. Bye now.

About the speakers

Brendan Burns is a distinguished engineer at Microsoft Azure and co-founder of the Kubernetes project. Before Kubernetes he worked on search infrastructure at Google. Before Google he was a professor at Union College in Schenectady, NY. He received his PhD in Computer Science from the University of Massachusetts Amherst and his BA in Computer Science and Studio Art from Williams College. He lives in Seattle. You can follow Brenand via @brendandburns on Twitter.

Matt Turner is a Platform Engineer at Marshall Wace, a London asset manager. Matt's team is responsible for infrastructure, on-prem and public-cloud, security, and developer experience. Marshall Wace's platform has it all - old and new, Kubernetes and Kerberos.

Matt has done Dev, sometimes with added Ops, for over a decade, working at JetStack, SkyScanner, Cisco, Tetrate, and others. His idea of "full-stack" is Linux, Kubernetes, and now Istio too. He's given lots of talks and workshops on Kubernetes and friends, and is co-organiser of the Istio London meetup. He tweets @mt165 and blogs at mt165.co.uk.