Next Generation Cloud System

We're kicking off 2021 with a new interview series: GOTO Unscripted, with our first round of interviews recorded back when we could still meet in person. GOTO Unscripted takes our conference speakers off the big stage and brings them behind the scenes for an intimate conversation on topics they know best.

While the industry is leaning towards distributed systems, implementing the technological patterns is not enough unless you also enable distributed teams to handle those systems. Jørn Larsen reviews best practices for building those teams with Bridget Kromhout, cloud advocate at Microsoft, and Casey Rosenthal, CEO, and co-founder of Verica. They also emphasized the importance of aligning those teams with the business values and elaborate on the “vision of done” in the tech space.

Intro

Jørn Larsen: We are sitting here at GOTO Chicago 2019, and we have Bridget and Casey Roshental here, and maybe you would like to just introduce yourself to the audience. Bridget Kromhout?

Bridget Kromhout: Sure. I'm Bridget Kromhout, and I am on the Cloud Advocacy team at Microsoft, which means they hired me, issued me a Mac, and told me to get people to use Linux on Azure. From which I think we can conclude that we may be living in the darkest timeline, but it's pretty interesting, too.

Casey Rosenthal: My name is Casey Rosenthal, I'm the CEO and co-founder of a startup called Verica, which develops a continuous verification platform. My background is mostly in reliability and availability, in particular for complex systems.

Jørn Larsen: So why Verica? I think we all know the company Microsoft, but Verica, what does it stand for?

Casey Rosenthal: The name itself is shortened "verification," and we built a continuous verification platform. We bet that CI and CE are pretty well accepted by the industry now, three years from now CV, continuous verification, will be just as accepted by the industry, nobody will put serious code into production without having a continuous verification stage of the pipeline.

The future of building and distributing systems in the cloud

Teams for distributed systems

Jørn Larsen: We're here to talk about the future or the best practices of building and distributing systems in the cloud. Do you have any recommendations for how you build teams that are ready to do that in the best possible way? Bridget?

Bridget Kromhout: I think that Casey has a lot of insights into this, so I'll start by saying that I think distributed systems do require distributed teams because you are going to be building world-spanning systems that require availability. Humans are not available 24x7x365, by design. So if we're going to have systems that are going to be working and up, and not waiting for somebody to wake up and make a decision, then we need to distribute that decision-making in the teams.

Casey Rosenthal: Yes. You said something earlier that resonates with me, about pushing the decision-making to the edge. So you know, geographically distributing teams is great, but you also really need to decentralize the decision-making process. Because with a distributed system, likely you're going to have a complex system, and in a complex system you want knowledge workers who know how to improvise at the edge, that's what's going to give your business the properties that ultimately it's going to want to optimize for, feature velocity, availability, fault tolerance, things like that.

Bridget Kromhoutl: Yes, I think that's true. That does feel a little risky to decision-makers who suddenly realize they might not be in the loop in every decision. If they are trying to push the decision-making to the edge in their organizations, what does that mean? I think that's something that we need to explore on the human side of tech.

Jørn Larsen: So when decisions are being made in the heads of the organizations and small groups, do you have any recommendations for, like, team sizes that you have been working with that works well? And also, the number of teams that can effectively work together in a distributed system?

Casey Rosenthal: Do you mind a pizza model? So from a span of the control point of view, I think for a manager, you're looking at five to seven people, four to seven people. Anything over seven people and as a manager, as a direct manager, you start not being able to give the appropriate amount of attention to your reports. But that model can expand pretty well.

Bridget Kromhout: Also, I think if you have people who are working across the organization in virtual teams on certain projects where they do need the support of their office manager but they aren't necessarily looped in running decisions by them or anything, you can experiment with different kinds of organization. I guess I would bring that one back a little bit, and say it's not even a question of specific sizes, because when you go to the two-pizza teams’ question, it's how large are those pizzas, how hungry are those people? Are there vegans in the crowd who are not going to like that pizza, but might eat that one? There's a lot of discussions...

Casey Rosenthal: I will eat an entire pizza myself, so that's very debatable.

Bridget Kromhout: Yes, hungry vegan, picky vegetarian. I feel like all the models are wrong, some models are useful. Is that how the saying goes?

Casey Rosenthal: Yes.

Best practices for adding context to decision making in teams

Jørn Larsen: Ok, so we agreed that the decisions are best made in the teams, so we need some kind of context for the teams to make the right kind of decisions. We could call it culture, or understanding of the company... so, what are your views on that?

Bridget Kromhout: This is interesting because of course you're founding a startup, so how do you go about making sure that the ideas you have about the organization are going to be the same ideas and ideals that are promulgated throughout the organization? That's an interesting question.

Casey Rosenthal: Yes, so actually, I don't. And this is the thing that's uncomfortable for most of the software industry right now, is that the model that most software companies, most companies operate under is was just inherited from Taylorism where somebody knows best, and then they have other people who know a little bit less good, and so you're trying to force your idea down. That works great for making widgets. We're knowledge workers, we're not in the widget-making space. So, what works better from a cultural perspective is flipping that, where management's job is to make sure that the people doing the work have all of the information that they need to do it, and they're not making decisions, they're just keeping them informed. It's up to the people on the frontlines, the ones building the product, to decide what to work on, and how to do it.  I know that sounds terrifying to people in traditional organizations.

Bridget Kromhout: It makes perfect sense when you think about it, again back to the distributed team, you're not going to be able to watch everyone, every second, and tell them what to do, but you can guide them by setting incentives

I saw a case once where there was a great deal of excitement and celebration when a team set up a centralized Jenkins server that everyone on the team could use and they had an end-of-the-month party where the team got celebrated. It turns out the next month, you can imagine what happened, instead of everyone else moving their workloads onto that Jenkins server, several other teams had set up Jenkins servers. Because you didn't incentivize the behavior you wanted, which was the collaboration and the sharing of resources, you were celebrating the..." you did a thing," and you will get more to celebrate...

Casey Rosenthal: Parties for everyone.

How important are the business mission and customers for tech teams?

Jørn Larsen: We touched a little bit on understanding the business and the mission, or listening to the customers. Can you talk a little bit about that? How important that is for tech teams?

Casey Rosenthal: Occasionally, I have managers come up to me afterward to tell me the model that you're describing of having where the engineers make the decisions sounds like anarchy. How can I trust them to not chase the shiniest tech and things like that? 

I actually have a question that I love, that I think captures how well managers do it. If you can take a manager's reports, pull them out of work at any point in time, and ask them... explain to me why what you're working on right now is the most important thing that you can be working on for the business.

Jørn Larsen: Exactly.

Casey Rosenthal: ... if they have that answer, then as a manager, you're done. You're great. You're doing great. If they don't, then it doesn't matter how many architects there are, and project managers and program managers trying to help organize their work, they're going to make less good decisions about the actual code that they're writing, the architecture that they're designing, than if they can answer that question.

Jørn Larsen: Teams need to have a purpose. And you were, in your talk earlier today, you were talking about a young guy you met. Can you maybe just tell that story?

Bridget Kromhout: I think that's very illustrative of the reasoning behind the thing that you're doing. Because I was in a meeting where a young developer suggested that the next thing he should work on was writing a custom Erlang message bus, and I said, okay, what problem are we trying to solve? 

I think that's the question you have to ask. In this case, he quite honestly said, "The problem of me wanting to write a custom Erlang message bus." I said, okay but it's like, if the problem... if they don't understand, if no one understands what problem they're trying to solve, then they are not necessarily going to be equipped by their management to do the right thing.

Casey Rosenthal: They're going to do things like that, which you know, maybe that is the best thing for... but probably not.

Bridget Kromhout: That might not have been the problem that the organization wanted to be solved.

Casey Rosenthal: Right.

Bridget Kromhout: I would say that when you brought up anarchy, or you said people are worried about anarchy, I was thinking I've seen it done well, where there is the happy path, the blessed path of, well if you choose to use these systems and these tools, this is fully supported, we have a very well built up pipeline that works with this. If you want to try this other thing, well, you might be on your own, you might not have as much help... these other things are way out of scope because we have contractual obligations, or you we don't want to give data to these companies, or whatever. But giving people the "No, you can't use this," you certainly... we recommend you use this, you can learn in this space, and maybe we'll get so much value out of it that that'll become the very encouraging thing.

Casey Rosenthal: But right now it's off the paved path.

Bridget Kromhout: Right now it's off the paved path, you might have more work to do. You might be on-call for that solution in a smaller scoped team for quite a while if you choose to use something that we don't have a lot of other people with expertise for. 

For example, how do you handle that kind of discussion, of like, but I want to do... I want to write that in Haskell, and you're like, oh, boy, I'm going to have very important production things, with one person who understands them? How do you deal with that?

Casey Rosenthal: So those are some of the discussions that should be had, that management should be able to help inform somebody, look, these are the priorities of the company, what's the best way to get there? Is it to go off and build the infrastructure to support Erlang in the Java shop, or is it the best thing to follow the paved path? So, that's the conversation, that's the conversation that should be had. Then you can be thoughtful about that, and hopefully, at the end of the day, you get more buy-in from the engineer when they've gone through that thought process, than if they're like, forced to work in Java, and thinking, like, I want to do it in Erlang, but my boss won't let me.

Bridget Kromhout: I've worked in the shop that started and started life as a Python shop, and ended up with a lot more Go tooling. That's fine, as long as... I do worry about the single points of failure. 

Jørn Larsen: You need to have increased that mass of knowledge in an organization to maintain it.

Bridget Kromhout: Well, I mean, I don't want any human being to feel like they're the only person who can do X, and they can't ever sleep or go on vacation or ride on a plane. I've been that person who was fixing Hadoop very slowly from the in-flight wifi,

Casey Rosenthal: That's something else then, a manager should be able to explain to the IC, look, we don't want to end up in a position where somebody is a single point of failure. You know, that's a consideration. If you're the one person who understands JVM in an airline shop, that's a problem.

Disruption in tech 

Jørn Larsen: By the way, Casey Roshental, what happened to your hand? It doesn't look so healthy. Are you growing an artificial arm, or what is going on?

Casey Rosenthal: This is... no, this is in case the systems just snap out of existence.

Jørn Larsen: Okay. That's good.

Bridget Kromhout: I'm not sure if it's an implicit threat, or if we are very safe because you are Iron Man.

Jørn Larsen: Yes.

Casey Rosenthal: Well, no spoilers... But no, actually, I wore this as a reflection of... American culture at this point, flirts with this notion of, well, we should tear everything down, and that's the way to rebuild things. That's held up in our media and might appear in our politics every once in a while. And as an industry, the software is obsessed with this notion of disruption, I think that's a symptom of the fact that most of the software industry is constructed in this bureaucratic hierarchical manner, and we kind of sense that that's not how the system should work. So we always flirt with this notion of like, oh, we should tear the system down, and start over again.

Bridget Kromhout: That's a really interesting question. Like, if the argument that you're having in an organization is everything will be better on the next greenfield project... You're building a new company from scratch, and I'm guessing that by now, you already have a few lines of code, and... does anything ever really feel like greenfield after, like, a few commits?

Casey Rosenthal: No, it doesn't. But...

Bridget Kromhout: It's decisions, you layer decisions on decisions on decisions...

Casey Rosenthal: Yes.

Bridget Kromhout: Nothing is ever, like, instantiated like that.

Casey Rosenthal: Or torn down like that.

“The vision of being done“

Jørn Larsen: Well, maybe then that leads to talking a little bit about being done, the vision of being "done." Because at least we're now talking about we are getting started with something, but when are we... when is something done? When can you jump on to the next feature... when can you safely do that?

Casey Rosenthal: I would say, we haven't even talked about when to get started. We're beyond the point where you can start anything. There are so many layers of abstraction just to get anything... so many decisions have been made that we have no influence over at this point.

Bridget Kromhout: Even in a brand new startup?

Casey Rosenthal: Even in a brand new startup.

Bridget Kromhout: Can you give us an example of what one of those decisions might be?

Casey Rosenthal: Language choice. So, our product has to integrate with pieces of infrastructure that already exist.

Jørn Larsen: So environment, you're talking about the environment is hard to change because you will deal with the real world.

Casey Rosenthal: Yes.

Bridget Kromhout: You're talking about your customers have a technology, if you would like them to buy your product, you have to work with the technology.

Casey Rosenthal: It has to work with that technology, yes. And there's a whole community that's already developed. In Python, for example, there's so much work that has already been done to make tooling nice for certain aspects of things and it would be foolish for us to choose a language that doesn’t have those niceties, doesn’t have the community there. Each language has its pros and cons, I'm not holding Python up as the one, but you know, those decisions are already beyond us. 

Bridget Kromhout: Because you're building for an ecosystem.

Casey Rosenthal: Yes.

Bridget Kromhout: The ecosystem already exists.

Casey Rosenthal: Already exists, yes. We're just trying to play a part in it.

Jørn Larsen: Yes, you're making keys that need to fit in a certain puzzle. 

Casey Rosenthal: Yes.

Bridget Kromhout: I think that's in the stuff that I talked about, when I was referring to tooling for the Kubernetes ecosystem, I got some folks to come up to talk to me afterward about how what if you're not going to use Kubernetes? And I'm like if Kubernetes does not solve actual problems you have, you don't want to introduce a whole bunch more complexity to your systems for no particular reason. But going back to the "done," well, you probably are done with a system when you've decommissioned it. But other than that, you're always going to iterate on it...

Casey Rosenthal: Hopefully, yes

Bridget Kromhout: You're always iterating on it because as long as you're continuing to get new value out of it, you're always going to add the new helper application or the new way to integrate with the next kind of partner that comes along, or the new way to produce a report, or to consume new data, that perhaps didn't exist when you first started... but then it comes along, and you want to keep iterating.

Casey Rosenthal: Yes, Netflix has this notion that nothing goes into maintenance mode, so if there's still business value to be had in something, then there's still reason to innovate there.

Jørn Larsen: Yes. It's also, I think, probably a bad word to use, "maintenance," because it does scare a lot of people who just think continuous development, and it's like, oh, we're still doing something exciting to do, it's not just fixing other people's books. 

Bridget Kromhout: I mean, but there can be very interesting books.

Jørn Larsen: There can be.

Bridget Kromhout: I think our industry does like to chase the shiny, and likes to instantiate, and likes to be the pioneer, and maybe not the settler and town planner, in the Wardley model. But like, I like team settlers, right? Not just because I love settling a town, but like, I like team settlers because the idea is, hey, you instantiated something, that's awesome. Now do it a thousand more times at scale for the next two years. I'm a lot more interested in that than the spark of something new.

Jørn Larsen: Yeah, it's becoming good at what you do.

Casey Rosenthal: I'm a team town planners because that's where the money is.

Bridget Kromhout: Yes, team town planners, there's money. But there's also, I think, that idea of if we label something legacy. I like to call it legacy, the place where all the customers and the money are. It's like... if you want to call something legacy in this negative idea, like, how do you feel about turning it off? And if you feel terrified, then you probably care about it, and maybe you should call it your heritage systems, or maybe you should just call it our business systems.

Casey Rosenthal: Contemporary?

Bridget Kromhout: Yes.

Jørn Larsen: That sounds like a good term.

Bridget Kromhout: Well-maintained?

Casey Rosenthal: Well-maintained, yes.

Jørn Larsen: Thank you. That sounds like a really good ending here. So, thank you so much for coming. I hope you will enjoy the rest of the conference, and we appreciate it so much that you're here.

Casey Rosenthal: Thank you.

Bridget Kromhout: Yes, thank you.

Recommended talks

GOTO 2016 • Distributed - of Systems and Teams • Bridget Kromhout

GOTO 2019 • Deprecating Simplicity 3.0 • Casey Rosenthal

GOTO 2020 • Why Are Distributed Systems so Hard? • Denise Yu

GOTO 2020 • Getting Started with Chaos Engineering • Nora Jones, Casey Rosenthal & James Wickett

GOTO 2019 • Kubernetes Operability Tooling • Bridget Kromhout