Check out this excerpt from the upcoming Book Club episode with Casey Rosenthal, Co-Founder, and CEO at JVerica, and Nora Jones, Co-Founder, and CEO at Jeli. Chaos engineering is much more than just hype. Get the map and compass that you need to navigate the stormy waters of distributed systems while optimizing to meet business goals. Casey Rosenthal and Nora Jones, authors of “Chaos Engineering,” highlight some of the best practices that famous companies like Netflix and Capital One use to break (or not break) their systems in productions, so that you can get a taste of it.
Casey Rosenthal: In the book, you talk about using a facilitator to investigate an incident, much like it's done in other high-stakes industries, medicine, aviation, and maritime. Can you explain how that relates to, "Chaos Engineering" and, you know, some of your work in the book?
Nora Jones: Yes, absolutely. So, at Netflix, we had built a really great system and we could do a lot of things like we could inject failure safely in production. But we had this giant form for folks to fill out if they wanted to do it. It was a pretty daunting form.
It's not like they were thinking about doing chaos experiments all the time unless they were the four of us on this team. We were thinking about it all the time. They were thinking about it, maybe once every few months. So seeing a giant form, asking what business metric you want to impact can be kind of scary for folks, apparently.
Doing some facilitation techniques at Netflix was really helpful, like talking to teams and asking what kept them up at night. If you actually had time, talking to people individually, and seeing how mental models differ between folks on the same team. Seeing how mental models differ between different tenures on the same team, different roles on the same team, different pieces of the system on the same team. A really good facilitator can extract that out of people and find those gaps.
When I went to Slack, we had been rolling out Envoy, and people wanted to do some chaos experiments on it. I started asking the questions I had put in the Chaos book, right? And I really grouped with people on the team, we filled out a lot of pre-work to be able to run the chaos experiment. And by the time we got about 75% through facilitating the experiment and preparing for it, we realized we weren't actually ready to roll it out. And we didn't even need to run the chaos experiment to know that. It was just asking those questions.
The Chaos Community Broadcast highlights practitioners in the discipline of Chaos Engineering every week in a dry-humor style that often leaves our audience laughing and guessing what will happen next. Guests include Nora Jones, John Allspaw, and many, many, more. Subscribe to their channel on YouTube to get notified when new episodes are released.