Chaos Engineering

Showing 14 out of 14 results

ARTICLE

AWS Failure Injection Simulator: New Chaos Engineering as a Service Offering

Recently, AWS introduced the company’s chaos engineering as a service offering, AWS Fault Injection Simulator, built with the purpose of simplifying the process of running chaos experiments in the cloud. We asked Adrian Cockcroft, AWS VP of cloud architecture strategy about chaos engineering, what the challenges are that put people off adopting it, and how AWS’s new tool fits in the space.

February 3, 2021
BOOK EPISODE

GOTO Book Club Yule Special

2020 has indeed been a different year. Many good things have happened, but the year has also shown that the world needs us more than ever. Software has a huge global impact and power and responsibility should go hand in hand, which unfortunately is not always the case. We've chatted with Martin Fowler, Richard Feldman, Mike Amundsen, Dave Thomas, Erik Schön, Saša Jurić, Casey Rosenthal, and Aino Vonge Corry about what was good in 2020 and how we can make 2021 better.

December 23, 2020
BOOK EPISODE

Getting Started with Chaos Engineering

Casey Rosenthal and Nora Jones, authors of “Chaos Engineering,” highlight some of the best practices that famous companies like Netflix and Capital One use to break (or not break) their systems in productions, so that you can get a taste of it.

October 29, 2020
BOOK EPISODE

Security Chaos Engineering

What’s the state of the art in modern security practices? The authors of the book Security Chaos Engineering, Aaron Rinehart and Kelly Shortridge talk to Mark Mille about the shift in the mental model that one has to undertake to reap its benefits. Their approach paves a new way that allows security engineers to uncover bugs in complex systems by chaos experiments before an actual attack.

May 26, 2022
SESSION

Developing a Chaos Architecture Mindset

We’ve seen cloud usage patterns begin with a faster data center and greenfield applications, move to cloud-native migrations, and end up with complete data center replacement strategies. These patterns are driving even more business-critical backend workloads to the cloud, and new patterns are emerging for highly automated, available, and durable cloud-based architectures. However, a recurring problem with highly available architectures is that they don’t get enough exercise to ensure they will work correctly under turbulent conditions—and the weakest link is often the people operating the systems. Most enterprises have a backup data center, but in many cases disaster recovery failover and incident response isn’t practiced regularly. Chaos engineering leverages carefully designed failure injection tests and the distributed automation inherent in cloud deployments to prove that there is enough margin to absorb failures in production. Adrian Cockcroft outlines the overall architectural principles of chaos engineering and shares methods engineers can use to exercise failure modes in safety and business-critical systems.

SESSION

Deprecating Simplicity

When engineering teams take on a new project, they often optimize for performance, availability, or fault tolerance. More experienced teams can optimize for these properties simultaneously. Now add an additional property: feature velocity. Organizations often try to optimize for feature velocity through process improvements and engineering hierarchy, but some optimize for feature velocity through explicit architectural decisions. These decisions increase the complexity of the system. This sounds like a trade-off: you get feature velocity, but for the price of increased complexity. Mental models of architecture can help us understand the tension between these engineering properties. For example, understanding the distinction between accidental complexity and essential complexity can help you decide whether to invest engineering effort into simplifying your stack or expanding the surface area of functional output. Spoiler alert: most businesses prioritize feature velocity over simplification. Chaos Engineering was born within this conflict between feature velocity and increasing complexity. Rather than simplify, Chaos Engineering provides a mechanism for us to embrace the complexity and ride it like a familiar wave, maintaining our business priorities while dialing up feature velocity.

SESSION

Breaking Things on Purpose

Failure Testing prepares us, both socially and technically, for how our systems will behave in the face of failure. By proactively testing, we can find and fix problems before they become crises. Practice makes perfect, yet a real calamity is not a good time for training. Knowing how our systems fail is paramount to building a resilient service. At Netflix and Amazon, we ran failure exercises on a regular basis to ensure we were prepared. These experiments helped us find problems and saved us from future incidents. Come and learn how to run an effective “Game Day” and safely test in production. Then sleep peacefully knowing you are ready!

SESSION

GameDays: Practice Thoughtful Chaos Engineering

GameDay is a dedicated time to intentionally create failure scenarios in a safe environment. Regularly running GameDays is an effective Chaos Engineering practice to test the resiliency of your services; to validate the technical intricacies, and to also surface conversations around observability and incident management. GameDays can also expose you to blind spots when systems are operating under suboptimal conditions. In this talk, Ho Ming will be sharing what it takes to run successful GameDays.

SESSION

Deprecating Simplicity 4.0

When engineering teams take on a new project, they often optimize for performance, availability, or fault tolerance. More experienced teams can optimize for these properties simultaneously. Now add an additional property: feature velocity. Organizations often try to optimize for feature velocity through process improvements and engineering hierarchy, but some optimize for feature velocity through explicit architectural decisions. These decisions increase the complexity of the system. This sounds like a trade-off: you get feature velocity, but for the price of increased complexity. Mental models of architecture can help us understand the tension between these engineering properties. For example, understanding the distinction between accidental complexity and essential complexity can help you decide whether to invest engineering effort into simplifying your stack or expanding the surface area of functional output. Spoiler alert: most businesses prioritize feature velocity over simplification. Chaos Engineering was born within this conflict between feature velocity and increasing complexity. Rather than simplify, Chaos Engineering provides a mechanism for us to embrace the complexity and ride it like a familiar wave, maintaining our business priorities while dialing up feature velocity.

SESSION

Deprecating Simplicity 3.0

*Casey regularly rewrites major components of this talk. The Program Committee watched him present again at GOTO CPH 2018 and enthusiastically confirms that it is definitely a completely different talk well worth watching!* When engineering teams take on a new project, they often optimize for performance, availability, or fault tolerance. More experienced teams can optimize for these properties simultaneously. Now add an additional property: feature velocity. Organizations often try to optimize for feature velocity through process improvements and engineering hierarchy, but some optimize for feature velocity through explicit architectural decisions. These decisions increase the complexity of the system. This sounds like a trade-off: you get feature velocity, but for the price of increased complexity. Mental models of architecture can help us understand the tension between these engineering properties. For example, understanding the distinction between accidental complexity and essential complexity can help you decide whether to invest engineering effort into simplifying your stack or expanding the surface area of functional output. Spoiler alert: most businesses prioritize feature velocity over simplification. Chaos Engineering was born within this conflict between feature velocity and increasing complexity. Rather than simplify, Chaos Engineering provides a mechanism for us to embrace the complexity and ride it like a familiar wave, maintaining our business priorities while dialing up feature velocity.

SESSION

Security & Chaos Engineering: A Novel Approach to Crafting Secure and Resilient Distributed Systems

People operate differently when they expect things to fail. Additionally, teams are more likely to keep an open mind about what is actually causing those things to fail when they are not fighting fires. There is a fundamental shift in mental focus and operational momentum that drives teams to put the fire out versus thorough examination of what caused the incident to begin with. As far as we know it Chaos Engineering is the only proactive mechanism for detecting availability and security incidents before they happen. Security Chaos Engineering allows teams to proactively, safely discover system weakness before they disrupt business outcomes. In this session we will introduce a new concept known as Security Chaos Engineering and how it can be applied to create highly secure, performant, and resilient distributed systems

SESSION

Practical Magic: The Resilience Potion and Security Chaos Engineering

Organizations want to sustain resilience in their software and systems – the ability to gracefully respond to failure and adapt to evolving conditions – but don’t know how to get there. How do we cut through the buzzwords and baloney? What principles, practices, and tools should we adopt? This talk introduces the Resilience Potion Recipe, a five ingredient elixir we can brew to foster resilience in our systems and guide our security chaos engineering transformation. We will start with an overview of what systems resilience means and imbibe our recipe for resilience. From there, we will explore the garden of practical opportunities we can harvest for each resilience potion ingredient when developing and delivering software. By the end of the talk, you’ll gather bountiful inspiration for how to nourish a transformation towards resilience in your own organization.

SESSION

Navigating Complexity and Reinforcing System Security

As engineers, we all know that delivering secure and reliable software is not an easy feat. Today’s systems are complex, dynamic, non-linear, and unpredictable, it's not a surprise that keeping our systems secure against threats in this challenging ecosystem has become exponentially more difficult. But what if there was a way to approach this problem differently? In this talk we will explore the Security Chaos Engineering (SCE), a proactive approach to building confidence in the security of complex systems through continuous security experimentation. By experimenting with different scenarios, organizations can gain a more realistic understanding of their security practices and reduce the likelihood of security blind spots that could result in an erosion of trust and system safety. In this session, Aaron Rinehart, the pioneer of SCE and co-author of the O'Reilly books on the topic, will share his expertise and guide you through the process of getting started in applying the practice. With SCE, you can proactively identify and navigate security unknowns, giving you the peace of mind that your software is secure and reliable. Join us to learn how to build confidence in your system's security and navigate the complex, ever-evolving world of software engineering.