Failure Testing prepares us, both socially and technically, for how our systems will behave in the face of failure. By proactively testing, we can find and fix problems before they become crises. Practice makes perfect, yet a real calamity is not a good time for training. Knowing how our systems fail is paramount to building a resilient service.
At Netflix and Amazon, we ran failure exercises on a regular basis to ensure we were prepared. These experiments helped us find problems and saved us from future incidents. Come and learn how to run an effective “Game Day” and safely test in production. Then sleep peacefully knowing you are ready!