Chaos engineering is the practice of injecting failures into your production systems in a controlled manner to identify weaknesses in your applications. In order to build, run, and report on chaos experiments Sylvain Hellegouarch created the Chaos Toolkit. In this episode he explains his motivation for creating the toolkit, how to use it for improving the resiliency of your systems, and his plans for the future. He also discusses best practices for building, running, and learning from your own experiments.
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email email@example.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Sylvain Hellegouarch about Chaos Toolkit, a framework for building and automating chaos engineering experiments
- How did you get introduced to Python?
- Can you start by explaining what Chaos Engineering is?
- What is the Chaos Toolkit and what motivated you to create it?
- How does it compare to the Gremlin platform?
- What is the workflow for using Chos Toolkit to build and run an experiment?
- What are the best practices for building a useful experiment?
- Once you have an experiment created, how often should it be executed?
- When running an experiment, what are some strategies for identifying points of failure, particularly if they are unexpected?
- What kinds of reporting and statistics are captured during a test run?
- Can you describe how Chaos Toolkit is implemented and how it has evolved since you began working on it?
- What are some of the most challenging aspects of ensuring that the experiments run via the Chaos Toolkit are safe and have a reliable rollback available?
- What have been some of the most interesting/useful/unexpected lessons that you have learned in the process of building and maintaining the Chaos Toolkit project and community?
- What do you have planned for the future of the project?
Keep In Touch
- Chaos Toolkit
- Chaos IQ
- Gremlin chaos engineering service
- Russ Miles Chaos IQ co-founder
- CherryPy minimalist Python web framework
- Cherrypy Essentials book
- Chaos Engineering
- Chaos Engineering Book
- SRE (Site Reliability Engineering)
- Dark Debt
- Netflix Simian Army
- Chaos Monkey
- Istio service mesh
- Chaos Platform
- Composition vs Inheritance
- Open Chaos Initiative