Machine learning and deep learning techniques are powerful tools for a large and growing number of applications. Unfortunately, it is difficult or impossible to understand the reasons for the answers that they give to the questions they are asked. In order to help shine some light on what information is being used to provide the outputs to your machine learning models Scott Lundberg created the SHAP project. In this episode he explains how it can be used to provide insight into which features are most impactful when generating an output, and how that insight can be applied to make more useful and informed design choices. This is a fascinating and important subject and this episode is an excellent exploration of how to start addressing the challenge of explainability.
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Scott Lundberg about SHAP, a library that implements a game theoretic approach to explain the output of any machine learning model
- How did you get introduced to Python?
- Can you describe what SHAP is and the story behind it?
- What are some of the contexts that create the need to explain the reasoning behind the outputs of an ML model?
- How do different types of models (deep learning, CNN/RNN, bayesian vs. frequentist, etc.) and different categories of ML (e.g. NLP, computer vision) influence the challenge of understanding the meaningful signals in their reasoning?
- Taking a step back, how do you define "explainability" when discussing inferences produced by ML models?
- What are the degrees of specificity/accuracy when seeking to understand the decision processes involved?
- Can you describe how SHAP is implemented?
- What are the signals that you are tracking to understand what features are being used to determine a given output?
- What are the assumptions that you had as you started this project that have been challenged or updated as you explored the problem in greater depth?
- Can you describe the workflow for someone using SHAP?
- What are the challenges faced by practitioners in interpreting the visualizations generated from SHAP?
- How much domain knowledge and context is necessary to use SHAP effectively?
- What are the ongoing areas of research around tracking of ML decision processes?
- How are you using SHAP in your own work?
- What are the most interesting, innovative, or unexpected ways that you have seen SHAP used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on SHAP?
- When is SHAP the wrong choice?
- What do you have planned for the future of SHAP?
Keep In Touch
- Microsoft Research
- Game Theory
- Computational Biology
- Shapley Values
- Julia Language
- CNN == Convolutional Neural Network
- RNN == Recurrent Neural Network
- A* Algorithm
- CFPB == Consumer Financial Protection Bureau
- NP Hard
- Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations
- Log Odds