Deep learning has largely taken over the research and applications of artificial intelligence, with some truly impressive results. The challenge that it presents is that for reasonable speed and performance it requires specialized hardware, generally in the form of a dedicated GPU (Graphics Processing Unit). This raises the cost of the infrastructure, adds deployment complexity, and drastically increases the energy requirements for training and serving of models. To address these challenges Nir Shavit combined his experiences in multi-core computing and brain science to co-found Neural Magic where he is leading the efforts to build a set of tools that prune dense neural networks to allow them to execute on commodity CPU hardware. In this episode he explains how sparsification of deep learning models works, the potential that it unlocks for making machine learning and specialized AI more accessible, and how you can start using it today.
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Nir Shavit about Neural Magic and the benefits of using sparsification techniques for deep learning models
- How did you get introduced to Python?
- Can you describe what Neural Magic is and the story behind it?
- What are the attributes of deep learning architectures that influence the bias toward GPU hardware for training them?
- What are the mathematical aspects of neural networks that have biased the current generation of software tools toward that architectural style?
- How does sparsifying a network architecture allow for improved performance on commodity CPU architectures?
- What is involved in converting a dense neural network into a sparse network?
- Can you describe the components of the Neural Magic architecture and how they are used together to reduce the footprint of deep learning architectures and accelerate their performance on CPUs?
- What are some of the goals or design approaches that have changed or evolved since you first began working on the Neural Magic platform?
- For someone who has an existing model defined, what is the process to convert it to run with the DeepSparse engine?
- What are some of the options for applications of deep learning that are unlocked by enabling the models to train and run without GPU or other specialized hardware?
- The current set of components for Neural Magic is either open source or free to use. What is your long-term business model, and how are you approaching governance of the open source projects?
- What are the most interesting, innovative, or unexpected ways that you have seen Neural Magic and model sparsification used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Neural Magic?
- When is Neural Magic or sparse networks the wrong choice?
- What do you have planned for the future of Neural Magic?
Keep In Touch
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email email@example.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Neural Magic
- Computational Neurobiology
- 6.006 MIT Course
- FLOPS == FLoating point OPerations per Second
- Convolutional Neural Network
- Quantization of ML
- YOLO ML Model
- Federated Learning
- Reinforcement Learning
- Transfer Learning
- Tensor Columns
- Neural Magic DeepSparse Engine
- Sparse Zoo