Netflix uses machine learning to power every aspect of their business. To do this effectively they have had to build extensive expertise and tooling to support their engineers. In this episode Savin Goyal discusses the work that he and his team are doing on the open source machine learning operations platform Metaflow. He shares the inspiration for building an opinionated framework for the full lifecycle of machine learning projects, how it is implemented, and how they have designed it to be extensible to allow for easy adoption by users inside and outside of Netflix. This was a great conversation about the challenges of building machine learning projects and the work being done to make it more achievable.
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!
Their customizable dashboards and interactive graphs make finding and fixing performance issues fast and easy, and their machine-learning driven alerting ensures that you always know what is happening in your systems.
If you need even more detail about how your application is functioning you can track custom metrics, and their Application Performance Monitoring (APM) tools let you track the flow of requests through your stack.
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- This portion of Python Podcast is brought to you by Datadog. Do you have an app in production that is slower than you like? Is its performance all over the place (sometimes fast, sometimes slow)? Do you know why? With Datadog, you will. You can troubleshoot your app’s performance with Datadog’s end-to-end tracing and in one click correlate those Python traces with related logs and metrics. Use their detailed flame graphs to identify bottlenecks and latency in that app of yours. Start tracking the performance of your apps with a free trial at datadog.com/pythonpodcast. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
- Your host as usual is Tobias Macey and today I’m interviewing Savin Goyal about Netflix’s infrastructure for machine learning
- How did you get introduced to Python?
- Can you start by describing the work you are doing at Netflix to support their machine learning workloads?
- How are you addressing the impedance mismatch of machine learning/data science work between local experimentation and production deployment?
- What was the motivation for building Metaflow?
- How does Metaflow compare to other tools in the ecosystem such as MLFlow?
- What was missing in the other available tools that made Metaflow necessary?
- workflow for someone using Metaflow
- How do you approach the design of the developer interface to make it approachable to machine learning engineers?
- level of coupling with overall Netflix data stack
- How is Metaflow implemented?
- How has the architecture and design of the system evolved since you first began working on it?
- supporting infrastructure/integration points
- motivation/benefits of releasing it as open source
- What are some of the most interesting, unexpected, or challenging lessons that you have learned while building infrastructure and tooling for machine learning?
- When is Metaflow the wrong choice?
- What do you have planned for the future of Metaflow and
Keep In Touch
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email firstname.lastname@example.org) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Data Lake
- Netflix Data Stack
- Chaos Engineering
- Chaos Monkey
- Netflix Simian Army
- Netflix Titus
- AWS Batch
- Netflix Meson
- Dataflow Programming
- DAG == Directed Acyclic Graph
- DVC (Data Version Control)
- CML (Continuous Machine Learning)
- AWS Step Functions