Machine Learning

Go From Notebook To Pipeline For Your Data Science Projects With Orchest - Episode 304

Jupyter notebooks are a dominant tool for data scientists, but they lack a number of conveniences for building reusable and maintainable systems. For machine learning projects in particular there is a need for being able to pivot from exploring a particular dataset or problem to integrating that solution into a larger workflow. Rick Lamers and Yannick Perrenet were tired of struggling with one-off solutions when they created the Orchest platform. In this episode they explain how Orchest allows you to turn your notebooks into executable components that are integrated into a graph of execution for running end-to-end machine learning workflows.

Read More

Giving Your Data Science Projects And Teams A Home At DagsHub - Episode 301

Collaborating on software projects is largely a solved problem, with a variety of hosted or self-managed platforms to choose from. For data science projects, collaboration is still an open question. There are a number of projects that aim to bring collaboration to data science, but they are all solving a different aspect of the problem. Dean Pleban and Guy Smoilovsky created DagsHub to give individuals and teams a place to store and version their code, data, and models. In this episode they explain how DagsHub is designed to make it easier to create and track machine learning experiments, and serve as a way to promote collaboration on open source data science projects.

Read More

Add Anomaly Detection To Your Time Series Data With Luminaire - Episode 293

When working with data it’s important to understand when it is correct. If there is a time dimension, then it can be difficult to know when variation is normal. Anomaly detection is a useful tool to address these challenges, but a difficult one to do well. In this episode Smit Shah and Sayan Chakraborty share the work they have done on Luminaire to make anomaly detection easier to work with. They explain the complexities inherent to working with time series data, the strategies that they have incorporated into Luminaire, and how they are using it in their data pipelines to identify errors early. If you are working with any kind of time series then it’s worth giving Luminaure a look.

Read More

Scale Your Data Science Teams With Machine Learning Operations Principles - Episode 289

Building a machine learning model is a process that requires well curated and cleaned data and a lot of experimentation. Doing it repeatably and at scale with a team requires a way to share your discoveries with your teammates. This has led to a new set of operational ML platforms. In this episode Michael Del Balso shares the lessons that he learned from building the platform at Uber for putting machine learning into production. He also explains how the feature store is becoming the core abstraction for data teams to collaborate on building machine learning models. If you are struggling to get your models into production, or scale your data science throughput, then this interview is worth a listen.

Read More

Supporting The Full Lifecycle Of Machine Learning Projects With Metaflow - Episode 274

Netflix uses machine learning to power every aspect of their business. To do this effectively they have had to build extensive expertise and tooling to support their engineers. In this episode Savin Goyal discusses the work that he and his team are doing on the open source machine learning operations platform Metaflow. He shares the inspiration for building an opinionated framework for the full lifecycle of machine learning projects, how it is implemented, and how they have designed it to be extensible to allow for easy adoption by users inside and outside of Netflix. This was a great conversation about the challenges of building machine learning projects and the work being done to make it more achievable.

Read More

Teaching Python Machine Learning - Episode 260

Python has become a major player in the machine learning industry, with a variety of widely used frameworks. In addition to the technical resources that make it easy to build powerful models, there is also a sizable library of educational resources to help you get up to speed. Sebastian Raschka’s contribution of the Python Machine Learning book has come to be widely regarded as one of the best references for newcomers to the field. In this episode he shares his experiences as an author, his views on why Python is the right language for building machine learning applications, and the insights that he has gained from teaching and contributing to the field.

Read More

Distributed Computing In Python Made Easy With Ray - Episode 258

Distributed computing is a powerful tool for increasing the speed and performance of your applications, but it is also a complex and difficult undertaking. While performing research for his PhD, Robert Nishihara ran up against this reality. Rather than cobbling together another single purpose system, he built what ultimately became Ray to make scaling Python projects to multiple cores and across machines easy. In this episode he explains how Ray allows you to scale your code easily, how to use it in your own projects, and his ambitions to power the next wave of distributed systems at Anyscale. If you are running into scaling limitations in your Python projects for machine learning, scientific computing, or anything else, then give this a listen and then try it out!

Read More

An Open Source Toolchain For Natural Language Processing From Explosion AI - Episode 256

The state of the art in natural language processing is a constantly moving target. With the rise of deep learning, previously cutting edge techniques have given way to robust language models. Through it all the team at Explosion AI have built a strong presence with the trifecta of SpaCy, Thinc, and Prodigy to support fast and flexible data labeling to feed deep learning models and performant and scalable text processing. In this episode founder and open source author Matthew Honnibal shares his experience growing a business around cutting edge open source libraries for the machine learning developent process.

Read More

Open Source Machine Learning On Quantum Computers With Xanadu AI - Episode 253

Quantum computers promise the ability to execute calculations at speeds several orders of magnitude faster than what we are used to. Machine learning and artificial intelligence algorithms require fast computation to churn through complex data sets. At Xanadu AI they are building libraries to bring these two worlds together. In this episode Josh Izaac shares his work on the Strawberry Fields and Penny Lane projects that provide both high and low level interfaces to quantum hardware for machine learning and deep neural networks. If you are itching to get your hands on the coolest combination of technologies, then listen now and then try it out for yourself.

Read More

From Simple Script To Beautiful Web Application With Streamlit - Episode 238

Building well designed and easy to use web applications requires a significant amount of knowledge and experience across a range of domains. This can act as an impediment to engineers who primarily work in so-called back-end technologies such as machine learning and systems administration. In this episode Adrien Treuille describes how the Streamlit framework empowers anyone who is comfortable writing Python scripts to create beautiful applications to share their work and make it accessible to their colleagues and customers. If you have ever struggled with hacking together a simple web application to make a useful script self-service then give this episode a listen and then go experiment with how Streamlit can level up your work.

Read More