Data Science

Scale Your Data Science Teams With Machine Learning Operations Principles - Episode 289

Building a machine learning model is a process that requires well curated and cleaned data and a lot of experimentation. Doing it repeatably and at scale with a team requires a way to share your discoveries with your teammates. This has led to a new set of operational ML platforms. In this episode Michael Del Balso shares the lessons that he learned from building the platform at Uber for putting machine learning into production. He also explains how the feature store is becoming the core abstraction for data teams to collaborate on building machine learning models. If you are struggling to get your models into production, or scale your data science throughput, then this interview is worth a listen.

Read More

Bringing Artificial Intelligence Projects From Idea To Production - Episode 287

Artificial intelligence applications can provide dramatic benefits to a business, but only if you can bring them from idea to production. Henrik Landgren was behind the original efforts at Spotify to leverage data for new product features, and in his current role he works on an AI system to evaluate new businesses to invest in. In this episode he shares advice on how to identify opportunities for leveraging AI to improve your business, the capabilities necessary to enable aa successful project, and some of the pitfalls to watch out for. If you are curious about how to get started with AI, or what to consider as you build a project, then this is definitely worth a listen.

Read More

Growing Dask To Make Scaling Python Data Science Easier At Coiled - Episode 275

Python is a leading choice for data science due to the immense number of libraries and frameworks readily available to support it, but it is still difficult to scale. Dask is a framework designed to transparently run your data analysis across multiple CPU cores and multiple servers. Using Dask lifts a limitation for scaling your analytical workloads, but brings with it the complexity of server administration, deployment, and security. In this episode Matthew Rocklin and Hugo Bowne-Anderson discuss their recently formed company Coiled and how they are working to make use and maintenance of Dask in production. The share the goals for the business, their approach to building a profitable company based on open source, and the difficulties they face while growing a new team during a global pandemic.

Read More

Building The Seq Language For Bioinformatics - Episode 257

Bioinformatics is a complex and computationally demanding domain. The intuitive syntax of Python and extensive set of libraries make it a great language for bioinformatics projects, but it is hampered by the need for computational efficiency. Ariya Shajii created the Seq language to bridge the divide between the performance of languages like C and C++ and the ecosystem of Python with built-in support for commonly used genomics algorithms. In this episode he describes his motivation for creating a new language, how it is implemented, and how it is being used in the life sciences. If you are interested in experimenting with sequencing data then give this a listen and then give Seq a try!

Read More

Open Source Machine Learning On Quantum Computers With Xanadu AI - Episode 253

Quantum computers promise the ability to execute calculations at speeds several orders of magnitude faster than what we are used to. Machine learning and artificial intelligence algorithms require fast computation to churn through complex data sets. At Xanadu AI they are building libraries to bring these two worlds together. In this episode Josh Izaac shares his work on the Strawberry Fields and Penny Lane projects that provide both high and low level interfaces to quantum hardware for machine learning and deep neural networks. If you are itching to get your hands on the coolest combination of technologies, then listen now and then try it out for yourself.

Read More

From Simple Script To Beautiful Web Application With Streamlit - Episode 238

Building well designed and easy to use web applications requires a significant amount of knowledge and experience across a range of domains. This can act as an impediment to engineers who primarily work in so-called back-end technologies such as machine learning and systems administration. In this episode Adrien Treuille describes how the Streamlit framework empowers anyone who is comfortable writing Python scripts to create beautiful applications to share their work and make it accessible to their colleagues and customers. If you have ever struggled with hacking together a simple web application to make a useful script self-service then give this episode a listen and then go experiment with how Streamlit can level up your work.

Read More

Illustrating The Landscape And Applications Of Deep Learning - Episode 234

Deep learning is a phrase that is used more often as it continues to transform the standard approach to artificial intelligence and machine learning projects. Despite its ubiquity, it is often difficult to get a firm understanding of how it works and how it can be applied to a particular problem. In this episode Jon Krohn, author of Deep Learning Illustrated, shares the general concepts and useful applications of this technique, as well as sharing some of his practical experience in using it for his work. This is definitely a helpful episode for getting a better comprehension of the field of deep learning and when to reach for it in your own projects.

Read More

Exploratory Data Analysis Made Easy At The Command Line - Episode 230

There are countless tools and libraries in Python for data scientists to perform powerful analyses, but they often have a setup cost that acts as a barrier to ad-hoc exploration of data. Visidata is a command line application that eliminates the friction involved with starting the discovery process. In this episode Saul Pwanson explains his motivation for creating it, why a terminal environment is a useful place for this work, and how you can use Visidata for your own work. If you have ever avoided looking at a data set because you couldn’t be bothered with the boilerplate for a Jupyter notebook, then Visidata is the perfect addition to your toolbox.

Read More

Combining Python And SQL To Build A PyData Warehouse - Episode 227

The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.

Read More

Build Your Own Knowledge Graph With Zincbase - Episode 223

Computers are excellent at following detailed instructions, but they have no capacity for understanding the information that they work with. Knowledge graphs are a way to approximate that capability by building connections between elements of data that allow us to discover new connections among disparate information sources that were previously uknown. In our day-to-day work we encounter many instances of knowledge graphs, but building them has long been a difficult endeavor. In order to make this technology more accessible Tom Grek built Zincbase. In this episode he explains his motivations for starting the project, how he uses it in his daily work, and how you can use it to create your own knowledge engine and begin discovering new insights of your own.

Read More