Latest Episodes

Simplified Data Extraction And Analysis For Current Events With Newspaper - Episode 280

News media is an important source of information for understanding the context of the world. To make it easier to access and process the contents of news sites Lucas Ou-Yang built the Newspaper library that aids in automatic retrieval of articles and prepare it for analysis. In this episode he shares how the project got started, how it is implemented, and how you can get started with it today. He also discusses how recent improvements in the utility and ease of use of deep learning libraries open new possibilities...

Play Episode

Digging Into Dagster: An Opinionated Open Source Framework For Data Orchestration - Episode 279

Data applications are complex and continually evolving, often requiring collaboration across multiple teams. In order to keep everyone on the same page a high level abstraction is needed to facilitate a cross-cutting view of the data orchestration across integration, transformation, analytics, and machine learning. Dagster is an innovative new framework that leans on the power and flexibility of Python to provide an extensible interface to the complete lifecycle of data projects. In this episode Nick Schrock explains how he designed the Dagster project to allow for integration with the...

Play Episode

When, Why, and How To Use Web Scraping In A Nutshell - Episode 278

The internet is a rich source of information, but a majority of it isn't accessible programmatically through APIs or databases. To address that shortcoming there are a variety of web scraping frameworks that aid in extracting structured data from web pages. In this episode Attila Tóth shares the challenges of web data extraction, the ways that you can use it, and how Scrapy and ScrapingHub can help you with your projects.

Play Episode

Working In The Code Mines: Mining Software Repositories With PyDriller - Episode 277

A large portion of the software industry has standardized on Git as the version control sytem of choice. But have you thought about all of the information that you are generating with your branches, commits, and code changes? Davide Spadini created the PyDriller framework to simplify the work of mining software repositories to perform research on the technical and social aspects of software engineering. In this episode he shares some of the insights that you can gain by exploring the history of your code, the complexities of building a...

Play Episode

Building The Open Data Ecosystem For Music And More At Metabrainz - Episode 276

The Musicbrainz project was an early entry in the movement to build an open data ecosystem. In recent years, the Metabrainz Foundation has fostered a growing ecosystem of projects to support the contribution of, and access to, metadata, listening habits, and review of music. The majority of those projects are written in Python, and in this episode Param Singh explains how they are built, how they fit together, and how they support the goals of the Metabrains Foundation. This was an interesting exporation of the work involved in building...

Play Episode

Growing Dask To Make Scaling Python Data Science Easier At Coiled - Episode 275

Python is a leading choice for data science due to the immense number of libraries and frameworks readily available to support it, but it is still difficult to scale. Dask is a framework designed to transparently run your data analysis across multiple CPU cores and multiple servers. Using Dask lifts a limitation for scaling your analytical workloads, but brings with it the complexity of server administration, deployment, and security. In this episode Matthew Rocklin and Hugo Bowne-Anderson discuss their recently formed company Coiled and how they are working to...

Play Episode

Supporting The Full Lifecycle Of Machine Learning Projects With Metaflow - Episode 274

Netflix uses machine learning to power every aspect of their business. To do this effectively they have had to build extensive expertise and tooling to support their engineers. In this episode Savin Goyal discusses the work that he and his team are doing on the open source machine learning operations platform Metaflow. He shares the inspiration for building an opinionated framework for the full lifecycle of machine learning projects, how it is implemented, and how they have designed it to be extensible to allow for easy adoption by users...

Play Episode

Learning To Program By Building Tiny Python Projects - Episode 273

One of the best methods for learning programming is to just build a project and see how things work first-hand. With that in mind, Ken Youens-Clark wrote a whole book of Tiny Python Projects that you can use to get started on your journey. In this episode he shares his inspiration for the book, his thoughts on the benefits of teaching testing principles and the use of linting and formatting tools, as well as the benefits of trying variations on a working program to see how it behaves. This...

Play Episode

Idiomatic Functional Programming With DRY Python - Episode 272

Python is an intuitive and flexible language, but that versatility can also lead to problematic designs if you're not careful. Nikita Sobolev is the CTO of Wemake Services where he works on open source projects that encourage clean coding practices and maintainable architectures. In this episode he discusses his work on the DRY Python set of libraries and how they provide an accessible interface to functional programming patterns while maintaining an idiomatic Python interface. He also shares the story behind the wemake Python styleguide plugin for Flake8 and the...

Play Episode

The Past, Present, And Future Of The FLUFL: Barry Warsaw Shares His History With Python - Episode 271

Barry Warsaw has been a member of the Python community since the very beginning. His contributions to the growth of the language and its ecosystem are innumerable and diverse, earning him the title of Friendly Language Uncle For Life. In this episode he reminisces on his experiences as a core developer, a member of the Python Steering Committee, and his roles at Canonical and LinkedIn supporting the use of Python at those companies. In order to know where you are going it is always important to understand where you...

Play Episode

Join The Mailing List