Open Source

Solving Python Package Creation For End User Applications With PyOxidizer - Episode 282

Python is a powerful and expressive programming language with a vast ecosystem of incredible applications. Unfortunately, it has always been challenging to share those applications with non-technical end users. Gregory Szorc set out to solve the problem of how to put your code on someone else’s computer and have it run without having to rely on extra systems such as virtualenvs or Docker. In this episode he shares his work on PyOxidizer and how it allows you to build a self-contained Python runtime along with statically linked dependencies and the software that you want to run. He also digs into some of the edge cases in the Python language and its ecosystem that make this a challenging problem to solve, and some of the lessons that he has learned in the process. PyOxidizer is an exciting step forward in the evolution of packaging and distribution for the Python language and community.

Read More

Flexible Network Security Detection And Response With Grapl - Episode 281

Servers and services that have any exposure to the public internet are under a constant barrage of attacks. Network security engineers are tasked with discovering and addressing any potential breaches to their systems, which is a never-ending task as attackers continually evolve their tactics. In order to gain better visibility into complex exploits Colin O’Brien built the Grapl platform, using graph database technology to more easily discover relationships between activities within and across servers. In this episode he shares his motivations for creating a new system to discover potential security breaches, how its design simplifies the work of identifying complex attacks without relying on brittle rules, and how you can start using it to monitor your own systems today.

Read More

Simplified Data Extraction And Analysis For Current Events With Newspaper - Episode 280

News media is an important source of information for understanding the context of the world. To make it easier to access and process the contents of news sites Lucas Ou-Yang built the Newspaper library that aids in automatic retrieval of articles and prepare it for analysis. In this episode he shares how the project got started, how it is implemented, and how you can get started with it today. He also discusses how recent improvements in the utility and ease of use of deep learning libraries open new possibilities for future iterations of the project.

Read More

Digging Into Dagster: An Opinionated Open Source Framework For Data Orchestration - Episode 279

Data applications are complex and continually evolving, often requiring collaboration across multiple teams. In order to keep everyone on the same page a high level abstraction is needed to facilitate a cross-cutting view of the data orchestration across integration, transformation, analytics, and machine learning. Dagster is an innovative new framework that leans on the power and flexibility of Python to provide an extensible interface to the complete lifecycle of data projects. In this episode Nick Schrock explains how he designed the Dagster project to allow for integration with the entire data ecosystem while providing an opinionated structure for connecting the different stages of computation. He also discusses how he is working to grow an open ecosystem around the Dagster project, and his thoughts on building a sustainable business on top of it without compromising the integrity of the community. This was a great conversation about playing the long game when building a business while providing a valuable utility to a complex problem domain.

Read More

Working In The Code Mines: Mining Software Repositories With PyDriller - Episode 277

A large portion of the software industry has standardized on Git as the version control sytem of choice. But have you thought about all of the information that you are generating with your branches, commits, and code changes? Davide Spadini created the PyDriller framework to simplify the work of mining software repositories to perform research on the technical and social aspects of software engineering. In this episode he shares some of the insights that you can gain by exploring the history of your code, the complexities of building a framework to interact with Git, and some of the interesting ways that PyDriller can be used to inform your own development practices.

Read More

Building The Open Data Ecosystem For Music And More At Metabrainz - Episode 276

The Musicbrainz project was an early entry in the movement to build an open data ecosystem. In recent years, the Metabrainz Foundation has fostered a growing ecosystem of projects to support the contribution of, and access to, metadata, listening habits, and review of music. The majority of those projects are written in Python, and in this episode Param Singh explains how they are built, how they fit together, and how they support the goals of the Metabrains Foundation. This was an interesting exporation of the work involved in building an ecosystem of open data, the challenges of making it sustainable, and the benefits of building for the long term rather than trying to achieve a quick win.

Read More

Growing Dask To Make Scaling Python Data Science Easier At Coiled - Episode 275

Python is a leading choice for data science due to the immense number of libraries and frameworks readily available to support it, but it is still difficult to scale. Dask is a framework designed to transparently run your data analysis across multiple CPU cores and multiple servers. Using Dask lifts a limitation for scaling your analytical workloads, but brings with it the complexity of server administration, deployment, and security. In this episode Matthew Rocklin and Hugo Bowne-Anderson discuss their recently formed company Coiled and how they are working to make use and maintenance of Dask in production. The share the goals for the business, their approach to building a profitable company based on open source, and the difficulties they face while growing a new team during a global pandemic.

Read More

Supporting The Full Lifecycle Of Machine Learning Projects With Metaflow - Episode 274

Netflix uses machine learning to power every aspect of their business. To do this effectively they have had to build extensive expertise and tooling to support their engineers. In this episode Savin Goyal discusses the work that he and his team are doing on the open source machine learning operations platform Metaflow. He shares the inspiration for building an opinionated framework for the full lifecycle of machine learning projects, how it is implemented, and how they have designed it to be extensible to allow for easy adoption by users inside and outside of Netflix. This was a great conversation about the challenges of building machine learning projects and the work being done to make it more achievable.

Read More

Idiomatic Functional Programming With DRY Python - Episode 272

Python is an intuitive and flexible language, but that versatility can also lead to problematic designs if you’re not careful. Nikita Sobolev is the CTO of Wemake Services where he works on open source projects that encourage clean coding practices and maintainable architectures. In this episode he discusses his work on the DRY Python set of libraries and how they provide an accessible interface to functional programming patterns while maintaining an idiomatic Python interface. He also shares the story behind the wemake Python styleguide plugin for Flake8 and the benefits of strict linting rules to engender good development habits. This was a great conversation about useful practices to build software that will be easy and fun to work on.

Read More

Pure Python Configuration Management With PyInfra - Episode 270

Building and managing servers is a challenging task. Configuration management tools provide a framework for handling the various tasks involved, but many of them require learning a specific syntax and toolchain. PyInfra is a configuration management framework that embraces the familiarity of Pure Python, allowing you to build your own integrations easily and package it all up using the same tools that you rely on for your applications. In this episode Nick Barrett explains why he built it, how it is implemented, and the ways that you can start using it today. He also shares his vision for the future of the project and you can get involved. If you are tired of writing mountains of YAML to set up your servers then give PyInfra a try today.

Read More