Data Analysis

Simplified Data Extraction And Analysis For Current Events With Newspaper - Episode 280

News media is an important source of information for understanding the context of the world. To make it easier to access and process the contents of news sites Lucas Ou-Yang built the Newspaper library that aids in automatic retrieval of articles and prepare it for analysis. In this episode he shares how the project got started, how it is implemented, and how you can get started with it today. He also discusses how recent improvements in the utility and ease of use of deep learning libraries open new possibilities for future iterations of the project.

Read More

When, Why, and How To Use Web Scraping In A Nutshell - Episode 278

The internet is a rich source of information, but a majority of it isn’t accessible programmatically through APIs or databases. To address that shortcoming there are a variety of web scraping frameworks that aid in extracting structured data from web pages. In this episode Attila Tóth shares the challenges of web data extraction, the ways that you can use it, and how Scrapy and ScrapingHub can help you with your projects.

Read More

Open Source Product Analytics With PostHog - Episode 266

You spend a lot of time and energy on building a great application, but do you know how it’s actually being used? Using a product analytics tool lets you gain visibility into what your users find helpful so that you can prioritize feature development and optimize customer experience. In this episode PostHog CTO Tim Glaser shares his experience building an open source product analytics platform to make it easier and more accessible to understand your product. He shares the story of how and why PostHog was created, how to incorporate it into your projects, the benefits of providing it as open source, and how it is implemented. If you are tired of fighting with your user analytics tools, or unwilling to entrust your data to a third party, then have a listen and then test out PostHog for yourself.

Read More

Build Your Own Personal Data Repository With Nostalgia - Episode 248

The companies that we entrust our personal data to are using that information to gain extensive insights into our lives and habits while not always making those findings accessible to us. Pascal van Kooten decided that he wanted to have the same capabilities to mine his personal data, so he created the Nostalgia project to integrate his various data sources and query across them. In this episode he shares his motivation for creating the project, how he is using it in his day-to-day, and how he is planning to evolve it in the future. If you’re interested in learning more about yourself and your habits using the personal data that you share with the various services you use then listen now to learn more.

Read More

Building A Business On Building Data Driven Businesses - Episode 246

In order for an organization to be data driven they need easy access to their data and a simple way of sharing it. Arik Fraimovich built Redash as a way to address that need by connecting to any data source and building attractive dashboards on top of them. In this episode he shares the origin story of the project, his experiences running a business based on open source, and the challenges of working with data effectively.

Read More

Exploratory Data Analysis Made Easy At The Command Line - Episode 230

There are countless tools and libraries in Python for data scientists to perform powerful analyses, but they often have a setup cost that acts as a barrier to ad-hoc exploration of data. Visidata is a command line application that eliminates the friction involved with starting the discovery process. In this episode Saul Pwanson explains his motivation for creating it, why a terminal environment is a useful place for this work, and how you can use Visidata for your own work. If you have ever avoided looking at a data set because you couldn’t be bothered with the boilerplate for a Jupyter notebook, then Visidata is the perfect addition to your toolbox.

Read More

Combining Python And SQL To Build A PyData Warehouse - Episode 227

The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.

Read More

Algorithmic Trading In Python Using Open Tools And Open Data - Episode 216

Algorithmic trading is a field that has grown in recent years due to the availability of cheap computing and platforms that grant access to historical financial data. QuantConnect is a business that has focused on community engagement and open data access to grant opportunities for learning and growth to their users. In this episode CEO Jared Broad and senior engineer Alex Catarino explain how they have built an open source engine for testing and running algorithmic trading strategies in multiple languages, the challenges of collecting and serving currrent and historical financial data, and how they provide training and opportunity to their community members. If you are curious about the financial industry and want to try it out for yourself then be sure to listen to this episode and experiment with the QuantConnect platform for free.

Read More

Wes McKinney's Career In Python For Data Analysis - Episode 203

Python has become one of the dominant languages for data science and data analysis. Wes McKinney has been working for a decade to make tools that are easy and powerful, starting with the creation of Pandas, and eventually leading to his current work on Apache Arrow. In this episode he discusses his motivation for this work, what he sees as the current challenges to be overcome, and his hopes for the future of the industry.

Read More

Teaching Digital Archaeology With Jupyter Notebooks - Episode 194

Computers have found their way into virtually every area of human endeavor, and archaeology is no exception. To aid his students in their exploration of digital archaeology Shawn Graham helped to create an online, digital textbook with accompanying interactive notebooks. In this episode he explains how computational practices are being applied to archaeological research, how the Online Digital Archaeology Textbook was created, and how you can use it to get involved in this fascinating area of research.

Read More