Data Analysis

Build Your Own Personal Data Repository With Nostalgia - Episode 248

The companies that we entrust our personal data to are using that information to gain extensive insights into our lives and habits while not always making those findings accessible to us. Pascal van Kooten decided that he wanted to have the same capabilities to mine his personal data, so he created the Nostalgia project to integrate his various data sources and query across them. In this episode he shares his motivation for creating the project, how he is using it in his day-to-day, and how he is planning to evolve it in the future. If you’re interested in learning more about yourself and your habits using the personal data that you share with the various services you use then listen now to learn more.

Read More

Building A Business On Building Data Driven Businesses - Episode 246

In order for an organization to be data driven they need easy access to their data and a simple way of sharing it. Arik Fraimovich built Redash as a way to address that need by connecting to any data source and building attractive dashboards on top of them. In this episode he shares the origin story of the project, his experiences running a business based on open source, and the challenges of working with data effectively.

Read More

Exploratory Data Analysis Made Easy At The Command Line - Episode 230

There are countless tools and libraries in Python for data scientists to perform powerful analyses, but they often have a setup cost that acts as a barrier to ad-hoc exploration of data. Visidata is a command line application that eliminates the friction involved with starting the discovery process. In this episode Saul Pwanson explains his motivation for creating it, why a terminal environment is a useful place for this work, and how you can use Visidata for your own work. If you have ever avoided looking at a data set because you couldn’t be bothered with the boilerplate for a Jupyter notebook, then Visidata is the perfect addition to your toolbox.

Read More

Combining Python And SQL To Build A PyData Warehouse - Episode 227

The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.

Read More

Algorithmic Trading In Python Using Open Tools And Open Data - Episode 216

Algorithmic trading is a field that has grown in recent years due to the availability of cheap computing and platforms that grant access to historical financial data. QuantConnect is a business that has focused on community engagement and open data access to grant opportunities for learning and growth to their users. In this episode CEO Jared Broad and senior engineer Alex Catarino explain how they have built an open source engine for testing and running algorithmic trading strategies in multiple languages, the challenges of collecting and serving currrent and historical financial data, and how they provide training and opportunity to their community members. If you are curious about the financial industry and want to try it out for yourself then be sure to listen to this episode and experiment with the QuantConnect platform for free.

Read More

Wes McKinney's Career In Python For Data Analysis - Episode 203

Python has become one of the dominant languages for data science and data analysis. Wes McKinney has been working for a decade to make tools that are easy and powerful, starting with the creation of Pandas, and eventually leading to his current work on Apache Arrow. In this episode he discusses his motivation for this work, what he sees as the current challenges to be overcome, and his hopes for the future of the industry.

Read More

Teaching Digital Archaeology With Jupyter Notebooks - Episode 194

Computers have found their way into virtually every area of human endeavor, and archaeology is no exception. To aid his students in their exploration of digital archaeology Shawn Graham helped to create an online, digital textbook with accompanying interactive notebooks. In this episode he explains how computational practices are being applied to archaeological research, how the Online Digital Archaeology Textbook was created, and how you can use it to get involved in this fascinating area of research.

Read More

Analyzing Satellite Image Data Using PyTroll - Episode 193

Every day there are satellites collecting sensor readings and imagery of our Earth. To help make sense of that information, developers at the meterological institutes of Sweden and Denmark worked together to build a collection of Python packages that simplify the work of downloading and processing the data gathered by satellites. In this episode one of the core developers of PyTroll explains how the project got started, how that data is being used by the scientific community, and how citizen scientists like you are getting involved.

Read More

Polyglot: Multi-Lingual Natural Language Processing with Rami Al-Rfou - Episode 190

Using computers to analyze text can produce useful and inspirational insights. However, when working with multiple languages the capabilities of existing models are severely limited. In order to help overcome this limitation Rami Al-Rfou built Polyglot. In this episode he explains his motivation for creating a natural language processing library with support for a vast array of languages, how it works, and how you can start using it for your own projects. He also discusses current research on multi-lingual text analytics, how he plans to improve Polyglot in the future, and how it fits in the Python ecosystem.

Read More

Entity Extraction, Document Processing, And Knowledge Graphs For Investigative Journalists with Friedrich Lindenberg - Episode 186

Investigative reporters have a challenging task of identifying complex networks of people, places, and events gleaned from a mixed collection of sources. Turning those various documents, electronic records, and research into a searchable and actionable collection of facts is an interesting and difficult technical challenge. Friedrich Lindenberg created the Aleph project to address this issue and in this episode he explains how it works, why he built it, and how it is being used. He also discusses his hopes for the future of the project and other ways that the system could be used.

Read More