Data Science

Growing Dask To Make Scaling Python Data Science Easier At Coiled - Episode 275

Python is a leading choice for data science due to the immense number of libraries and frameworks readily available to support it, but it is still difficult to scale. Dask is a framework designed to transparently run your data analysis across multiple CPU cores and multiple servers. Using Dask lifts a limitation for scaling your analytical workloads, but brings with it the complexity of server administration, deployment, and security. In this episode Matthew Rocklin and Hugo Bowne-Anderson discuss their recently formed company Coiled and how they are working to make use and maintenance of Dask in production. The share the goals for the business, their approach to building a profitable company based on open source, and the difficulties they face while growing a new team during a global pandemic.

Read More

Building The Seq Language For Bioinformatics - Episode 257

Bioinformatics is a complex and computationally demanding domain. The intuitive syntax of Python and extensive set of libraries make it a great language for bioinformatics projects, but it is hampered by the need for computational efficiency. Ariya Shajii created the Seq language to bridge the divide between the performance of languages like C and C++ and the ecosystem of Python with built-in support for commonly used genomics algorithms. In this episode he describes his motivation for creating a new language, how it is implemented, and how it is being used in the life sciences. If you are interested in experimenting with sequencing data then give this a listen and then give Seq a try!

Read More

Open Source Machine Learning On Quantum Computers With Xanadu AI - Episode 253

Quantum computers promise the ability to execute calculations at speeds several orders of magnitude faster than what we are used to. Machine learning and artificial intelligence algorithms require fast computation to churn through complex data sets. At Xanadu AI they are building libraries to bring these two worlds together. In this episode Josh Izaac shares his work on the Strawberry Fields and Penny Lane projects that provide both high and low level interfaces to quantum hardware for machine learning and deep neural networks. If you are itching to get your hands on the coolest combination of technologies, then listen now and then try it out for yourself.

Read More

From Simple Script To Beautiful Web Application With Streamlit - Episode 238

Building well designed and easy to use web applications requires a significant amount of knowledge and experience across a range of domains. This can act as an impediment to engineers who primarily work in so-called back-end technologies such as machine learning and systems administration. In this episode Adrien Treuille describes how the Streamlit framework empowers anyone who is comfortable writing Python scripts to create beautiful applications to share their work and make it accessible to their colleagues and customers. If you have ever struggled with hacking together a simple web application to make a useful script self-service then give this episode a listen and then go experiment with how Streamlit can level up your work.

Read More

Illustrating The Landscape And Applications Of Deep Learning - Episode 234

Deep learning is a phrase that is used more often as it continues to transform the standard approach to artificial intelligence and machine learning projects. Despite its ubiquity, it is often difficult to get a firm understanding of how it works and how it can be applied to a particular problem. In this episode Jon Krohn, author of Deep Learning Illustrated, shares the general concepts and useful applications of this technique, as well as sharing some of his practical experience in using it for his work. This is definitely a helpful episode for getting a better comprehension of the field of deep learning and when to reach for it in your own projects.

Read More

Exploratory Data Analysis Made Easy At The Command Line - Episode 230

There are countless tools and libraries in Python for data scientists to perform powerful analyses, but they often have a setup cost that acts as a barrier to ad-hoc exploration of data. Visidata is a command line application that eliminates the friction involved with starting the discovery process. In this episode Saul Pwanson explains his motivation for creating it, why a terminal environment is a useful place for this work, and how you can use Visidata for your own work. If you have ever avoided looking at a data set because you couldn’t be bothered with the boilerplate for a Jupyter notebook, then Visidata is the perfect addition to your toolbox.

Read More

Combining Python And SQL To Build A PyData Warehouse - Episode 227

The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.

Read More

Build Your Own Knowledge Graph With Zincbase - Episode 223

Computers are excellent at following detailed instructions, but they have no capacity for understanding the information that they work with. Knowledge graphs are a way to approximate that capability by building connections between elements of data that allow us to discover new connections among disparate information sources that were previously uknown. In our day-to-day work we encounter many instances of knowledge graphs, but building them has long been a difficult endeavor. In order to make this technology more accessible Tom Grek built Zincbase. In this episode he explains his motivations for starting the project, how he uses it in his daily work, and how you can use it to create your own knowledge engine and begin discovering new insights of your own.

Read More

Open Source Automated Machine Learning With MindsDB - Episode 218

Machine learning is growing in popularity and capability, but for a majority of people it is still a black box that we don’t fully understand. The team at MindsDB is working to change this state of affairs by creating an open source tool that is easy to use without a background in data science. By simplifying the training and use of neural networks, and making their logic explainable, they hope to bring AI capabilities to more people and organizations. In this interview George Hosu and Jorge Torres explain how MindsDB is built, how to use it for your own purposes, and how they view the current landscape of AI technologies. This is a great episode for anyone who is interested in experimenting with machine learning and artificial intelligence. Give it a listen and then try MindsDB for yourself.

Read More

Algorithmic Trading In Python Using Open Tools And Open Data - Episode 216

Algorithmic trading is a field that has grown in recent years due to the availability of cheap computing and platforms that grant access to historical financial data. QuantConnect is a business that has focused on community engagement and open data access to grant opportunities for learning and growth to their users. In this episode CEO Jared Broad and senior engineer Alex Catarino explain how they have built an open source engine for testing and running algorithmic trading strategies in multiple languages, the challenges of collecting and serving currrent and historical financial data, and how they provide training and opportunity to their community members. If you are curious about the financial industry and want to try it out for yourself then be sure to listen to this episode and experiment with the QuantConnect platform for free.

Read More