Growing Dask To Make Scaling Python Data Science Easier At Coiled - Episode 275

Summary

Python is a leading choice for data science due to the immense number of libraries and frameworks readily available to support it, but it is still difficult to scale. Dask is a framework designed to transparently run your data analysis across multiple CPU cores and multiple servers. Using Dask lifts a limitation for scaling your analytical workloads, but brings with it the complexity of server administration, deployment, and security. In this episode Matthew Rocklin and Hugo Bowne-Anderson discuss their recently formed company Coiled and how they are working to make use and maintenance of Dask in production. The share the goals for the business, their approach to building a profitable company based on open source, and the difficulties they face while growing a new team during a global pandemic.

Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!


Datadog is a powerful, easy to use service for gaining comprehensive visibility into the state of your applications.

The easy to install Python agent lets you collect system metrics and log data, supports integrations with all of your services, and distributed tracing.

Their customizable dashboards and interactive graphs make finding and fixing performance issues fast and easy, and their machine-learning driven alerting ensures that you always know what is happening in your systems.

If you need even more detail about how your application is functioning you can track custom metrics, and their Application Performance Monitoring (APM) tools let you track the flow of requests through your stack.

Start tracking the performance of your apps with a free trial at pythonpodcast.com/datadog. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.

 



Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • This portion of Python Podcast is brought to you by Datadog. Do you have an app in production that is slower than you like? Is its performance all over the place (sometimes fast, sometimes slow)? Do you know why? With Datadog, you will. You can troubleshoot your app’s performance with Datadog’s end-to-end tracing and in one click correlate those Python traces with related logs and metrics. Use their detailed flame graphs to identify bottlenecks and latency in that app of yours. Start tracking the performance of your apps with a free trial at datadog.com/pythonpodcast. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host as usual is Tobias Macey and today I’m interviewing Matthew Rocklin and Hugo Bowne-Anderson about their work building a business around the Dask ecosystem at Coiled

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you give a quick overview of what Dask is and your motivations for creating it?
    • How has Dask changed or evolved in the past 3 1/2 years since we last talked about it?
  • How has the rest of the ecosystem changed in that time?
  • After working on Dask for the past few years, what led you to the decision to build a business around it?
  • What are the sharp edges of programming for Dask that users are looking for help on solving?
  • What are the difficulties that users face in deploying and maintaining a production installation of Dask?
  • What are the limitations of Dask when scaling both up and down?
  • What are you building at Coiled to improve the user experience for users of Python and Dask?
    • What are your thoughts on the pros and cons of orienting your messaging around the scalability of Python, as opposed to focusing on a specific industry or problem domain?
  • What are the challenges that you are facing in managing the tensions between the open source and proprietary work that you are doing?
  • How are you handling the ongoing governance of the Dask project?
  • What are some of the most interesting, unexpected, or challenging lessons that you have learned while building and launching a company based on an open source project?
  • What do you have planned for the future of both Coiled and Dask?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Liked it? Take a second to support Podcast.__init__ on Patreon!