Python has become one of the dominant languages for data science and data analysis. Wes McKinney has been working for a decade to make tools that are easy and powerful, starting with the creation of Pandas, and eventually leading to his current work on Apache Arrow. In this episode he discusses his motivation for this work, what he sees as the current challenges to be overcome, and his hopes for the future of the industry.
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2020 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Check out the Practical AI podcast from our friends at Changelog Media to learn and stay up to date with what’s happening in AI
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with O’Reilly Media for the Strata conference in San Francisco on March 25th and the Artificial Intelligence conference in NYC on April 15th. Here in Boston, starting on May 17th, you still have time to grab a ticket to the Enterprise Data World, and from April 30th to May 3rd is the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
- Your host as usual is Tobias Macey and today I’m interviewing Wes McKinney about his contributions to the Python community and his current projects to make data analytics easier for everyone
- How did you get introduced to Python?
- You have spent a large portion of your career on building tools for data science and analytics in the Python ecosystem. What is your motivation for focusing on this problem domain?
- Having been an open source author and contributor for many years now, what are your current thoughts on paths to sustainability?
- What are some of the common challenges pertaining to data analysis that you have experienced in the various work environments and software projects that you have been involved in?
- What area(s) of data science and analytics do you find are not receiving the attention that they deserve?
- Recently there has been a lot of focus and excitement around the capabilities of neural networks and deep learning. In your experience, what are some of the shortcomings or blind spots to that class of approach that would be better served by other classes of solution?
- Your most recent work is focused on the Arrow project for improving interoperability across languages. What are some of the cases where a Python developer would want to incorporate capabilities from other runtimes?
- Do you think that we should be working to replicate some of those capabilities into the Python language and ecosystem, or is that wasted effort that would be better spent elsewhere?
- Now that Pandas has been in active use for over a decade and you have had the opportunity to get some space from it, what are your thoughts on its success?
- With the perspective that you have gained in that time, what would you do differently if you were starting over today?
- You are best known for being the creator of Pandas, but can you list some of the other achievements that you are most proud of?
- What projects are you most excited to be working on in the near to medium future?
- What are your grand ambitions for the future of the data science community, both in and outside of the Python ecosystem?
- Do you have any parting advice for active or aspiring data scientists, or resources that you would like to recommend?
Keep In Touch
- Ursa Labs
- AQR Capital Management
- Distributed Computing
- Duke University
- Chang She
- Open Source Governance
- Apache Software Foundation
- Paul Graham
- Schlep Blindness
- Big Data File Formats
- Apache Arrow
- Apache Impala
- R Language
- Pandas 2.0 Design Docs
- Apache Arrow and the 10 Things I Hate About Pandas
- Python For Data Analysis by Wes McKinney
- 2 Sigma
- R Studio