As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of SpaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry.
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!
One of the frustrating things about being a developer, is dealing with errors… (sigh)
- Relying on users to report errors
- Digging thru log files trying to debug issues
- A million alerts flooding your inbox ruining your day…
We have a special offer for Podcast.__init__ listeners. Go to rollbar.com/podcastinit, signup, and get the Bootstrap Plan free for 90 days. That’s 300,000 errors tracked for free.
Loved by developers at awesome companies like Heroku, Twilio, Kayak, Instacart, Zendesk, Twitch and more. Help support Podcast.__init__ and give Rollbar a try today. Go to rollbar.com/podcastinit
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
- You’ll want to make sure that your users don’t have to put up with bugs, so you should use Rollbar for tracking and aggregating your application errors to find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.
- Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
- To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
- Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
- Your host as usual is Tobias Macey and today I’m interviewing Matthew Honnibal about SpaCy and Explosion.AI
Interview with Matthew Honnibal
- How did you get introduced to Python?
- Can you start by sharing what SpaCy is and what problem you were trying to solve when you created it?
- Another project for natural language processing that has been part of the Python ecosystem for a number of years is the Natural Language Tool Kit (NLTK). How does SpaCy differ from the NLTK and are there any cases where that would be the better choice?
- How much knowledge of NLP and computational linguistics is necessary to be able to use SpaCy?
- What does the internal design and architecture of SpaCy look like and what are the biggest challenges associated with its development to date and into the future?
- One of the projects that you have built around SpaCy which I think is really cool and caught my attention when I first found your project is the displaCy visualization tool. Can you explain what that is and why you think it is important?
- What are some kinds of applications where SpaCy would be useful which might not be obvious candidates for it?
- Why is speed such an important focus for an NLP library?
- One of the ways that you have been able to gain a speed boost is through releasing the GIL and allowing for true parallelism via Cython. How have you managed to ensure that this doesn’t lead to data races and program failures?
- Building on the success of SpaCy you founded a company called Explosion AI. Can you explain what your goals are for this endeavor and the kinds of services that you are offering?
- What are some of the most interesting uses of SpaCy that you have seen?
- What do you have planned for the future of SpaCy?