Using computers to analyze text can produce useful and inspirational insights. However, when working with multiple languages the capabilities of existing models are severely limited. In order to help overcome this limitation Rami Al-Rfou built Polyglot. In this episode he explains his motivation for creating a natural language processing library with support for a vast array of languages, how it works, and how you can start using it for your own projects. He also discusses current research on multi-lingual text analytics, how he plans to improve Polyglot in the future, and how it fits in the Python ecosystem.
This episode of Podcast.__init__ is brought to you by Clubhouse, the first project management platform for software development that brings everyone together so that teams can focus on what matters – creating products their customers love. Clubhouse provides the perfect balance of simplicity and structure for better cross-functional collaboration. Its fast, intuitive interface makes it easy for people on any team to focus-in on their work on a specific task or project, while also being able to “zoom out” to see how that work is contributing towards the bigger picture. With a simple API and robust set of integrations, Clubhouse also seamlessly integrates with the tools you use everyday, getting out of your way so that you can deliver quality software on time.
Listeners of Podcast.__init__ can sign up for two free months of Clubhouse by visiting pythonpodcast.com/clubhouse.
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
- And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email email@example.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Rami Al-Rfou about Polyglot, a natural language pipeline with support for an impressive amount of languages
- How did you get introduced to Python?
- Can you start by describing what Polyglot is and your reasons for starting the project?
- What are the types of use cases that Polyglot enables which would be impractical with something such as NLTK or SpaCy?
- A majority of NLP libraries have a limited set of languages that they support. What is involved in adding support for a given language to a natural language tool?
- What is involved in adding a new language to Polyglot?
- Which families of languages are the most challenging to support?
- What types of operations are supported and how consistently are they supported across languages?
- How is Polyglot implemented?
- Is there any capacity for integrating Polyglot with other tools such as SpaCy or Gensim?
- How much domain knowledge is required to be able to effectively use Polyglot within an application?
- What are some of the most interesting or unique uses of Polyglot that you have seen?
- What have been some of the most complex or challenging aspects of building Polyglot?
- What do you have planned for the future of Polyglot?
- What are some areas of NLP research that you are excited for?
Keep In Touch
- The Wizard and the Prophet: Two Remarkable Scientists and Their Dueling Visions to Shape Tomorrow’s World by Charles C. Mann
- NLP (Natural Language Processing)
- Stony Brook University
- Sentiment Analysis
- Assembly Language
- Stack Overflow
- Deep Learning
- Word Embedding
- NLTK (Python Natural Language Toolkit)
- Transfer Learning
- Read The Docs
- BERT (Bidirectional Encoder Representations from Transformers)
- Quilt package management for data