Exploring The SpeechBrain Toolkit For Speech Processing

The podcast about Python and the people who make it great

14 July 2021

Exploring The SpeechBrain Toolkit For Speech Processing - E323

0:00/0:00

Share on social media:

Summary

With the rising availability of computation in everyday devices, there has been a corresponding increase in the appetite for voice as the primary interface. To accomodate this desire it is necessary for us to have high quality libraries for being able to process and generate audio data that can make sense of human speech. To facilitate research and industry applications for speech data Mirco Ravanelli and Peter Plantinga are building SpeechBrain. In this episode they explain how it works under the hood, the projects that they are using it for, and how you can get started with it today.

Announcements

Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
Your host as usual is Tobias Macey and today I’m interviewing Mirco Ravanelli and Peter Plantinga about SpeechBrain, an open-source and all-in-one speech toolkit powered by PyTorch

Interview

Introductions
How did you get introduced to Python?
Can you describe what SpeechBrain is and the story behind it?
What are the goals and target use cases of the SpeechBrain project?
What are some of the ways that processing audio with a focus on speech differs from more general audio processing?
What are some of the other libraries/frameworks/services that are available to work with speech data and what are the unique capabilities that SpeechBrain offers?
How is SpeechBrain implemented?
- What was your decision process for determining which framework to build on top of?
- What are some of the original ideas and assumptions that you had for SpeechBrain which have been changed or invalidated as you worked through implementing it?
Can you talk through the workflow of using SpeechBrain?
- What would be involved in developing a system to automate transcription with speaker recognition and diarization?
In the documentation it mentions that SpeechBrain is built to be used for research purposes. What are some of the kinds of research that it is being used for?
What are some of the features or capabilities of SpeechBrain which might be non-obvious that you would like to highlight?
What are the most interesting, innovative, or unexpected ways that you have seen SpeechBrain used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on SpeechBrain?
When is SpeechBrain the wrong choice?
What do you have planned for the future of SpeechBrain?

Keep In Touch

Mirco
- mravanelli on GitHub
- LinkedIn
- @mirco_ravanelli on Twitter
Peter
- pplantinga on GitHub
- @ComPeterScience on Twitter
- Website
- LinkedIn

Picks

Tobias
- x.ai

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Join the community in the new Zulip chat workspace at pythonpodcast.com/chat