Delivering Deep Learning Powered Speech Recognition As A Service For Developers At AssemblyAI


August 3rd, 2021

52 mins 20 secs

Your Hosts

About this Episode


Building a software-as-a-service (SaaS) business is a fairly well understood pattern at this point. When the core of the service is a set of machine learning products it introduces a whole new set of challenges. In this episode Dylan Fox shares his experience building Assembly AI as a reliable and affordable option for automatic speech recognition that caters to a developer audience. He discusses the machine learning development and deployment processes that his team relies on, the scalability and performance considerations that deep learning models introduce, and the user experience design that goes into building for a developer audience. This is a fascinating conversation about a unique cross-section of considerations and how Dylan and his team are building an impressive and useful service.


  • Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Your host as usual is Tobias Macey and today I’m interviewing Dylan Fox about AssemblyAI, a powerful and easy to use speech recognition API designed for developers


  • Introductions
  • How did you get introduced to Python?
  • Can you describe what Assembly AI is and the story behind it?
  • Speech recognition is a service that is being added to every cloud platform, video service, and podcast product. What do you see as the motivating factors for the current growth in this industry?
    • How would you characterize your overall position in the market?
  • What are the core goals that you are focused on with AssemblyAI?
  • Can you describe the different ways that you are using Python across the company?
  • How is the AssemblyAI platform architected?
    • What are the complexities that you have to work around to maintain high uptime for an API powered by a deep learning model?
    • What are the scaling challenges that crop up, whether on the training or serving?
  • What are the axes for improvement for a speech recognition model?
    • How do you balance tradeoffs of speed and accuracy as you iterate on the model?
  • What is your process for managing the deep learning workflow?
  • How do you manage CI/CD for your deep learning models?
  • What are the open areas of research in speech recognition?
  • What are the most interesting, innovative, or unexpected ways that you have seen AssemblyAI used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on AssemblyAI?
  • When is AssemblyAI the wrong choice?
  • What do you have planned for the future of AssemblyAI?

Keep In Touch


Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at


The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA