Fast Stream Processing In Python Using Faust with Ask Solem

00:00:00
/
00:28:45

August 26th, 2018

28 mins 45 secs

Your Hosts

About this Episode

Summary

The need to process unbounded and continually streaming sources of data has become increasingly common. One of the popular platforms for implementing this is Kafka along with its streams API. Unfortunately, this requires all of your processing or microservice logic to be implemented in Java, so what’s a poor Python developer to do? If that developer is Ask Solem of Celery fame then the answer is, help to re-implement the streams API in Python. In this episode Ask describes how Faust got started, how it works under the covers, and how you can start using it today to process your fast moving data in easy to understand Python code. He also discusses ways in which Faust might be able to replace your Celery workers, and all of the pieces that you can replace with your own plugins.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Join the community in the new Zulip chat workspace at podcastinit.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Ask Solem about Faust, a library for building high performance, high throughput streaming systems in Python

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is Faust and what was your motivation for building it?
    • What were the initial project requirements that led you to use Kafka as the primary infrastructure component for Faust?


  • Can you describe the architecture for Faust and how it has changed from when you first started writing it?

    • What mechanism does Faust use for managing consensus and failover among instances that are working on the same stream partition?


  • What are some of the lessons that you learned while building Celery that were most useful to you when designing Faust?

  • What have you found to be the most common areas of confusion for people who are just starting to build an application on top of Faust?

  • What has been the most interesting/unexpected/difficult aspects of building and maintaining Faust?

  • What have you found to be the most challenging aspects of building streaming applications?

  • What was the reason for releasing Faust as an open source project rather than keeping it internal to Robinhood?

  • What would be involved in adding support for alternate queue or stream implementations?

  • What do you have planned for the future of Faust?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA