A majority of the work that we do as programmers involves data manipulation in some manner. This can range from large scale collection, aggregation, and statistical analysis across distrbuted systems, or it can be as simple as making a graph in a spreadsheet. In the middle of that range is the general task of ETL (Extract, Transform, and Load) which has its own range of scale. In this episode Romain Dorgueil discusses his experiences building ETL systems and the problems that he routinely encountered that led him to creating Bonobo, a lightweight, easy to use toolkit for data processing in Python 3. He also explains how the system works under the hood, how you can use it for your projects, and what he has planned for the future.
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $60 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!
With GoCD’s comprehensive pipeline modeling, you can model complex workflows for multiple teams with ease. And GoCD’s Value Stream Map lets you track a change from commit to deploy at a glance.
GoCD’s real power is in the visibility it provides over your end-to-end workflow. So you get complete control of and visibility into your deployments, across multiple teams.
Say goodbye to deployment panic and hello to consistent, predictable deliveries.
To learn more about GoCD, visit gocd.org for a free download. Professional Support and enterprise add-ons, including disaster recovery, are available.
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
- If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email firstname.lastname@example.org)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Your host as usual is Tobias Macey and today I’m interviewing Romain Dorgueil about Bonobo, a data processing toolkit for modern Python
- How did you get introduced to Python?
- What is Bonobo and what was your motivation for creating it?
- What is the story behind the name?
- How does Bonobo differ from projects such as Luigi or Airflow?
[RD] After I explain why that’s totally different things, maybe a good follow up would be to ask about differences from other data streaming solutions, like Apache Beam or Spark.
- How is Bonobo implemented and how has its architecture evolved since you began working on it?
- What have been some of the most challenging aspects of building and maintaining Bonobo?
- What are some extensions that you would like to have but don’t have the time to implement?
- What are some of the most interesting or creative uses of Bonobo that you are aware of?
- What do you have planned for the future of Bonobo?
Keep In Touch
- Anaconda Installer
- DAG (Directed Acyclic Graph)
- Data Engineering Podcast
- Dask Interview
- IFTTT (If This Then That)