Gnocchi: A Scalable Time Series Database For Your Metrics with Julien Danjou - Episode 189

Summary

Do you know what your servers are doing? If you have a metrics system in place then the answer should be “yes”. One critical aspect of that platform is the timeseries database that allows you to store, aggregate, analyze, and query the various signals generated by your software and hardware. As the size and complexity of your systems scale, so does the volume of data that you need to manage which can put a strain on your metrics stack. Julien Danjou built Gnocchi during his time on the OpenStack project to provide a time oriented data store that would scale horizontally and still provide fast queries. In this episode he explains how the project got started, how it works, how it compares to the other options on the market, and how you can start using it today to get better visibility into your operations.

linode-banner-sponsor-largeDo you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2019 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.


Clubhouse LogoThis episode of Podcast.__init__ is brought to you by Clubhouse, the first project management platform for software development that brings everyone together so that teams can focus on what matters – creating products their customers love. Clubhouse provides the perfect balance of simplicity and structure for better cross-functional collaboration. Its fast, intuitive interface makes it easy for people on any team to focus-in on their work on a specific task or project, while also being able to “zoom out” to see how that work is contributing towards the bigger picture. With a simple API and robust set of integrations, Clubhouse also seamlessly integrates with the tools you use everyday, getting out of your way so that you can deliver quality software on time.

Listeners of Podcast.__init__ can sign up for two free months of Clubhouse by visiting pythonpodcast.com/clubhouse.



Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Julien Danjou about Gnocchi, an open source time series database built to handle large volumes of system metrics

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Gnocchi is and how the project got started?
    • What was the motivation for moving Gnocchi out of the Openstack organization and into its own top level project?
  • The space of time series databases and metrics as a service platforms are both fairly crowded. What are the unique features of Gnocchi that would lead someone to deploy it in place of other options?
    • What are some of the tools and platforms that are popular today which hadn’t yet gained visibility when you first began working on Gnocchi?
  • How is Gnocchi architected?
    • How has the design changed since you first started working on it?
    • What was the motivation for implementing it in Python and would you make the same choice today?
  • One of the interesting features of Gnocchi is its support of resource history. Can you describe how that operates and the types of use cases that it enables?
    • Does that factor into the multi-tenant architecture?
  • What are some of the drawbacks of pre-aggregating metrics as they are being written into the storage layer (e.g. loss of fidelity)?
    • Is it possible to maintain the raw measures after they are processed into aggregates?
  • One of the challenging aspects of building a scalable metrics platform is support for high-cardinality data. What sort of labelling and tagging of metrics and measures is available in Gnocchi?
  • For someone who wants to implement Gnocchi for their system metrics, what is involved in deploying, maintaining, and upgrading it?
    • What are the available integration points for extending and customizing Gnocchi?
  • Once metrics have been stored, aggregated, and indexed, what are the options for querying and analyzing the collected data?
  • When is Gnocchi the wrong choice?
  • What do you have planned for the future of Gnocchi?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA