Adding Observability To Your Python Applications With OpenTelemetry - Episode 268

Summary

Once you release an application into production it can be difficult to understand all of the ways that it is interacting with the systems that it integrates with. The OpenTracing project and its accompanying ecosystem of technologies aims to make observability of your systems more accessible. In this episode Austin Parker and Alex Boten explain how the correlation of tracing and metrics collection improves visibility of how your software is behaving, how you can use the Python SDK to automatically instrument your applications, and their vision for the future of observability as the OpenTelemetry standard gains broader adoption.

Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!



Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • Your host as usual is Tobias Macey and today I’m interviewing Austin Parker and Alex Boten about the OpenTelemetry project and its efforts to standardize the collection and analysis of observability data for your applications

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what OpenTelemetry is and some of the story behind it?
  • How do you define observability and in what ways is it separate from the "traditional" approach to monitoring?
  • What are the goals of the OpenTelemetry project?
  • For someone who wants to begin using OpenTelemetry clients in their Python application, what is the process of integrating it into their application?
  • How does the definition and adoption of a cross-language standard for telemetry data benefit the broader software community?
    • How do you avoid the trap of limiting the whole ecosystem to the lowest common denominator?
  • What types of information are you focused on collecting and analyzing to gain insights into the behavior of applications and systems?
    • What are some of the challenges that are commonly faced in interpreting the collected data?
  • With so many implementations of the specification, how are you addressing issues of feature parity?
  • For the Python SDK, how is it implemented?
    • What are some of the initial designs or assumptions that have had to be revised or reconsidered as it gains adoption?
  • What is your approach to integration with the broader ecosystem of tools and frameworks in the Python community?
  • What are some of the interesting or unexpected challenges that you have faced or lessons that you have learned while working on instrumentation of Python projects?
  • Once an application is instrumented, what are the options for delivering and storing the collected data?
  • What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on and with the OpenTelemetry ecosystem?
  • What are some of the most interesting, innovative, or unexpected ways that you have seen components in the OpenTelemetry ecosystem used?
  • When is OpenTelemetry the wrong choice?
  • What is in store for the future of the OpenTelemetry project?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...
Tobias Macey
0:00:13
Hello, and welcome to podcast ordinate, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project to hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested linode platform including simple pricing, node balancers, 40 gigabit networking dedicated CPU and GPU instances s3 compatible object storage and worldwide data centers. Go to Python podcast.com slash linode. That's l i n o d today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host As usual is Tobias Macey and today I'm interviewing Austin Parker and Alex Boten about the open telemetry project and its efforts to standardize the collection and analysis of observability data for your applications. So Austin, can you start by introducing yourself?
Austin Parker
0:01:13
Sure thing. Hi, I'm Austin Parker. I'm a principal developer advocate at light step. And I'm a maintainer on the open telemetry project.
Tobias Macey
0:01:21
And Alex, how about you?
Alex Boten
0:01:22
Yeah, sure. Hi, I'm Alex Boten. I'm an open source software engineer at lighstep, and I'm a contributor to the open telemetry project, as well as one of the maintainer is on the open telemetry Python project.
Tobias Macey
0:01:33
And so going back to you, Austin, you remember how you first get introduced to Python?
Austin Parker
0:01:37
Oh,
0:01:38
actually, I think probably the first time I ever use Python was way back in college. I feel like that's, I want to say you knew my, you know, intro to programming type of thing. Before that I done a lot with just general scripting things like Apple script on Mac, and other forms of like bash scripting and shell scripting. Python was kind of new to me. And I really enjoyed it. Actually, I still love to work in Python.
Tobias Macey
0:02:05
And Alex, how about you?
Alex Boten
0:02:06
Yeah. So I guess I remember the first time I heard about Python was people complaining from the Java world about the white spacing that Python required. But, you know, at the time, I hadn't really used it. And I was really introduced to it about six years ago, when I joined a team that was building a platform as a service and API and the CLR. Were all written in Python on top of Docker. And so basically, I jumped in, I just needed what I needed to learn a team move forward using Python, two, seven at the time, I think.
Tobias Macey
0:02:36
And in terms of the open telemetry project, can you each give a bit of background as to how you each get involved with it, and maybe describe a bit about what the project is and some of the story behind it?
Austin Parker
0:02:47
Sure. I guess I'll start off so I actually came into open telemetry from one of its predecessors open tracing, the short version of the story is and I must say, 2016 2017 or so open, tracing came out, has an open source project. And the goal was really to provide a standard unified API for distributed tracing across multiple different languages. So everything from Java to Python, C sharp, and so forth. And this, you know, the underlying idea here was that distribute tracing was very useful. But often you people would apply it in sort of a polyglot environment where you had some services running in node and some in Python and some in Java, and you want to be able to for useful for distributed tracing specifically to be useful. You really want it to be able to kind of go through the entire request, you know, nose to tail as it were. So you need some standard idioms, you need some standard, you know, think about it as adjectives and verbs, right? You need these standard things between your different languages. So open tracing was designed to fill that hole in the ecosystem. And it was an API that had to be re implemented by different vendors. So you'd have a yahoo implementation or zipkin implementation, but your actual tracing code was independent. So I could use the same tracing code with any number of vendors. And it was great. Now in reality, maybe it didn't work out quite as well as we had hoped. 2018 or so rolls around. We said, 2018 I want to say you saw open census kind of arrive as a competitor to open tracing and a lot of ways and open census was a Google project Microsoft also joined in, and it had a similar sort of concepts like, Hey, we want a bit we as tracing, we want to be polyglot. We but we want to provide sort of both the API and the SDK, and then let you have you know, have a pluggable sort of export model. These two projects coexisted for a while, but it was causing confusion in like the broader open source community. People weren't sure which to use. You know, I had open source library authors come and say like, well, I Customers or I have users that want to use tracing. But which one of these two things should I pick? And ultimately, I think what happened really is everyone sort of looked at the situation and said, Hey, this is definitely a time where we can be better together, right? We don't need these two competing standards. It's driving down adoption, rather than driving it up. So let's kind of join forces. Let's do a best of both worlds approach. And out of that you was born open telemetry.
Tobias Macey
0:05:30
And Alex, how did you get involved in the open telemetry project?
Alex Boten
0:05:34
Yeah, so I actually came came out of it from the other side, I was I was a user of both open sets and set open tracing. So I was I was reading the middle of all that confusion about not maybe a year and a half, two years ago now and I was introduced to the open telemetry projects, much like everybody else on the on the user community around the announcement that was done about a year, year and a few months ago now and I really joined the project. As a contributor when I joined lights up about seven or eight months ago now, and that was my first real introduction to the project.
Tobias Macey
0:06:08
And so the tagline of open telemetry is that it's trying to help make observability data easier to collect and access. And before we get too much further into the specifics of how it does that, I'm wondering if you can give a bit of a definition as to how you think about observability. And in what ways it's separate from the quote unquote, traditional approach to monitoring where you're just collecting different metrics and shipping it off to some host for being able to aggregate them there.
Austin Parker
0:06:36
Yeah, that's such a great question. I think if you kind of look around, there's a few different definitions of observability. But I like to talk about it and think about it in a pretty simple way, which is observability is about understanding and it's about having the ability to understand your system, your systems dependencies, it's about being able to understand not only sort of the aggregate behavior of a system You know, in production not isolated and not in some sort of synthetic way, but actually to look at traffic as it goes through your system, look down at the individual requests level if you need to, to figure out, you know, what is wrong with someone, you know, this one request, but also to be able to kind of pull back that 30,000 foot view and understand the entire thing, how all the pieces fit together. And having both that very coarse and fine visibility into what's going on means that you can do a lot of interesting things that you can't do with traditional monitoring. There's the sort of common, you know, unknown unknowns theory, right? Where, when you're building a system, when you're running a system, even there's you don't know, it's not the things that you know about, it's not the metrics that you maybe you're collecting and that you think you care about at the beginning. And it's not the things that you started to discover the things that are gonna bite you are the things that you don't even know about, that you should know about. Right? So the whole principle of observability is it kind of gets down to this instrument. pation level, which is where open telemetry really plays is you need to have a, you know, an SDK that can help you effortlessly collect a lot of different data points and has a pretty deep integration into your underlying frameworks and tools, and then ship that off somewhere, to a system that is capable of asking these sort of arbitrary questions.
Alex Boten
0:08:22
I think often you, you hit the nail on the head there. You know, observability is really about understanding. When I think of traditional monitoring, it's always been about putting graphs on a dashboard and kind of watching for, you know, something to change on those graphs. And although those graphs and dashboards definitely appeal, it plays a role in observability, they tend to only really give you part of that picture. And too often they're kind of a result or an afterthought, or, you know, like, we have an outage with metrics than we have on our dashboard. While we're better, you know, go ahead and add it for a year. But when I think of observability, I think of being able to really dig to the bottom of the behavior of a system while while code is actually running that system and being able to answer questions like You know is the code doing what we expect it to if users hitting an issue will be able to detect it, if an anomaly occurs is like, is it possible for me to dig into the cause of that anomaly right then and there? Or do I just have to wait for that anomaly to occur again, and I think observability is changing how software is being built by thinking about observing a system, rather than using it as an afterthought.
Tobias Macey
0:09:25
And as you mentioned, the need to retro actively go in and add new metrics collection points to try and understand what happens in an anomalous situation rather than being able to automatically capture the necessary relevant context information is the real game changer there. And I'm wondering what you see as being some of the biggest challenges in enabling people to capture that necessary information and do it in a cross cutting way where it's beyond just the specifics of a single logline or a particular counter that you're incrementing When a particular function happens, or certain timers that are being collected, and just some of the overall goals of the open telemetry project in how it's going to help people achieve that sort of holy grail of observability
Austin Parker
0:10:12
Sure, you know, I think you raise an interesting point there. And I've kind of turn it not turning around. But why do people not do this already? Right? There's a pretty pervasive view, I think, in the developer community that things like distributed tracing things like observability are only for like big companies, you know, your Googles and your Facebook's and your Microsoft's where you have just some eye popping them out of services and dependent services and different line of business things and, you know, unmanageable complexity, right. And I think that that is really that's an opinion that is kind of brought forth by an unwillingness to kind of grapple with the goals of our existing tools, because far too often, no matter how many fancy dashboards you may know How many, you know, cool metric data points you put together? I think Alex had the right of it. A lot of times, it's just like we're making this dashboard to make this dashboard because this dashboard is sort of proof of life. This dashboard is the thing we can point to when someone says, Well, is it is it up? So I think one of the goals of open telemetry is to really help change this narrative. And I think we do that by using the fact that open telemetry is so widely supported. It's has extremely broad support from, you know, many, many, many organizations, people you've heard of, like Microsoft, and Google and Amazon, a huge variety of you know, monitoring and observability tool vendors, also open source vendor, you know, open source maintainers people that are creating things like Jaeger, or Prometheus, right? So because it has this broad support, that means that we can sort of push the point of integration away from the individual Dev and maybe go down a step or two You know, I think it's a pretty common thought when you're trying to monitor abstraction where you're trying to observe an abstraction, you really want to go one step below it. So if I'm trying to understand the behavior of my application, you know, a good way to monitor that is to kind of go to the application container, right is to go to the runtime and look at certain metrics and measurements there. With open telemetry, you can have a similar sort of behavior, where I would say a goal is to integrate it into things like Kubernetes, things like spring on the Java side, or flask on the Python side. So you're actually getting a more holistic view of what's going on very, very easily without having to spend a bunch of time doing some manual instrumentation code.
Tobias Macey
0:12:42
And in terms of the actual process of instrumentation, I know that one of the use cases for the SDK is to automate some of that setup and be able to out of the box to start collecting useful information without requiring a lot of development effort up front but then still being able to Provide the option of adding additional collection points for different metrics and traces. And I'm wondering what the developer workflow looks like for actually being able to get set up with collecting those metrics and then being able to perform some useful analysis on them.
Alex Boten
0:13:16
Yeah, so I guess there's kind of, there's kind of two ways of thinking about instrumenting an application, at least, at least on the Python side. And I know it's true in a lot of the other things as well, there's kind of what we call the auto instrumentation portion of instrumenting, which in the Python world means that we're basically looking at what libraries are going to be utilized by a certain application. And so if you're, if you go ahead and you install the auto instrumentation package, as well as some of the other instrumentation libraries that we already support, what ends up happening is you run a separate script to wrap your, your Python executable with which is just called Open telemetry instrument. And it basically will go ahead and instrument all the libraries that you're using that have support For open telemetry with like spans and metrics for, for those libraries for you. So that's the auto instrumentation piece, the metal instrumentation piece would require application developers to go ahead and set up their, their SDKs and start spans or start collecting metrics and the different portions of their application that they're interested in. But a big piece of what we're trying to do here is we're trying to make sure that the correlation between the manual and the auto instrumentation all works and all of the context flows through your application, and it also flows between the services along the wire.
Tobias Macey
0:14:36
And I think that that context aspect is the real differentiating factor between just raw metrics of here is a an integer that represents something but you need to have enough internal context or awareness of the code base to understand what it means versus being able to have that context propagate along with the metric so that it's a little bit easier for somebody who doesn't know The system to be able to understand what's actually happening.
Alex Boten
0:15:03
Yeah, absolutely, I think context is is absolutely critical here. Otherwise, all of that data is really just data that's floating in the air
Tobias Macey
0:15:12
and stepping back up to the level of the more broad open telemetry project. And its mission it because of the fact that it is focused on support across a number of different language communities and runtimes. And specifics of libraries and frameworks. I'm wondering how that benefits the overall broader software community and just some of the challenges that you face in avoiding the trap of limiting the entire capability of the ecosystem to the lowest common denominator that exists across those different runtimes.
Austin Parker
0:15:44
I can speak to this a little bit. I think one of the things as like an open tracing maintainer, certainly that we heard from the community was, for lack of a better word people were, let's say displeased with the quote unquote Java ness of the help and tracing API right? And from the jump with open telemetry, there's been a pretty concentrated push to make the look and feel open telemetry feel very native in each language. So that really influenced everything from sort of spec writing where, you know, one of there's been a pretty lengthy, lengthy now project, a kind of cross language compatibility SIG, that is mostly looking at like, you know, what are the words we're using in the specification? Are we making sure that we have kind of all the words we need in the spec to allow for individual languages to implement the spec in a way that feels native for that language. There's also a bit of hewing to kind of whatever the dominant patterns is, right? So I think if you look at outside of the real, raw, primitive things like starting and ending a span, adding attributes or events to a span, a lot of the stuff around metrics, you know, a lot of those The up down counters and things like that. That's one of these very primitive sort of actions, there's a lot more flexibility and how that gets integrated into the native workflow. A good example I would probably point to is what is what's going on in the C sharp SIG. And I know we've used that word a lot for people that don't know, say, give special interest group. That's basically the primary way we're organizing the project around these languages or topics have cigs. So the C sharp SIG is Microsoft's very heavily involved in this project. And so there's actually a push to integrate open telemetry into the dotnet runtime itself and bridge it with sort of the existing, you know, diagnostic information that is already available in dotnet. And so you're seeing kind of this two fork approach where you can use kind of just existing primitives in the dotnet side of the shop that you might be are already using or already familiar with. It's the dotnet developer and then with a simple bit of configuration change those will hook into open telemetry and start emitting open telemetry spans open telemetry metrics that can then be forwarded to some other part of the open telemetry ecosystem, like the collector component in order to be exported elsewhere. I think with Python, you see something somewhat similar and where there's some idioms that are maybe, you know, they tried to make it pretty pythonic, I would say go is very much like go, a lot of it is giving, you know, making sure that each SIG has the ability to kind of change things as they need to, along with a pretty strong top level community that can make sure that hey, we're on the same page. No one's drifted too far from spec or is kind of reinventing the wheel over here in a way that they shouldn't.
Alex Boten
0:18:41
Yeah, and if I can just add a little bit to that. I think the the specification is written in a way that's specific enough around the intent of a particular definition, while leaving enough room for languages interpreted in a way that makes sense at a language. And another thing we've seen that has been pretty successful is around employee mentation over proposals around that we call Oh taps which are open telemetry enhancement proposals, which is basically just the process that we go through before making changes to the spec where you're you're able to propose a change that you want. And one thing that we've seen work well, is to actually have different cigs implement an app as a prototype, just to get an idea for whether or not a particular concept works in the line in the language that is defined in each of the languages that we care we care about. And so that's kind of implemented at the CIC level, by by folks that are working in the language itself.
Austin Parker
0:19:35
Yeah, I think the process has been very helpful for the project and just sort of the requirement to show your work, something that maybe a lot of open source projects could look at, you know, we didn't necessarily get you know, I'm not gonna say we came up with it ourselves. It was Isabel Reto. Meier was actually a very strong proponent of OTA or the OTA process and helped kind of codified originally, but I don't She talked to people that were very involved with creating the caps, right, the Kubernetes enhancing protocols or proposals. I don't know what the P stands for there. But that's where a lot of the inspiration was drawn from. I want to say,
Tobias Macey
0:20:15
yeah, there are a number of different communities that have gone down that path where Python has its peps. I know Django has their own process for that there are a number of other open source communities that are following on that. So it's definitely a good way to bring everybody into the conversation and ensure that you have as diverse of as possible of inputs to make sure that you're not just getting tunnel vision on the waist one thing should be implemented and bringing in the voices that are necessary to make sure that it works for the broader community. And then as far as the actual data that you're collecting for being able to gain some visibility into the systems, we've talked about metrics, we've talked about spans, I know that there's initial support for log collection and some of the ways That those should be formatted. So I'm curious if we can dig a bit more into the specifics of the information that's collected. And then maybe in terms of the Python SDK, some of the available hooks into the runtime to be able to pull that information out and propagate it appropriately.
Alex Boten
0:21:16
So basically, the types of information we're looking at collecting with the initial release of the of the project is really around the implementation and collect collection of distributed traces and metrics. That's, that's kind of the initial goal. And basically, I mean, metrics could be anything from you know, your your memory consumption, CPU, or request timing, or whatever it is that you you're care about in your application. And then on the distributed traces. Basically, what we're worried about there is collecting traces and spans that are distributed across your different services.
Austin Parker
0:21:55
One of the things I think is really helpful when you think about distributed traces. Is that in a lot of ways, they're really just semantic. They're structured logs, right? They're structured logs with context. And one of the things that we've done, you know, open telemetry is we've tried to sort of extend the span model a little bit and add a lot more semantic meaning to different fields, which is something that I don't believe is going to be like, this is not a, you know, it's going to revolutionize your life tomorrow. But a couple years now, I think being able to go and say like, okay, is a small example, the status field on a span and open telemetry supports the G RPC status codes, right. So there's a lot of ways actually classify the work that happened under a span in a semantically meaningful way something the difference between For example, this failed due to a timeout versus this fail, you know, the difference between a 400 or 400 for 405, right, but also things like, you know, this was a context in other contexts for this request was canceled. So as analysis systems sort of catch up to open telemetry and start to implement the ability to use this data to kind of derive interesting statistical information about your system, I think you'll see a lot of interesting innovation happen in terms of building tools that can sort of really understand what's going on because I've one of the sort of last mile problem exists open telemetry as it does with most monitoring observability things. You have all this data great, but you still have to have someone that actually understands what the data represents in order to make heads or tails of it. But by focusing a lot on sort of having this semantic these semantically accurate spans, I think we'll be able to build better analysis tools in the future
Alex Boten
0:23:47
to get back to the questions specifically around Python. So for the the open telemetry Python project, or we're really focusing on is just wrapping up the work around implementing the spec around metrics and today the The implementation for tracing as already is already in beta. And we're already, I think, pretty close to being done. Yeah. So between the tracing permutation and the metrics implementation, that's kind of the, the interfaces that you'll want to use from from Python.
Tobias Macey
0:24:16
And I know, too, that there is some initial work on being able to standardize on the formatting of log data so that it can be correlated with the spans and metrics that are being collected. I'm wondering how you are approaching that in the Python ecosystem, whether you're just working on using the built in logging capabilities, or leaning on something like the struct blog project to be able to provide that structure out of the box and just define the specifics of it for the tracing capabilities or what your thoughts are on that regard.
Alex Boten
0:24:48
Yeah. So as the logging capability is, is fairly new and the conversation we haven't as a as a project, started looking at how we're going Intel implemented on the open telemetry Python project yet, but I would suspect that we would want to lean on, you know, as much of the standard library as we can, wherever we can
Austin Parker
0:25:11
sort of add some more color on the logging question. I think. I know logging is incubating. It's still kind of in a very fluid process right now. But I think it's accurate to say that there's really no appetite in the project to create kind of a new logging API, right? There's a plethora of battle hardened and tested, and very good logging libraries out there. You know, and I don't really think we're in the interest competing with them. I believe the, at a really high level, I think it's actually what you said earlier was maybe the best way to describe it. Open telemetry has, you know, the semantic concept of linking different forms of telemetry together. So you could blink A measurement from a metric to the span that occur while that measurement was being collected, for example, or while that measurement was being measured, and I feel like some sort of adapter into, you know, a logging adapter that allows those logs to be context sensitive would be one potential implementation result here, I think another might be, you could even maybe imagine something where if you have, you know, logs that are already in a file format, and they have some sort of correlation information, or, you know, they they're getting correlation information through open telemetry, auto instrumentation, and that's going in and that's editing your law, you know, that's adding in the correlation IDs identifiers, then like a file beat style processor that is scraping those and then sending them off somewhere and adding in sort of the link through like the open telemetry collector, for example. But this is also like, I think this is where the thinking is more so than the more the reality is, I think the logging stuff is probably, you know, I don't know if it's going to be beta by the time the rest of the project is in GA, I would maybe expect it to be more of an an alpha.
Tobias Macey
0:27:21
And then for being able to ensure that you can use open telemetry across different languages and across different service boundaries, how are you addressing the question of feature parity within different SDKs? And how to signal the feature completeness of a given implementation so that people can effectively evaluate them as they're building out and designing their systems and trying to gain useful observability metrics?
Austin Parker
0:27:48
Yeah, that's a good question. I can talk about it kind of in the high in the in the high level. But at a high level, you can think of there being Three sort of big moving parts. So there's the API. There's the SDK. And then there's Oh TLP, which is the wire format for actually representing the trade symmetric data. So the API once once we have API, that like a one Dotto, and I would even suggest like right now the API is pretty close to one Dotto. But once we have that, then that API will most likely change very slowly, the SDK, you would expect the can move a little more quickly under that. But I would also suggest that the OTP the wire protocol will also probably changed slowly. So one of the goals open telemetry was to make sure the API and SDK were decoupled. So as long as you know, you're using the same API level on both sides, you could actually swap out individual components if you need to, assuming that you aren't violating that you aren't breaking the API. And then since you can sort of independent upgrade the SDK components with OTP is sort of the stop of the fallback now as a fallback that the default way to export data, you can kind of do whatever you want in the middle in the middle can move very quickly. But it needs to start out in a pretty slow way. And it needs to end in a pretty slow way. So you're not going to be able to, you know, you could definitely see extensions or little side projects popping up that add some valuable feature that would need to be marshaled into working through these things API or, and then being exported through the existing OTP format. To get back to the issue around feature parity. I think the process we've taken to address
Alex Boten
0:29:42
feature parity across different languages is to basically create talking issues for any spec changes and address them as they come in on a site by site basis. So for each language basically, for example, before the beta was released, there was a list put together by the technical committee for open telemetry of all the features that we have We knew we needed to implement for the beta to be complete. And so basically it was up to each sec to then go off and ensure that the list was implemented within their own language. And anything that wasn't implemented, we would just go ahead and create an issue for and track it that way.
Austin Parker
0:30:17
For the most part, it's actually maybe less of a huge concern, because the real basic primitives are things like context propagation, we're defaulting to W three c trace context specification. And I know there's like a bunch of other stuff, sort of upcoming from trace context. But the primitives are really well defined at this point. And you could even you know, as long as you're using the same sort of kind of propagation everywhere, then every like the individual hops Tell me as much like let's say I had five different services. You know, as long as they're all using the same context Prop, then I'll get an unbroken trace, right, I'll get an unbroken set of tags and my metrics because that's all being sent around in a way everyone understands. And there should be, you know, reasonable fallback. So, you know, if something new comes in, it goes into an additional field and service B has been updated and understands it and the rest of them haven't, then you would obviously have like, oh, service B is maybe something new is happening here. Right, that isn't happening everywhere else. But the basic functionality of tracing hasn't broken for you, because you updated
Tobias Macey
0:31:30
and digging more into the specifics of Python and the SDK implementation of it. Can you discuss a bit about how that is actually implemented and some of the initial design decisions or assumptions that were made early on that had to be revised or reconsidered as you get further along in the implementation of it and the overall adoption of its use?
Alex Boten
0:31:53
Yeah, I guess. I can't think of too many assumptions that have been changed. Since we have started seeing adoption, just add, I think we might, we might still be a little bit too early in the project, or any assumptions that were made and change might be predating my time on the project. But I guess one of the more recent changes that occurred was around the propagation of context where initially, we didn't have a context API, what we call the context API and open telemetry. And so we did have to go to go back and and find a way to separate out that context API from the the implementation that we were using, and so that that was a fair amount of work. But I can't think of other ways that we really had to rework a bunch of the assumptions that we've made originally.
Austin Parker
0:32:47
Actually, I can't either I feel like a middlee I kind of most of my work is at the community level broadly. So I don't necessarily know about all the individual cigs but pythons always felt like it's pretty On the rails, I think some of that is actually just a credit to sort of the people in the SIG, obviously, but also Python just at a pretty basic level, it has all the features you really need. So you don't have to do you know, here's a good example, I should say, I guess I can say, in process context propagation. So if I have a process, and then I have a span, and then I have a function that I want to trace independently, or I want to create a span for independently, and I'm using multi threading, you know, multi threading of some sort, then I need some thing to sort of Marshal like, okay, what's the act of span at any given point, and Python really just has all the tools you need to kind of code that out of the box, you don't have to deal with like manually passing stuff around, like you can go and you don't have the sort of wonky support store you have around that like in JavaScript, since there's no threads. There's different ways to kind of do this. So for the most part, Python is just chugged along very merry While other cigs were trying to kind of glom everything together Python, and I think it's also partially just like pythons pretty popular, you know. So there's already been, you know, open tracing and open census both had very good Python support. And so, you know, there hasn't really been a lot of like a, we need to, like invent this thing. It's more like, Well, we've done this in the past. So we'll do it, you know, this way, but we'll make it more performant or whatever.
Tobias Macey
0:34:25
And then for being able to gain compatibility and visibility into the broader ecosystem, I know that there are specific libraries for things like Django or flask to auto instrument, the peculiarities of those frameworks. But what is the overall strategy or approach for being able to gain broader adoption within the overall ecosystem of web or data or, you know, just network systems, things like that?
Alex Boten
0:34:54
That's a really good question. So you know, we started out by providing instrumentation for a library is that No, we don't we know are fairly popular framework. So like you mentioned the Django the flask request. And what we started seeing is that there's there's interest from contributors and maintainer as of other frameworks to build instrumentation libraries that are compatible with open telemetry for their own projects and add to the list, which is, which is actually really exciting. Because, you know, those folks know the tools and libraries inside and out. And as far as you know, a scaling, scaling strategy, it's a lot more scalable to have folks that are involved in those projects, directly contributing instrumentation than to have someone from the open telemetry project go out and, you know, learn everything there is to know about any particular libraries. So I think, you know, hopefully, that's that's kind of where we're going to be driving as open telemetry gains popularity. And, you know, one thing that we've seen in, for example, in the dotnet world, which I think Austin alluded to earlier, was that there's plans to adopt open telemetry in the light Which itself, which, you know, is kind of the best possible solution for for adoption of any particular standard.
Austin Parker
0:36:08
Yeah, I think as a strategy, really our goal as a project is to provide sort of, you know, stable interfaces, good quality SDK design, and a, you know, a support plan, right. Like, we need to be good shepherds and stewards of what we've built and give people the confidence that they can go and they can do these integrations. Right. And that we will be able to support them through that by, you know, not boiling the sea every two months, I think beyond, you know, into this broader ecosystem, like you said, approaching and thinking about how do we integrate this into, you know, more services more, you know, you think like managed resources, things like cloud providers, There SDKs or make this available through API's, you know, if you, like, let's say you use the database API or graph qL, you know, library to query something, you know, maybe getting the person that serves that API to also have open telemetry traces, and then be able to independently sort of return those to you so that you could actually inspect the performance of your, you know, your requests as it goes across the land into some other system. And then same thing with like, you know, maybe you're running your own database, right? Like, is there a way that we can get open telemetry into MySQL or into Postgres or into Mongo? But I think that goes back to you know, the way that will happen is by running the project well and having it be stable, and have it have good release management and, you know, make it something other people can build on.
Alex Boten
0:37:57
And, and just one one thing throw out there, I think it's also Important to make, make it as easy to to use the libraries and the API's as possible for anyone who wants to provide that instrumentation so that they don't have to spend, you know, a tremendous amount of time just learning how to use a particular tool to be able to, you know, provide benefits of using that tool for their users. And so I think one of the things that we're spending some time on in the Python SIG is around providing, like an interface for anyone who wants to provide all auto instrumentation for their library through deep like bass instruments or class and also providing, you know, like the examples for how some of the other instrumentations have have gone through and actually been instrumented.
Tobias Macey
0:38:42
And then once you have instrumented your application, you're generating all of these data objects and context for being able to understand what's happening in your system, you still need to have somewhere to send it to and perform analysis on it. So what are some of the existing options for that? And what are some of the ones that are upcoming that you are keeping an eye on?
Austin Parker
0:39:04
Yeah, that's a great question. So, one of the things that I think that opened some entry really dramatically simplifies is the deployment of, you know, telemetry through tools like the open 20 collector and through you know, kind of a native format like hotel P. So what we're seeing is more organizations start to adopt open telemetry protocols kind of their can native ingest format. I know like a light step. We actually don't know if we've like publicly announced it yet, but it's, it's there, so I'll talk about it. But we'll be accepting ot LP formatted data to our SAS back end, very soon, like the work is done. And I think it's actually in public right now. But we haven't updated Doc's yet. So but in a sort of bigger picture, I think having the open telemetry collector be this vendor neutral way to you know, aggregate traces and metric data from multiple different sources and then, you know, export them to a variety of back end systems. You know, some of which will be, you know, open source like Jaeger has already come out and kind of pledged adoption to or said like, Hey, we're gonna switch over to using the open telemetry collector instead of our own. I would expect other sort of open source projects to, you know, analysis tools to start following. You also have the ability to write your own exporter. And we've seen a lot of adoption in that. So companies like Google, Microsoft, Splunk, yeah. honey comb. Who else is probably a few I'm forgetting data dog. As they did, I don't know. Have they done a export yet? Yep. And there's an
0:40:49
exporter.
0:40:50
But either way, the collector makes it very easy for you to sort of to really have a separation of concerns between the people that are maybe integrating open telemetry into their code base and then the people that They're sort of responsible for collecting all that data and sending it somewhere, you know, instead of having to redeploy your application for a config change, you can redeploy your collector and say like, okay, I want to send this to some other place now. So that's very cool. And then because the collector is also a completely open source, you can transform that data however you like. So one of the things that I'm kind of keeping an eye on I've been talking about is like, Hey, you could just write, you could convert your traces, to analytic events, right? And then you could send them to an analytics provider, or you can turn them into JSON and do whatever with them, right? You can put them in a big data thing and do all sorts of fun queries. I think it's going to enable a lot of interesting tools that we haven't seen yet things like using traces as a part of, you know, automated testing to validate application behavior or logic flows, right. You can sort of imagine situation where Maybe in test, I have a ton of instrumentation about what's going on in my application. And maybe I turned some of that off production or whatever. But being able to sort of take all that out as a dump and then just do a diff against the last time you run it ran it to see what's changed, like, am I calling someone else this new external API? Or did I add some? Did I pick up a new service dependency or whatnot? That sort of stuff, I think is really cool.
Tobias Macey
0:42:28
And then in terms of your experience of working with the open telemetry community and working on the specification and some of the SDK implementations, what are some of most interesting or unexpected or challenging lessons that you've learned in the process?
Alex Boten
0:42:42
I think, for me, on the challenging side, I think it's just how hard it is to get collecting telemetry, right? in a way that's both easy to use and useful to people. I think that's that's been really challenging. And I think the most pleasant surprise by far has been just watching so many Different people from so many different organization working together and trying to solve these really hard problems. I think it's been, it's been a really, really great experience. So if anybody's out there looking for an open source project to join, I think this is a great place to start.
Austin Parker
0:43:13
Yeah, I'll echo most of what Alex said. They're all like all of it. But I think we'll there's this maybe probably there's a perception issue, I guess, in the open source community more broadly that you know, that quote unquote, vendor projects aren't good projects or that they're, they're not, you know, reflective of actual community values. And I think in the case of open telemetry, I would I would suggest, this is maybe the counter example where, yes, this is a very, you know, vendor II project. Most people working on it are working on it. I mean, their passion on the subject, but they're also working for companies that are invested in it. Right. That said, I think any kind of project you have, that is so impactful to the bottom lines. These companies, it's been fascinating to see how how much people are just coming together for the good of the project, right? There's not exactly there hasn't really been a lot of bad blood. There's not a lot of like side taking, like, Oh, well, we think this way, because our employer thinks this way. You know, people are genuinely engaging in really good faith. And I think it's brought what is normally a pretty interesting industry together. It's cool to see a bunch of people that's like, Hey, we work for different places, and our companies are competitors. But you know, in here, we're all cool. We're all friends. We're all trying to, you know, solve this very tricky problem in a really good way.
Tobias Macey
0:44:39
Yeah. And I really like the general industry trend that I've been seeing in a lot of different areas of technology of trying to standardize on the interfaces that are used for interoperability so that the different specific implementations of open source or vendor technologies can innovate on the specifics of how they operate and the capabilities that they provide, rather than trying to lock people into their solution because they are not compatible with the import or export options for the other systems that you might want to compose together with them
Alex Boten
0:45:15
100%.
0:45:16
Now, there's nothing worse than having to learn a whole new set of tooling and languages just to like make something work with a particular
Tobias Macey
0:45:23
vendor. And then, in terms of the open telemetry project itself, what are the cases where it's the wrong choice, and somebody might be better suited just using a standard approach to metrics and logging for being able to gain understanding into their system?
Alex Boten
0:45:39
That's a tough one isn't never an option.
Tobias Macey
0:45:42
That's definitely an option.
Austin Parker
0:45:44
I'll be I mean, I think there's probably it's one it's an interesting question, right? Because, yeah, there's obviously times it's the wrong choice, but that's less of a technical consideration than it is sort of a cultural one. I think if you I think I think it's broadly applicable. And it's generally worthwhile to pursue open telemetry, you know, especially if you are in an organization that can kind of handle it and adapt to it. But I think there are broad like, there are wide class applications that maybe don't benefit from the Oh, like, for example, tracing adds overhead, not a ton of overhead, but it does tracing on the front end, especially where you're right now will increase the amount of JavaScript your browser has to download. That can be a problem for people. And if it's a disqualifying problem for you, then Okay, use something else, right. We'll try to fix it, we'll try to make it better. But I'm not out here saying like, Oh, it's perfect. And if you don't use it, you're an idiot. I think a lot of maybe embedded devices or you know, things like that, you're probably not going to get the mileage. If you don't have a distributed system, it may be as a little less useful for you that said, 80% of the time, it's probably if not the right answer. It's at least a good answer. And what I the one thing I would say is if you're listening to this or you're thinking, you know, the time that it's always the right answer is if you're thinking about trying to do it yourself, right? If you've looked at your system and you said, I need distributed tracing, I need the sort of things open telemetry can do, but I want to build it myself then either 100% the right answer to use open telemetry, because I can guarantee you that collectively, we have probably put, like more thought and time into building open telemetry than just like one person can. Right now you don't have to use open 103 with light step. You don't have to use it with data dog. You don't use it with any strict vendor, right? I'm not saying like, Oh, you have to pay us money. I am saying, don't, you know, there's a cost for everything and going your own way is going to hurt more in the long run than anything it saves up front in terms of either time or money.
Alex Boten
0:47:54
Yeah, so I'll echo what what Austin said there. But also I think it's important if you know, if you're Looking at open telemetry and you think, oh, there's, it's not doing the thing that I want for my special case, or whatever it is, I think it's important to get involved with the community and actually bring that case up, because it's likely that other people have also run into the same problems or limitations or whatever it is that's preventing adoption of open telemetry as it is. And so I think it's, it's good to have that conversation and at least try to understand all those use cases, even even if it's not supported today, it doesn't mean that it won't be supported tomorrow. And I think that's, that's something that we're always looking forward to so
Tobias Macey
0:48:29
and so as you look to the next steps in the near to medium future of open telemetry, what do you have in store for it as far as plans and feature capabilities? And what are the areas of contribution that are most vital as you continue down the path of completing the general availability roadmap and moving beyond that?
Austin Parker
0:48:52
I can speak from my my side of things, you know, at a big picture, I think more integration More auto instrumentation, really trying to make it easy and fast for people to onboard and start using it and getting value from it. That's what I see, you know, this year, right? Like, we've got kind of the basics done, or implementing metrics everywhere. You know, now let's make that useful in terms of what I would love to see people kind of coming in and doing if you're, if you're listening to this and you want to get involved, the two things we really need are one is user feedback. So people to actually try it out and use it and tell us what doesn't work. And also tell us what works. But also, you know, if you want to go teach people about this, if you want to educate people about this, we would love to have more people kind of involved in the community, helping out writing documentation, doing examples, you know, it doesn't have to be on our website, on the open telemetry site. It can be anywhere, you know, do it on dev dot two or put it on YouTube or whatever. But if you are interested in that, we actually have resources at our website, open telemetry.io. Under the documentation, there's like a Learning Resources section where you can actually you know, I mean, I know no one's doing in person, me Right now, but you know, we actually have like a workshop there that you can adapt news, you know, maybe you want to do a lunch and learn at your company, you can pick up our slides and use those to teach other people open telemetry, which I think is a great way to get involved, even if you don't want to, you know, really get down in the in the weeds and GitHub,
Alex Boten
0:50:17
and I think specifically around Python, as you mentioned, Austin, reading the dogs trying out the examples, we have the open telemetry, Python, read the docs website that you know is up there and we would love to get some feedback on that. And from a contributions then we would love to see you know, as many folks that are interested in implementing instrumentation for open telemetry as soon as possible come in the door, and, you know, attend our our SIG meetings or join us on Twitter, we're pretty accessible.
Tobias Macey
0:50:46
Well, for anybody who wants to get in touch with either of you or follow along with the work that you're doing or contribute to open telemetry I'll have you add your preferred contact information to the show notes. And so with that, I'll move into the pics and this week, I'm going to change The paluma project, I've started adopting that for managing my own infrastructure. And I've been enjoying being able to just write some Python and take advantage of all the ecosystem tooling around that for being able to build that my cloud resources. So definitely recommend taking a look at that. If you're, you know, even if you're running existing infrastructure and you want to start converting over it has great options for adopting existing resources, so definitely worth a look. And with that, I'll pass it to you Austin. Do you have any pics this week?
Austin Parker
0:51:26
I've been using a lot of Helm three recently I've been getting more back into Kubernetes. And wow, you know, I tried Helm when it was years ago, I guess when it first came out. And the headaches of configuration and setup and tiller and all that were extremely frustrating and I felt made it a not great experience. But Helm three, they really, you know, came around and it's it's exactly what you want it to do. It templates things and it just works and it's one binary and you don't have to install or configure stuff. It's It's great. I love it. If you've seen Helm before and been burned by tiller by our back or anything else. Definitely recommend giving Helm three a shot.
Tobias Macey
0:52:10
And Alex, how about you?
Alex Boten
0:52:12
Yeah, so I guess my pick of the week is algorithms to live by the computer science of human decisions by Brian Christian and Tom Griffiths. It's a great it's a great book, talking about how algorithms can be used in everyday life for applying, you know, computer algorithms to how to make decisions. As humans it's it's actually a really great and entertaining read.
Tobias Macey
0:52:35
Yeah, I enjoyed reading that one myself. So I'll second that pic. So I'd like to thank the both of you for taking the time today to join me and discuss the work that you're doing with open telemetry. It's definitely a very interesting project and one that I plan to start adopting from my own system. So I appreciate all of the effort that you and everyone else involved has put into it. And I hope you enjoy the rest of your day.
Austin Parker
0:52:56
Great. Thanks for having us. Thanks for having us. And looking forward to your feedback.
Tobias Macey
0:53:02
Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast comm for the latest on modern data management and visit the site at Python podcast comm to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@podcasting.com with your story. To help other people find the show please leave a review on iTunes and tell your friends and co workers
Liked it? Take a second to support Podcast.__init__ on Patreon!