Pure Python Configuration Management With PyInfra - Episode 270

Summary

Building and managing servers is a challenging task. Configuration management tools provide a framework for handling the various tasks involved, but many of them require learning a specific syntax and toolchain. PyInfra is a configuration management framework that embraces the familiarity of Pure Python, allowing you to build your own integrations easily and package it all up using the same tools that you rely on for your applications. In this episode Nick Barrett explains why he built it, how it is implemented, and the ways that you can start using it today. He also shares his vision for the future of the project and you can get involved. If you are tired of writing mountains of YAML to set up your servers then give PyInfra a try today.

Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!


Datadog is a powerful, easy to use service for gaining comprehensive visibility into the state of your applications.

The easy to install Python agent lets you collect system metrics and log data, supports integrations with all of your services, and distributed tracing.

Their customizable dashboards and interactive graphs make finding and fixing performance issues fast and easy, and their machine-learning driven alerting ensures that you always know what is happening in your systems.

If you need even more detail about how your application is functioning you can track custom metrics, and their Application Performance Monitoring (APM) tools let you track the flow of requests through your stack.

Start tracking the performance of your apps with a free trial at pythonpodcast.com/datadog. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.

 



Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • This portion of Podcast.__init__ is brought to you by Datadog. Do you have an app in production that is slower than you like? Is its performance all over the place (sometimes fast, sometimes slow)? Do you know why? With Datadog, you will. You can troubleshoot your app’s performance with Datadog’s end-to-end tracing and in one click correlate those Python traces with related logs and metrics. Use their detailed flame graphs to identify bottlenecks and latency in that app of yours. Start tracking the performance of your apps with a free trial at datadog.com/pythonpodcast. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
  • Your host as usual is Tobias Macey and today I’m interviewing Nick Barrett about PyInfra, a pure Python framework for agentless configuration management

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what PyInfra is and its origin story?
  • There are a number of options for configuration management of various levels of complexity and language options. What are the features of PyInfra that might lead someone to choose it over other systems?
  • What do you see as the major pain points in dealing with infrastructure today?
  • For someone who is using PyInfra to manage their servers, what is the workflow for building and testing deployments?
  • How do you handle enforcement of idempotency in the operations being performed?
  • Can you describe how PyInfra is implemented?
    • How has its design or focus evolved since you first began working on it?
    • What are some of the initial assumptions that you had at the outset which have been challenged or updated as it has grown?
  • The library of available operations seems to have a good baseline for deploying and managing services. What is involved in extending or adding operations to PyInfra?
  • With the focus of the project being on its use of pure Python and the easy integration of external libraries, how do you handle execution of python functions on remote hosts that requires external dependencies?
  • What are some of the other options for interfacing with or extending PyInfra?
  • What are some of the edge cases or points of confusion that users of PyInfra should be aware of?
  • What has been the community response from developers who first encounter and trial PyInfra?
  • What have you found to be the most interesting, unexpected, or challenging aspects of building and maintaining PyInfra?
  • When is PyInfra the wrong choice for managing infrastructure?
  • What do you have planned for the future of the project?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...
Tobias Macey
0:00:13
Hello, and welcome to podcast dotnet, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project to hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested linode platform including simple pricing, load balancers, 40 gigabit networking dedicated CPU and GPU instances and worldwide data centers. Go to Python podcast.com slash linode. Today, that's Li n od E and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macey, and today I'm interviewing Nick Barrett about pi infra appearance. Python framework for agentless configuration management. So Nick, can you start by introducing yourself?
Nick Barrett
0:01:04
Hi, yeah. Thanks for having me. And thank you for this thing. I'm Nick Baum, software engineer working at a company in the retail analytics space where I focus on kind of infrastructure heavy projects. I also run a couple of side projects under the oxy Jim brand and spends way too many hours on open source projects in addition.
Tobias Macey
0:01:24
And do you remember how you first got introduced to Python?
Nick Barrett
0:01:26
Yeah, so I actually started Python quite late. I started maybe 14 years ago with the kind of classic PHP, HTML WordPress stack fiddling with my own kind of blog, little bit of the Internet, and then moved on to Lua. In garry's mod. They've all played the game garry's mod of all places, before this is all pre University and then even in university we did you know Java and the big, big slow ones, and eventually after university. I started working professionally then I moved into picking up Python, and I instantly fell in love with language. And ever since, as I say the rest is history.
Tobias Macey
0:02:06
And so that has ultimately led you down the road of building your own Configuration Management Framework. So can you give a bit more background about what pi infra is and some of the origin story and what led you to build it in the first place?
Nick Barrett
0:02:18
Absolutely. So paying for is a tool for managing infrastructure. It's designed to support both ad hoc command execution and kind of full state or configuration management there started back when I first started, when I first started working professionally, we used a lot of Ansible and a bit of fabric, both of which I love both also written in Python, and they will kind of they're kind of like the basis of the inspiration for the real, this kind of thing that kicked gave me the kick to start building plan for was we were infrastructures growing rapidly as we grew, Advil came quite slow and frustrating to use ever large numbers of servers do require The kind of infrastructure wide deploy we had to set up I believe it was sent to at the time, and which, which was uploading a whole bunch of files onto each server. And basically, as this as they've structure grew, this became exponentially slowly got to the point where it take 2030 minutes just to run one, you know, one roll out updated sets of scripts at the time. We also were using, as I mentioned, fabric, which was which was still super fast. And we kind of occasionally who would have makes a bit of Ansible and fabric to you know, fabric to do the kind of the bits that Ansible was slow at and Ansible to do the initial install and configuration management side, which is so good. So essentially, I took the my favorite bits of fabric, its speed and its kind of pure Python configuration with the state management concepts and ideas from Ansible.
Tobias Macey
0:03:56
And that's how I hacked together Python for 0.1. And it's You mentioned there's Ansible, which is pre existing and you highlighted some of the challenges and pain points there. And there are a number of other options both in Python and other languages for configuration management that have different layers of complexity and various options in terms of the ways that you approach it. And I'm wondering what the core goals are of pi infra and some of the capabilities that it offers that might lead someone to choose that over other options such as Ansible or saltstack, or Chef or Puppet.
Nick Barrett
0:04:30
Absolutely. So I think for me the the part of the reason of building plan for the greatest benefit isn't isn't debugging capability, particularly due to the way pine fir works. So roughly execute shell commands in the in the same way that you would if you were kind of setting up a server by hand and this kind of leads nicely into when something does go wrong. You get instead of getting a like a bespoke kind of error, trace back specific to you know, the software you just Get essentially the standard out or the standard error. And it should be from the command that failed during the deployed. So I think this is a, this is a huge like, they're just instant feedback. So you get instant debugging feedback. So it means you can rapidly iterate through a through building out a deployable set of deploys with this in mind. And the other. Other big win is the pure Python configuration of fine Ferb, which is a which kind of enables almost infinite possibilities. as everything is configured in Python. This allows you to integrate with with basically any Python package that already exists. So if you need to put in like a bunch of easy to hosts from anywhere, so you can just use Boto three and integrate that or if you need to pull in secrets from hashey Corp volt, you can just use the standard Python libraries for that. And so none of that needs to be built into Python for itself out of the box compatible with more or less everything the pies is compatible with.
Tobias Macey
0:05:56
So for somebody who is trying to build up infrastructure, there are a number of different ways that you can go about it different levels of complexity in terms of what you're trying to deal with. And I'm wondering what your major pain points are in terms of dealing with the infrastructure that you're working on. And some of the shortcomings that you see in terms of just the overall approach to tooling or system available systems architectures and how you're trying to tackle that with PI infra and your own work.
Nick Barrett
0:06:25
So I think I previously mentioned the weather. We talked about other tools. One of the certainly one of the pain points I've experienced is the the abstraction that tools apply over the top of what's going on under the hood. And so they hide away, they can hide the way the kind of underlying commands or changes being made to the server, which is great when they work fine, and they get the end result of the status as defined in the playbook role or whatever. But when it doesn't work, you it's much harder to pick out exactly how why this is better. Which leads to kinda kind of can lead to kind of unknowns in the infrastructure as it were, if that makes sense. I think there's another another pain point, which is kind of unrelated to Python for To be honest, is I think resource creeps are an issue that I've experienced kind of where you've especially if you've got like a bunch of cloud providers, and maybe some dedicated servers thrown in there copied across a whole bunch data centers, it's really easy to essentially lose servers. And you'd occasionally, you know, might find like an old box just lying around, which is not only a waste of money, but obviously becomes a security hole after a certain period of time when,
Tobias Macey
0:07:37
instantly if you don't know about it, and it's still attached to kind of networking infrastructure. Yeah, there's definitely a lot of different complications in trying to approach infrastructure management and just all of the interconnectedness of the systems after you get them up and running makes it very difficult to try and think about them from the initial starting point and how to go from zero to fully working system in a relatively straightforward progression. Yeah, absolutely. And for somebody who is using pi infra to be able to configure their servers, can you just talk through the overall workflow for actually creating the deployment logic and working with PI infra and figuring out how to go from the local development to the production environment? Yeah,
Nick Barrett
0:08:24
absolutely. So, this kind of I mean, back so when when I first built it, it was essentially use vagrant machine so that you bring up a local vagrant VM and then and then write the deploy logic and execute against the machine and and effectively manually verify that the deployment has completed and then you know, once I suitably partially roll that on to production, and I think as point 9.8, there's Pio for now integrates directly with Docker. So you can sort of see all the avoids all the overheads of the vagrant VM so it becomes extremely easy to rapidly Right ns asleep, they basically breathe building Docker container over and over again, using pine for, which, which has the nice benefit of, you can quite easily write some quick tests to verify that the contents of so container, you know, matches the state defined in your deploy logic. I am also keen to kind of expand this, there's a cool project called test infer turbo came aware of recently, which kind of allows you to write unit tests for for tools like Python for Ansible, and salt and so on. And I'm looking forward to integrating into that, because I think that'd be really awesome to be able to write like unit for unit tests for that.
Tobias Macey
0:09:37
Yeah, I've used testing for my own work and definitely appreciate being able to use it. And I've actually written a custom integration with saltstack. So that you can write the test in for tests using the saltstack yamo syntax, and then it'll do the translation for you behind the scenes, which is ugly to look at, but useful when you're actually using it.
Nick Barrett
0:09:55
Yeah, that's awesome. Yeah, it's a really cool project and I'm the kind of digital Hello, my red. Fourth getting into it.
Tobias Macey
0:10:03
And I also appreciate your efforts to use PI in for for being able to build Docker containers so that you can free yourself from the Docker file that I personally don't like the approach of but it has gotten us thus far. So yeah denigrated too much.
Nick Barrett
0:10:20
I agree. I yeah, I really struggle with the book file. Some days, I kind of like the lack of being able to have logic or, you know, complex logic in there, but some days is very frustrating and the hacks that have to be used to kind of get past it. So I mean, that was part of the the inspiration for, for educating problem with paying for Docker. But yeah, it's kind of a strange concept, like building a Docker container in an image and then turning that like from a, from a kind of configuration management tool, rather than a Docker file. But it works pretty well.
Tobias Macey
0:10:50
So the other thing that you mentioned is that paying for has support for being able to handle item potency of the deployments. So that You don't have to worry about running it multiple times. And I'm wondering what your approach is for enforcing that and the different operations being performed and some of the things that you have to address in the built in operators to be able to ensure that it does have that item potency support or edge cases that people who are customizing tie in for need to look out for in terms of handling item potency.
Nick Barrett
0:11:21
Yeah, so it's a really interesting subject suspect like Pipers explain the approach and then the essentially buying for cheese as opposed to see by executing deploys in a kind of two phases. The first during the first phase, which is read in Python for will kind of read straight from the server, you know, where's this file, what packages are installed, blah, blah, blah. And then this is compared in the operations with the state defined by the user or within the flow logic. And then finally, pi and four will spit out the commands required to alter that state to the one you know, define the final by the user. And so by doing this, it means you run it first time, it executes them as normal. And then the second time you run it, nothing, you know, nothing will happen because there's nothing to change. There are some operations that don't follow this, for example, there's a server dot shell operation, which which literally just executed any show commands you get
Tobias Macey
0:12:18
it. Obviously this this, there's no state to compare this or that that command would execute every time in pretty much every configuration tool there is the option to just run this bash script and then you're on your own as far as ensuring that it's not going to break every time you run it.
Nick Barrett
0:12:33
Yes, yeah, it's interesting one, there's a kind of there's an additional edge case with this because of the two this is this is where I think paying for difference to like Ansible and stuff where they do the the check the check whether the state is right kind of all on the machine during during execution phase because mine does not does it in a two phase and and allows you to essentially dry run it to see what changes would be made. It does mean that operates You almost have to consider them independent of others to be fully either potent, which is obviously kind of a load of nonsense realistically, like, for example, one good example is to install nginx using apt and then you want to remove the default site or the link, you know, the similar to the default psych configuration. The if you do that in pined for without, could have, if you just call the two operations apt install and files the link present equals false, it will it will install nginx as expected, but then when it gets to the files, that link operation, it won't do anything because at the time of first phase, that file never existed. So Prime's is going to assume not we're not paying for it doesn't know it's there, and therefore doesn't know to delete it. So I've included there's a number of kind of hint options in Python for in certain operations. So you can get the output of the call to App packages with installing nginx and then you can based on the app They'll tell you whether something's changed or not. And so if something has changed, I installed nginx, you can then pass the assumed present argument to the files that link operation which will ignore or not check whether the link exists and kind of issue that removal based on the nginx change, digging deeper into paying for itself. Can you talk through how its implemented and some of the overall design and implementation changes that it's gone through since he first began working on it? And as you started to use it yourself and expose it for use by other people? Yeah, absolutely. I think I've just mentioned the kind of two phase deploy the two phase two plug is way back to the very beginning. And is Yeah, core to the way pining for works. And this is implemented essentially by the core API that runs the kind of links everything together. And then the main two kind of objects, if you like, are fact and operations. So operations are used by users to describe the desired state. Ensure this file is here install this package and facts kind of describe the current state on the remote side, I this file doesn't exist or these app packages are installed. And so playing for basically. So operations use these facts to figure out what needs to change to match the state to find any operation. And then that is essentially the first phase. And at the end of that, you end up with essentially a bunch of commands to run per individual target host. And when I say command, I mean either a shell a standard shell command or something like a file, upload a file download or a Python callback. So once phase one is complete, and you've got this list pining for essentially just executes that by default operation by operation and going through each operation and executing each of the hosts commands as it goes. So that's the that's kind of execution phase. And another kind of key part applying for and this is this is a much more recent addition is the idea of connecting As So, Python was built up using POSIX servers only. And SSH was basically the only thing connected to for a long time. And then that's going to change recently with the addition of things like the Docker connector and the vagrant connector. These get these essentially define how commands are executed. And it makes for a very pluggable system. For example, there's also a weird RM Windows Remote Management connector, which is, which is in Python for now. It's kind of very, very early stages, but it's it's got that's the kind of flexibility this system will allow.
Tobias Macey
0:16:38
This portion of podcast dotnet is brought to you by data dog. Do you have an app in production that is slower than you like? is its performance all over the place? Sometimes fast and sometimes slow? Do you know why? with data dog you will, you can troubleshoot your apps performance with data dogs Enter to enter tracing in one click correlate those Python traces with related logs and metrics. use their detailed flame graphs to identify bottlenecks and latency and that app of yours starts tracking the performance of your app. So the free trial at Python podcast comm slash data dog, if you sign up for a trial and install the agent, data dog, we'll send you a free t shirt to keep you comfortable while you keep track of your apps. And as you've been building out pie and for what are some of the libraries that have been essential to being able to make it happen and some of the prior art that you've looked at as either positive or negative examples to learn from,
Nick Barrett
0:17:33
from the library perspective, g event is the absolutely at the core of pining for an absolute raving mad fan of DFM as a library. It's a obviously Python three is somewhat reduced the need for it with native async. But gfm is not only the API of Java, and I know I think it's better than async IO but it's also it's definitely quicker. I think it's over was a truly wonderful every two years. And although the only downside, of course, is the kind of Monkey patching part, which is kind of a bit hacky, but I like it, I mean, I've been using Python for in production workloads for, like four years now, all of which is given. And so I'm not concerned from that perspective, if you see what I mean. And the other the other kind of major library that he uses would be, so two libraries would be Jinja. Two, for templating, which is kind of same as Ansible. And salt, I believe, both use it and also click for the CLR which I think is another absolutely fantastic
Tobias Macey
0:18:36
bit of good. Yeah, those are definitely useful and ubiquitous libraries and both of them from the mind of Armin ronica, who is also brought us flask and many other good things. Indeed, yeah. And so the other interesting piece to dig into is the set of operations that you've built into pi infra for being able to perform the basis of item potency and To sort of define the available functionality for interfacing with the different system aspects of the servers that you're trying to build out, I'm wondering how you have determined which pieces need to be built, and just the overall interface of being able to define new operations and add them into the runtime of pi infra. Yes. So
Nick Barrett
0:19:22
the core of it is Yeah, I remember looking at I remember, I still remember good, like, when I was first going to figure it out paying for was was the kind of baseline what's the minimum, you need to configure a server on a, you know, the reasonable way. And so it comes down to like the kind of file file system for all for all kind of file system, you know, managing files, directories, line lines in files, links, and upload and download files that can that's kind of essential, I think, to any configuration management system where even any kind of infrastructure management system, and then there's the server side stuff or it's called server In the server module Apart from that, no, no, that's the right terminology but of user management, group management and you know, little things like managing the hostname, managing sis CTL entries, managing the load, the kernel modules, and so on. So that's kind of the those that goes to the kind of real core core set of operations that I would say most. Like they are used most by any deploy that I've seen. And then on top of that, there's kind of the more tools specific command specific ones. So like apt for managing app repositories and installing app packages, and then Yum, the same thing. And they were they were essentially the only two open in the beginning because that was the the only the only ones I needed, but it has now expanded. I think there's DNF, brew APK there's a whole bunch of them, which is cool and a lot of them implemented by contributions, which is absolutely fantastic. So as far as as far as int star Developing additional operations. There's a the documentation contains a page on this, it's it's reasonably simple it's it's basically just a matter of creating a Python function. And the decorator flag is an operation that takes a state and a host object and then whatever key in whatever arguments keyword arguments you want, and then Normally these operations are actually they turn out to be really quite simple functions because they just, you know, they read some state from the host by the way of facts so they'll call you know, host dot facts dot and then depending on the what those day looks like, they're basically issues some shell commands is kind of general how it looks. So it's actually it makes for quite a quite a simple implementation from an operations perspective. So the there should be should be quite easy to write, which I think is a real good thing.
Tobias Macey
0:21:50
Yeah, reducing the amount of information and understanding that you need to have to be able to plug in something new is definitely beneficial, particularly For a new framework that's trying to gain adoption, where I know that for instance, with things like salt, there's a lot of power there. But the amount of understanding that you need to have about how the system works to be able to contribute a new module is fairly substantial. And so there's a fairly high barrier to entry before you can really even get started adding new code to it.
Nick Barrett
0:22:20
Yeah, absolutely. Yeah, I think that's, uh, cuz I've looked at Ansible modules as well. And it's kind of the same thing. Like they're much more complex. We just partly just the way they're executed. But yeah, I think it's a real advantage of, of Python. for that. It's, it's accessible, should we say without, you know, a huge effort expended in learning it.
Tobias Macey
0:22:40
And then another interesting thing that I saw as I was looking through the documentation is that you have the operation modules that allow you to execute these various shell commands, but you also mentioned the ability to execute Python code via pi infra and using some of the issues ecosystem libraries. I'm wondering how you handle the execution of those functions on the remote hosts when you're not shipping the external dependencies and the Python runtime to that server and just what the overall flow looks like in the code level of how that works.
Nick Barrett
0:23:16
So the key here is actually the they're not executed on the they are executed on the local host of the users, the users computer, so that they bypass that in the sense, essentially, the actual execution of Python functions during during the deploy run locally, which means obviously, your local virtual environment can have all your requirements as such is still so paying for itself will not run Python on the remote side. In fact, it won't run anything bar, but by default, by the kind of default shell is the only requirement on the other side and it could be any shop, it kind of bypasses hit by running them running them locally if you like, which means you can run them within your inventory farther within the Wi Fi or the kind of real estate Use the callback operations is that you can within the within the Python callback, even though it's running locally, you can still execute and receive output from the remote host. So there's a virtual network software called zero tear, which you like install it via apt apt repository, and it will, it brings itself up and it basically put to an identifier, a unique identifier on the file system. And you need this identifier to authorize the machine within the secretary UI to join your kind of network. So one way of achieving that is you have a pipe, a callback function within Python for and after you do the install. And within that function, you basically cat that file out, collect it back in your in your function, and then just call the zero to API. So that allows you to kind of you obviously know for executing anything on the remote server but you you can dynamically speak and collect output from the remote server because of mid deploy and within within In the context of a function
Tobias Macey
0:25:01
and beyond the operations, you also mentioned the different execution contexts for interfacing with Docker versus POSIX. But what are some of the other extension points that exist for being able to plug into pi infra and add different capabilities or functionality,
Nick Barrett
0:25:19
because I mean, there's kind of three areas currently essentially is the connectors as you just mentioned, terraform is an example of one that could be built and there's there is actually an issue open for it. But you know, essentially read the TF state file and turn that into an imagery and then execute against a infratry would be one example. Then there are facts you can write facts custom facts quite easily. They're just Python classes with two command method, the process method these can be written and then used without actually adding them into Python for so if you you know, you could vote the spoke set of facts and operations as well and these could all be stored in like, you know, the spoke package or whatever you wanted, really, and then call within paying for That's that's kind of extending time for this today. And then the the kind of the other area where there's a lot more potential i think is kind of pining for API which which is fully fledged and and in existence, but not currently stable and whether it's fairly stable, but there's no guarantees against that stability. yet. This is something I'm targeting for version 1.1. To have a kind of stable API with semantic versioning guarantees. I think this the API could offer some really interesting integrations that would allow you to execute the pilot for deploys from within almost any context rather than just being a car light, which could learn up some really interesting work in the future. And then, on the back of my mind, one thing I really want to do over the coming months is is build a really lightweight paying for agent which would allow paying for it to run in a similar manner to kind of salt agent or chef agent, that kind of thing, which would obviously mean allow enable running paying for in an agent based manner as well. Cuz it's kind of native agentless for people who are adopting pi in for what are some of the edge cases or points of confusion that they should be aware of or that you see them commonly experience this is really interesting. As of today, there's one major gotcha I see which is the idea potency that we talked about earlier where operations rely on each other, this can be a bit of a gotcha because it requires thinking about the like, each operation is always the individual and then any dependencies between operations have to be essentially always manually reconciled by by kind of receiving the output of one operation and then using that to infer the arguments passed into the next operation. Unfortunately, this is some other the nature of the kind of tools that deploy it and there is some and I to do this more documentation around this and kind of examples of it because it is it is quite rare in my experience, but it's kind of thing that load bytes in the ass because you don't realize So that's kind of annoying. Historically, there's been a whole bunch of gotchas, particularly with there was like the ordering of the operations, I think was the biggest culture ever. That's
Tobias Macey
0:28:10
somewhat resolved, but it's resolved. And then as far as the community and the overall uptake of pi infra, you mentioned that there has been some use of it and contributions from other users. I'm wondering what the overall response has been from developers who first encounter and test out pi infra and your overall approach to trying to promote it or grow the community.
Nick Barrett
0:28:35
Absolutely. Yeah, it's been responses have generally like been pretty good on the whole. We're really good on our show. I've been really, really pleased with people's feedback. And and kind of taken aback Actually, I didn't really, it was a part of the spine for kind of obviously been building for years and using it, and then it appeared on Hacker News, kabongo something and I don't submit to by me, which was really interesting, he was really nice to see really great feedback. I'm kind of glad to have it out there. I'm kind of working on plans as to promote it more. I think I haven't really done a perimeter justice Talk To be honest, like I posted on Hacker News a year or two ago, and then kind of sat on it. To be honest, I don't think I've put in I think that's one where area of lack when it comes to my open source projects we say is I was promoting them, I guess. And I think part of that probably lends to, of using it, using it professionally, myself to kind of haven't thought about, you know, where else it could be used, if that makes sense.
Tobias Macey
0:29:40
Yeah, it's definitely easy to be focused on your own use case and overlook some of the potential other applications just because it's not something that you have had to deal with or a challenge that you're facing. And so that's definitely one of the great benefits of open source and making a tool available to other users is that it can always For those new use cases to be discovered and factored into the tool, although it also ends up bringing in the challenge of having to know when to decline a contribution because as one of my favorite adages goes you know, open source is free isn't puppy.
Nick Barrett
0:30:19
That is very true. Yeah, it's very it's been there's been some really interesting contributions recently on from kind of BSD BSD users co has always had been an open BSD and the vagrant vagrant test machine set. But I never really used BSD firsthand, and certainly not as a daily driver. So I didn't really kind of waiting it if you like, this is really interesting. I've learned a lot from other people contributing operations or improving operations within Python for it's really exciting
Tobias Macey
0:30:48
to see that what are some of the other interesting or unexpected or challenging lessons that you've learned in the process of building pi infra or some of the complexities that you've had to deal with in its develop?
Nick Barrett
0:31:00
So I think for me, the most interesting thing is kind of been learning about all different systems are like different operating systems kind of I kind of like I'm definitely an infrastructure nerd. I love much love infrastructure, telecom infrastructure, I think, yeah, really interesting to like properly, deep dive into that and play around with the various Linux distributions, I would never have used the SDS and other example. Yeah, stuff like that. And, and I found that admin were really interesting, especially in the beginning, I was kind of figuring out what paying for would look like and what systems it would integrate with. And obviously, the kind of guarantee then was just POSIX based systems. But now with the win RM connector, I think the next interesting area is going to be kind of learning about Windows Remote Management because I haven't used windows of years. So that's gonna be really looking forward to kind of figuring that out. I think there there's a, I think the really interesting part of developing pined for was the deploy file itself. So you've got you've got, you can store your operations in a file, Python file, whatever it's called. And this is what's used to execute against that. So the way it works is pining for will take this file, and then for every single host in your inventory, it will execute the file. This essentially means that the file, this means that the file is execute a whole bunch of times once per host to generate that individual hosts operations, which is all fine and well, if you're if it's just a series of simple series of operation calls, where became complex was when you have conditional statements or for loops or anything like that. And you suddenly you'd have depending on the order of the hosts in the inventory, you might end up with a different operation order. Because of these if statements so that you know if you if at the top of the file, you wouldn't that operation might not get seen, as it were by applying for until the second host so it will get put afterwards. This is this is the As the original implementation is essentially a append only every time it saw a new operation, so this this is obviously less than favorable. So I, as I said, this is time for his biggest issue over the years and one that I solved late last year, I think so the second part of this my second attempt at fixing this was to implement of hope booth control statements. So you'd have instead of doing if you do like with State DOT when so use context presses or abused context processes, do it with state or when and then the contents of your if statement within that, and then the same for loops. But it was it is hacky, but it worked, it did the job, but it meant it meant the the Python code in the deploy file wasn't just Python anymore. It was a weird, you know, it was Python, but with some weird control statements. So the next approach two years ago was to essentially compile the deploy files. So using a test st module, and then later I think I use ready banned for a bit, which is also fantastic. Use those decisions. You take the users deploy code, rewrite it with the context processor if statements or swap if statements for context versus, and the same with loops, and then execute the file. And this actually actually worked really well, for a long time. But I think the kind of a was slow, relatively slow to deal with this compilation. And it did lead some kind of edge cases over the years, especially in because you're fiddling with other people's code, essentially. And I think that just opens you up to like an infinite number of potential edge cases, which is, which was resulted in some painful debugging sessions, and that So to fix all of this, I basically paid for now, and this is how it works. It uses it actually uses line numbers to order the operations. So as you cause the deploy file, and then as it executes the file every time an operation is called, it's tracking those line numbers. this happened this isn't it, this happens in a nested manner. So if you include another file, it tracks, you know, the file at the line of include and then the line within the file included. And it does exactly the same thing when you include like other people's package deploys. And that is how applying for does its ordering today. And it takes the kind of List of operations and then operation lines and then uses the basic sorts them to come out with the final order. And it has the kind of nice effect of being like human, understandable if you like you can you can, if you have any, if you've run it with debug, you can literally see the lines have been printed out. But basically the way you would read the file is the way the operations now execute, which is kind of that was always the kind of the ultimate goal for operational drilling. And it took at best part of three years to get it kind of nailed down to where it is now.
Tobias Macey
0:35:50
Yeah, it's funny how infrastructure kind of attracts a special kind of masochist, who wants to be able to actually deal with and handle all of these little edge cases that show Because of all of the slight variances and the systems that you're using and the different distributions of Linux and the package naming and how the file structures are laid out for people who want to deploy their code in different manners, exhibiting works in the space, I can commiserate with that. For somebody who is considering what tool to use for building and managing their infrastructure, what are the cases were paying for is the wrong choice. And they might want to just go with a Docker file or use one of the more full blown configuration management frameworks like salt or Ansible, or something like that,
Nick Barrett
0:36:35
Holly, there's definitely going to be cases where hiring for doesn't doesn't have the necessary operations to fulfill the you know, the desired state. I think on the whole I call it like a thing. This is probably quite rare. Obviously, ideally, Piper's perfect in all of these use cases. One of its limitations certainly is kind of will be the performance side so as you reach thousands 10s of thousands of fans targets, the kind of single executing from a single host will be you'll run out of CPU and network throughput at some point. So kind of at that scale, I don't think the pipe is the right tool, perhaps paying for agent might be in some way in the future. And then I think the other one is kind of continuous ensuring of state, which I believe stolte does very nicely, obviously, because pining for his agent, this is not running all the time. And so it can't, it only runs it whenever you run it and you can hack it together by running a cron job on Jenkins or whatever. But that's it's not the same as having a daemon set on the machine that making sure that things are constantly in the right state and Windows as the other one because because five to seven used to win RM and it's, you know, really early days, anything of all a Windows I would not recommend using quiet for currently,
Tobias Macey
0:37:46
for the future of the project. You mentioned a few different goals that you have, but what is your overall vision for where it's going to end up or anything that you are looking for help with contributions or feedback Or anything as you continue to grow and build on the project?
Nick Barrett
0:38:03
Yeah. So that I mean, the first the initial big thing I want to get over the line is version one, it's no no real major change from an end user perspective, but it removes a lot of the kind of old craft that built up over those years of iteration, which is which will kind of give a much cleaner slate for building on top of it, which is really important. I'm really keen on hearing the kind of feedback from the community on pine forests API's and and how pif is used in different contexts, because ideally, it would be, you know, ideally, it's, like super easy to use very accessible tool is the aim. I think the fact that it's configured in Python a little bit probably increases the bar for entry for something like a yamo based syntax, but I'm hoping the kind of examples now absolutely welcome and love people contributing Any more examples more just feedback on how it works like anything like that. That's super keen to improve that kind of the user facing UI API's. And then operation coverage is the other kind of area for expansion. I think there's still plenty of things that pine for can't do or can't natively do, and for which you may use like shell commands or bash script for, and it would be nice to cover all of that. And the kind of the other major thing is, is with the upcoming version 1.1, hopefully, with the stable API, I'm extremely keen to see you know, how people use that I'd like I don't know, I have no idea what that looks like. But I'm very interested to see if and how people use it.
Tobias Macey
0:39:35
Are there any other aspects of your work on pi infra or the ways that you're using it for infrastructure management or just the overall space of configuration and infrastructure that we didn't discuss that you'd like to cover before we close out the show?
Nick Barrett
0:39:49
thinks you got anything.
Tobias Macey
0:39:52
The only other thing that I wanted to call out is the fact that I think your approach to having everything be pure Python and the fact that that enables you to distribute deployment logic as a Python package and being able to compose things together and manage dependencies in that way, is an excellent contribution to the ecosystem. So I appreciate that aspect of it. Thank you very much. I
Nick Barrett
0:40:17
thought I'd mention that but but yeah, you make a really good point, I think leaning on the built in package management, the pies and offers and pie pie and set up tools kind of out of the box enables Yeah, like dependency management. And bundling up a deploy as a Python package is kind of a really nice feature if you like to kind of sidesteps the need for you know, custom implementation like Ansible galaxy comes to mind, but you know, there are others.
Tobias Macey
0:40:41
Absolutely. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing or contribute to pi and fro or your other projects. I'll have you add your preferred contact information to the show notes. And so with that, I'll move into the pics and this week I'm going to choose a movie that I watched last night with a family called my spy just became available on Amazon and it was pretty hilarious just, you know, great movie about a bumbling CIA agent who ends up embroiled with the targets who he's supposed to be surveilling, not gonna give any more of that away, but definitely a lot of fun. So for anybody who's looking for something to watch, I recommend it. And with that, I'll pass it to you Nick. Do you have any pics this week?
Nick Barrett
0:41:19
Sounds great. Yes, I picked out two things. One is tech or tech ish is my or the das keyboard ultimate which I would highly recommend to anyone. The combination of mechanical key caps and blank key caps, mechanical keys and blankie caps is it has dramatically improved my typing over the years and pining for is written on this very key word or large majority of it and my other pick is a is completely unrelated is actually a food pick. There's a recipe which I could provide the link for further does a week for which we made was extremely delicious for Korean slow cooked Korean short ribs with kimchi fried Rice
Tobias Macey
0:42:00
highly recommend it. All right, definitely sounds like a interesting and enjoyable meal. So up to take a look at that. So thank you again for taking the time today to join me and discuss the work that you've been doing with PI infra. It's definitely an interesting tool and one that I plan to take a closer look at and possibly use for some of my personal infrastructure. I appreciate all the work that you've done on that and I hope you enjoy the rest of the day.
Nick Barrett
0:42:21
Excellent. Thank you very much. And thank you for listening.
Tobias Macey
0:42:26
Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast comm for the latest on modern data management, and visit the site at Python podcast calm to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at podcasting calm with your story. To help other people find the show, please leave a review on iTunes and tell your friends and co workers
Liked it? Take a second to support Podcast.__init__ on Patreon!