Most long-running programs have a need for executing periodic tasks. APScheduler is a mature and open source library that provides all of the features that you need in a task scheduler. In this episode the author, Alex Grönholm, explains how it works, why he created it, and how you can use it in your own applications. He also digs into his plans for the next major release and the forces that are shaping the improved feature set. Spare yourself the pain of triggering events at just the right time and let APScheduler do it for you.
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2020 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.
Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, node balancers, a 40 Gbit/s public network, and a brand new managed Kubernetes platform, all controlled by a convenient API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they’ve got dedicated CPU and GPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
Your host as usual is Tobias Macey and today I’m interviewing Alex Grönholm about APScheduler, a library for scheduling tasks in your Python projects
How did you get introduced to Python?
Can you start by describing what APScheduler is and the main use cases that APScheduler is designed for?
What was your movitvation for creating it?
What is the workflow for integrating APScheduler into an application?
In the documentation it says not to run more than one instance of the scheduler, what are some strategies for scaling schedulers?
What are some common architectures for applications that take advantage of APScheduler?
What are some potential pitfalls that developers should be aware of?
Can you describe how APScheduler is implemented and how its design has evolved since you first began working on it?
What have you found to be the most complex or challenging aspects of building or using a scheduling framework?
What are some of the most interesting/innovative/unexpected ways that you have seen APScheduler used?
What are some of the features or capabilities that you have consciously left out?
What design strategies or features of APScheduler are often overlooked or underappreciated?
What are some of the most useful or interesting lessons that you have learned while building and maintaining APScheduler?
When is APScheduler the wrong choice for managing task execution?
What do you have planned for the future of the project?
Hello, welcome to podcast, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network all controlled by a brand new API, you've got everything you need to scale up. For your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. They also have a new object storage service to make storing data for your apps even easier. Go to Python podcast.com slash linode. That's l i n o d today to get a $20 credit and launch a new server and under a minute, and don't forget to thank them for their continued support of This show you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen and learn from your peers you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media chronium Global intelligence, od sc and data Council. Upcoming events include the software architecture conference, the strata data conference, and pi con us. Go to Python podcasts comm slash conferences to learn more about these and other events and take advantage of our partner discounts to save money when you register today. Your host, as usual is Tobias Macey, and today I'm interviewing Alex Gronholm about AP scheduler, a library for scheduling tasks and your Python projects. So Alex, can you start by introducing yourself?
Yeah, sure. I'm a longtime developer, doing hobby project and commercial projects for many years. I think my professional Career most excited around 2011? No 2005, but it started with Python. I actually started with PHP and then went on to Java. And then then I think Python was the next language.
I remember one of my colleagues mentioning it in passing, but that This sparked my interest. It was It was only after what a few years that I was reintroduced to Python. I don't remember when that was exactly I do remember I started using it. In my work projects in around 2007. I think I think I needed to find a new tool besides Java. And then I remembered Python, maybe because it was already gaining a lot of popularity. I heard a lot of good things about Python and then I wanted to try it out. I made my first practical application in like a couple of hours after studying to learn it, it was that intuitive. So I fell in love with Python and I, I still am in love with Python, I would say. And given that you have a background in a number of other different languages, do you find that you still use those other languages for some of your different projects? Or are you primarily just using Python these days? I'm primarily using Python these days. And pretty much the only other language I use is easy and a script for practical reasons, because it's the only one you can use on the web. Right now.
And so in some of the work that you were doing with Python, apparently you found the need to be able to schedule different tasks. So I'm going to give you can just give a bit of background about what the AP scheduled project is and some of the primary use cases that it was designed for.
Yeah, the primary reason why I chose to ride it is because there were very many options for scheduling tasks and five for Python at the time, there was the cellaring project, but that was quite a bit of an overkill for what I had in mind. I was running this commercial project accustom er p software that needed to perform some routine tasks on the schedule. And I didn't want to use the chrome de mon because it was not really the right right tool for the job. So I figured this shouldn't be that hard. And I wrote the Crone Krieger and a couple others and publish it Republic Project after that, and it turns out other people were also looking for lighter scheduler because the problem I usually get told about tillery is that it's very hard to set up very heavyweight. So people come to AP scheduler because it runs within your application and not a standalone service.
Yeah, there were definitely a number of cases where, as you said, celery is overkill for doing some lightweight projects. And I actually came across AP scheduler A number of years ago when I was first starting to use Python in my professional work and was pleased with the simplicity of getting it to run because I just needed to write a simple demon that would execute periodic tasks and didn't need for the distributed capabilities that salary brought along. So I was glad to find the the work that you did, so I didn't have to do it on my own. So for somebody who is going to build something on top of AP schedule, or can you just talk through some of the workflow for an agreed to get into an application and some of the design considerations that they should be aware of as they're planning out the overall structure of their project.
Yeah, this is actually one of the pain points of AP scheduler, because the most common type of application out there is a web application. And usually, in a web application, you have several processes, running the web application as multiple workers to distribute the load. And it is scheduled was really designed to run in a single process. So if you want to use a scheduler in a web app, then you need to be aware of the fact that it currently doesn't have any mechanism to share job stores is a festival in the Frequently Asked Questions section needed documentation. But it is a very common problem. People want to use it in a web app, but there are some ways to do that. The first and foremost is to know to is a persistent job store. But that may not work for everybody. If a person has job stories required, then the only real way to do that with a busy schedule or three current release is to run it in a separate process and use some kind of RPC mechanism more or maybe a lightweight HTTP server to schedule the tasks. So to build some kind of remote control interface for the web app, there are I recall, there are some libraries that actually do that. I can't remember the names of pand. But they do exist. Yeah.
And the documentation you have a reference implementation using our pi c for simple command to control mechanism. And so it's definitely interesting, where, as you said, a number of people who are thinking about task oriented workflows are probably coming from the web where they're familiar with celery versus people who might be building some sort of system demon that just needs to be able to Run periodic tasks on server somewhere. And so there are different constraints and different considerations in those workloads where for web applications, as you said, it's common for people to scale horizontally across a number of instances. And so you need some sort of coordination mechanism versus somebody who's running a local daemon where all it has to care about is the local system resources, in which case AP schedulers is a perfect solution,
in fact that the first use case that I had my own or just involved one process, so it wasn't a problem in that setting. So regretfully, abs case was never really designed for scalability, but it's something that I'm already working on or we can talk about.
Yeah, when I was looking through the documentation is there's a lot of pluggable capabilities within the library. So it definitely seems that you designed in room for being able to add those capabilities without locking yourself to that at the outset. And so I think some of The design of how you structured the library definitely allowed for that the eventual progression towards something more sophisticated while still being able to be simple enough for somebody to get something up and running in a single process and solve their needs. In that use case.
Yeah, people use a lot of backing data stores and I received quite a few prs for adding new data stores. I think at least two Keeper and resync Db were completely contributed up by other people. So that that was a plus, it was really nice to see people interested enough in the project to contribute surface ready made modules for AP Scheduler.
Yeah, the project so all that I don't remember the design, alterations in the beginning, particularly between one point x and 2.2 But at least one aspect that I remember from at scuttler. Three is that before that, a peacekeeper was not really designed for a lot of jobs in a back end data source. So excusa three contains quite a few optimizations to make it work with thousands of jobs. Beyond that I really can't recall, I've been trying to make it more modular, more adaptable along the way.
But I still have ways to go. And in terms of the scheduling algorithms that you've built in, what are some of the challenges that you faced in terms of being able to support the different styles of date based versus Kron based versus interval and then have combinations of those and just some of the learning curve that you've gone through while you have been working on building and evolving the project, I would
say that the chrono triggers have given me the most trouble. Sometimes I would just stare at the court or health and now I'm trying to wrap my head around the logic. There have been some edge cases Particularly a lot around daylight savings time, I'll just have to say that the FDA can go die in a fire. But it's a fact of life that we still have that. So some people have pointed out problems with the DST handling and I think is fixed at least the majority of the issues with that, hopefully, nobody has complained about it recently, at least Yeah, then there are some cases where user can enter physical specification that will actually never trigger. And that can cause an infinite loop or causing the scheduler to pretty much freeze or go go to BC A little that never ends. And I'm not even sure what the proper solution to that is maybe limited to certain amount of iterations or whatnot. I'm not really sure but it's a design challenge. Also, the order trigger is fairly nice and doesn't usually give you any trouble but the end trigger is very problematic, particularly when combined. With interval trigger, you see, the biggest issue with combining these two is that the interval trigger thoughts on certain second or millisecond. And then the trigger compares the produced next fire times. Exactly. So the trouble is that usually the interval trigger and the Chrono Trigger neighbor agree on the corner COME ON FIRE time. So this also causes the schedule to freeze because this keeps looking for the veil, the next available flight of time, which is never found this sticks coming for that in the next major release, but it has given me a lot of trouble. I answer several Stack Overflow questions regarding this very problem. But people keep encountering this problem. It's regrettable, but this is one of the biggest design challenges so that I played with a be Scheduler.
Yeah, I don't envy you being so involved in a project that is so closely coupled to time and The way that computers deal with it because as anyone who has ever had to deal with it before time is one of the more challenging aspects of working with computers and getting anything right. And that's why there are so many different conflicting truths about how to handle it properly.
Also, this issue of time zones is killer currently requires pipey the time zones because particular with the Chrono Trigger, you have to have five TV time zones, four time zones that do you have DSP, so even if you use like, Bluff, 0200 time zones, that you currently are using them to a lot of people, it came as a surprise that that schedule can be one hour off because they haven't taken into consideration the DSD issue, or remember this one, one person who complained that their schedule was one hour off, and when I inquired further, they told me that you were using Eastern Standard Time, as the time to But before Sean, because they were actually somewhere on the eastern seaboard, and they should have used America slash New York Times on because that will take care of DST when you enter the cutoff date the time, but if you use Eastern Standard Time, then it keeps the standard fixed offset, which is of course wrong. At some point, they time handling is really a big can of worms.
Absolutely. And so in terms of the ways that you've used a p schedule, or you said that when you were first building it, it was for an E RP system, I'm assuming at this point that you've moved on to other projects. Yeah. And I'm curious how much of a role AP schedule has played and some of the other projects that you've been involved with since then?
Well, I meant to say that I don't really use it in production myself. I bought by anticipate that I will be using the next major release in production because I do have a need for a schedule. At least for rescheduling of projects, but for simple use cases just asleep loop is enough. In fact, a lot of people have asked about AP scheduler for their use cases. And when I asked, they just need to do something on on the fifth period. And I've told them that if they don't need any sophisticated event listening mechanism or error handling or whatnot, they can just get by with asleep, Lou, and
has the introduction of async. io as a core primitive in the Python runtime brought about more interest in AP scheduler because of the fact that there is a greater possibility of actually having the asynchronous offline tasks co located with the primary runtime and has that brought people into using AP scheduler within those types of projects.
I wouldn't say that they think IO has had much effect on that but async IO super This was a weak point to actually have a piece k two or one that is going to be rectified with the next major release. async support was pretty much an afterthought. With a biscuit last design that is going to totally change with the next major release. I'll talk about that later. But so far, the async call your support is not not nearly as good as I would want it to be. So I've actually refrained from using it myself. And so
Well, I don't want to leave people hanging and it has been a great learning experience for me. I've learned about modularity a lot, but it's a very popular project. I think it's somewhere in the maybe top 50 or maybe at least in the top 100 Top pipey eyes, most downloaded projects, so it would be a shame to just leave it Without the maintainer?
Pretty much so nobody has volunteered to take over. So I'm just continuing is that when I whenever I have time, also, I should mention that omere cuts of the salary project has expressed interest in incorporating AB scheduling into salary. It's interesting salaries, from my viewpoint is pretty much a test to with some minimal scheduling capabilities, while AB scheduler is probably a scheduler with some minimal test queuing capabilities. And in fact, I approached him years ago, trying to get them mean sustaining you incorporating a biscuit or two, sorry, but they didn't express the interested that at that point.
Yeah, I've used celery for a while and while I was preparing for this conversation, I went and looked through their requirements list to see if AP scheduler was used at all because of the fact that they do have some scheduling capacity with their salary beats. And I was a little surprised to see that they didn't actually leverage that. So I think that that could be a good union of projects to improve some of the capabilities of celery because I know that beats is one of their weak points.
So I've learned that celery five will also come with some major internal changes such as better high availability. And to accommodate that they have requested some changes to AP schedule, which I am happy to provide. We can also talk about that at the end.
And so in terms of some of the other uses of the project and some of the community that's built up around it, what are some of the most interesting or innovative or unexpected ways that you've seen it used and how has the overall community reception and growth Then for you as the maintainer of the project,
well, it has to be very encouraging to hear about all of this all different use cases, people don't usually really give me the details of their projects, because they are often closed source commercial projects. But I learned bits and pieces before here there. I think the first interesting point that was brought up was when somebody told me they were scheduling thousands upon thousands of jobs at this time. This prompted their optimizations for AP schedules tree, but also, I remember this one use case where they were trying to set up a system that would allow different users to dynamically schedule jobs for themselves, but to keep different users separate and mature, if that project went anywhere, because AP schedule doesn't really accommodates a few use case. I've really liked to support that. But two very Difficult design is you. Apart from that i don't i don't really recall any specific project or use cases that that might have been noticeable.
And as you have grown and maintain the project, and as people have used it for their own particular projects, what are some of the features or capabilities that have been requested, which you've consciously left out or pushed into other projects that rap or take advantage of AP scheduler or integrate with it?
The fact is, I don't remember it. There have been a few a few requested I tried to turn down but I can't recall the specifics. And I couldn't, I couldn't easily find them on the bug tracker, but I tried to be as encoding as possible. But some of these features were very specific to their use cases. So I wanted to keep this maintainable and with so many, vastly different back end say it would be very difficult called to support these use cases. I'm sorry, I can't be any more specific than that.
That's fine verb such a long running project, it's entirely expected that different details will fade into the background. And it's impressive that you've kept it going as long as you have. And in terms of some of the design strategies or features that might often be overlooked or underappreciated, what are some of the things that you think users should be more aware of or would be able to benefit from digging deeper into,
I think that might be the use of setup tools into points. You see, AP scheduler use this setup tool center points declared for different triggers and job stores and executors. So that makes AP scheduler extensible in such a manner that you can declare these entry points in your own project and you can then use the job, a joke store and so forth and use Names of your custom triggers or job stores directly with a busy schedule or even just look up the entire point dynamically, and then it will just work. Also, a lot of people seem to have very specific requirements for triggers. Most of these use cases can be handled with the new content, combining triggers, but in other cases, I have taught people to write their own trigger classes. And for some very specific use cases, I still recommend that
In such cases, you don't really use triggers, but because triggers are solely used for calculating the next fire time. So if you want to Build a reactive application that low unscheduled tasks for execution right now, then you don't really want to use triggers for that you can just use AB scheduler as a task queue. But you have to build that reacting mechanism some other way.
And in terms of your experience of building and growing the project and the community around it, what have you found to be some of the most useful or interesting lessons that you've learned in the process?
I think most of modularity and API design, API design is a very complex issue a very difficult because you have to anticipate a lot of the use cases. That's the most difficult part in this. Also, I'd like to mention this one issue where a particularly angered, angry user came to be complaining that all of the jobs have disappeared. This was because of partly a misuse by the user, but also a design problem with a be Scheduler. They had scheduled a number of jobs. And then they were using a visco from a totally different process that didn't have the same code base. And we're trying to gather list of jobs. Now, you see, when AP scheduler is trying to deserialize the jobs from persistent data store, it tries to look up the job. And if the job function isn't found, the job is discarded. It was built this way, so as to get rid of the tubes that are no longer relevant. But in this case, well, it was quite undesirable and this user was very cross with me. Well, I don't blame them that much. Of course, they when they were misusing the schedule, but they I don't blame them for not knowing what would happen if regrettable and it's something I plan to fix in the next major release.
the number one feature for AP is capable of poor will be high availability. Finally, the ability to share joke stores between different schedulers. So you can have this web app and skilar running in it, even when using multiple workers their schedule of will know how to communicate. Basically it means that when the scheduler looks for the next schedules to to run, it will obtain a lock on the schedules and then other schedule will know not to Try to process them. So it's a partial redesign. A major redesign, I would say that considers high availability from the ground up. That's the that's the biggest feature. That's for the rest. I intend to fix some some of these issues with the triggers. Right now the plan is to make all the triggers stateful. One of the biggest reasons for that are the combining triggers because the way they work right now is it's a bit difficult because when you have multiple triggers inside and and trigger it doesn't, it doesn't keep track of what kind of runtimes they have produced. So if it can
and that's something I enjoy tend to pick for the next release, although there will be a threshold within which the five times have to fit. So they, they don't need to be exact matches anymore. Yeah, there was also one design issue with the with the Chrono Trigger, I think the week the numbers seemed really matched specification, but I couldn't fix that in three point x, because it would have caused unexpected behavior. For people who have already worked around this problem, yes, and there will be better async support across the board. All the job stores will be asynchronous, and a walking scheduler will be provided. But behind the scenes, it will use async.
And in terms of the async capabilities, you mentioned that there is some support for it, but it's not as fleshed out as you would like. And I know that in the past conversation that we had about the asphalt framework, I think almost three years ago at this point, you were using AP scheduler and that were asphalt is primarily based around asynchronous networking. And so I'm curious how much of the design and integration of AP scheduler with the async runtime was driven by your use case of building, building the asphalt framework and using the AP scheduler within they're
not much really these pros are very separate. I want to have some indication at some point, but as far as async support goes, in AP scheduler, I didn't actually plan to go that deeply into anything until I was requested to do so by by Alma carts of celery because they want to jump both Peter into which both feet into a sink. So I figured I could do that too. And just provide a synchronous interface for people who don't want to use Anything directly. This new racing support will be centered around my other project called any IO, which is a common API for doing anything guy or with async Goryeo to Rio and trio you can use any one of these three as back end. And Jojo still doesn't support the twisted because of the some architectural issues with twisted but I'm hoping that that is an issue we can overcome at some point with a twisted people. So I don't know how to handle which is support, I think that they should have for will need to have a lot of pre releases to sort out the design issues because it's going to be it's going to break quite a lot of the interface for because it is quite necessary. Some of the issues with the design Such that you can't really fix them without breaking stuff.
Oh, I can't think of anything. Oh, yeah, there was one. One thing. There's a long standing issue with using entry points that projects like pi two, and pi installer have a problem with entry points, because they don't stack is the library meta data, the destined for directories, which breaks the entire point. And that has been a problem I looking for. An alternate solution may be giving up enterprise completely. But it's been an issue that a lot of people have encountered
Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move into the pics. And this week, I'm going to choose the data exchange podcast which I found recently that is hosted by Ben lorica, who was formerly of O'Reilly. So he's moving the work that he had been doing with the O'Reilly data podcast to his own independent one. And so so far, it's been enjoyable. So if you're looking for a new podcast to listen to, I recommend taking a look at that one. And so with that, I'll pass it to you Alex, do you have any pics this week?
All right, I'll definitely take a look at that one. So thank you for taking the time today to join me and share your experience of building the AP scheduler project. It's one that I've taken advantage of in the past. And so I appreciate all of your time and effort on that. And I hope you enjoy the rest of your day.
Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast.com for the latest on modern data management, and visit the site at Python podcasts calm to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or try it out a project from the show then tell us about it. Email hosts at podcast and a.com with your story. To help other people find the show. Please leave a review on iTunes and tell Your friends and coworkers
Liked it? Take a second to support Podcast.__init__ on Patreon!