Extending The Life Of Python 2 Projects With Tauthon - Episode 265

Summary

The divide between Python 2 and 3 lasted a long time, and in recent years all of the new features were added to version 3. To help bridge the gap and extend the viability of version 2 Naftali Harris created Tauthon, a fork of Python 2 that backports features from Python 3. In this episode he explains his motivation for creating it, the process of maintaining it and backporting features, and the ways that it is being used by developers who are unable to make the leap. This was an interesting look at how things might have been if the elusive Python 2.8 had been created as a more gentle transition.

Springboard logo Machine learning is finding its way into every aspect of software engineering, making understanding it critical to future success. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype.

Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. Podcast.__init__ is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to pythonpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.


Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $60 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!



Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • You listen to this show because you love Python and want to keep your skills up to date, and machine learning is finding its way into every aspect of software engineering. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype. Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. Podcast.__init__ is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to pythonpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.
  • Your host as usual is Tobias Macey and today I’m interviewing Naftali Harris about his work on Tauthon, a fork of Python 2 that backports features from Python 3

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Tauthon is and your motivations for creating it?
    • What’s the story behind the name?
  • What types of applications and environments are you using Tauthon in?
  • How much adoption of Tauthon have you seen?
    • What are some of the different ways that your users are employing it?
  • Is this the missing "2.8" release? In other words, is this intended to be a bridge for simplifying the migration of existing Python 2 code to Python 3, or as an extended support window for Python 2?
  • What features have you backported from Python 3?
    • What is your process for identifying and prioritizing features to bring into Tauthon?
  • What is your workflow for implementing the backported functionality in Tauthon?
  • What are some of the cases where you have had to compromise on the functionality or syntax of a feature that you have backported in order to fit into Python 2?
    • What is your governing philosophy for how to manage syntax or behavior differences between Python 2 and 3?
    • What have been the most challenging features to backport and maintain?
    • What are some of the ways that Tauthon might break existing Python 2 code?
  • What is the story for compatibility with libraries that are Python 3 only?
  • What have you seen in terms of adoption of Tauthon?
    • Do you have any sense of the commonalities among those users?
  • What are some of the ecosystem challenges that faces users of Tauthon? (e.g. Pip support, package compatibility, etc.)
  • What are some of the most interesting, unexpected, or challenging lessons that you have learned in the process of creating and maintaining Tauthon?
  • What are your long-term plans for Tauthon, and how have they changed since you first started working on it?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...
Tobias Macey
0:00:12
Hello, and welcome to podcast ordinate, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at linode good 200 gigabit and private networking node balancers, 40 gigabit public network fast object storage and a brand new managed Kubernetes platform all controlled by a convenient API, you've got everything you need to scale up. And for your tasks that need fast computation such as training machine learning models, or running your ci and CD pipelines. They've got dedicated CPU and GPU instances. Go to Python podcast.com slash linode. That's Li n o d today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. You listen to this show because you love Python and want to keep yourself up to date and machine learning is finding its way into every aspect of software engineering. springboard has partnered with us to help you take the next step in your career by offering a scholarship to their machine learning engineering career track program. In this online project based course every student is paired with a machine learning expert who provides unlimited one to one mentorship support throughout the program via video conferences. You'll build up your portfolio of machine learning projects and gain hands on experience in writing machine learning algorithms deploying models into production and managing the lifecycle of a deep learning prototype. springboard offers a job guarantee meaning that you don't have to pay for the program until you get a job in the space. podcast often it is exclusively offering listeners 20 scholarships of $500 to eligible applicants, it only takes 10 minutes and there's no obligation. Go to Python podcast.com slash springboard and apply today and make sure to use the code a springboard when you enroll your host as usual is Tobias Macey, and today I'm interviewing Naftali Harris about his work on Tauthon, a fork of Python two that backports features from Python three. So Naftali, can you start by introducing yourself?
Naftali Harris
0:02:07
Hi, everybody. I'm Naftali Harris, super excited to be on the show today. I've been writing code in Python for 10 or 12 years and super excited to tell you about telethon. And do you remember how you first got introduced to Python? I do actually. It was the first real programming language that I learned. After getting my start programming TI calculators. I learned it in high school in ninth or 10th grade. And the real motivation was I wanted to write a chess engine. So I heard about this new python programming language and picked it up.
Tobias Macey
0:02:39
And so now you have been using it for a while and you ended up creating this fork of Python two in the form of Tauthon. I'm wondering if you can just start by giving a bit of a description about what it is and what your motivation was for creating it.
Naftali Harris
0:02:53
Yes, so Tauthon is a fork of Python 2.7 that is totally backwards compatible with Python two But nonetheless includes a lot of the exciting new features from Python three that a lot of people will move to Python three for. So essentially, you can take your Python two code, run it exactly as it is, but start using some of the exciting new features from Python three, such as function annotations, the matrix multiplier, operator argument, less super async, await, stuff like that.
Tobias Macey
0:03:20
And I'm assuming that the name is a bit of a joke about Tau being twice pi. But I'm wondering if you can give a bit more of the story behind how you selected it.
Unknown
0:03:29
Yeah, well, I actually have to give credit to Nick Coghlan, who I, if I recall correctly, is the one that actually suggested it as the name, but that's exactly right. The name actually comes from tau, which is two times pi. And I think there's a lot of interesting stuff in there. If you look at the actual digits, tau equaling two pi, you actually have a 6.28 in there. And so the two eight is maybe a little bit of a, you know, tongue in cheek joke if, if there were to be a Python version 2.8 would probably be the one to have it be
Tobias Macey
0:03:58
and so In terms of the use cases and the ways that you're using tough on yourself or seeing it used by others, what are some of the main environments or types of applications that it's being employed with?
Naftali Harris
0:04:11
Well, I run it personally on my personal laptop. Over the years, I've just written a lot of different code, mostly targeting 2.7. And so I use it personally on my laptop just to run all my old scripts without having to upgrade them all.
Tobias Macey
0:04:23
And obviously, the main appeal of it is that you can keep using your Python two code past the end of life, and as you said, be able to take advantage of some of the new features. But what is then some of the feedback you've gotten from other people who are adopting tattooed on?
Naftali Harris
0:04:39
I think people think it's pretty cool. I mean, I think the concept of it is appealing to a lot of different people that don't necessarily feel the need to upgrade their code to Python three. I think the other thing is it's sort of a proof of concept of an alternative future that we could have lived in. We don't right now, but I think it's a pretty convincing demonstration that on had we wanted to we could have shipped a lot of the new features that people are excited about in Python three on top of Python two in a backwards compatible way. The main exception, of course, being the Unicode defaults for Python three.
Tobias Macey
0:05:10
And when Python three was still in the early stages, and Python two still had a decent amount of life left in it, there was a lot of discussion and furor over the idea of Python 2.8. That was the sort of missing bridge between Python two and Python three, where instead, Python 2.7 was the end of everything. And that caused a lot of people to have to go through fairly painful upgrade cycles to make their code either work in both Python two and three, or jump directly to three and so is Alphonse in some ways the spiritual 2.8 release that never happened.
Naftali Harris
0:05:46
Well, I certainly cannot call it python 2.8. But I wouldn't even view it as a bridge. I would view it as really an alternative. If you think about it, the Python community over the last 10 years, the sort of main push that has been happening all of this sort of political capital of Python has been poured into moving from Python two to Python three. And, you know, to be honest, I think that's largely been successful. For the last, you know, year or two, I think most, almost everything new is on Python three. Finally, some of the legacy projects are migrating. But I think if you looked over the last 10 years, the sort of main push from Python was really moving from two to three. And I don't think that was the best use of everyone's time and efforts. I think, instead of potentially making everybody upgraded their code to Python three and spending years and years doing that, I think we might have been better served by instead pursuing a different path, which was keeping the interpreter backwards compatible, but nonetheless, giving everybody new features that they could use on really, it's sort of different way that the community could have evolved. I don't think we saw that in all candor, but I think that telethon would have been a sort of different way that we could have gone
Tobias Macey
0:06:56
and so now that the Python two support window is over and it's end of life. There are people who are discussing what that means for people who still have their Python two code bases and don't have the time or intent of porting it to Python three, and there are discussions of the option for commercial companies to provide long term support releases and security patches for the Python two interpreters that people are still running. And I'm wondering what your thoughts are on the viability of southpaw as an option for those people who do want to keep using Python two and do want to still be able to keep getting security fixes without having to go through the effort of porting.
Naftali Harris
0:07:35
Yeah, I mean, I would say that if somebody wants to take telethon and run with it that way, I'd be super supportive. I'm you know, I'm not working on it right now. Full time I have a I'm co founder and CEO of a startup which takes up the vast majority of my time as opposed to maintaining telethon. There are a group of maintainer zz that have stepped up to the plate which I super appreciate so work on on telethon. But I do think that For organizations that want to continue running their Python two code and have that continue to work telethon could be a viable option for them.
Tobias Macey
0:08:07
And in terms of the overall feature set of tau THON as it compares to Python 2.7, what are some of the main capabilities that you've backported from Python three? And what is your process for determining which features to bring back and prioritizing the ordering of them, given that you don't have a full time investment? And
Naftali Harris
0:08:26
honestly, just the things that we prioritize backporting are the ones that I think are the coolest features in Python three. So some of the things that are included are function annotations, which I think is a really exciting idea, particularly for organizations that started by writing small code bases and then as they grew bigger are discovering that actually the dynamic typing is can lead to type errors in production. So I think function annotations and my PI which is associated with it was a really exciting addition to Python three and something that we've back ported into Python into telethon keyword only arguments is another word. exciting one, I'm really excited about async and a weight, which I think is one of the coolest new things you can do in Python three, which we've back ported to telethon as well, there's also some new convenience things such as argument list super, or the new metaclass syntax that's available in Python three. Personally, I think my my pet favorite is actually the underscores in numerical literals, which is really nice. From a writing perspective, if you write 10 million, it's hard to tell if that's 1 million or 100 million unless there's the underscores. And so I find that really helpful personally, a lot of different quality of life improvements, the matrix multiplier operator yield from the non local keyword, a whole bunch of stuff like that. Essentially, the the sort of idea is really anything that if you read the article is like why upgrade to Python three, typically, they'll say two things. They'll say number one, there's a lot of cool new features in Python three, and number two, we cleaned up a lot of the mistakes we made in Python two, cow THON, tries to do everything in category one, and of course, can't do any of the things in category two. So really, in terms of priority In which things to actually Blackboard parties have been one of the coolest features in Python three, the ones that people are most excited about, and let's backport them into telethon
Tobias Macey
0:10:08
in terms of the actual effort of backporting those capabilities, particularly as Python three marches forward and continues to evolve, what have been some of the most challenging aspects of being able to keep that functionality running in python 2.7. And how much has that difficulty changed or evolved as Python three continues to add new capabilities and add new changes to it, particularly with the upcoming 3.9 where they're introducing an entirely new parser?
Naftali Harris
0:10:37
Yeah, I mean, I would say that, historically, it's actually been surprisingly easy. The way that I did this was by looking for the pull requests in Python three that added the new functionality, looking at each of the different commits that did that and then very carefully by hand, applying those different commits onto the two codebase. So I really can't take very much credit at all. For the code, because I'm genuinely just taking work that the core Python developer group has done and taking those different changes that they made and carefully applying them to Python two, it's not as easy as literally just doing get diff and piping it to get apply, you have to actually do it by hand. And so I wrote all the code by hand, but I had a really good starting point in the change requests that the core developers had already done. I will say that the core developers are an incredibly talented group, I look up to them all. And if any of you are listening to this, thank you very much for the work that you do. It's really incredible.
Tobias Macey
0:11:32
And did you have much of a background prior to working on top on and actually digging into the C Python code base in the interpreter or any other related work?
Naftali Harris
0:11:42
I've done some different c extensions for Python before I actually really love the C programming language. I think it's pretty incredible. Probably the most relevant thing that I've done is I wrote a a c extension called lazy sorted, which is Lu it works just like the sorted function in Python. Except instead of actually sorting the list, it returns a object, which is logically but not physically sorted. And so it actually will sort the list lazily. So for example, if you just wanted the median from a list, you could sort the whole list, which takes n log n time, and then pick off the middle element. But actually, there's algorithms that will give you the median in linear time, as opposed to n log n time. And so I wrote this C extension, lazy sorted, which allows you to logically sort the list. So it returns when you call lazy sorted on something, it just takes the list copies it and tells you that it's sorted. It's not actually sorted. But when you be when you request the median element, it will sort the list just enough to actually figure out what that is, and do that in linear time. So that was probably the most relevant thing I'd done prior, where I wasn't hacking on the Python interpreter itself. But I was working with the the C code
Tobias Macey
0:12:55
and as far as bringing in these new capabilities, What are some of the things that you've had to do more research on or gain more of a foundational understanding before you can comfortably and carefully bring in that capabilities? And what are some of the things that you've learned in the process of digging through the interpreter and bringing in that functionality?
Naftali Harris
0:13:16
Well, I started really basic. I think I actually, if I recall correctly, I started with the underscores in numeric literals. Or maybe with matrix multiplication. Those are things that are a little bit walled off from some of the more complicated changes like async and await, and I sort of learned the process of adding new things to interpreter by starting starting a little bit smaller. I think since working on this project, I've learned a lot more about how the interpreter works. As I mentioned, I've gotten a lot more respect for the I already had a lot of respect to start with, but even more respect for the core developer team, which is really doing some incredible work and just sort of learn how the interpreter works overall,
Tobias Macey
0:13:53
in terms of the overall capabilities of the Python three functionality there are some instances where it's going to conflict with Python two, either because of clashes and potential keyword usage or because the underlying functionality of Python two isn't exactly what the feature in Python three was built upon. So what are some of those cases where you've had to compromise on either the syntax or the feature set of one of the Python three capabilities that you're bringing back into Python two?
Naftali Harris
0:14:25
Yeah, I can give you a good example here. One of the cool things from Python three is finer grains operating system errors. So for example, in Python two, if you try to open a file that doesn't exist, you'll get an IO error and you have to parse the error know from that to figure out that the actual operating system error is a file doesn't exist and in Python three, it just throws a file not found error, which is a lot easier to work with a lot more convenient. And you know, a lot more semantically correct, I would argue, so in telethon, we want to be able to use the same fine grain Oh sir, but not but not But do that in a in a way that's non breaking with with Python two. And so what we did is actually introduce a new class of errors that you can catch, but not actually throw them the same way that they're thrown in Python three. So you know, for example, in in Python three, if you open a file that doesn't exist, Python will throw a file not found error, and you can then catch the file not found error. In telethon. If you try to open a file that doesn't exist, it'll throw an IO error, but you can actually catch it with a file not found error. So sort of compromises like that, that maintain some of the old functionality of Python two, but allow you to use some of the new features from three.
Tobias Macey
0:15:35
And in those cases where there is a potential conflict with how Python two operates, what has been your governing philosophy for how to manage the changes or in the syntax or behavior as to how that feature is represented in Python three and bringing it into tough on
Naftali Harris
0:15:51
the core idea is keep the code backwards compatible. So everything that's done is with backwards compatibility, the exceptions to backwards connected ability and telethon are incredibly pedantic. So for example, if you like, the exceptions are things like if you literally check the Cisco dot version, and you depend on it being literally 2.7, then obviously your code is going to break because we changed the the System version, you know, or, for example, if you depend on the abstract syntax tree, well, we change that. So obviously, that's not going to work either. Or if you do things like depend on not being able to use async and a weight, then and you know, you expect to throw an error when you try using code like that, obviously, since we introduced those new keywords, that's an it's technically not backwards compatible, but in the most pedantic way possible, and you can't literally write code without doing changes like that. But everything else will work just as assuming the entire 2.7 test suite will pass and the only places where it wouldn't are again where like the syntax tree has changed or stuff of that sort
Tobias Macey
0:16:54
in bringing the features from Python three into Tao THON Are you also back porting the test cases, so that you You can continue to have comfort in the Ford capability of Tavon as you bring in more changes. And you want to make sure that everything's running as expected.
Naftali Harris
0:17:08
Yeah, of course. So if you look at the test suite for telethon, the bulk of it is stuff from 2.7. And then there's another class of tests that come from Python three to test that the functionality actually works as it's supposed to. And then there's a third class of tests, which tests the parts of it that are sort of specific to telefon you know, so for example, for the non local keyword, which is present in Python three, you can also use it in telethon, but it's not technically a keyword. And the reason for that is we want to be backwards compatible with people that use non local as a variable name. And so in telethon, you can actually use non local both as a variable name and also to designate that a particular variable is actually non local. And so in telethon, the tests include both the new tests about using non local that are present in Python three, but we also have tests that show that you can still use non local as an actual variable in
Tobias Macey
0:17:59
our there Any cases where you have either been tempted to or actually gone through with implementing new functionality that's unique to tau THON because of its usage of a continued support for the Python two ecosystem? Or is that something that you have consciously decided to not accept as either new feature set or as requests from other people?
Naftali Harris
0:18:21
No. been very clear. I mean, the the mission of telethon is backporting stuff from Python three, while maintaining backwards compatibility with Python two. So there's no functionality in it that you wouldn't find in Python three, there's no we're not doing anything like removing the Gil, or anything like that. That would be on my list personally, but I realized that's incredibly challenging. But no, we're very focused.
Tobias Macey
0:18:48
And as far as the maintenance of the features that you're backporting and the existing features, have there been any cases where you have had bug fixes that have have been difficult to bring back or new reported errors because of conflicts with the Python three functionality and how it manifests in the Python two code base,
Naftali Harris
0:19:09
bringing back the bug fixes has been relatively straightforward. In fact, especially for some of the older features that were implemented earlier on, you know, typically those, if you look at this sort of commit history for any of those new functionality in Python three, you'll typically find that a new feature is released. And then there are bug fixes that people discover bugs as it went into the wild. And then there are bug fixes that happened afterwards. You know, again, credit to the core developer team that often these bugs are incredibly pedantic, but they're fixed nonetheless, to make it as close to perfect as possible. And so in the process of backporting stuff to telethon, I would backport both the initial feature as well as those bug fixes over time, and in some cases, you got to be a little bit careful in terms of, you know, again, maintaining that backwards compatibility, but I was able to do that without too much difficulty
Tobias Macey
0:20:02
and have most of the features they brought back been more core to the actual language itself and the interpreter or have you also been doing a fair bit of copying from the standard library to bring in some of the new capabilities like the ipv6 support and things like that
Naftali Harris
0:20:18
the sort of approach that I took was initially starting with the core language itself. And then after that, doing the libraries, and the main reason for that, frankly, is that the libraries are written in Python three. And if you start by backporting, the libraries you have to take a lot of the code which is oftentimes using new functionality from Python three, and you have to remove that functionality to make it work in telethon. So I didn't want to do that. And instead, we first worked on the on the core language. And then after that started backporting, some of the some of the different standard library features
Tobias Macey
0:20:51
and in terms of that compatibility with libraries that rely on Python three functionality what is the story and telethon for being able to use some of those libraries that might be Python three only particularly new things that have moved to being Python three only, such as Django or NumPy.
Naftali Harris
0:21:08
The support is actually relatively solid, I would say. I mean, I think that, you know, Python three has some new things that Python two doesn't obviously, and telethon was designed to run stuff in Python two, as opposed to run stuff in Python three, but nonetheless, the actual support has been relatively solid, I would say, like most Python code that's written in Python three, depending on which Python three, it is, will run in town unless you're doing something that's, you know, pretty Python three specific or if you are, it'll run with a couple of reasonable changes. But again, this sort of focuses on the the legacy code that's in Python two,
Tobias Macey
0:21:44
and are there any other elements of the surrounding Python ecosystem that have been challenging to make work with Taff on maybe things like IP or some of the test capabilities or ci services that rely on for being able to verify their own code?
Naftali Harris
0:22:03
Well, certainly, probably one of the biggest challenges has been just the distribution of telethon. You know, right now, you basically have to clone it from GitHub and then install it, like build it yourself and install it. And that's, that's a challenge, as opposed to, you know, installing it on Debian or with Brewer, you know, in your preferred package manager of choice. So I would say that's been a challenge for sure.
Tobias Macey
0:22:26
And as far as the overall adoption, you mentioned that you have had some people who have stepped in to help with maintenance of it, and you've got a decent body of people who are using it for their own work, but what are some of the commonalities that you've seen among the people who have adopted it, whether it's shared industries or commonalities in terms of their background or regions or anything like that,
Naftali Harris
0:22:50
I would say it's a sort of a different kind of personality type perhaps, or a different sort of focus. You know, for example, myself personally, so my company works in financial services, we're in general, very careful, we try hard to keep things working and nothing on the other side, there's the move fast and break things sort of model. And I think developers oftentimes fall somewhere on that spectrum. You know, we're on one side, it's like, we'll move slowly and deliberately and carefully, either side with the other side decide we will change things rapidly and, you know, always be living in the future. And, you know, that's where you have a different JavaScript framework every month. And I think some of the folks that are excited about telethon are a little bit closer to the that first side of that spectrum.
Tobias Macey
0:23:32
And in terms of your vision for Tavon and your plans for it going into the future, especially now that Python two is officially unsupported. What are your thoughts on its long term viability or the overall time horizon that you plan to keep working on it and keep bringing back features from Python three
Naftali Harris
0:23:50
Yeah, so I'm personally I started this project but unfortunately don't really have the time to be personally supporting it full time. And so that's why I think we're that group of contributors who stepped up and are maintaining telethon who I really appreciate, Stefan, who I super appreciate who's stepped up on this. But I think in terms of the future folks that are interested in continuing to run their to code, which I think there's there's still a reasonable sized class of them, you know, would encourage them to to either move to telethon or move to a distribution that includes will include security fixes for further code. I mean, I do think that Python three migration has has largely been successful. You know, my company, we use 3.7, I believe, and you know, are running it well, and you know, I do believe that actually, Python three is a better language than Python two, or than telethon in isolation. I just don't agree with the idea that we should spend 10 years getting everyone to migrate to the new thing. I think we could have used those 10 years in a different way. But now that that's largely happened, I think that the the future now that that's largely happened, I think that in the future, we can spend some time thinking about things that are hopefully not migration.
Tobias Macey
0:25:02
So a decent amount of the work that you have done with Tao THON has been focused on a lot of the visible capabilities of Python three in terms of new syntax or things like async, where it offers a some new runtime capabilities, how much of your time or effort has been spent on bringing in some of the performance improvements that Python three adds as far as better memory usage improvements and the garbage collector, a lot of people are pretty excited about the dictionaries that are ordered by default. I'm wondering what your thoughts are on the sort of mostly invisible aspects of Python three, and your thoughts on bringing those into the Python two code base.
Naftali Harris
0:25:43
Honestly, we'd mostly focused on backporting some of the more visible features, because I think those are oftentimes the ones that rightly or not get, sort of more excitement and so haven't done as much on some of the performance improvements and things like done on but in general are very excited about that. sorts of things. I mean, I think, to the extent that some of the code can be sped up or other kinds of useless resources overall, that'd be a very good thing.
Tobias Macey
0:26:09
And in terms of your experience of starting this out on package, and bringing in these new features and keeping it up to date with some of the capabilities of Python three, as it evolves, what are some of the most interesting or unexpected or challenging lessons that you've learned to the process?
Naftali Harris
0:26:25
memory management was super hard. I remember at some point, there was a memory leak, and I couldn't find it and I, I forget which feature I was backporting but I literally spent a week digging around for this thing. And, you know, memory was leaking, I'd run it in Val grind. And, you know, there are so many and, you know, maybe one of the core developers would have, you know, been able to spot this in an hour or something but I spent really, literally a whole week and I thought it was going nuts, you know, the eventually I found it, you know, I could find the, you know, the reference count increment. And then you saw where it wasn't getting documented. And that was my leak. But that was incredibly challenging to find. And so I think one of the lessons I learned was, you know, I thought I was being extremely careful. And I think I was being extremely careful when writing it. But after that, I was even more careful, especially with the reference counting. So just be extra sure that you know that I wasn't creating some bug that would then take another week for me to actually find in squash that was actually one of the hardest bugs I've ever I've ever dealt with in my life.
Tobias Macey
0:27:30
And in terms of your overall work on Tavon, and your experience of helping more people continue to use their Python two code bases and possibly have a viable bridge that makes it easier for them to incrementally add Python three support, what are some of the other things that you have enjoyed from that or unexpected outcomes that you didn't anticipate at the beginning?
Naftali Harris
0:27:56
I've just really enjoyed the process of hacking on the interpreter, to be honest, I think That the code is incredibly beautiful. I think Python, all of the languages, if you will, are incredibly beautiful. I think there's a ton of really, really deep thought that's gone into every facet of language and learning more about that has been, for me, like incredibly fulfilling. You know, you can see that every every aspect of it was thought over very, very carefully argued over, you know, ultimately, I think the decisions have been made have been pretty solid overall in terms of language design. And I think just digging deep into how things actually work and how they're implemented has been, for me incredibly interesting.
Tobias Macey
0:28:37
Is there any new capabilities in recent releases of Python three, or features that you haven't yet backported that you're excited to be able to bring into Tavon or anything that you are sort of looking forward to, as far as you know, some of the new releases of Python? To be
Naftali Harris
0:28:53
honest, not really. I've been pretty focused in the last couple of months just on my startup, so I actually Probably for one of the first times in my life, I haven't been following the most recent releases super closely.
Tobias Macey
0:29:04
Alright, well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move into the pics and this week I'm going to choose first off the pike on 2020 online content because we were all forced to stay inside our homes and not travel to pike on us this year, they have fortunately found a way to at least bring some of Biocon to you with people recording their talks and putting them up on YouTube and poster sessions and some of the elements of the language summit. So they've put together a nice website where new content is added every week for being able to at least get some sense of what's going on in the Python community and learn some new stuff there. So we'll add the link to the show notes. And my other pick is I've been using a framework called Baxter for being able to do a lot of ETL work recently and I've just been enjoying that it's really well designed has a lot of great elements for being able to abstract out different things. portions of the pipeline and the execution context to make it easy for testing the logic and isolation and then being able to get useful metadata out of the pipeline as it's executing. So if you're looking for a way to be able to have some ordered execution of steps and workflow management for your data, it's definitely a great framework for that, and they recommend it. And so with that, I'll pass it to you neftali. Do you have any pics this week?
Naftali Harris
0:30:23
Yeah, I would say all due to one as a startup founder. I I don't think I would be able to live with myself if I didn't have a to my own horn just a little bit for our startup. It's called centerlink. We detect a new form of fraud for banks and lenders and our hiring engineers. So if you are looking for a job super talented and are interested in fraud and identity, would love to hire you. I will put the link in the in the notes for the show. And the other pic, I would say is one piece of software that is just really incredible. It's part of Python, of course, but Tim's or if you haven't already. learned about this is really incredible. It's made its way into other sorting as well, such as Java. Wikipedia has a really great description. But even better is the one that comes from Tim Peters himself, which is in the Python code base. And so if you're interested in sorting, this is really something special. I mean, it's not it's way better, not better necessarily, but it's, there's a lot of thought that went into it. And you know, the stuff you learned in school about, you know, quicksort, or merge sort or bubblesort, or something, this is a whole new level. So I would really encourage you to take a look at that. It's really, really impressive.
Tobias Macey
0:31:31
Yeah, I've definitely been impressed with it as well. And I've heard a lot of references to other communities and academic work being based on the work that's gone into that. So it always makes me proud as a Python user to have that be built into the code base and have it be something that was a result of somebody who was early in the community and making their contribution to it. All right. Well, thank you very much for taking the time today to join me and discuss your work on TAF. On it's a very interesting project and one that definitely gives a lifeline to people who still have a lot of Python two code that they want to keep running either because it works just fine. And they don't want to have to move it to Python three and potentially add bugs or they don't have the time or capability to upgrade it to Python three. So definitely good thing for the community. So I appreciate all of your time and effort on that front. And I hope you enjoy the rest of the day.
Unknown
0:32:20
Yeah, you too. Bye. It's great to chat today.
Tobias Macey
0:32:25
Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast.com for the latest on modern data management, and visit the site at Python podcast.com to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or try it out a project from the show then tell us about it. Email hosts at podcasting a.com with your story. To help other people find the show please leave a review on iTunes and tell your friends and co workers
Liked it? Take a second to support Podcast.__init__ on Patreon!