Community

Cultivating The Python Community In Argentina - Episode 229

Summary

The Python community in Argentina is large and active, thanks largely to the motivated individuals who manage and organize it. In this episode Facundo Batista explains how he helped to found the Python user group for Argentina and the work that he does to make it accessible and welcoming. He discusses the challenges of encompassing such a large and distributed group, the types of events, resources, and projects that they build, and his own efforts to make information free and available. He is an impressive individual with a substantial list of accomplishments, as well as exhibiting the best of what the global Python community has to offer.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host as usual is Tobias Macey and today I’m interviewing Facundo Batista about his experiences founding and fostering the Argentinian Python community, working as a core developer, and his career in Python

Interview

  • Introductions
  • How did you get introduced to Python?
  • What was your motivation for organizing a Python user group in Argentina?
  • How does the geography and culture of Argentina influence the focus of the community?
  • Argentina is a fairly large country. What is the reasoning for having the user group encompass the whole nation and how is it organized to provide access to everyone?
  • What are some notable projects that have been built by or for members of PyAr?
    • What are some of the challenges that you faced while building CDPedia and what aspects of it are you most proud of?
  • How did you get started as a core developer?
    • What areas of the language and runtime have you been most involved with?
  • As a core developer, what are some of the most interesting/unexpected/challenging lessons that you have learned?
  • What other languages do you currently use and what is it about Python that has motivated you to spend so much of your attention on it?
  • What are some of the shortcomings in Python that you would like to see addressed in the future?
  • Outside of CPython, what are some of the projects that you are most proud of?
  • How has your involvement with core development and PyAr influenced your life and career?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...
Tobias Macey
0:00:12
Hello, and welcome to podcast.in it the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you need somewhere to deploy it. So take a look at our friends over at winnowed. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network all controlled by a brand new API, you get everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models and running your continuous integration, they just launched dedicated CPU instances, go to Python podcast.com slash the node that's LINODE today to get a $20 credit and launch a new server and under a minute, and don't forget to thank them for their continued support of this show. And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet listen and learn from your peers you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media Day diversity Corinthian global Intelligence Center data Council. Upcoming events include the O'Reilly AI conference, the strata data conference, the combined events of the data architecture, summit and graph forum and data Council in Barcelona. Go to Python podcast.com slash conferences today to learn more about these and other events and take advantage of our partner discounts when you register. Your host, as usual is Tobias Macey. And today I'm interviewing for condo Battista about his experiences founding and fostering the Argentinian Python community working as a core developer and his overall career in Python. So for condo could you start by introducing yourself?
Facundo Batista
0:01:47
Hello. Thanks for having me. Yes, I am. I'm fuck on though I, I'm in a sonic engineer, I started programming for fun when I was killing a lot of different languages. Until when work in US engineer found Python and fell in love with it, like 20 years ago, or 18 years ago?
Tobias Macey
0:02:18
And do you remember how you first got introduced to Python?
Facundo Batista
0:02:21
I used to work in a telecommunications company where we had to process a lot of information server side at that point, the language that I was most comfortable with was C which I work with a lot of in the in the university but as you may know, processing texts server side with the sea is not fun at all. So I started to find out what what I could do. I found parallel and I did some developments with power bar. They was like every all day protesting because of finally syntax and everything. So work work companion told me the Herald about Python? No, No, I didn't. You should read this tutorial. So he gave me the tutorial for the official tutorial for Python for Python, I think it's was to that tour to the one at that time. And I sort of when I when I, when I go the tutorial, my first impression was, this looks nice, but it's like, too simple. The I don't know if these will be powerful enough for the processing I wanted to do. So my first test with it was doing a recursive analysis of the networks to try to find potential loops, or some simile that was kind of complex and a lot of processing. And Python works just fine. So I said, Oh, I really like the language. I really like this language.
Tobias Macey
0:04:00
So after you discovered Python and started using it, you have ended up helping to found the Python, Argentina user group. And I'm wondering what your overall motivation was for getting involved with that. And some of the story behind your founding of the group.
Facundo Batista
0:04:16
I, the moment I started to work in Python, I started like doing a lot of things with Python and a couple of work companion also uses Python with me, but nobody else knew about Python. None of my friends knew by Sun at that time. So I say I cannot be the only person in a Cantina who does Python. I mean, I ano the international community and everything. But there should be something in a container. So I refloat the normal meetups, we have a meeting three people in that original meeting with this either the three of us were working kingdom, Python, but at the same time, we knew that somebody should be working in Python. So we decided at some point to start a mailing list about it, probably a web page. And that is the version of it. I mean, they needed the needing of talk with somebody else that also use that technology. That was
Tobias Macey
0:05:23
right. And I've actually heard a number of references to people coming from Argentina who are involved in Python, and both the local community there as well as the international community. And I'm curious how large the Python Argentina user group has gotten to be over the years,
Facundo Batista
0:05:38
it's difficult to measure because we don't have a formal process for you to shine in the in the community. And so it depends on how you which numbers that you take, for example, we have a mailing list, and in the middle is there to sell thin 300 300 people. But we know that a lot of young people is not in the mailing list because they tend to not use mail, we created a telegram group for Python hunting a couple of years ago, and it's already more than 1000 people. So it's difficult to now because we don't know how much how much of one group is in the other and the last pi con Cantina with you there were more more than 1000 people attending. So it's a large group.
Tobias Macey
0:06:36
And Argentina itself is a fairly large country and the group that you have put together, IT services the entirety of the nation. And I'm wondering how the overall geography and culture of your country influences the focus of the community and any of the challenges that you face in terms of trying to facilitate interaction for such a wide world distributed group of people.
Facundo Batista
0:07:01
It's a problem, because it's not only that our country is large, as I tend to say, to people visiting the country, I always say when when they come to when I say this, I said to them that if you want to go to the south, you have to travel 2000 kilometers. And if you have to go to the north, you have to travel another 2000 kilometers. It's a large country. But the problem is deeper than that. This, the cantina is, is very centrally stick, I don't know if that's an English or luxury, they everything tends to happen in when I say this, with the exception of a couple of other big cities like Carlo, our Rosario, or Mendoza. Most of the technology she happens in when I say this, so when we did when we found the Python will wanted to find the Python group, but at the same time within one to found just when I say this, because it's we knew that we will be excluding a lot of people. So we, from the very beginning, we when we decided that we will be addressing the whole container, we decided to call it little Python, Argentina at the same time we started but we started purely beer actually, beginning. So that part was easy, because the mailing list, you can shine anywhere. But the meetings, of course, were locals. So there were there were a lot of meetings in when I say this, when we started to new people from other provinces or cities, we started to encourage them, do meetings in your cities, talk with people locally, we all work in pie some but when we have different problems, or even with the same problems, for example, quantity of companies working with Python, or sharp offers, etc. Maybe with the same problem, the different their solutions have different so let's let's have this group, others, you know, the whole Cantina but let's not be when I say centric, and try to make it as federal as possible.
Tobias Macey
0:09:29
And from your experience overall of being a technologist and living and working and want to Saturday's and interacting with people in the broader community that you that you work with what has been your sense, as far as the level of popularity of Python as compared to other languages or technologies that are being used in Argentina,
Facundo Batista
0:09:49
I think that in that in that regard, is no different from other countries or areas, we have a a lot of people working in other languages like commercial languages with a good basis in universities like shower or pay PHP or c++. And at the same time, we have like, a lot of languages that are are not widely used by Do they have a good community here, especially especially in universities, like Lisp, or Haskell. But again, in the same in the mean, similar with will happen in a lot of other places. Python has a steady, growing, but not really quite growing a lot until 10 or seven years ago, which, at some point, a lot of people are starting to use Python, like five years ago or something like literally exploded loaded with a quantity of people trying to learn Python from the science world. So I don't have a particularly specific data for Argentina and other countries. But what I've heard and in my experience is similar to what happened in the US or Europe,
Tobias Macey
0:11:22
and what of what are some of the ways that you facilitate the growth and interaction of the community. And some of the types of resources and events that you help to provide,
Facundo Batista
0:11:33
we try to make our focus is in is pretty much in the community. I mean, we do Python, where we were a group of people to in Python. So our focus is to make people talk together and get together around Python, from the mailing list or the telegram groups where we provide assistance. So anybody can look are in Python or find answers for the problems around Python. Two meetings, which we have several other kinds of meetings or events, always The idea is to make people get together around the language. What one of the basics. rules that we have for for events in Python or Cantina is we want the events to be free. We don't want to charge you for you to be able to talk with by some with somebody else. So the Thai Connor Cantina, for example, it was always free, which is kind of unusual in what the rest of the world happens.
Tobias Macey
0:12:49
Yeah, it's definitely much different than typical technology conferences that I've had experience with. And I know that in general conference, organization, and management can be both time because consuming and expensive. So I'm wondering how you've approached that in order to be able to provide it as a free resource for people?
Facundo Batista
0:13:07
Well, we have sponsors, I mean, companies, we company sponsor the events, so we get that money and pay for the expenses, we are somehow limited in the sense that for example, we don't provide you for with lunches, our T shirts for everybody, or this kind of generic stuff that you have when you go to a paid events. Because I mean, you're not paying for anything. So we cannot give you lunch. But you can access the or focuses for you to be able to access the information, the information should be free. If you have money or not, that's a focus.
Tobias Macey
0:13:55
And in addition to pi con Argentina, you have also working on this pie camp event. And I'm wondering if you can describe a bit about what that is, and how that got started?
Facundo Batista
0:14:07
Well bigamy for me is one of the events that I must like, for every year in a container. It's a small event. I mean, it's this is not for why assistance, we get together every year, like 40 or 50 people in a place that provides the basics for us to survive, like electricity, internet, bathrooms full, and that kind of stuff. And we spend four days coding and hacking and playing board games and doing fun activities like learning how to fight with swords, and that kind of stuff. It's a very nice event, where you just go to buy some Python Python for for this is very nice, very nice. We had a lot of good pictures about that I showed a lot. We have to reproduce this in other countries for people to get fun.
Tobias Macey
0:15:10
Yeah, that definitely sounds like a lot of fun. And I'm curious if the sword fighting expertise came from within the group or if that's something that you brought somebody from the outside for,
Facundo Batista
0:15:19
know somebody in the group that that specialist is in that so he every every every become he covers some sort of teach a little, but we have we normally do also a sports like playing football or basketball, or actually, or, for example, the last PE camp, we had a talk from an specialist about astronomy, we were in the mountains in a really dark place. So he talked about stars to ask for an hour. And it was very, very good.
Tobias Macey
0:15:58
Yeah, definitely links to pictures for that for anybody who wants to take a look. And I'll definitely advocate for anybody else to replicate that because it sounds like a good time and something that would be worthwhile to help grow some community engagement and just be an excuse to get out and do something different.
Facundo Batista
0:16:15
Yes.
Tobias Macey
0:16:17
So in terms of the overall community, I'm wondering what have been some of the main points of focus in terms of just general themes of events and talks and some of the notable projects that have been built by or four members of Python, Argentina?
Facundo Batista
0:16:33
Yes, well, the focus is mostly mostly the people like making everybody together to talk about Python, but with some specifics, like information should be free to anyone to anybody, as I said before, but also in diversity, we were heavily focused on diversity since I know 10 years ago, similar to what they be SF was doing. Also 10 years ago, before diversity was really in the agenda for everybody. We all we we, we were like pioneers with BASF around that. So it's mostly the people. But sometimes, sometimes, as a group, we want to attack some different projects. For example, one of the longest in time that we have, and that is I'm most proud of is the CDP via the CDPV is a project where we puckish the whole Wikipedia in a city, I mean, originally was the city, then we had we started the DVD version. So you have they are and then we we also started at a dependent I version. But the D is always the same. You go with a CD or a DVD, or a pen drive, with a computer with no internet at all. And you have the whole Wikipedia content. Of course, we are addressing the Spanish part of the Wikipedia, even as we have the idea to make it multi language at some point. But the idea is for you to go to with a CD or DVD, for example to a school in in this distant from any city, and you have computers, but you don't have internet, which is quite common in our container because we have so many rural areas. So the idea is that you have a computer, you have internet, but having the CD PDF, you can get all the information from Wikipedia, which is very, very good project is there since like 13 years of something
Tobias Macey
0:18:56
that definitely is great to be able to provide that information access. And I'm curious, what are some of the challenges and strategies that you're faced with to make it possible to have all of that information available offline and internally linked so that it doesn't require any outbound network access and any potential applications that that that could be made from that project to things like maybe packaging up sections of the Internet Archive for similar purposes,
Facundo Batista
0:19:29
I think that it's very difficult to make it genetic, because the the processing of the Wikipedia basis are so specific for Wikipedia basis, because the need of compress them at the maximum. So it's very difficult to to make it generic. The defeat, the challenge around the projects are mostly about the compression for basis and images on one side, but also the index is very difficult to achieve. They remember that we original aim is to make a CD. So CDs are slow. So if you the moment you want to find for something and open that specific page, you cannot really be reading 100 megabyte to uncompressed something in memory. But you you should have a small blocks access, you should have access in small blocks.
0:20:38
Other big challenge is how do you
0:20:45
determine which patients we will will you include and which images from those patients will you include in the in the CDB because if you it, if you make it fit for a CD, you have 600 megabytes, but if you aim for a DVD, you have like almost five gigabytes. And at the same time we have a version for a with with all images and all basis for that that is a means to pen drives. That is around 13 gigabytes The last time we compile it, but the the process, the process of selecting with basis is quite quite difficult. But that's only the the technical challenge of a project like this because you have also the associated challenge. The moment you have a CD or the moment you have a DVD with the whole will be there. How do you distribute it? Because it's not something you cannot you can well, we we we have it for the last but if you have if you are in the problem that you don't have good internet in this call, how do you original download it? So we have success regarding that the Xiaomi Wayne's, which is the founder of Wikipedia as a gift to person no sorry, it was the the the way around. A person that has this company working with the education ministry in a Cantina made us a gift for Jimmy Wales, the possibility to distribute the CD billion dollar container. So we had the disc in all schools in Argentina, I think around 2011 or something, which is very, which was a very good thing.
Tobias Macey
0:22:53
Another aspect of the project too is that because Wikipedia is continually evolving body of information. There's the issue of staleness of information, where some pages, for instance, are going to be unmodified, because their historical records that don't necessarily have a lot of flux. But for any sort of scientific information that might have been updated since the last time the information was compiled, there's the challenge of being able to redistribute those updates. And I'm curious if you have any thoughts on that problem, or any ways of maybe sending incremental updates for people who already have an existing copy, or because of the fact that it's entirely self referential if, if that's even viable, and we
Facundo Batista
0:23:35
analyze that a couple of times, it's it was very difficult to produce incremental, incremental, because at some point, we witnessed some stats at some point. And it was like, almost there were there are so many changes. And as a lot of patients are references by a lot of other precious, you have like you needed like seven, I think the number was around 65% of the patients needed to modify. So at some point, you just get a new snapshot and deliver this new a new snapshot. And incremental is not it wasn't on the water fit. The problem was Yes, the problem of a patient's going stale is is a problem of all snapshots. The moment you have, the moment you get an A snapshot, you are doomed with that. But there is a similar challenge around that, that what you can do to prevent or mostly avoid people doing bad things to the Wikipedia pages, and you distributing them as as a truth. I prefer to have this page about I don't know this scientific thing that is two months old. But it's true. That that is to the old. But it's a lie, or it's a it's a hack about something or so we have a lot of algorithms about when when we decide to include the patient in the snapshot, which version of that page we choose, we in a lot of situations, we don't choose the latest page. So it's it's complicated.
Tobias Macey
0:25:29
Yeah, it's definitely a complex challenge. And as you said, it's not just the technical, it's also the social aspects of it. And because of the fact that a lot of the people who are using it don't have internet access, it's not necessarily viable to just ship those increments over the internet, you would have to have another physical medium of sending it along, and then have a way of merging the information on a hard drive or something like that. So right, that's best of luck in in that overall effort. And then we under involvement in Python, Argentina, and working on projects such as CD pedia, you have also been working as a core developer for C Python. And I'm wondering how you got started on that path. And what specific areas of the language and runtime you've been most involved with and most focused on.
Facundo Batista
0:26:13
And I started, I have this problem around 2002, where I started a personal project for managing my own money, my own finances. And quickly I found out, I found out that float acid at the time was not a good fit for handling money. So trying to see how you can handle money in in Python, I found out that there was this idea of creating this email data type, which is the best fit for handle money. But that was not the really there. So in my original maize maze, Gisele mantra, some suggested that I did that the this Amanda Debbie is what I needed. And I decided to make it happen there was cold around there. And it is this is pick from IBM, which is specify exactly how the dissimilar the type worker. So I started to work into this email module, I received a lot of help from people that knew a lot about numbers. And I got into things like Tim Peters, or Eric snow, or Well, there is a there was a lot of people involved. But my main success there was to start and finish. Very complicated. Pip that was was is the symbol, the symbol model, and then implementing the model. At that point, I become a core developer because I was committing a lot of code, committing a lot of tests pretty well. Basically working in the in the decimal model. Beyond the decimal module itself, I like to participate in Python back days a lot. And I started to create small events in Argentina for people to grab bags of C, Python and work on them. And I normally tend to work on stuff like that in in Python sprints and everything. But I even even as I'm a core developer, I really don't spend a lot of time with the source code. I'm I the last 15 years or something, or mostly the last 10 years, I was heavily focused in the community part of Python and not so much of the call, I tend to do come eat every I know every several months because of helping somebody with, with patches, our backs the others, but it's more is most an effort of creating a community of people helping with the with the code than helping with a cold myself. For example, I participated in a seven hour Google summit of calls for people who wanted to do calls in Python and that kind of stuff.
Tobias Macey
0:29:45
And I'm curious what it is about the Python language and community that has caused you to spend so much of your time and attention on it as opposed to other endeavors that you might go, that you might spend your time on or other languages that you might be using perfect rationally or personally,
Facundo Batista
0:30:01
on one side of the Python, the Python language itself is something very nice and fun to work with. It's something which works good enough in most of the context that they use the language. Or, in my particular case, in all the context, I use a language. So I really don't have the needing of use another language is for when I do projects in my free time for foreign on or leveling technologies, either by thumb because I like it. And then I finished working as a Python software engineer in a couple of companies I am working in in canonical since more than 10 years ago, doing Python. So I use Python everywhere. On the other hands. Community is one of the healthiest communities that I found in the south Paulo world. There was always this good attitude of people around the language, the language, people, the community was always very welcoming. Always very respectful. And it's a good place to be for people to encourage to be it happened to me a lot that getting people from other languages into Python. In Argentina, one of the aspects was that I really like this, I don't know mailing list, because I can make a silly question and nobody will hit me in the head with something. Or specific specifically speaking about diversity. There is a lot of non male, white, good socio economic position, people that is really happy with their community. And this is this, I think this represented status of the Python community around the globe that this is very good. It's, it's very good, but at the same time, it's like, I don't know if I find anomaly, but it's not usual that communities are so well behaved.
Tobias Macey
0:32:38
Yeah, it's definitely remarkable the amount of effort that has been put in by members of the community globally to help foster that overall sense of Welcome to new people of all skill levels. And just the fact that it has been able to be maintained and sustainable as the community has grown beyond its original roots is pretty remarkable. And I think the fact that there is an organization in the form of the PSF, at the core of it to help drive a lot of those efforts, and set standards for the community has helped to allow it to scale to that, to the point that it has,
Facundo Batista
0:33:16
yes, yes.
0:33:19
For example, the BASF, always well made a focus about the diversity. For example, we every year in these pay camps, we do this, the base camp is the only event in Argentina, this is not free, because I mean, you have to pay for the hotel, and everything, but we normally gives money to people to be able to attend. And we do a focus on diversity there with PSF sponsorship for specifically for that, which makes the community more diverse. And at some point that will be it's it's a positive circle, that making the community more diversity will attract more diversity itself. And at some point, we can stop being equals in the community.
Tobias Macey
0:34:16
And as a user of the Python language and committed to the runtime, for such a long period, I'm sure that there are aspects of the language that you've run into that you would like to see improved or modified. And I'm wondering if there's anything notable that you would like to see addressed in the near to medium future?
Facundo Batista
0:34:35
I think that one of the aspect I in general says I am very happy with the language. It For example, other people say it's is slow in some situations, but I not really, it's that's not really a problem. For me, what I will really want to see improved in the midterm is this time, time for for the for the for the Python process, the time that is there between you type vice and three in the in the terminal, and the script really starts executing this. That time, I think that really helped a lot of different areas where Python could be more widely spread. And it's the problem is that you cannot release executed feeling by fans in a millisecond. I'm exaggerating. But that's the the
Tobias Macey
0:35:42
and outside of your work on Python, Argentina and the C Python runtime and some of the other open source projects that you've mentioned, what are some of the other areas that you spend your time and projects that you're most proud of,
Facundo Batista
0:35:56
I really use a lot of time of my life to make my kids happy, make them grow and be with them. Enjoy them while they are growing. They are still small, but time goes by so fast. I do tennis I love tennis playing. And I really a lot of my free time I put it in in computers and software projects and community. One of the projects is one of the persons that I I spend a lot of time is why is one called phase that they say automatic belittle them rapper for your projects is smaller than it will tell him rubber in the sense that you really don't know that you are reaching out to land in your project or in your autonomy. Now you only specify the dependent says the process or your interactive interpreter or whatever executes inside of your to laugh. But you don't really need to know that the real flame is under there, or how to create it, or how to activate it or anything which is makes it very, very good for people to start in Python, because they don't need to install dependencies or anything. They just if they use phase, they should specify the dependencies that they want. And the script will run in a barrel. And with only those dependencies ultimate, ultimate shakily.
Tobias Macey
0:37:43
And how would you characterize the overall influence that your involvement as a core developer, and with the Python, Argentina group, and just the overall influence that that has had on your life and career,
Facundo Batista
0:37:57
I don't know if he's been a carnival over itself influenced a lot of what I do in Python and Cantina. What really affects what I do in Python, Argentina was in that, in that sense, being part of the Python Software Foundation, being part of the group of people involved in making the language better, and then translating, or translating a lot of those attitudes. Good seems to have from overall BASF to turrentine. Specifically, regarding my career, well, I'm I'm an electronic engineer, I started working at the communications company 20 years ago, and working as an engineer. But then when I started being more and more involved with Python, I was a head of the developers in company in around 2006, then go went back to work as an electronic engineer in another telecommunication company, but then sample to canonical been doing Python, they're almost 11 years now. So it's heavily influenced my career, because I really work as a developer, even if I didn't study that in the university.
Tobias Macey
0:39:29
Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move into the pics and this week, I'm going to choose a book that I picked up from the library recently, that's been a lot of fun, called the dictionary of difficult words. And it's just a bunch of different words that you wouldn't typically use in everyday language that are interesting to say, or here. And they've got useful and complex definition. So it's just great to explore language and fun and entertaining. And there are a lot of funny illustrations to accompany the words. So it's great to sit down and look at it with your kids. So I've been having fun with that. And with that, I'll pass it to you. Do you have any pics this week?
Facundo Batista
0:40:10
Well, I, I will encourage anybody working with rich labs to take a look, face and start is like, at the beginning, you don't really see the value of it. I mean, you say Oh, another rapper by you, you really use in one, what is the benefit for it? But the moment you start really using it, you you will not stop? Isn't it? It's It's It's very, very, it's very, very helpful in the everyday Python usage.
Tobias Macey
0:40:45
All right. I'll have to take a look at that. Well, thank you very much for taking the time today to join me and discuss your experience working with Python and helping to contribute to the growth of the community. I appreciate all your efforts on that front and I hope you enjoy the rest of your day.
Facundo Batista
0:40:58
Okay, thank you. Thank you for having me. Bye bye.
Tobias Macey
0:41:04
Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast.com for the latest on modern data management. And visit the site of Python podcasts. com to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcast in a.com with your story. To help other people find the show please leave a review on iTunes and tell your friends and co workers

Security, UX, and Sustainability For The Python Package Index - Episode 225

Summary

PyPI is a core component of the Python ecosystem that most developer’s have interacted with as either a producer or a consumer. But have you ever thought deeply about how it is implemented, who designs those interactions, and how it is secured? In this episode Nicole Harris and William Woodruff discuss their recent work to add new security capabilities and improve the overall accessibility and user experience. It is a worthwhile exercise to consider how much effort goes into making sure that we don’t have to think much about this piece of infrastructure that we all rely on.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Counsil. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Nicole Harris and William Woodruff about the work they are doing on the PyPI service to improve the security and utility of the package repository that we all rely on

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by sharing how you each got involved in working on PyPI?
    • What was the state of the system at the time that you first began working on it?
  • Once you committed to working on PyPI how did you each approach the process of identifying and prioritizing the work that needed to be done?
    • What were the most significant issues that you were faced with at the outset?
  • How often have the issues that you each focused on overlapped at the cross section of UX and security?
    • How do you balance the tradeoffs that exist at that boundary?
  • What is the surface area of the domains that you are each working in? (e.g. web UI, system API, data integrity, platform support, etc.)
    • What are some of the pain points or areas of confusion from a user perspective that you have dealt with in the process of improving the platform?
  • What have been the most notable features or improvements that you have each introduced to PyPI?
    • What were the biggest challenges with implementing or integrating those changes?
  • How do you approach introducing changes to PyPI given the volume of traffic that it needs to support and the level of importance that it serves in the community?
  • What are some examples of attack vectors that exist as a result of the nature of the PyPI platform and what are you most concerned by?
  • How does poor accessibility or user experience impact the utility of PyPI and the community members who interact with it?
  • What have you found to be the most interesting/challenging/unexpected aspects of working on Warehouse?
    • What are some of the most useful lessons that you have learned in the process?
  • What do you have planned for future improvements to the platform?
    • How can the listeners get involved and help out?
  • How was this work funded?

Keep In Touch

  • Nicole
    • @nlhkabu on Twitter
    • Website
    • If you’re using CI to upload to PyPI and would like to speak with Nicole please book a time here
    • If you’re using assistive technology and would like to speak with Nicole please book a time here
  • William
    • @8x5clPW2
    • Website
    • Email
    • Please get in touch if you’d like to work with Trail of Bits on your next security project!

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...
Tobias Macey
0:00:13
Hello, and welcome to podcast, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Lenovo with 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network all controlled by a brand new API, you've got everything you need to scale up. And for your tasks that need fast computations, such as training machine learning models and running your ci CD pipelines. They just launched dedicated CPU instances. They've also got worldwide data centers, including a new one in Toronto and one opening in Mumbai at the end of the year. So go to Python podcast.com slash Linux that's LINOD today to get a $20 credit and launch a new server and under a minute, and don't forget to thank them for their continued support of this show. For even more opportunities to meet, listen and learn from your peers. You don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media Day diversity and the Open Data Science Conference with upcoming events including the O'Reilly AI conference, the strata data conference, and to the combined events of the data architecture summit and graph forum, go to Python podcast.com slash conferences to learn more and to take advantage of our partner discounts when you register. Your host, as usual is Tobias Macey. And today I'm interviewing Nicole Harris and William Woodruff about the work they are doing on the Pi Pi service to improve the security and utility of the package repository that we all rely on. So Nicole, can you start by introducing yourself?
Nicole Harris
0:01:38
Yeah. Hi, my name is Nicole Harris. I've been working on ipi or the warehouse project, which is the code base that powers API for about three or four years now. In my day job, I manage a UX UI team at a company called peopledoc. But in my spare time I work on po po.
Tobias Macey
0:02:00
William, can you introduce yourself?
William Woodruff
0:02:02
Sure. So my name is William Woodruff. I'm a security engineer with a small security consultancy called trilobites. I've actually been working on warehouse for only about five or six months now we started the work back in March. But during my day job, I sort of split my time between engineering and research. And on the research side, I do program analysis, research, mostly government funded. On the engineering side, I work on mostly open source projects, like warehouse, and always Korea and things like that.
Tobias Macey
0:02:30
And going back to you, Nicole, do you remember how you first got introduced to Python.
Nicole Harris
0:02:33
So my background is in HTML, CSS design user interface. So I, Python wasn't sort of the first technology that I was exposed to in terms of the web. But my husband is actually a Python developer, he started teaching himself programming by learning Django. So through him, basically, I got introduced to Python, and also learned, you know, and not enough Python to be useful alongside my friend and skills.
Tobias Macey
0:03:06
And William, do you remember how you first got introduced to Python? I think
William Woodruff
0:03:09
I think I use Python and a few university courses, but I didn't actually really start programming in earnest at it until I took this job. And before that, I also did actually CM Ruby. So this has been sort of a nice, a nice turn for me.
Tobias Macey
0:03:26
And given the fact that you haven't been using it for your day to day, I'm curious how much effort it's been to get up to speed with the code base, and be able to understand it and be effective with it. And how much of your experience with Ruby in particular was able to easily translate?
William Woodruff
0:03:43
Oh, so I think Fortunately, the warehouse code base, I'd like to say, it's probably one of the nicest Python code bases I've worked on. It has like 100% unit test coverage. And the idioms of the frameworks that it uses are actually well preserved across the code base. So it was actually relatively easy to get up to speed. And thankfully, I had both Nicole and everybody over on the PSL side, as well as to Minaj gene set to answer my questions as as they came up.
Tobias Macey
0:04:11
And so for both of you, I'm wondering if you can just start by sharing a bit about how you each got involved in working on the pipeline project and the main responsibilities that you have.
Nicole Harris
0:04:24
Yeah, so I can maybe start there. So I think it must have been in 2015, Donald stuffed, who is the lead developer on on warehouse, which is the project pairing pi pi, sent out, I think he actually opened a GitHub ticket that said, help, I need a designer, this is not something that I'm good at, you know, I'm rebuilding this thing. And, you know, this is completely outside of my skill set. So, you know, please retweet. And it was through one of my friends that had actually met at a party unconference that I that I kind of put my hand up and said, Hi, you know, I'm Nicole, and this is, this is what I do, and, and I think I can help you so. So that's how I got involved in in my, my involvement has kind of extended from there. So in terms of my responsibilities, I'm responsible for the UX, the UI, so the user experience user interface, as well as the HTML and CSS code base for the warehouse project. So a bit of a bit of coding and a bit of designing.
Tobias Macey
0:05:36
And William, How about yourself?
William Woodruff
0:05:37
Yeah. So on my side, I got involved by the current contract that I'm working on, which is the OTF funded security improvements to warehouse. And my work is primarily revolved around four key changes to the warehouse code base to sort of improve both the way that users improve the ability for users to secure their accounts, as well as improve the general security posture of the bypass base. And I can talk about the specifics of those improvements as we go forward. But uh, that was, that was how I got started.
Tobias Macey
0:06:06
And particularly for a unicorn, what was the state of the system at the time that you first began working on it, and any of the notable issues that you were first faced with?
Nicole Harris
0:06:17
So I didn't know if you're aware of the sort of full history of pi VI, the, when I joined the project in I think it was 2015 2016. Basically, api.org was still powered by an old code base that had been written like, I think it had been written before kind of web frameworks even existed. I think, I think Donal described it as before we even knew how to like use Python to, to build great web experiences. So in terms of the the state of the ecosystem, you know, there was this old code base, that I was kind of that the tunneled really discouraged me from diving into, he was like, Look, don't look at it, it's no best practice. What we're going to do is we're going to rebuild this from scratch. So you know, I had a fairly clean slate in terms of the user interface. And in fact, the HTML and the CSS code base, Donald did have some sort of, I think, the bootstrap templates that were working in the code base, but they weren't particularly finessed, let's see the whole basically just kind of out putting data onto the screen. So I basically rebuilt that from scratch, and made a whole lot of decisions about how we were going to structure not so much the templates, but certainly the the CSS because we were using SAS, yes, CSS code base, so that it would be something that would be easy to maintain moving forward. Because if any of your listeners have experienced sort of working on large code bases with CSS, it can get out of control pretty quickly. So we needed to put that in from the beginning.
William Woodruff
0:08:01
Yeah, so as I started working on warehouse, one of the first things I looked at was sort of the present security posture of the site and of the various like, sort of common weak points and package management, such as like name squatting, or project name, reuse, or username reuse. And overall, as far as package managers and packaging dependencies go, warehouse was in a pretty good state. So for example, as I began working, already supported, preventing common type of squatting attacks on packages, and it had written limiters and other sort of mechanisms in place to prevent these really common low level attacks against package indices. The things that I ended up working on as part of the OTF funded scope were things that are sort of above and beyond the current norm for for package indices. And that would be like, two factor authentication API tokens, surprisingly, are not the norm for prepackaged indices, and the logging infrastructure.
Tobias Macey
0:09:01
And will I understand that you have also worked on the homebrew package manager as well. And I'm wondering what your initial reactions were, as you started digging into warehouse and how it compared to your prior experience of working with other package managers and some of the common security pitfalls that are germane to that particular type of application.
William Woodruff
0:09:22
I will say I'm probably the homers current worst maintainer, I'm probably one of the least active ones. But the the security issues that Humber has to deal with are somewhat unfortunately, somewhat orthogonal to traditional homebrew, or traditional package management issues, primarily because homebrew revolves around this the central repository for all packages. And so we actually have finer grained control over both the integrity of packages as well as their origin, because we can actually see the get committed, as well as run like CI checks, basically, literally, as every package is updated. So it's all good. It's all good centralized in a way that, for example, by vi can't necessarily can't necessarily do. But that That being said,
Tobias Macey
0:10:06
I think, and so for each of you, once you began working on the Pi Pi code base, and working toward some of the initial issues, I'm curious if the problems that you were addressing are identified ahead of time or what your overall approach was for determining what were the most critical and most important tasks to be undertaking to improve the overall security and user experience of the platform.
Nicole Harris
0:10:35
So I can take this one, I think this this kind of relates to the way that this project has actually been funded. So as well as being a contributing designer slash developer on on pi pi. I'm also a member of the Python packaging Working Group, which is the sort of sub organization or working group that that works under the Python Software Foundation to raise money for packaging related projects. And it was through that working group that we actually got funding to make the security improvements that the users are starting to see being rolled out on pi pi. So the scope of the work that will and I have been undertaking is is directly related to the application that we made to the open technology fund who have actually funded this work. So what we did is we, we looked at their their mission and their vision and their values, we looked at the different grants streams, and we made an application for the items that we thought were relevant to their particular fund. And that was kind of what determines the scope of everything that has been funded through that particular initiative. So I think will you probably agree that in coming into this project, we had fairly well defined parameters around what was and what wasn't in scope based on what was basically being funded and what we'd what we said we were going to do.
William Woodruff
0:12:02
Yeah, I think that's correct. I think, yeah, we had a we had a high level idea of the individual goals we want to achieve based on the work that we scoped out with IVF. And then once we actually began work, we sort of prioritized the individual tasks based on what we thought would have both the highest user impact as well as what we could roll out with, with like minimal, minimal disruption to like, I think, like package upload and the user experience.
Tobias Macey
0:12:28
And given that you're both focusing on somewhat different areas of the platform, I'm wondering how often the issues that you're focusing on have had overlap, and what the cross section ends up being between user experience and security, particularly given that the interfaces that you're dealing with aren't necessarily just the web UI that you see when you load up the web page.
William Woodruff
0:12:52
So I actually I'm of the opinion that like, UI is severely underrated in terms of user security. So users, oftentimes, our don't know how to don't really know how to engage with the security features that security engineers exposed to them. And this is an issue that I've run into and other platforms that I've worked on. And I think a huge part a huge, huge boon to working with, Nicole has been actually setting up a set of features and then seeing how how to expose them correctly to users set up something that I'm not personally equipped to do. And seeing her build like this actually extremely pleasant to use, and extremely intuitive. Setup has been really great.
Tobias Macey
0:13:34
And then in terms of the trade offs that exist, I know that oftentimes there's a conflict between improving the overall security of a system but also still making it usable, because as you ratchet down too tightly on making something ultimately secure, you start to encourage people to take shortcuts that ultimately reduces the effectiveness of your practices and how you try to balance that issue and some of the common patterns that you have settled on to make sure that you're improving the security as much as possible, while still making sure that people are adhering by the security practices.
William Woodruff
0:14:12
Yeah, I think a huge challenge when designing secure systems is is security fatigue. So one of the last things you want to do is, like I said, ratchet down the system so much that users become frustrated and take shortcuts to achieve achieve their ends. And that's one of the issues you often see with a two factor implementations as a two factor implementation will, like require a user to sign on or re authenticate so frequently, that users will just like move there to TP setup on to their post itself and just control C Control V and thereby like dissolve the second factor component of the authentication scheme.
Tobias Macey
0:14:46
and wondering too, if you can just enumerate the overall list of interfaces. And the total surface area of the problems that you're each working with, as far as the special is a mix of the API project, because with some projects that might be limited to just the web UI, others it might be just an API. But with API, there's the web interface there the API's that users are using those the actual data integrity, as well as the actual interactions that people have of downloading and installing the packages, which is potentially another attack vector that isn't necessarily going to be present in other projects.
William Woodruff
0:15:25
Yeah, I think so the work that, at least what I did the work that I did, primarily centered around the API and the web interface. So the security features that we added specifically two factor authentication, API tokens, and an audit log of those two factor authentication is is intended primarily for use with the web interface, and audit log. Visibility is performed via the web interface, although some online events are actually captured as the user hits the API for like, two sensitive actions, such as package uploads, or file uploads, or removals. But also on the API side, there's API tokens themselves, which the user will interact with, via a tool like setup tools, or twine, or any of the other clients that interact with the warehouse API.
Tobias Macey
0:16:16
And, Nicole for you, as well. I'm wondering what the surface areas that you're dealing with as far as the user experience work, and some of the ways that that manifests at the different trade offs and the interactions between API's and web UI, and the overall package upload experience, etc.
Nicole Harris
0:16:36
Yeah, so I mean, in terms of this current contract, my work has been limited to basically what we'll just described. So first part was making sure that when users that they find it easy to set up two factor authentication, and then to use that out when logging into pi pi.org. So that's sort of the first first thing we worked on. Then we looked at the, the API keys. So sorry, API tokens, we're avoiding the word keys. And I can tell you why later. But looking at how it's, it's, you know, making it easy for users to set up those those tokens. And then obviously, as will set as well, exposing the audit log to the end users in terms of my work with regards to sort of the way that people interact with pi pi outside the browser, that's really limited to me, making sure that the instructional texts and the help texts that were showing on pi pi.org is actually useful enough for people to be able to do what they need to do. So for example, with the API tokens that were we've just deployed, I've been running some user tests that have revealed that perhaps the the way that we display the token, and the instructions that we give to users currently is not good enough for them to understand what they need to do next, using using whatever tool they're using. So that's kind of where my sphere of influence kind of sits is making sure that people have the information that they need to be able to then interact with API. However they need to do that.
Tobias Macey
0:18:28
That I'm sure is also complicated by the fact that there are any number of different tools that people might be using that would require the access to that API token, where I know that there's Pip. And there's twine for being able to upload things and flit and there are, I'm sure any number of different homegrown applications, and I'm wondering how that plays into your efforts to make sure that the instructions are clear and accessible. And I guess how far you're willing to take the effort. And when you decide that you covered enough ground, and the sort of majority of people are handled, and anybody who is in some of these edge cases is there because of something that they've decided to do that isn't necessarily something that would be required to be supported by the people responsible for the API infrastructure?
Nicole Harris
0:19:16
Yeah, I think I think there's sort of to two factors when, when thinking about, or at least how I think about designing for pi phi, it's, it's that yet people have different workflows, as you've just described, and also that you have people with different really vastly different levels of knowledge as well. So, you know, Python is now being used a lot, as a teaching language. So I'm really aware that pi pi could be the first, you know, package index that some people are using or experiencing. So they might not be familiar with all of the concepts that we present to them. On the flip side, you have people who've been coding, you know, decades, and a really familiar with all the concepts. So it's a real challenge in terms of making sure that you're explaining things enough for beginners, whilst also not sort of, you know, talking down to people who are who are really experienced. So that so that is, that is a challenge, but I tend to lean on the side of, Okay, let's, let's give more information for beginners, because at the end of the day, experienced users can ignore instructions that they already know if they don't need them in terms of the kind of weighing up or how much information to give, which tend to take a lot of feedback from the community. So I mean, I've run user tests and thinking more less about the API tokens here and more about the two factor authentication workflow that we worked on, I ran a whole lot of user tests, when we were rolling out those interfaces, with people with different levels of experience, and who were who had different, we're kind of using different tools, you know, for example, to for to TP to authenticate that some people will using a password manager to create a temporary one time password word. Some people using mobile phone, though, using all three other people, you know, there was all sorts of different ways that people were doing that and and what we in the end did was put a whole lot of examples into our text of, Okay, these are the kind of the kind of applications that you might choose to use. And we made sure that we had a good balance there between, you know, sort of the most popular tools. So things like Google Authenticator and author sort of floated to the top of the list as as things that people sort of were mentioning frequently, but also mentioning, you know, the kind of less common use cases making sure, for example, that we were listing non proprietary solutions as well, because we know that there's members of the community out there who prefer not to use proprietary software. So it it's really just about prioritizing the way that you present the information to cover the most common use case first, and then give the kind of the information for the edge cases later. Yeah, and I would say that's the same. Also, when we're talking about webauthn, which is two factor authentication with some kind of device. Lots of people understand that as OI authenticate with a yubikey. Because you know, yubikey is probably the most popular, the most popular USB key that you can use with that particular standard. But we do have people out there in the community who are using other things. So what we ended up doing was writing the instructional and the help text in such a way to sort of emphasize USB keys mentioning certain brand names. So people kind of were associating what we were talking about with the correct concept. And then also mentioning, hey, there's all these other ways that you can also do this as well, by the way. So I think that balance is quite good. Because generally, generally, if you are not necessarily using the most mainstream, as you sort of said, if you're not using the most mainstream solution out there on the market, then you're probably more familiar and more advanced use it anyway. In which case, perhaps the help Texas or the instructional text is less required for you than it might be for someone who's a beginner who's using something that's fairly mainstream.
William Woodruff
0:23:28
Yeah. And to add on to that for the API tokens work we did. One thing that's pretty interesting about the Python package ecosystem as a whole is that there's a whole lot of third party clients out there and a whole lot of third party implementations that talk to these API's. And so as we were designing out the initial API keys approach, we realized that we would probably have to make concessions in terms of like authentication semantics to make them fit into all of these third party clients that expect a username and password instead of just a general purpose key authentication. As we're working on that, we also realized very quickly that the range in continuous integration setups as well as other automated systems, constrained our ability to add certain token prefixes and certain sub usernames. So doing all that work was was pretty interesting because it involved community feedback, as well as trying to sort of guests common or happy paths and unhappy paths for for common for common uses of tokens, or sorry, API keys.
Nicole Harris
0:24:26
One thing I'd also like to add to that is, I don't know when this podcast exactly is going to go out. But I'm currently in terms of those API API tokens, I'm still working on improving the help text in the instructional text. But I do need to seek feedback from from members of the community as to what tools they are using their API tokens with so that I can make sure that I am covering all of those, well, as many of those use cases as possible within the help and instructional tech. So I suppose that's a bit of a call to action. And I know, we'll probably get a chance to make another one by the end of this podcast. But if you're a community member out there, and you're particularly particularly interested in if you're using a continuous integration service to upload your package to pi pi, and you'd like to test out the API tokens, and I'd really like to speak to you because understanding what your workflow is, and how we can document that in the user interface and give brief but useful instructions would be very valuable.
Tobias Macey
0:25:32
As we've been discussing here, there is a wide variety of people and patterns in terms of how the API infrastructure is interacted with. And I'm curious how that informs and affects your overall workflow and strategy for interior for introducing changes to the platform, and how you validate and I guess, control the rollout of those changes.
Nicole Harris
0:26:00
Yeah, so I can speak on that a little bit. So in terms of releasing new features, well, a lot of this is actually handed base, similar from chain sick consulting, who's our project manager for this contract, and she's worked as a project manager for previous contracts as well. And in what she does is she reaches out to the community and does a lot of communication about what the upcoming features are going to be within, when we release a new feature, it's marked as a beta or beta, depending on your accent, feature. So it sort of comes with the warning of you know, this is something that we've shipped, but you know, it's it's, it's still not kind of certified as as as perfect and, and production ready. So you know, obviously set things up with the expectation that perhaps things might change. And she does communication at that point as well to reach out to the community to say, Hey, we really this new thing, please go and test it. At that stage. I obviously also do some reach out in terms of user testing with people to see if they've got any any problems, working through the interfaces, but we also because of her work in in sort of communicating what's going on to the wider community, we do tend to get a lot of tickets opened up on on GitHub, where people said, Hey, you know, I've tried out this thing, and it's not quite working, you know, there's a bug you or I'm using a browser that you haven't tested it with, or whatever it is, and then we go and address those, those particular issues before we can obviously move out of the beta period. So so it's been quite smooth so far, in terms of, you know, yeah, there's bugs. But we expect that to happen within that period. And we've been quite good at turning around and fixing those. And, and because we're labeling things as beta, people under stand at that that's, you know, part of the process of developing software. Will did you want to comment on that at all, in terms of some of the changes maybe that we've had to make based on on feedback that's come back from the community?
William Woodruff
0:28:11
Yeah. So I think the the big things that come to mind are what you mentioned earlier with, with confusion about token versus key in the context of security token versus what I originally called API tokens, but we quickly realized confuses users because they associate token with with a physical device. We've also on the on more of the development side. I think I mentioned earlier, but warehouse has pretty comprehensive unit tests. So as as we've been developing, we've been somewhat fortunate to catch things that otherwise probably would have, would have blown up in production. As both unit tests and as as sort of smoke tests by either seminar or the reviewers on the PSF. side, that would be earnest, Donald and Dustin.
Tobias Macey
0:28:56
So we've mentioned the API key, and some of the two factor off features that have been introduced. I'm curious, what have been some of the other notable features or improvements that you've been involved with?
Nicole Harris
0:29:09
Well, I suppose I've been involved in since very early. So I'm going to scope my answer to that question to this particular contract, which is the RTF contract. So yeah, as you said, two factor authentication API, API token. And then the audit log, which is basically being able to expose so that with this kind of, from my point of view, there's two sides to that audit log, it's, we have an account audit log. So when you log into your account, you can see, okay, you know, when did I last change my password? When did I set up an API key, when did I enable two factor authentication, etc, etc. So we've got that exposed. And then we've also got project audit logs as well. So things that have happened on an individual project. So for example, a new release has been made or, or an API key has been created that has permissions on this project. So things like that. The other thing to mention is that the ETFs grant doesn't just cover security, when we made the application through the Python packaging working group, we also received funding to improve both the accessibility and the localization of pi pi.org, as well. So some of my work well, already working on this, but it's going to be my work moving forward as well, is to improve the accessibility of IPOs or for people who are using assistive technologies. So for example, people who are using screen readers or people who are limited to just using their keyboard, people who are using high contrast mode, etc. And also we're going to be implementing localization. So making it possible for us to translate at least the interface copy on popi eyes or org into a local languages, so French, Chinese, whatever, whatever community contributions we get for translations, those things are kind of within the scope of the ETFs contract as well. So that's super exciting, because it's not just about thinking about how we can make the site more secure. But also how can we make it more universally accessible for people who have different needs, and who are who are in in different communities, Python communities around the world.
Tobias Macey
0:31:36
And William, in terms of the attack vectors that you have considered for pipe E, I know that you said in general, it was in a fairly good security stance as far as already having some capacity for mitigating type of squatting attacks. But I'm wondering if there are some of the other attack vectors that you have looked at or other things that you're concerned and about for API, recognizing that you're not asking you to do any sort of improper disclosure, but just in general, some of the thoughts that you have as far as security and attack vectors for packet repository, I'm sure.
William Woodruff
0:32:14
So so the really common attack vectors that you see on package indices, and package managers are sort of those type of squatting, package takeover fishing based attacks, where someone will try to take over the account or add themselves as as a contributor to a project and then push up a malicious version of that project that contains, you know, a malware dropper, or whatever, whatever it needs to be. And I said, So fortunately, pipe I already had a few pretty pretty good medications in place, including for type of squatting and and rate limiting to prevent credential, brute forcing, there are some things that are sort of already well known, well known weaknesses in pipe is set up, those include sort of the way that that roles are currently structured. So at the moment, any account can be added to any project as an owner without that other project without that targeted users consent. So and and prior to this audit, login, without a ton of history, or login, to designate that, that change. So there are there are big issues with sort of transparency and package ownership, as well as transparency and changes in package control. So like it's, it's, if you I think, I'm actually not positive about this. But I believe currently, if you delete your project name on, if you delete your project on patreon.org, another user can claim that that name, and if that happens, you can then imagine a sort of package reuse attack where a popular package gets deleted by an attacker, and then they become a like, innocence legitimate owner, because they've actually claimed the project rather than taking it over.
Nicole Harris
0:33:49
Yeah, that's correct as To my knowledge, will, however, they can't release any files that have previously the being released, if that makes sense. So it would only be new versions moving forward. But you're right in in the sense that, yeah, it would be they would own the package. And have the legitimacy of of that that package name. With regards to your first comment, I know that we do have a pull request in progress, I'm hoping that we'll be able to address the issue with giving permission to add collaborators. Soon.
William Woodruff
0:34:29
Yeah. There's also the sort of more general problem of active scanning of projects for rather packages as they get uploaded. And that's I think, as far as I know, an unsolved problem in the world of package maintenance. And I don't think it's something that I could barely be asked to solve.
Nicole Harris
0:34:45
Well, what do you mean by that? He said, active scanning.
William Woodruff
0:34:47
Yeah. So imagine scanning for, like common indicators of compromise, or common indicators that have packages is malicious. For some for some, you know, fuzzy definition of malicious? Because you can imagine, like a recent package that contains malware samples, or what have you.
Tobias Macey
0:35:05
And particularly given the flexibility of Python and the ability to obfuscate the actual intent of the code, it's definitely a non trivial and potentially NP complete problem to be able to actually definitively to determine whether or not a package is malicious or has nefarious intent.
William Woodruff
0:35:24
Yeah, this is the problem with that some of the most lockdown platforms in the world struggle with, you know, Apple with their app store struggle with static analysis immensely. So I think it would be completely unreasonable to expect a dynamic language on a community maintained index to solve this problem.
Tobias Macey
0:35:40
So in terms of your overall experience of working on and with the Pi Pi platform, and the community of users who rely on it, what have been some of the most interesting or challenging or unexpected aspects of that work,
William Woodruff
0:35:55
I can try answering that. So on my at least, I've done community management before. Some of it as in my role, as I'm reminded, and or some of it on my own open source projects, as well as the open source work that trilobites does. But it is it is different every time. And so especially when dealing with feature changes that affect potentially 10s of thousands of people, it can be sort of challenging to get people to see your side of things, especially when it comes to like event logs. So very understandably, users are wary of any sort of feature that records their IP address for records, security, salient events, about their actions. And so it can be difficult to explain to users who don't necessarily see the value of those recordings, from a security perspective, it can be difficult to justify those events to them, and coming up with a compromise where we both get actionable, or were able to record enough information to take action, while also preserving their privacy and mitigating their concerns can be can be a challenge. Especially you know, for for countries where GDPR compliance is is key.
Nicole Harris
0:37:05
I think on my side, one of the issues with doing design in the open on open source community projects is that the work is very, very visible. And it's it is really hard to satisfy everybody, you know, everybody's using different browsers, everybody has different use cases. And, and, you know, we don't have any full time resources on, on looking at the user experience of pi pi, it's just me, and the hours that I have, either in my spare time when I'm working as a volunteer or as on this contract for my contracted hours. So, you know, it's, it has been challenging to try and satisfy everyone and, and make everybody happy. That was probably more challenging when we had the transition from the old api.org. Sorry, the old API code base to popular org, when there were a lot of changes, which was disruptive to people's existing workflows. On the other hand, there are a lot of people who are like, Yay, pipe eyes sort of moved into the modern era, and it works on mobile. And, and you know, so there's kind of two sides to every coin. What I've tried to do in terms of my work with pi pi, is make sure that when decisions are made, that they're really backed by either user research or user feedback, or by user testing. So you know, it not just being a case of me saying, well, it's my opinion that it should be like this, and therefore, my opinion, is most important. But actually being able to show people I looked into this, I looked at prior art, or I looked at, I spoke to people within the community, and this is the reason that this decision has been made. And when you actually articulate the reason and you show people that you've, you've thought about this more than just, you know, this is my opinion, then people are really responsive to that. So I think that that's been quite positive experience for me in in interacting with the Python community, who as a whole, very friendly, friendly bunch of people
Tobias Macey
0:39:15
in terms of the future work that you either have planned for your existing contract, or that you have identified as potential improvements to the platform in general, what do you think are most interesting or most notable? And what are some of the ways that listeners and the broader community can get involved and help out with your efforts and just the overall work needed to keep the pipeline platform healthy and viable for the long run,
Nicole Harris
0:39:45
so I can address that in terms of the current contract, most of the security work is is kind of done. Now. I mean, there's a few things that we need to wrap up. And as I mentioned, I would really like to talk to anybody who's using see is uploaded to it to pi pi, because that would be really helpful for me in terms of making sure that the interface is working for those use cases. In terms of the rest of this contract. As I mentioned earlier, we have accessibility and localization, which is the last two subjects that we need to address. In terms of accessibility, I've also put a call out recently, I'd really like to talk to any members of the Python community who are interacting with websites, and using assistive technologies. So if you're a user, who's online using a screen reader, I would love to speak to you. Same for if you're someone who's limited to using a keyboard, or if you're using high contrast mode. Or if you're using like a very zoomed in version, you know, you're using, you're zooming in your browser a lot because of poor eyesight. The reason that I would really like to speak to people who are using the web in those ways is because we're doing an audit against WCAG. Two point O standards, which is is kind of the accessibility standard. But just being able to tick the box isn't in my view enough, I mean, obviously, we want to check the box and say, yes, we're compliant. But actually being able to test the interface with people who are using a system assistive technology. And and seeing that it's working for them in in real life with real life use cases is super important, as well. So it's not really enough just to check the boxes, we really need to talk to people about how they're using the site as well. And on the localization side, and I think there'll be more communication that will come out about this later, as we sort of get into that milestone, we are going to be looking for people to help us to actually translate the interface copy into different languages. So once we've actually got the technical implementation done, you know, we're going to want to get people to translate it into whatever language that they'd like to translate it into barring Arabic and Hebrew and any right to left languages, because that is outside the scope of the current project.
William Woodruff
0:42:15
Yeah, I'm also on on the security side of things. There are things that are out of scope of the current contract, but that I believe, are planned for a future iteration on on the warehouse code base. And that would be things like, for API keys, the implementation that we went with, is based on the security tokens called macaroons. And one of the interesting things about macaroons is that they have embedded in them something called caveat language, which allows for a sort of rich description of the permissions associated with each token. And currently, we have a version of version field in our cabinet language that allows for those permissions to be iterated on, and modified to allow for sort of really rich interactions with the authentication system. So you can imagine, I think the future in the future, the plan is to add tokens that expire after exactly one use, or are only allowed between certain hours today, or can only be used from a certain domain in terms of or certain authenticated IP, or things like that. So I think we've put out on the warehouse issue tracker, sort of a request for for help with that
Nicole Harris
0:43:16
yet, I should mention here as well that if any of your listeners are interested in contributing to the warehouse project, the issue tracker is in in fairly good school fairly well managed. So we do tag issues with needs discussion or help required. So going on to the issue tracker, and having a look at what discussions are happening is kind of a useful way of being able to find out where you could help make pie pie more sustainable. In terms of the feature development that we're currently working on. The other thing I'd like to mention as well is that and I think what will already said today kind of reinforces this, it's a really nice code base to work on, like, pretty easy to set up with Docker and Docker compose, got really great unit test coverage, it really is a very nice code base to work on. So you know, if you're looking to make an open source contribution, I think it's a it's a good candidate. And we do welcome also, people who are making their first contribution to open source as well. So it's not just your more experienced listeners, you can make contributions to the warehouse code base, you have plenty of tickets tagged with good first issue, which, specifically for people who are looking to make sort of more mine now sort of to ease their way into open source contributions.
William Woodruff
0:44:43
Yeah, I do want to hammer that point, it really is a nice code base. I've worked on a lot of both open source and proprietary code bases written in sort of a combination of Python two and Python three, or you know, now Python three, but we're migrated from Python two, with very bespoke setups and environments that are clearly developed from an engineer's desk somewhere inside of an office. And warehouse, fortunately, is not one of those code bases.
Tobias Macey
0:45:08
And is it worth digging more into the actual funding behind this work, and how that structured and just some of the overall sustainability efforts to be able to maintain and upgrade the Pi Pi and warehouse platform?
Nicole Harris
0:45:22
Yeah, so I can I can talk about that. As I mentioned earlier, I'm a member of the Python packaging Working Group, which raises money for for not just pi pi for any packaging related project. And it was through that, that, that we got this this grant from the open technology fund or TF to actually be able to do that work. It's the second major grant that we've got for pi pi, you might be familiar with the fact that we got a moss were granted a moss grant Missoula open source. Grant last you must last year, and that was to migrate from the old version of API to this to the new warehouse Kobe's into retire that old code base. So so far, through the packaging Working Group, we've had two fairly substantial grants, which have allowed us to really improve the packaging index, that working group continues to, to work to, to make grounds for for different subjects, not just pi pi, but also many of the tools that interact with pi pi, such as Pip. So we're hoping that in the next sort of year, we will have more more money coming in from those from those from those applications that we make them be able to fund more sustainable development for Python, the packaging ecosystem in general. The other thing to mention is that we have very fortunate with pi pi, to have a number of great sponsors who actually give us the infrastructure for free, I don't have the data right now, in terms of how much that's worth, but it's certainly millions per year that it costs to actually run the Python packaging package index. And a lot of that is is borne by how CDN fastly if you donation to us is actually quite enormous. So in terms of sustainability, we we have a mixture of these, the funding coming through from grant applications, and we have you know, the these different companies giving us their their services to enable us to keep the service up. The other thing that we we we appreciate is we have a donation page on polka.org, where members of the community can donate towards the Python packaging Working Group, so that we can then have a budget to be able to pay for maintenance, and improvements to both pi pi and other projects. An ideal scenario in the future is that we would have enough kind of recurring donations from the community that we would be able to set up a more reliable either part time or full time situation where we have people working on packaging as their job. Because at the moment, we really have mostly just contracts that come and go depending on the money that comes in.
Tobias Macey
0:48:18
Are there any other aspects of your current efforts on the Pi Pi infrastructure, or any other aspects of the overall platform that we didn't discuss yet that you'd like to cover before we close out the show?
Nicole Harris
0:48:30
Yeah, I can't think of anything. Can you think of anything real?
William Woodruff
0:48:33
Oh, no, no, not in particular. I mean, there's there's sort of interesting things about webauthn and to TP that they go into, but they'll be a bit in the weeds.
Tobias Macey
0:48:42
Well, for anybody who does want to dig deeper into that, if you have any specific references that you found useful, I can add them to the show notes. And for anyone who wants to follow up with either of you or get in touch and follow along with the work that you're doing, I'll have you each have your preferred contact information to the show notes. And so with that, I'll move into the and this week, I'm going to choose the show the expanse, I started watching that recently, and I've gotten through the first season and into the second. And it's just very interesting and well done sci fi series chronicling some dramatic events that far into the future where humans have gone beyond Earth and started populating other areas of the solar system. So it's a interesting and well put together show with a lot of good sort of environmental aspects such as the Creole language that people speak further out into the asteroid belt. So if you're looking for something new to watch, I recommend that And so with that, I'll pass it to you. Do you have any pics this week?
William Woodruff
0:49:37
Sure. Yeah. I don't know if I have a media pick. I've been reading. I'm not actually normally a big nonfiction person. But I've been reading an autobiography of Abraham Lincoln by Carl Sandburg, who's some a well known American poet. So a little bit out of his like, I think, not, not his expertise, but out of his field of renown. But it's been a pretty interesting, a pretty interesting read so far. It's actually a surprisingly nuanced biography of his life in the sense that it goes through sort of the both political and military failures that he encountered. And it's just been, it's been a sort of interesting to read, because, you know, you learn this stuff in like 10th grade in American high schools, but then you just then it gets dropped.
Nicole Harris
0:50:14
I do have an answer.
0:50:17
So, last week, or the week before, I watched a documentary on Netflix called the Great hack, which was particularly interesting to me, because I live in the UK. And it talked about Brexit and Cambridge Analytica and and what's sort of been happening, I haven't followed that probably as closely as I should have. So yeah, anybody out there who's kind of interested in documentaries, it's certainly very, very interesting and very topical at the moment with regards to the current political climate.
Tobias Macey
0:50:50
Well, thank you both very much for taking the time today to join me and discuss your work on the API platform and infrastructure and some of the ways that that will improve the overall viability of it in the long term and improve the available workflows for people using it. So I appreciate all of your efforts on that front and I hope you enjoy the rest of your day. Thank you.
Nicole Harris
0:51:11
Thank you.

Learning To Program In Python With CodeGrades - Episode 224

Summary

With the increasing role of software in our world there has been an accompanying focus on teaching people to program. There are numerous approaches that have been attempted to achieve this goal with varying levels of success. Nicholas Tollervey has begun a new effort that blends the approach adopted by musicians and martial artists that uses a series of grades to provide recognition for the achievements of students. In this episode he explains how he has structured the study groups, syllabus, and evaluations to help learners build projects based on their interests and guide their own education while incorporating useful skills that are necessary for a career in software. If you are interested in learning to program, teach others, or act as a mentor then give this a listen and then get in touch with Nicholas to help make this endeavor a success.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today Nicholas Tollervey is back to talk about his work on CodeGrades, a new effort that he is building to blend his backgrounds in music, education, and software to help teach kids of all ages how to program.

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what CodeGrades is and what motivated you to start this project?
    • How does it differ from other approaches to teaching software development that you have encountered?
    • Is there a particular age or level of background knowledge that you are targeting with the curriculum that you are developing?
  • What are the criteria that you are measuring against and how does that criteria change as you progress in grade levels?
  • For someone who completes the full set of levels, what level of capability would you expect them to have as a developer?
  • Given your affiliation with the Python community it is understandable that you would target that language initially. What would be involved in adapting the curriculum, mentorship, and assessments to other languages?
    • In what other ways can this idea and platform be adapted to accomodate other engineering skills? (e.g. system administration, statistics, graphic design, etc.)
  • What interesting/exciting/unexpected outcomes and lessons have you found while iterating on this idea?
  • For engineers who would like to be involved in the CodeGrades platform, how can they contribute?
  • What challenges do you anticipate as you continue to develop the curriculum and mentor networks?
  • How do you envision the future of CodeGrades taking ship in the medium to long term?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Protecting The Future Of Python By Hunting Black Swans - Episode 221

Summary

The Python language has seen exponential growth in popularity and usage over the past decade. This has been driven by industry trends such as the rise of data science and the continued growth of complex web applications. It is easy to think that there is no threat to the continued health of Python, its ecosystem, and its community, but there are always outside factors that may pose a threat in the long term. In this episode Russell Keith-Magee reprises his keynote from PyCon US in 2019 and shares his thoughts on potential black swan events and what we can do as engineers and as a community to guard against them.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • And to grow your professional network and find opportunities with the startups that are changing the world then Angel List is the place to go. Go to pythonpodcast.com/angel to sign up today.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Upcoming events include the O’Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit and Graphorum. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Russell Keith-Magee about potential black swans for the Python language, ecosystem, and community and what we can do about them

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what a Black Swan is in the context of our conversation?
  • You were the opening keynote for PyCon this year, where you talked about some of the potential challenges facing Python. What motivated you to choose this topic for your presentation?
  • What effect did your talk have on the overall tone and focus of the conversations that you experienced during the rest of the conference?
    • What were some of the most notable or memorable reactions or pieces of feedback that you heard?
  • What are the biggest potential risks for the Python ecosystem that you have identified or discussed with others?
  • What is your overall sentiment about the potential for the future of Python?
  • As developers and technologists, does it really matter if Python continues to be a viable language?
  • What is your personal wish list of new capabilities or new directions for the future of the Python language and ecosystem?
  • For listeners to this podcast and members of the Python community, what are some of the ways that we can contribute to the long-term success of the language?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Behind The Scenes At The Python Software Foundation - Episode 217

Summary

One of the secrets of the success of Python the language is the tireless efforts of the people who work with and for the Python Software Foundation. They have made it their mission to ensure the continued growth and success of the language and its community. In this episode Ewa Jodlowska, the executive director of the PSF, discusses the history of the foundation, the services and support that they provide to the community and language, and how you can help them succeed in their mission.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Podcast.init listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
  • Bots and automation are taking over whole categories of online interaction. Discover.bot is an online community designed to serve as a platform-agnostic digital space for bot developers and enthusiasts of all skill levels to learn from one another, share their stories, and move the conversation forward together. They regularly publish guides and resources to help you learn about topics such as bot development, using them for business, and the latest in chatbot news. For newcomers to the space they have the Beginners Guide To Bots that will teach you the basics of how bots work, what they can do, and where they are developed and published. To help you choose the right framework and avoid the confusion about which NLU features and platform APIs you will need they have compiled a list of the major options and how they compare. Go to pythonpodcast.com/discoverbot today to get started and thank them for their support of the show.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Ewa Jodlowska about the Python Software Foundation and the role that it serves in the language and community

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what the PSF is for anyone who isn’t familiar with it?
    • How did you get involved with the PSF and what is your current role?
  • What was the motivation for creating the PSF?
  • What are the primary responsibilities of the PSF?
    • How has the scope and scale of the responsibilities for the PSF shifted in the years since its foundation?
  • What is the relationship between the PSF and the language core developers?
  • What are some reasons that someone would want to become a member of the PSF and what is involved in gaining membership?
  • What are the challenges confronted by you and the PSF, currently and in the recent past?
  • What are you most worried about and most proud of in the PSF, the core language, or the community?
  • What challenges or changes do you foresee for the PSF in the near to medium future?
  • What are some of the most interesting/unexpected/challenging lessons that you have learned while working with the PSF?
  • How are the PSF and the PSU (Python Secret Underground) related?
  • Outside of the PSF, how can the community contribute to the health and longevity of the language, its ecosystem, and its community?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Algorithmic Trading In Python Using Open Tools And Open Data - Episode 216

Summary

Algorithmic trading is a field that has grown in recent years due to the availability of cheap computing and platforms that grant access to historical financial data. QuantConnect is a business that has focused on community engagement and open data access to grant opportunities for learning and growth to their users. In this episode CEO Jared Broad and senior engineer Alex Catarino explain how they have built an open source engine for testing and running algorithmic trading strategies in multiple languages, the challenges of collecting and serving currrent and historical financial data, and how they provide training and opportunity to their community members. If you are curious about the financial industry and want to try it out for yourself then be sure to listen to this episode and experiment with the QuantConnect platform for free.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Podcast.init listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • The Python Software Foundation is the lifeblood of the community, supporting all of us who want to run workshops and conferences, run development sprints or meetups, and ensuring that PyCon is a success every year. They have extended the deadline for their 2019 fundraiser until June 30th and they need help to make sure they reach their goal. Go to pythonpodcast.com/psf today to make a donation. If you’re listening to this after June 30th of 2019 then consider making a donation anyway!
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Jared Broad and Alex Catarino about QuantConnect, a platform for building and testing algorithmic trading strategies on open data and cloud resources

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what QuantConnect is and how the business got started?
  • What is your mission for the company?
  • I know that there are a few other entrants in this market. Can you briefly outline how you compare to the other platforms and maybe characterize the state of the industry?
  • What are the main ways that you and your customers use Python?
  • For someone who is new to the space can you talk through what is involved in writing and testing a trading algorithm?
  • Can you talk through how QuantConnect itself is architected and some of the products and components that comprise your overall platform?
  • I noticed that your trading engine is open source. What was your motivation for making that freely available and how has it influenced your design and development of the project?
  • I know that the core product is built in C# and offers a bridge to Python. Can you talk through how that is implemented?
    • How do you address latency and performance when bridging those two runtimes given the time sensitivity of the problem domain?
  • What are the benefits of using Python for algorithmic trading and what are its shortcomings?
    • How useful and practical are machine learning techniques in this domain?
  • Can you also talk through what Alpha Streams is, including what makes it unique and how it benefits the users of your platform?
  • I appreciate the work that you are doing to foster a community around your platform. What are your strategies for building and supporting that interaction and how does it play into your product design?
  • What are the categories of users who tend to join and engage with your community?
  • What are some of the most interesting, innovative, or unexpected tactics that you have seen your users employ?
  • For someone who is interested in getting started on QuantConnect what is the onboarding process like?
    • What are some resources that you would recommend for someone who is interested in digging deeper into this domain?
  • What are the trends in quantitative finance and algorithmic trading that you find most exciting and most concerning?
  • What do you have planned for the future of QuantConnect?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Hardware Hacking Made Easy With CircuitPython - Episode 212

Summary

Learning to program can be a frustrating process, because even the simplest code relies on a complex stack of other moving pieces to function. When working with a microcontroller you are in full control of everything so there are fewer concepts that need to be understood in order to build a functioning project. CircuitPython is a platform for beginner developers that provides easy to use abstractions for working with hardware devices. In this episode Scott Shawcroft explains how the project got started, how it relates to MicroPython, some of the cool ways that it is being used, and how you can get started with it today. If you are interested in playing with low cost devices without having to learn and use C then give this a listen and start tinkering!

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Scott Shawcroft about CircuitPython, the easiest way to program microcontrollers

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what CircuitPython is and how the project got started?
    • I understand that you work at Adafruit and I know that a number of their products support CircuitPython. What other runtimes do you support?
  • Microcontrollers have typically been the domain of C because of the resource and performance constraints. What are the benefits of using Python to program hardware devices?
  • With the wide availability of powerful computing platforms, what are the benefits of experimenting with microcontrollers and their peripherals?
  • I understand that CircuitPython is a friendly fork of MicroPython. What have you changed in your version?
    • How do you structure your development to avoid conflicts with the upstream project?
    • What are some changes that you have contributed back to MicroPython?
  • What are some of the features of CircuitPython that make it easier for users to interact with sensors, motors, etc.?
  • CircuitPython provides an easy on-ramp for experimenting with hardware projects. Is there a point where a user will outgrow it and need to move to a different language or framework?
  • What are some of the most interesting/innovative/unexpected projects that you have seen people build using CircuitPython?
    • Are there any cases of someone building and shipping a production grade project in CircuitPython?
  • What have been some of the most interesting/challenging/unexpected aspects of building and maintaining CircuitPython?
  • What is in store for the future of the project?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Hacking The Government With The USDS - Episode 210

Summary

The U.S. government has a vast quantity of software projects across the various agencies, and many of them would benefit from a modern approach to development and deployment. The U.S. Digital Services Agency has been tasked with making that happen. In this episode the current director of engineering for the USDS, David Holmes, explains how the agency operates, how they are using Python in their efforts to provide the greatest good to the largest number of people, and why you might want to get involved. Even if you don’t live in the U.S.A. this conversation is worth listening to so you can see an interesting model of how to improve government services for everyone.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • Bots and automation are taking over whole categories of online interaction. Discover.bot is an online community designed to serve as a platform-agnostic digital space for bot developers and enthusiasts of all skill levels to learn from one another, share their stories, and move the conversation forward together. They regularly publish guides and resources to help you learn about topics such as bot development, using them for business, and the latest in chatbot news. For newcomers to the space they have the Beginners Guide To Bots that will teach you the basics of how bots work, what they can do, and where they are developed and published. To help you choose the right framework and avoid the confusion about which NLU features and platform APIs you will need they have compiled a list of the major options and how they compare. Go to pythonpodcast.com/discoverbot today to get started and thank them for their support of the show.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing David Holmes about his work at the US Digital Services organization

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what the USDS is and how you got involved with it?
  • The terminology that is used around "Tours of Service" is interesting. Can you explain what that entails?
    • relocation
    • what if you have a house and career?
  • Can you explain the model of how the USDS works?
    • What is involved in staffing a new project?
    • What is your typical toolkit, and how does that vary with the specific departments that you are working with?
  • What are some of the most interesting projects that you and the team at USDS have worked on?
  • What are some of the most challenging projects that you have been involved with?
  • What are some projects that you hope to be asked to work on?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Exploring Indico: A Full Featured Event Management Platform - Episode 208

Summary

Managing an event is rife with inherent complexity that scales as you move from scheduling a meeting to organizing a conference. Indico is a platform built at CERN to handle their efforts to organize events such as the Computing in High Energy Physics (CHEP) conference, and now it has grown to manage booking of meeting rooms. In this episode Adrian Mönnich, core developer on the Indico project, explains how it is architected to facilitate this use case, how it has evolved since its first incarnation two decades ago, and what he has learned while working on it. The Indico platform is definitely a feature rich and mature platform that is worth considering if you are responsible for organizing a conference or need a room booking system for your office.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • Bots and automation are taking over whole categories of online interaction. Discover.bot is an online community designed to serve as a platform-agnostic digital space for bot developers and enthusiasts of all skill levels to learn from one another, share their stories, and move the conversation forward together. They regularly publish guides and resources to help you learn about topics such as bot development, using them for business, and the latest in chatbot news. For newcomers to the space they have the Beginners Guide To Bots that will teach you the basics of how bots work, what they can do, and where they are developed and published. To help you choose the right framework and avoid the confusion about which NLU features and platform APIs you will need they have compiled a list of the major options and how they compare. Go to pythonpodcast.com/discoverbot today to get started and thank them for their support of the show.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Adrian Mönnich about Indico, the effortless open-source tool for event organisation, archival and collaboration

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Indico is and how the project got started?
    • What are some other projects which target a similar use case and what were they lacking that led to Indico being necessary?
  • Can you talk through an example workflow for setting up and managing an event in Indico?
    • How does the lifecycle change when working with larger events, such as PyCon?
  • Can you describe how Indico is architected and how its design has evolved since it was first built?
    • What are some of the most complex or challenging portions of Indico to implement and maintain?
  • There are a lot of areas for exercising constraint resolution algorithms. Can you talk through some of the business logic of how that operates?
  • Most of Indico is highly configurable and flexible. How do you approach managing sane defaults to prevent users getting overwhelmed when onboarding?
    • What is your approach to testing given how complex the project is?
  • What are some of the most interesting or unexpected ways that you have seen Indico used?
  • What are some of the most interesting/unexpected lessons that you have learned in the process of building Indico?
  • What do you have planned for the future of the project?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

A Quick Python Check-in With Naomi Ceder - Episode 204

Summary

Naomi Ceder was fortunate enough to learn Python from Guido himself. Since then she has contributed books, code, and mentorship to the community. Currently she serves as the chair of the board to the Python Software Foundation, leads an engineering team, and has recently completed a new draft of the Quick Python Book. In this episode she shares her story, including a discussion of her experience as a technical author and a detailed account of the role that the PSF plays in supporting and growing the community.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Check out the Practical AI podcast from our friends at Changelog Media to learn and stay up to date with what’s happening in AI
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Your host as usual is Tobias Macey and today I’m interviewing Naomi Ceder about her career and contributions in the Python community

Interview

  • Introductions
  • How did you get introduced to Python?
  • How are you using Python in your current day-to-day?
  • You have been working with Python for a long time at this point, and you have become very involved in supporting and growing the community. What is your motivation for dedicating so much of your time and energy into work that isn’t directly related to paying the bills?
  • You have been the chair of the PSF for a few years now. What are your responsibilities in that position?
  • What do you find to be the most under-rated, misunderstood, or overlooked activities of the PSF?
    • How much of the success of the Python language and its community can be attributed to the presence and support of the PSF?
  • In addition to the work you do with the PSF, other community activities, and your day job, you have also written the 2nd and 3rd editions of the Quick Python Book. Can you give a synopsis of what the book covers and the audience that it is intended for?
  • In the process of writing the book and updating it between revisions, what are some of the features of the language or standard library that you discovered or learned more about which you have been able to use in your work?
  • What are some of the other language communities that you have been involved with and what lessons have you learned from them that you would like to see reflected in Python?
  • What are some of the other projects that you have been involved with that you are most proud of, whether technical or otherwise?
  • What are you most excited about in the near to medium future?

Keep In Touch

Quick Python Book

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA