Libraries

Pandas Extension Arrays with Tom Augspurger - Episode 164

Summary

Pandas is a swiss army knife for data processing in Python but it has long been difficult to customize. In the latest release there is now an extension interface for adding custom data types with namespaced APIs. This allows for building and combining domain specific use cases and alternative storage mechanisms. In this episode Tom Augspurger describes how the new ExtensionArray works, how it came to be, and how you can start building your own extensions today.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. And with their new Kubernetes integration it’s even easier to deploy and scale your build agents. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Tom Augspurger about the extension interface for Pandas data frames and the use cases that it enables

Interview

  • Introductions
  • How did you get introduced to Python?
  • Most people are familiar with Pandas, but can you describe at a high level the new extension interface?
    • What is the story behind the implementation of this functionality?
    • Prior to this interface what was the option for anyone who wanted to extend Pandas?
  • What are some of the new data types that are available as external packages?
    • What are some of the unique use cases that they enable?
  • How is the new interface implemented within Pandas?
  • What were the most challenging or difficult aspects of building this new functionality?
  • What are some of the more interesting possibilities that you are aware of for new extension types?
  • What are the limitations of the interface for libraries that add new array functionality?
  • What is the next major change or improvement that you would like to add in Pandas?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Asking Questions From Data Using Active Learning with Tivadar Danka - Episode 162

Summary

One of the challenges of machine learning is obtaining large enough volumes of well labelled data. An approach to mitigate the effort required for labelling data sets is active learning, in which outliers are identified and labelled by domain experts. In this episode Tivadar Danka describes how he built modAL to bring active learning to bioinformatics. He is using it for doing human in the loop training of models to detect cell phenotypes with massive unlabelled datasets. He explains how the library works, how he designed it to be modular for a broad set of use cases, and how you can use it for training models of your own.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. And with their new Kubernetes integration it’s even easier to deploy and scale your build agents. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Tivadar Danka about modAL, a modular active learning framework for Python3

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is active learning?
    • How does it differ from other approaches to machine learning?
  • What is modAL and what was your motivation for starting the project?
  • For someone who is using modAL, what does a typical workflow look like to train their models?
  • How do you avoid oversampling and causing the human in the loop to become overwhelmed with labeling requirements?
  • What are the most challenging aspects of building and using modAL?
  • What do you have planned for the future of modAL?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Synthetic Data Generation Using Mimesis with Nikita Sobolev - Episode 155

Summary

Most applications require data to operate on in order to function, but sometimes that data is hard to come by, so why not just make it up? Mimesis is a library for randomly generating data of different types, such as names, addresses, and credit card numbers, so that you can use it for testing, anonymizing real data, or for placeholders. This week Nikita Sobolev discusses how the project got started, the challenges that it has posed, and how you can use it in your applications.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • Your host as usual is Tobias Macey and today I’m interviewing Nikita Sobolev about Mimesis, a library for quickly generating synthetic data

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is mimesis and how does it compare to other projects such as faker and factory_boy?
    • What was the motivation for creating it?
  • One of the features that is advertised is the speed of Mimesis. What techniques are used to ensure that the data is generated quickly?
  • What are the built in mechanisms for generating data?
    • What options do users have for customizing the types of data that can get generated?
  • What are some of the most complicated providers to write and maintain?
  • What are some of the use cases outside of unit or integration tests where Mimesis could be beneficial?
    • How would you use Mimesis to anonymize data from a production environment to be used for testing?
  • What are the most challenging aspects of maintaining the Mimesis project?
  • What are some of the plans that you have for the future of Mimesis?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Salabim: Logistics Simulation with Ruud van der Ham - Episode 151

Summary

Determining the best way to manage the capacity and flow of goods through a system is a complicated issue and can be exceedingly expensive to get wrong. Rather than experimenting with the physical objects to determine the optimal algorithm for managing the logistics of everything from global shipping lanes to your local bank, it is better to do that analysis in a simulation. Ruud van der Ham has been working in this area for the majority of his professional life at the Dutch port of Rotterdam. Using his acquired domain knowledge he wrote Salabim as a library to assist others in writing detailed simulations of their own and make logistical analysis of real world systems accessible to anyone with a Python interpreter.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Ruud van der Ham about Salabim, a Python library for conducting discrete event simulations

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what Discrete Event Simulation is and how Salabim helps with that?
    • Can you explain how you chose the name?
  • What was your motivation for creating Salabim and how does it compare to other tools for discrete event simulation?
  • How does discrete event simulation compare with state machines?
  • How is Salabim implemented and how has the design evolved over the time that you have been working on it?
  • I understand that you have done a majority of Salabim was written on an iPad. Can you speak about why you have chosen that as your development environment and your experience working in that manner?
  • What are some examples of the types of models that you can model with Salabim?
    • What would an implementation of one of these models look like for someone using Salabim?
  • What options does a user have to verify the accuracy of a simulation created with Salabim?
  • One of the nice aspects of Salabim is the fact that it provides a visual output as a simulation runs. Can you describe the workflow for someone who wants to use Salabim for modeling and visualizing a system?
  • At what point does a system become too complex to encapsulate in a simulation and what techniques can you use to modularize it to make a simulation useful?
  • When is Salabim not the right tool to use and what would you suggest for people who find themselves in that situation?
  • What have been some of the most complicated or difficult aspects of building and maintaining Salabim?
  • What are some of the new features or improvements that you have planned for the future of Salabim?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Laboratory: Safer Refactoring with Joe Alcorn - Episode 150

Summary

Every piece of software that has been around long enough ends up with some piece of it that needs to be redesigned and refactored. Often the code that needs to be updated is part of the critical path through the system, increasing the risks associated with any change. One way around this problem is to compare the results of the new code against the existing logic to ensure that you aren’t introducing regressions. This week Joe Alcorn shares his work on Laboratory, how the engineers at GitHub inspired him to create it as an analog to the Scientist gem, and how he is using it for his day job.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • A brief announcement before we start the show:
    • If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to podcastinit.com/odsc-east-2018 and register.
  • Your host as usual is Tobias Macey and today I’m interviewing Joe Alcorn about using Laboratory as a safety net for your refactoring.

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start be explaining what Laboratory is and what motivated you to start the project?
  • How much of the design and implementation were directly inspired by the Scientist project from GitHub and how much of it did you have to figure out from scratch due to differences in the target languages?
  • What have been some of the most challenging aspects of building and maintaining Laboratory, and have you had any opportunities to use it on itself?
  • For someone who would like to use Laboratory in their project, what does the workflow look like and what potential pitfalls should they watch out for?
  • In the documentation you mention that portions of code that perform I/O and create side effects should be avoided. Have you found any strategies to allow for stubbing out the external interactions while still executing the rest of the logic?
  • How do you keep track of the results for active experiments and what sort of reporting is available?
  • What are some examples of the types of routines that would be good candidates for conducting an experiment?
  • What are some of the most complicated or difficult pieces of code that you have refactored with the help of Laboratory?
  • Given the fact that Laboratory is intended to be run in production and provide a certain measure of safety, what methods do you use to ensure that users of the library will not suffer from a drastic increase in overhead or unintended aberrations in the functionality of their software?
  • Are there any new features or improvements that you have planned for future releases of Laboratory?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

PyRay: Pure Python 3D Rendering with Rohit Pandey - Episode 147

Summary

Using a rendering library can be a difficult task due to dependency issues and complicated APIs. Rohit Pandey wrote PyRay to address these issues in a pure Python library. In this episode he explains how he uses it to gain a more thorough understanding of mathematical models, how it compares to other options, and how you can use it for creating your own videos and GIFs.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • A few announcements before we start the show:
    • There’s still time to get your tickets for PyCon Colombia, happening February 9th and 10th. Go to pycon.co to learn more and register.
    • There is also still time to register for the O’Reilly Software Architecture Conference in New York. Use the link podcastinit.com/sacon-new-york to register and save 20%
    • If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to podcastinit.com/odsc-east-2018 and register.
  • Your host as usual is Tobias Macey and today I’m interviewing Rohit Pandey about PyRay, a 3d rendering library written completely in python

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what PyRay is and what motivated you to create it?
    [rohit] PyRay is an open source library written completely in Python that let’s you render three and higher dimensional objects and scenes. Development on it has been ongoing and new features have so far come about from videos for my Youtube channel.
  • What does the internal architecture of PyRay look like and how has that design evolved since you first started working on it?
  • What capabilities are unlocked by having a pure Python rendering library which would otherwise be impractical or impossible for Python developers to do with existing options?
    [rohit] Having a pure Python library makes it accessible with minimal fixed cost to Python users. The tradeoff is you lose on speed, but for many applications that isn’t an issue. I haven’t seen a library coded completely in Python that let’s you manipulate 3d and higher dimensional objects. The core usecase right now is Mathematical artwork. Google geometric gifs and you’ll see some fascinating, mesmerizing results. But those are created for the most part using tools that are not Python. Which is a pity since Python has a very extensive library of Mathematical functions.
  • What have been some of the most challenging aspects of building and maintaining PyRay?
    [rohit] 3d objects – getting mesh plots. I have to develop routines from scratch for almost everything – shading objects, etc. Animated routines for characters.

  • What are some of the most interesting or unexpected uses of PyRay that you are aware of?
    [rohit] Physical simulations. Ex: Testing if a solid is a fair die, getting lower bounds for space packing efficiencies of solids. Creating interactive demos where a user can draw to provide input.

  • For someone who wanted to contribute to PyRay are there any particular skills or experience that would be most helpful?
    Basic linear algebra and python
  • What are some of the features or improvements that you have planned for the future of PyRay?

Keep In Touch

pyray repo – https://github.com/ryu577/pyray
Email
GitHub
LinkedIn

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

MonkeyType with Carl Meyer and Matt Page - Episode 146

Summary

One of the draws of Python is how dynamic and flexible the language can be. Sometimes, that flexibility can be problematic if the format of variables at various parts of your program is unclear or the descriptions are inaccurate. The growing middle ground is to use type annotations as a way of providing some verification of the format of data as it flows through your application and enforcing gradual typing. To make it simpler to get started with type hinting, Carl Meyer and Matt Page, along with other engineers at Instagram, created MonkeyType to analyze your code as it runs and generate the type annotations. In this episode they explain how that process works, how it has helped them reduce bugs in their code, and how you can start using it today.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • A few announcements before we start the show:
    • There’s still time to get your tickets for PyCon Colombia, happening February 9th and 10th. Go to pycon.co to learn more and register.
    • There is also still time to register for the O’Reilly Software Architecture Conference in New York Feb 25-28. Use the link podcastinit.com/sacon-new-york to register and save 20%
    • If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to podcastinit.com/odsc-east-2018 and register.
  • Your host as usual is Tobias Macey and today I’m interviewing Carl Meyer and Matt Page about MonkeyType, a system to collect type information at runtime for your Python 3 code

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is MonkeyType and how did the project get started?
  • How much overhead does the MonkeyType tracing add to the running system, and what techniques have you used to minimize the impact on production systems?
  • Given that the type information is collected from call traces at runtime, and some functions may accept multiple different types for the same arguments (e.g. add), do you have any logic that will allow for combining that information into a higher-order type that gets set as the annotation?
  • How does MonkeyType function internally and how has the implementation evolved over the time that you have been working on it?
  • Once the type annotations are present in your code base, what other tooling are you using to take advantage of that information?
  • It seems as though using MonkeyType to trace your running production systems could be a way to inadvertantly identify dead sections of code that aren’t being executed. Have you investigated ways to use the collected type information perform that analysis?
  • What have been some of the most challenging aspects of building, using, and maintaining MonkeyType?
  • What have been some of the most interesting or noteworthy things that you have learned in the process of working on and with MonkeyType?
  • What have you found to be the most useful and most problematic aspects of the typing capabilities provided in recent versions of Python?
  • For someone who wants to start using MonkeyType today, what is involved in getting it set up and using it in a new or existing codebase?
  • What features or improvements do you have planned for future releases of MonkeyType?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Surprise! Recommendation Algorithms with Nicolas Hug - Episode 135

Summary

A relevant and timely recommendation can be a pleasant surprise that will delight your users. Unfortunately it can be difficult to build a system that will produce useful suggestions, which is why this week’s guest, Nicolas Hug, built a library to help with developing and testing collaborative recommendation algorithms. He explains how he took the code he wrote for his PhD thesis and cleaned it up to release as an open source library and his plans for future development on it.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
  • If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing Nicolas Hug about Surprise, a scikit library for building recommender systems

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is Surprise and what was your motivation for creating it?
  • What are the most challenging aspects of building a recommender system and how does Surprise help simplify that process?
  • What are some of the ways that a user or company can bootstrap a recommender system while they accrue data to use a collaborative algorithm?
  • What are some of the ways that a recommender system can be used, outside of the typical ecommerce example?
  • Once an algorithm has been deployed how can a user test the accuracy of the suggestions?
  • How is Surprise implemented and how has it evolved since you first started working on it?
  • What have been the most difficult aspects of building and maintaining Surprise?
  • competitors?
  • What are the attributes of the system that can be modified to improve the relevance of the recommendations that are provided?
  • For someone who wants to use Surprise in their application, what are the steps involved?
  • What are some of the new features or improvements that you have planned for the future of Surprise?

Keep In Touch

Picks

  • Tobias
    • Silk profiler for Django

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Yosai with Darin Gordon - Episode 120

Summary

For any program that is used by more than one person you need a way to control identity and permissions. There are myriad solutions to that problem, but most of them are tied to a specific framework. Yosai is a flexible, general purpose framework for managing role-based access to your applications that has been decoupled from the underlying platform. This week the author of Yosai, Darin Gordon, joins us to talk about why he started it, his experience porting it from Java, and where he hopes to take it in the future.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
  • Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Your host as usual is Tobias Macey and today I’m interviewing
    Darin Gordon about Yosai, a security framework for Python applications

Interview

  • Introductions
  • How did you get introduced to Python?
  • What is Yosai and what is the problem that you were trying to solve when you started it?
  • How does Yosai compare to existing libraries for web frameworks such as Flask-Security or Django Guardian and why might someone choose Yosai instead?
  • In the documentation it mentions that Yosai is a port of the Apache Shiro framework from Java to Python. What was most difficult about exposing a Pythonic interface while maintaining the core principles of the original?
  • Authentication and authorization are difficult problem domains and can cause significant issues if they are not implemented in a secure fashion. How do you ensure an appropriate level of quality in Yosai to be confident having people use it?
  • To start can you describe how the framework is architected and what is involved in integrating it with a project?
  • Outside of the context of web applications, what are some situations where someone should consider integrating authentication and authorization into their project?
  • What have been some of the most challenging aspects of building the Yosai project?
  • Tell us about the Rust extension you wrote earlier this year
  • What do you have planned for the future of Yosai?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

SpaCy with Matthew Honnibal - Episode 87

Summary

As the amount of text available on the internet and in businesses continues to increase, the need for fast and accurate language analysis becomes more prominent. This week Matthew Honnibal, the creator of SpaCy, talks about his experiences researching natural language processing and creating a library to make his findings accessible to industry.

Brief Introduction

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
  • When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
  • You’ll want to make sure that your users don’t have to put up with bugs, so you should use Rollbar for tracking and aggregating your application errors to find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.
  • Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
  • To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
  • Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
  • Your host as usual is Tobias Macey and today I’m interviewing Matthew Honnibal about SpaCy and Explosion.AI

Interview with Matthew Honnibal

  • Introductions
  • How did you get introduced to Python?
  • Can you start by sharing what SpaCy is and what problem you were trying to solve when you created it?
  • Another project for natural language processing that has been part of the Python ecosystem for a number of years is the Natural Language Tool Kit (NLTK). How does SpaCy differ from the NLTK and are there any cases where that would be the better choice?
  • How much knowledge of NLP and computational linguistics is necessary to be able to use SpaCy?
  • What does the internal design and architecture of SpaCy look like and what are the biggest challenges associated with its development to date and into the future?
  • One of the projects that you have built around SpaCy which I think is really cool and caught my attention when I first found your project is the displaCy visualization tool. Can you explain what that is and why you think it is important?
  • What are some kinds of applications where SpaCy would be useful which might not be obvious candidates for it?
  • Why is speed such an important focus for an NLP library?
  • One of the ways that you have been able to gain a speed boost is through releasing the GIL and allowing for true parallelism via Cython. How have you managed to ensure that this doesn’t lead to data races and program failures?
  • Building on the success of SpaCy you founded a company called Explosion AI. Can you explain what your goals are for this endeavor and the kinds of services that you are offering?
  • What are some of the most interesting uses of SpaCy that you have seen?
  • What do you have planned for the future of SpaCy?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA