Easy Data Validation For Your Python Projects With Pydantic

00:00:00
/
00:47:14

May 18th, 2020

47 mins 14 secs

Your Hosts

About this Episode

Summary

One of the most common causes of bugs is incorrect data being passed throughout your program. Pydantic is a library that provides runtime checking and validation of the information that you rely on in your code. In this episode Samuel Colvin explains why he created it, the interesting and useful ways that it can be used, and how to integrate it into your own projects. If you are tired of unhelpful errors due to bad data then listen now and try it out today.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, node balancers, a 40 Gbit/s public network, fast object storage, and a brand new managed Kubernetes platform, all controlled by a convenient API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they’ve got dedicated CPU and GPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show because you love Python and want to keep your skills up to date. Machine learning is finding its way into every aspect of software engineering. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype. Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. Podcast.__init__ is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to pythonpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.
  • Your host as usual is Tobias Macey and today I’m interviewing Samuel Colvin about Pydantic, a library for enforcing type hints at runtime

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Pydantic is and what motivated you to create it?
  • What are the main use cases that benefit from Pydantic?
  • There are a number of libraries in the Python ecosystem to handle various conventions or "best practices" for settings management. How does pydantic fit in that category and why might someone choose to use it over the other options?
  • There are also a number of libraries for defining data schemas or validation such as Marshmallow and Cerberus. How does Pydantic compare to the available options for those cases?
    • What are some of the challenges, whether technical or conceptual, that you face in building a library to address both of these areas?
  • The 3.7 release of Python added built in support for dataclasses as a means of building containers for data with type validation. What are the tradeoffs of pydantic vs the built in dataclass functionality?
  • How much overhead does pydantic add for doing runtime validation of the modelled data?
  • In the documentation there is a nuanced point that you make about parsing vs validation and your choices as to what to support in pydantic. Why is that a necessary distinction to make?
    • What are the limitations in terms of usage that you are accepting by choosing to allow for implicit conversion or potentially silent loss of precision in the parsed data?
    • What are the benefits of punting on the strict validation of data out of the box?
  • What has been your design philosophy for constructing the user facing API?
  • How is Pydantic implemented and how has the overall architecture evolved since you first began working on it?
    • What have you found to be the most challenging aspects of building a library for managing the consistency of data structures in a dynamic language?
      • What are some of the strengths and weaknesses of Python’s type system?
  • What is the workflow for a developer who is using Pydantic in their code?
    • What are some of the pitfalls or edge cases that they might run into?
  • What is involved in integrating with other libraries/frameworks such as Django for web development or Dagster for building data pipelines?
  • What are some of the more advanced capabilities or use cases of Pydantic that are less obvious?
  • What are some of the features or capabilities of Pydantic that are often overlooked which you think should be used more frequently?
  • What are some of the most interesting, innovative, or unexpected ways that you have seen Pydantic used?
  • What are some of the most interesting, challenging, or unexpected lessons that you have learned through your work on or with Pydantic?
  • When is Pydantic the wrong choice?
  • What do you have planned for the future of the project?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA