Automating Application Lifecycles For Developer Happiness At Wayfair

00:00:00
/
00:46:11

March 20th, 2022

46 mins 11 secs

Your Hosts

About this Episode

Summary

A common piece of advice when starting anything new is to "begin with the end in mind". In order to help the engineers at Wayfair manage the complete lifecycle of their applications Joshua Woodward runs a team that provides tooling and assistance along every step of the journey. In this episode he shares some of the lessons and tactics that they have developed while assisting other engineering teams with starting, deploying, and sunsetting projects. This is an interesting look at the inner workings of large organizations and how they invest in the scaffolding that supports their myriad efforts.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan.
  • Your host as usual is Tobias Macey and today I’m interviewing Joshua Woodward about how the application lifecycle team at Wayfair uses Python to

Interview

  • Introductions
    • Josh Woodward, for the past year have been managing the application lifecycle team at Wayfair. Prior to that, IC on python platforms team. Embed with teams looking to decouple from monolith. See pain points first hand.
  • How did you get introduced to Python?
    • High school physics class, TI84 Calculator, friend wrote a program to solve vector problems, I thought it was amazing.
    • Used TI-Basic to solve specific physics problems for me. (Give fixed inputs, run through equation, get outputs)
    • Approaching college, thinking about student loans.
    • Heard about python and decided to give it a shot.
    • Wrote program to simulate various payback / interest scenarios.
    • Went to college for ME, switched to SE when I found out my dorm neighbors were using python to draw cool images with python + turtle
  • Can you describe what the role of the application lifecycle team is and the story behind it?
    • Story behind it:
      • Around 2018, in a state where we had deploy congestion, challenging to iterate and ship changes. tech org invested in containerization and decoupling to directly combat this problem. Teams incentiviced to decouple.
      • While on python platforms, the team had already been experimenting with code templating.
      • Standard cookiecutter template for flask apps.
      • Wayfair experimenting with Kubernetes late 2017.
      • Spent 1 year embedding with 4 different teams to help knowledge transfer re: k8s, containers, application setup, python best practices, testing, linting, etc – through that we got a lot of great feedback on our tooling.
      • Took senior engineers weeks to get something setup.
        • Know who to contact, click the right buttons, file the right ticket
      • Approach: Counted manual steps. Something like 60 distinct / atomic activities that had to be performed to get a "hello world" response from a basic flask app in production.
      • Focus on reduce manual steps
      • Released product (Mamba, on theme of snakes)
      • Initially, supporting one main user story.
      • User story: "As an engineer, I would like to create a production ready application in 10 minutes so that I can have a reliable and standardized application setup that follows best practices."
      • grew out of python platforms, created own team with own scope, that was about 1.5 years ago.
  • What is your team’s scope now?
    • Team Scope is to facilitate the creation, maintenance, and decommissioning of decoupled applications at Wayfair.
  • What are the interfaces that your team has to the rest of the organization?
    • People Interfaces:
    • We value getting feedback on our work to build strong products.
    • Make assumptions, Willing to be wrong. Validate assumptions with customers.
    • Software Interfaces:
    • for mamba, CLI at first
    • Backstage (open sourced from spotify)
    • Lots of Github
  • What is your method of determining what projects to work on?
    • (See above). Known pain points. Intuition, Free day fridays. Being comfortable taking risk (using friday time). Vet solution with customers.
    • How do you measure the impact of your work on the rest of the organization?
      • We don’t force use of our products. Adoption of tooling.
        • Number of microservices being spun up.
        • Number of automated pull requests being created, merged.
      • DORA metrics throughput (deployment frequency, lead time for changes) and stability (change failure rate, mean time to recovery)
  • What is the role of Python in your work?
    • we use it and love it!
      • existing skillset from incubation phase within python platforms
    • right tool for the job
      • lightweight automation
      • hitting lots of APIs
      • define lots of user facing specifications (json, yaml)
      • pydantic has been great for creating descriptive, human and machine specifications.
    • open source (we rely on it, we also have some presence)
      • cookiecutter -> columbo
      • gitpython -> pygitops
  • Can you tell me more about your application creation solution. Who can use it, and what does it actually do?
    • Written in python, though it templates out code for any language.
    • Runs automation to onboard an application to production
      • git repo, build pipeline, calling out to various APIs to signal a new app is present
    • Wayfair has a variety of applications (python, java, .net, php, javascript, some go)
    • Team interested in integrating with our solution will create a github repository containing 1..* cookiecutter template(s)
    • Provide a specification for what questions to ask users.
      • Limitation with cookiecutter where the approach to ask questions isn’t dynamic. lack of validation.
      • Pat Lannigan -> Columbo (open sourced). Python DSL to describe the set of questions to ask users.
      • python fastapi application will have a completely different set of questions than a java library for example.
  • You had mentioned that another part of your team scope is to facilitate the maintenance of applications. Can you tell me more about that?
    • Reduce engineering toil around keeping applications up to date.
    • Average engineer owns several, dozens of repos
    • Create automated pull requests:
      • Versioned dependencies (Renovate)
      • Propagating platform changes (Gator)
      • Ex1: python apps use "black" to format code and our python platform team would like to prescribe a line length. Our tooling can be used to declare desired changes. yaml specification -> pr automation at scale.
      • Ex2: shared library, new version released, breaking interface change. Code instructions for performing AST manipulation and resolving breaking change for people.
      • Shift from: "We need you to do this", "I am proactively letting you know that something needs to change, and I also made the change for you!"
  • How do you actually go about creating automated pull requests?
    • manual steps would involve cloning, checking out feature branch, applying code changes, staging / committing, pushing up branch, creating the PR
    • gitpython is an existing and extremely powerful tool, but its api is fairly involved and (by design) doesn’t provide the type of high level abstractions that we need.
    • created pygitops (open sourced), built completely on top of gitpython
    • high level abstractions for the workflow I described.
    • coolest / most pythonic part about it is the "feature branch" context manager.
    • code changes are made in the context of a feature branch
    • when you intentionally or accidentally leave the context of a feature branch, we want certain things to be true (default / main branch, clean workdir, no unstaged changes)
    • when writing PR automation, don’t have to worry about this!
  • Can you describe some of the more technical details about how your change propagation system (Gator) works?
    • heavily inspired by kubernetes resource model (resources are defined via a declarative specification)
    • Kubernetes itself ships with resources that implement behaviors of common resources (pods, services, etc)
    • Gator’s execution model is broken up into two parts:
      • what repos to act on (Source)
      • what are the changes that need to be applied. (Output)
    • Ex: Source to proxy github search. write github search query to get back list of repos
    • Output to scan a repo for regex pattern at specified paths and replace with some fixed term. Very popular, engineers love find and replace.
  • What are the most interesting, innovative, or unexpected ways that you have seen mamba / gator used?
    • resource model of gator supports the idea of we don’t know, what we don’t know
    • reference k8s, CRDs, resource model.
    • container execution
    • log4j identification and remidiation
      • automate some of the work for identifying vulnerabilities
      • java platform team was able to use java native tooling in the environment of their choosing to identify vulnerable apps.
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on application lifecycle concerns?
  • What do you have planned for the future of application lifecycle management/developer experience improvements at Wayfair?
    • Hope to start open sourcing interesting aspects of our change propagation tool (Gator)
    • As someone who maintains many open source projects, or even at the enterprise level, we think that some of our patterns and approaches can be shared! yaml -> code changes

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA