Faster And Safer Software Development With Feature Flags

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.

When When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up.

And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances.

And they also have a new object storage service to make storing data for your apps even easier.

Go to python podcast.com/linode,

that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show.

And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.

For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season.

We have partnered with organizations such as O'Reilly Media, Corinium Global Intelligence, Alexio, and Data Council. Go to python podcast.com/conferences

to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.

Your host as usual is Tobias Macy. And today, I'm interviewing Pete Hodgson about the concept of feature flags and how they can benefit your development workflow. So, Pete, can you start by introducing yourself?

Sure. My name is Pete Hodgson, and, I'm an independent

software consultant.

I spent quite a lot of time helping

engineering teams figure out kind of how to build software,

in a more effective way and, get that software into production in a more effective way. And I know that you don't use Python for your primary language and that you don't generally do too much work in it, but I'm wondering if you can just share a bit about sort of maybe how it first came onto your radar and any experience that you do have with it.

Sure. 1st came onto my radar well, I guess I've been doing software development for 20 years and I think probably around the same time that I was early in my career I was trying to figure out how to kind of,

script things and automate things and I think Python and and Ruby were the 2 that I started playing with. Not actually, perl. Perl first and then rapidly realized that I'd rather use something other than perl.

And,

so I played with python and ruby and then I actually eventually got into ruby,

because I happened to be working somewhere that was doing rails development and then didn't use python that much.

I've used it for a fair amount of kind of system automation stuff on and off through the years.

And I've had a couple of clients that were building apps in in django.

And so I've I've kind of dabbled dabbled with Python Python there, but I certainly wouldn't consider myself an expert. And before we get too far into the discussion about how to use feature flags and some of the advanced concepts, can you just start by describing a bit about what the idea of a feature flag is and maybe some of your first experience of experiencing them and how it affected your overall approach to software development? Sure. So I think

the way I think about it kind of fundamentally,

a feature flag is a is a way of choosing

between 2 code paths,

normally at run time. So

you can think of it,

as a way to kind of dynamically

choose or adjust the the business logic in in a system in in in your kind of app without recompiling and and and redeploying

that app.

We can probably talk a little bit about, later on about kind of how dynamic that decision making process is. But I think that's the kind of the fundamentals of of what a feature toggle isn't or a feature flag is.

They they tend tend to have a few different names.

Feature toggle

is the 1 I used to use. Feature flag is, I think, the top the the the term that's got more acceptance or more commonly used nowadays. I've also seen it called feature bit, feature flipper.

Sometimes people kind of conflate a little bit AB testing and feature flags. So, yeah, the the namings naming is kind of interesting. And I think my first experience with feature flags was

years years ago

working at a startup

where we implemented them. I think they were called feature bits at that company

and didn't use them for a few years after that. We didn't we used them a little bit, but I didn't didn't really use them that much. And then I after I worked at that company, I worked at a consulting company called Thoughtworks, and, we were really big into practices like continuous delivery,

and trunk based development.

And

feature flags kind of play quite heavily into that. And

when we use them in that way, they they really

actually pretty radically kinda changed

the way that we were

able to build software

mainly because they allowed us to to get rid of feature branches or get rid of branching

and kind of have just people working directly,

in a shared

in a shared branch in in our code repo and still be, be able to kind of deploy software

into production very frequently. So a lot of teams I was I was out at Fortworks, we'd be deploying to production on a very regular basis, you know, like once a day maybe,

which was quite at the time was was kind of quite

felt definitely felt

like a pretty rapid tempo.

And we were able to do that without using branches

and without kind of having to slow ourselves down with with that kind of stuff. And I think feature feature flags were a big part of how we were able to achieve that. And you mentioned

being able to do trunk based development because of feature gating at in the code level rather than at the branch level. Yeah. And also

mentioned some,

association with AB testing. I wonder if you can just discuss some of the broader scope of the ways that feature flags are used

and just some of the ways that they'll manifest in somebody's code base. Yeah. I think feature flagging is really interesting in my kind of experience over time. What's really interesting to me is, like, the the number of different things that people use feature flags for or kind of the number of different

words that

concepts that kind of get mixed around feature flagging. And and I used to I used to get quite annoyed when people

refer to something as a feature flag in a in something that I personally kinda didn't consider to be a feature flag. So an AB testing, for example, I people would sometimes say, oh, yeah. We're doing feature flagging. We use this AB testing framework or, you know, we'll add an we'll add a feature flag to test out that whoever this the color of this button looks good or not. And I I would get kinda grumpy and I would correct people. Well, technically, that's not a feature flag. That's something different.

And what I've kind of come to embrace over the time is these are all essentially, they come down to that same fundamental capability of choosing code pass at runtime. And so if you if you think about, like, what are the different things you can do with that capability? The way I think about it, there's kind of these 4 main

categories

of feature flags. So the 1 that I the the the main category that I kind of was was my, I guess, my gateway drug for feature flagging was was what I would call a release flag or a release toggle. And that's a way for

an engineer to hide half finished code from from users. So the ability to essentially deploy a half finished feature into production without releasing

that feature to your users because it's protected behind a flag. So I could I've tried and think of an example. So if I'm implementing

a

implementing a new login feature where you can log in via via kind of a a social login,

to my application.

I can kind of protect the, you know, the button that says, you know, log in via Twitter or Myspace or whatever. I could kind of hide that button behind a feature flag, and that means I can I can be working on that feature and deploying it into into my production,

environment without expose, the the codes there that the kind of that powers that feature, but I haven't turned it on yet by kind of flipping that that flag on, flipping that that feature flag on? And so I can,

kind of work in a shared branch work and and and even deploy that stuff out to production without worrying that it's that half finished work, that latent code is gonna be shown to a user. So that's, like, the kind of the first use case,

the release toggle and then there's another kind of use case around kind of experimentation

where

I might want to to try

try out a an AB test for example. So I'm trying to think of an example where I'd let's say I wanna see whether a new recommendations engine

is more likely to to get my users to to click on a recommendation. So maybe I'll have the old recommendation system and the new recommendation system kind of both deployed

into

into my, my production system and then I can choose whether to use the new recommendation system or not. And maybe I send 50% of my users

to that new recommendation

system and 50%

to my old recommendation system. So that's that's a kinda like on the 1 hand, it feels like a really different use case. At the end of the day, if you think about, like, what's happening inside of your software, it's the exact same thing. You've got essentially like an if else statement or something similar to that and your at runtime your code is deciding should I send should I go down this code path or this other code path so that's that's the second use cases kind of experimentation

third use case I've seen with people using feature flags for

a kind of operational flags. So that's a way to kind of disable

parts of your system or change kind of dynamically

change how your system is operating in production without having to without having to redeploy your software. So

if you have let's say you have 2 different ways of 2 different third parties or you have a third party

that calculates,

what the shipping costs are gonna be for your, ecommerce package or your ecommerce software. And,

this vendor is kind of flaky and sometimes they, their system goes down and you want a way to kind of just dynamically just turn off that feature in production. If if they're having a bad day, you basically wanna just turn it off rather than having it be broken on your site. You could, have a feature flag behind, that's kind of protecting that feature. And if you see that that third party

system is is having a tough time, you could just turn it off without kind of having to, like, redeploy your software or make some kind of urgent change to your system. That's, that's the the third way. Third thing I see people using operations toggles and then there's a 4th thing

around kind of permissioning so basically depending on what user you are I'm gonna let you do something different. So if you're an admin user, I'm gonna show the edit button. But if you're not the admin user, then I'm not gonna show the edit button. Or if you're a premium user, then I'm gonna show you the free shipping

option.

And if you're not a premium user, then I'm not gonna show you the free shipping option. That's kind of an a simplification of probably what you'd wanna do. But there's fundamentally, there's this kind of idea of based on who you are, I'm gonna either turn off or on

certain capabilities.

And that's that's 1 where I think a lot of people would say, well, that's not a feature flag. That's something different. And I kind of used to be 1 of those people. But I've found over time in talking to different organizations about how they use feature flags is there is kind of a benefit in kind of embracing the fact that even though these the kind of scenarios are quite different and the use cases are quite different, the fundamental capability is the same, and there's kind of a benefit in kind of thinking about these as different different types of the same thing. And in my experience,

a lot

of developers will start to

grow into using feature flags for a particular

capability that they're building either because it's going to take too long to actually get the whole thing out into production in 1 fell swoop where they don't wanna have a long lived branch that's gonna be a pain to merge Yeah. 3 months down the road, or because they have something that they wanna be able to launch into a production or preproduction environment so that they can test out how it actually functions

outside of their local development context, and they wanna be able to turn it on and off until they're sure that it's working properly.

And then for some of these other scenarios, particularly for AB testing or operations toggles where you want this to be more long lived, the way that you would actually approach implementing

the toggling infrastructure is a lot different, where a lot of times you might just have an environment variable or something that lives close to the logic branch that you're dealing with. And then maybe down the road, you have another PR that just strips it back out. For these other use cases, you wanna have something a bit more sophisticated.

And I'm curious,

what are some of the

ways that you've seen people approach the overall introduction of feature toggles into their code base, and maybe some of the anti patterns as to how people have used them in your experience? Yeah. I mean, I think so that that thing about, like, how people introduce it, is an interesting 1. And I think my my experience has been similar to yours where usually these things initially get introduced into a code bay or the the concept of feature flags a lot of times gets added by an engineer that's that wants to like you're saying, they want to do some kind of they want to work on this thing without having to create a big long the feature branch that's gonna be a pain to merge. That's 1 kind of, like,

gateway. The other thing,

and it's quite different. The other entry point I see is usually from, like, a product manager who wants to be able to do some kind of AB testing. And those are quite different entries into kind of the concept of feature flagging and so that that that's kind of interesting because it can it can change quite a lot like how how you get how you get started.

And now I'm forgetting what the the second part of your question was apart from the the kind of the entry point part. Just curious about any sort of anti patterns that you've seen people use around how they either introduce feature flagging and implement it Gotcha. Or how they're actually employing it in their software. So I think the the biggest there's a few things I see people struggling with. The most common problem I see with people who are using feature flags a lot is managing those feature flags and retiring, keeping the number of active or the number of flags kind of in check. So as

as you're kind of as time goes on, you are motivated to add flags because you want to you wanna avoid creating a feature branch or you want to try out a new feature experiment against certain number of users and that kind of thing. So you're motivated to kind of add them, and then they there's tends to be kind of this kind of tech debt type thing where it's tough to to clean them up over time. That's 1 very kind of very common challenge that I see people having is is figuring out how to

track the flags in their system and how to actually kind of manage them and and and remove them over time

And kind of related to that

is when people

use

the wrong technique for the wrong type of flag in terms of how they implement that flag. So I'll give you I'll give you 2 examples. So if I was doing a release toggle,

release flag where it's like I'm kinda just temporarily

want to hide this thing.

You know, so my social login, I wanna kinda temporarily hide this thing

for,

for a week or so while I'm while it's under development. And then once it's done, I'm gonna turn it on, and I'm gonna remove that,

that that kind of the if l statement from my code or whatever. If I know that this is a a kind of a short lived flag, then it's actually probably not terrible if I literally

implement that with an if else in in my code base.

But if it's a if it's a flag that's gonna be used for a long period of time, so like 1 of those operational flags, it's gonna be there for a long time or a, you know, an admin is admin type thing that's gonna be probably in your code base kind of forever,

then implementing that with an if else is

is is gonna be really painful. If else that's gonna get really painful over time, particularly if you've got a lot of flags over time. So

I think people

who when I talk to people when I talk about future flags and I find people who are skeptical about it,

A lot of times they've been burned by

a code base where people have just sprinkled if else statements everywhere. So I think that's like a a big anti pattern. And I think part of it is

I think fundamentally part of it is

is people not thinking about what like, why they're creating this flag and how long

that that flag is gonna live in your code base. So that's part of it is is is not kind of thinking about how long this thing is gonna be there. And then I think the other part is just like teams not having,

the right kind of processes in place to clean those flags up over time, either to figure out which flags I would just like the management and kind of tracking, like, which flags are in use, which flags aren't, or more of a process thing of kinda getting the time

to, to to kinda keep the code base cleaned up and kinda keeping the the campground clean. Another thing too is that because of the fact that a lot of teams will just introduce flagging ad hoc to begin with, that it then

starts to cut on as a good idea of, oh, I can do this, and I don't have to maintain this long, you know, long running branch. Yep. So another person will implement their own flag, but there isn't really any consensus as to what the common entry point is or what the common design pattern is as to how we maintain these flags. And so somebody will put their flag in for their block of code that they care about. The next person will put it in for the block of code that they care about, and then it can quickly devolve into just everybody putting the flags wherever they feel like rather than taking a step back to think about, okay. Well, now that we're starting to use this, how do we actually design it into a system that is easy to maintain where we have a common entry point, and we can clearly see what flags exist in the code base. Because otherwise, you might start to see basically the same type of flag or a very similar flag in 2 different places, and you don't really know which 1 does what or for why. Right. Right. Yeah. I mean, I think I described feature flagging as as it's kind of like a little bit of an iceberg where, like, when you get started with it, it just it's very simple. Right? You're just like particularly because normally you get started and it's just literally just 1 thing you wanna you wanna toggle and you're just like, oh, well,

sometimes people don't even really realize that what they're creating is a feature flag. They'll say, like, oh, well, we've already got this configuration system, so we'll add an extra an extra field, like use new

whatever and, and then we'll just we'll just check the configuration

here in this part of the code base. And if it's on then we'll use it

If it's not, we're not. And then and kind of like there's this creeping thing over time. Like, exactly like you're saying like, someone else is like, oh, yeah. We could do we could do that config thing that we did but for this other thing and then, eventually, there's like 3 or 4 places where you're where you're using that that you're checking the configuration and making a decision on your code. And then someone's like, oh, well, we need to do this per user.

So, you know, depending on who the user is. So I guess that we should,

add this extra thing. And and, like, kind of, like, over time, it kind of, this kind of boiling a frog. You get to this situation 6 months time where it's a mess and no 1 ever decided that they wanted it to be a mess and no 1 consciously said let's let's just make this messy. It's just use it slowly. The scope of the thing kind of changed, grew over time. The the type of things that people wanted to do with it grows over time. And then if you're not careful, you you can end up

careful, you you can end up just kind of taking lots of little steps, none of which seem bad. But when you look back where you are in 6 months, you're like, wow. Why you know, what the heck are we doing here? And so for teams who are starting to think about feature toggling or already have something in place, what are some of the approaches to implementation that you've seen and that you recommend?

And particularly

for these different categories that you outlined, do you think it makes sense to have everything route through a single common implementation,

or do the different categories of feature toggles require their own implementation logic? I think that's that that last part is a really good question. What I've what I have found is there's a lot of value in is in kind of decoupling

the

the reason the the place where you're making the decision from the reason that you're making the decision. So kind of decoupling the decision point from the decision logic. So

what that means is is in the middle of my code or in the area of the code where I need to make that decision, like, should I show should I show recommendation

engine a or b? That's really the question I should be asking. Right? That code should say, hey. Should I show this recommendation system or this other recommendation system or, you know, where should I what what which which class should I use to to implement this algorithm or or kind of get the result for this algorithm.

That part of your code really shouldn't actually care about feature flags at all. All it knows is like, I know there's 2 ways I should do this. Tell me which way to do it. And you kinda wanna abstract

over

the reason for the decision and then kind of the other side of that is somewhere in a centralized place. I think there's a lot of value in centralizing

like

that the the why for why you're making a decision and that why could be, you know, we're we're using this recommendation system because

in this context this is an ab test and this person this the user for the current context like the current request in a web framework for example falls into the cohort. So we're gonna use recommendation

engine a. This other user, the next request comes in, and this user is in a different cohort, so we're gonna use the different requests,

recommendation algorithm. So I think that's that's 1 thing that I think is is really beneficial is to centralize and abstract

the reasons behind

the decision being made and and kind of not let that leak into your code. I think that's that's really valuable. And that's that's, again, that's something that if you kind of start with just a simple checker configuration value and and and call call function a or function b, it it it feels like overkill to do that. But at some point, you've got to realize that there's a lot of benefit in in kind of abstracting over

the reason for the decision versus the the place where you're making a decision.

And another thing that complicates the decision as to how specifically to implement the feature flagging is the idea of the overall life cycle of the

logic branch where for something that you're only caring about being able to enable trunk based development where everybody can push to 1 common code branch without having to branch

using their source control and just branch by abstraction instead,

there's a high probability that once they finish the feature, they're going to wanna take out that code branch. And so

1 suggestion that I've heard before is essentially as soon as you introduce the feature branch,

you introduce another poll request that removes it, but then you don't merge it until you're done with the code that requires it.

So just so that you don't forget about the fact that it's there and that you don't want it to stay there forever.

But for some of these operations toggles or AB testing where it needs to run for a longer period of time, there's a high probability that you're going to maybe wanna keep it in the code base possibly forever. And then

the other thing is if the code that you're writing isn't actually going to be deployed and managed by the same team writing it and is instead intended as

a piece of software that gets deployed into a customer's environment or if it's an open source project that somebody else is going to use,

how you manage

documenting the existence of these different feature toggles and what they're used for.

And so I'm curious

what you have seen as far

as beneficial

actual implementation

strategies and,

triggering strategies for some of these different types of life cycle requirements.

The again, like, the thing that I've seen that that burns people on on feature flags is

if else statements,

sprinkled through your code and you're and kind of just it makes it really, really hard to read the code. It makes it really hard to even understand. Like, is this part of this code base ever even called?

And so my

my advice in general is unless unless something unless you're really confident that this toggle is only gonna be in your in your code base for a very small period of time

and, like, you're only basically making this decision at 1 point in the code,

then, you know, in that situation, maybe just a good old fashioned if else is

fine.

If I should show the social login, then show it. Otherwise, don't show it.

For anything more complicated, like, than that or for anything that's gonna be in your code base for any period of time, then

treat this like production

code.

So

use the same

good design patterns and

kind of software

design

approaches as you would use for any other kind of,

thing you're doing in your production code base. So

patterns like, like the strategy pattern, for example, is a kind of a classic way of doing this where you rather than saying,

you know, I'm gonna have 4 if else statements in different in in different areas of this of this class,

you just have 2 kind of implementations of of this of a of a common interface. So 2 classes.

1 that that implements recommendation engine a, and 1 that implements recommendation b. And then the consumer of that kind of functionality is just given 1 of those things and it doesn't know that it's feature flagged. It doesn't know

which way it's going. It's just kind of you're basically using polymorphism

to to kind of implement that decision or that dynamism

between those 2

code paths and representing it using polymorphism rather than using,

sprinkling eval statements everywhere. So I think that's

that's, like, a a a really

a really important strategy, and and I think it kinda comes down to just

embracing

that this is just because you're writing a it's kind of like test. It's like it's kind of like unit testing. Just because you're writing a unit test Just because you're writing a unit test doesn't mean you're allowed to write kind of crappy code. Like, you you should still write good code. I mean, the same thing with feature flag just because it's a feature flag doesn't mean it's not production code that someone's gotta look after and maintain. It doesn't impact your system, so you gotta kinda treat it with respect.

Another decision to make, particularly as you get further down the road of using feature flags and they become more of your common development practice and part of your standard operations environment

is the decision as to whether to continue to support a homegrown solution or start looking at 3rd party libraries

or,

service providers

that might have more advanced functionality than what you want to build and maintain on your own. And I'm curious what you have found to be useful considerations when making that build versus buy decision for whether to continue using a homegrown service versus a paid service or a third party library. So, yeah, I think

this kinda goes back to that iceberg

thing that I said earlier where

I see a lot of engineering organizations

that are using some kind of home roles

feature flagging system. In fact, normally, if you if you if there are more than a

a a small size,

normally, they have

3 or 4 home roles feature flagging systems.

I I was actually just talking to a company the other day, and in the I when we started the conversation I said how many of these do you have and they said 1

and then I asked a question and they said well yeah I guess maybe I guess there's 2 and then I I said well do you ever do this thing? And they're like, oh, yeah. I guess we have 3. And I was like, I bet there's someone who's doing this other thing. And, like, yeah. I guess we probably have 4.

So, yeah, they they they kind of

it seems it's simple to get started and then the complexity of these things grow over time.

And

my advice for the kind of the build versus buy thing, I I would I would kinda say build versus buy versus

versus kind of borrow or rent. Right? Because you you can you can use a SaaS product. You can use an open source implementation,

or you can build your own.

Unless it's the core

competence, unless it kind of touches on, like, a core competency of your company,

why the hell are you spending time building this thing? Right? Like, I I mean, this is just general advice I give to clients on

build versus buy is

if it's just a commodity kind of table stakes thing that you need in order to achieve your business goals,

then

look to buy it, rent it, borrow it. Only if it's gonna be a differentiating

kind of feature of your product should you be building this stuff yourself? Like, you don't have time to mess around implementing your own database. You don't have time to mess around implementing your own UI framework. You actually don't have time to mess around building your own feature flagging system. Like, why? Why are you doing that? I'm pretty sure that it'll be cheaper and definitely cheaper in terms of opportunity cost to

to kind of buy 1 off the shelf or rent it or whatever. So I think

well, I I don't

I I don't think there's that many cases

where it makes sense to build your to build your own unless you are kind of a really large engineering org that has very kind of complex requirements. You need to integrate with

some custom internal

and a metric system

or your entire business is built around

showing unique

things to each user or something like that. In most of the time, most companies I speak to, they should just be using

a a SaaS product or they should be using an open source product.

And 1 thing that's always a consideration,

particularly for production software,

is when you are relying on a third party service, at some point, that third party service ends up impacting your availability

based on what their availability patterns are. And so I'm curious if you have seen anything similar happening with some of these SaaS providers for feature toggles where some sort of system outage on their end might have an unintended consequence in your running application.

Or do you find that they're generally fairly good

at maintaining current state when they're

going through an outage, and the only thing that is impeded is your ability to kind

of

kind of thing an engineer thinks of quite a lot when they're trying to decide whether they should use this thing. It's like, well, what will happen if I can't talk to you? And it is obviously, it's a pretty

important question.

It's kind of definitely an important thing to think about it because it's

essentially if if their SLA

was directly tied to

your entire system not being able to operate, then then that's that's pretty bad.

All of the SAS products I'm aware of have pretty good

methodologies

to to solve this so they kind of store the store the state of the flags locally

and kind of like this the the systems that they use the the kind of the agents or the library that you're using locally has some kind of a built. It basically fails gracefully

when you

if it can't can't phone home. And I think this actually is a really good example of why you should be buying versus building because I'm pretty sure that the homegrown thing that people build

actually sucks more in terms of handling outages.

And it turns out that

when you're running your own software you also sometimes have outages

and it turns out that companies who are dedicated to running a feature flagging service

are probably gonna do a better job

of running that feature flagging service than your company that's dedicated to selling pants online.

So

I I hear this kind of I think it's a really valid thing to ask.

It's a really valid thing to dig into

and understand.

But, normally, I think the out the it's kind of the answers to the flip side to what you would think. It normally is a good another reason to be using a hosted product is because

you can kind of essentially pay someone else to do the uptime

and the monitoring for their all of their user base rather than

you having to kind of make sure that your systems are,

your internal whatever internal system you're using is gonna handle all of all of these kind of outages and edge cases that that

that could happen. Another thing to consider when dealing with some sort of

dynamic system that can toggle your feature

flags or feature branches

is

the question of auditability,

where if if it all lives alongside your code base and all you're doing is maybe changing a value in a settings dot pie file or in a YAML configuration,

then you can go back in time and see, okay, this is what the value was at this time. This is what it is now.

Whereas if you're just toggling something in a web UI or sending an API request,

I'm wondering what you have seen as far as some of the auditability

or some of the strategies for auditing those changes over time in that type of context. Yeah. I think it's it's a great it's a really really good question. I I think, like,

in general, if you can if you can make the the flagging decision static in in terms of

for any request for this version of the code, it's gonna go this way through the system or the other way through the system. That's the ideal because because then you get that audit trail via source control. Right? So if if I if my decision is to whether to show that social login button

is powered entirely by

configuration that's kind of maybe hopefully checked into the same

repository as as my code itself or maybe this in this kinda sister repository,

then you get this awesome audit trail and available

audit trail. And you also get, like, nice things around availability because you don't have some external system you need to talk to, etcetera, etcetera.

That's great. And I and I would I would kind of there's there's kind of an argument for

for doing that in the for the first the toggles or the flags that that work that way.

The problem is

almost always there's some need for that

for those

that flagging

configuration to be more dynamic. So either so if, like, an operations toggle for example,

the ideal if you've got a really good continuous delivery practice, and I just talked to a company the other day that that that does this.

If they need if their their hair's on fire and they need to turn off, you know, turn off the the external tax calculation

vendor or maybe, you know, switch from recommendation system a to recommendation system b because recommendation system a is is eating all of the CPU in the system.

If you've got really healthy CD practices, continuous delivery practices, you just update the configuration in in code and you run it through your delivery pipeline,

and that's how you make that change in production. And if you can do that, then

good for you. That's that's amazing. That's awesome.

For most

real life organizations,

they need, like, an ocean capability

to to do it at runtime

without having to to make that configuration change. So

in that case, you need it to be more dynamic.

And if you think about it in terms of things like AB testing,

and if you think about it in terms

of toggles that are used to kind of incrementally roll out,

you know, let's roll this. Let's roll out the social button to 10% of our users and make sure that we don't get any 500, and then let's roll it out to 50% of our users.

They're using using feature toggles for feature flags for

controlled rollout.

You generally need it to be more dynamic than a code change.

And so in that case,

you you basically need to get that auditability

from

the,

there's 2 ways to get it. 1 way is

to have an audit trail in whatever feature flag kind of management

system you're using. So whenever someone

updates, like, what percentage of users should be getting this feature

or whenever someone kind of changes the,

flips the flag,

dynamically from off to on or vice versa, you you record that in some kind of audit log. So that's that's 1 thing, and that's great, and could could well be useful,

for for compliance reasons or whatever.

What's probably more useful is

observability

around

your feature flag decisions. So

at the point in which

in the context that you're making a flagging decision. So

most commonly, that would be

while servicing serving a web request,

you are deciding to do x y z.

If you can include in your logging, in your metrics,

in your observability

systems,

the the state of those flags, then you get really, really rich insight,

Not just an audit trail as to what was happening.

You know, this request had a 500, what was happening?

But you you get, like, you get the ability to slice and dice and say,

we're seeing latencies going up for a certain percentage of requests.

What's

the is that related? Is there any correlation between this increase in latency

and this feature flag that we flipped on 5 hours ago?

That's like that's like a real superpower

particularly if you're using if you're using feature flags heavily the ability to slice and dice

your

your production system metrics

and ideally your business metrics too right like be able to to be able to look at a graph that's that says

our

conversion rate

or the click through rate on our recommendation

system

dropped,

like, noticeably

dropped in the last week.

What feature flags did we change around that time? Or even better,

is there kind of like a correlation where

the people with the feature flag on were behaving differently from the people with the feature flagging off? That's,

super useful as a general capability.

It's something that you need if you're using these for AB testing because that's kind of the point

is to to say what's the difference in behavior

depending on what the state of this this flag is.

But if you generalize that and, again, I think this is a really good example of why thinking about feature flags broadly,

you know, think about AB tests,

as being in the same conceptual

bucket,

in the same context contextual kind of area

as a release toggle. If We think about all of those the same way, then you start saying, like, well, why don't we be like, why can't we do AB testing for a

operational change, or why can't we do AB testing for for every feature?

I think, like, Uber I think it's Uber have this kind of phrase saying that, like, every every feature should be an experiment or something like that. So,

I think kind of what that gets to is at the end of the day you should you should be able to slice and dice

any change to the system and say you know the people that had this changed how did they behave differently? Whether that was,

more errors,

or increased latency

or lower conversion in terms of

people, opting to put something into their cart. Like, they're all fundamentally the same the same kind of question that you're asking. Yeah. And that definitely gets into some of the more advanced use cases, like you're saying, AB testing and being able to dynamically route traffic through a certain code path based on whether it's a cookie or a header or a user ID.

And I'm wondering

what you have found to be some of the

challenges that organizations or teams face as far as how to implement some of those types of dynamic feature toggles and be able to track the appropriate metrics and

getting

a useful and effective feedback loop for

when those feature toggles are causing problems or or how to measure some of the user facing

metrics or user interactions

based on the feature path that they're going down and how that factors back into their overall development workflow?

I mean, I I think, like, the the biggest challenge

that almost everyone

I I haven't really seen people solving this. The biggest challenge is closing that feedback

loop between

the state of a flag

and the,

kind of the observed behavior.

Like, what

actually

happened

when what what was the impact

of this flag? And I think that

what I see a lot is

I mean, like, loads of places that I talk to,

the only way they can correlate the, you know, the state of the flag versus the

versus the the observed behavior is they they have some kind of there's some kind of proxy way of measuring it. So I was just talking to a company the other day that,

roll out new features.

They had this habit of rolling out new features to to a single market first. So their kinda controlled rollout was,

let's turn this feature on for Denver,

or let's turn this feature on

for for this specific,

cohort of users.

And then we'll kind of use that as a proxy

for

what's the impact of that flag. So rather than saying directly, you know, how has

latency

how is this flag reflected latency?

You look at the things and say, how's the latency for users who are in Denver versus users elsewhere? Or or do we have an observably kind of statistically significant change in conversion rate for people in Denver since we rolled out this change? So people do correlation kind of through proxies,

or, you know, 12:55

PM.

What was the behavior before 12:55 and after 12:55? So, that that kind of works ish. And and you get to do the same thing as, let you know, by market and say, you know, we turned it on in Denver. Did we see anything happen in Denver? That's nice if you don't wanna have it just entirely on or entirely off. You know, you wanna do a controlled rollout or something where there's a bit of risk. But actually being able to, like, have a a direct correlation

between your

metrics and and the hardest 1 is the your business metrics

and the state of of that feature flag is is something that I think almost no

companies I've talked to have fully solved then. They've started to solve it, but they it's really, really hard. And I think part of what makes it hard is

the organizational

challenges of the the people who are collecting the metrics are not the same people as the people who are kind of building these these feature flagging systems, particularly for those kind of marketing and business and product metrics. That's very different from the operations folks who care about latencies, for example. And then another

challenge that we briefly touched on earlier is how do you handle

documenting

the complete list of feature toggles that are present in a system and identifying what they're for and

run time, but more generally useful is what are all the feature toggles that exist, and how do you avoid

overloading

their intended purpose by just saying, oh, that's close to the code path that I want. I'll just, you know, piggyback on that feature flag rather than adding a new 1, and just some of the overall strategies of making sure that everybody's aware of what the toggles are, what they're for, and, sort of what are the criteria for deciding when to add a new 1. Yeah. I mean and that's kind of an example of that iceberg thing of at first, it seems like this is a pretty small set straightforward product that a that an engineer can bang out in,

on a Friday afternoon, and it will be a kind of a fun little project. It turns out that there's all these capabilities you need. So the the real answer for a lot of companies is they have, like, a a spreadsheet, and they track it in a spreadsheet, or they have a a a Wiki page or something like that, which is a really crappy answer. A more mature

feature flagging system has a way of of adding metadata

for each flag. So so things like which team owns this flag, super useful

because then you can actually go and ask that person or the the, you know, the tech lead for that team or the product manager for that team. Like, hey. Are you guys still using this? Are you guys and gals still using this? So being able to attach metadata,

I think, is

is is really useful there. Other metadata that's useful is a description. Like, what does this flag actually do? It's amazing to me that there are some systems

that don't let you actually add, like, a textual description. It's just like the name.

So, hopefully, you know, like, hopefully, you're good at 1 of those hard computer science problems of naming things.

Other things that are really useful is, in terms of managing flags, is when was this flag created and when do we expect this flag to be retired? That goes back to that thing of certain flags you're expecting to only be in your code base for a few weeks, a few days maybe. Certain flags you're expecting to be there for the next 2 years and being able to ask the system

which flags should have expired by now but are still being used is is really helpful in terms of kind of keeping hold of that or keeping in check that kind of that tech debt kind of thing of of of flags growing over time. So I think those are those are some really kind of useful ways of of kind of extra information of managing those and

a good feature flagging system

has that capability

even if you the even if your feature flagging system doesn't have that. Even if your home rolled thing or the open source tool you're using or SaaS tool you're using doesn't have all of those capabilities.

The next best thing is to just

include that information in source control

next to a place where those flags are defined. Normally,

this is probably less

less true in a language like Python, which is kind of more dynamic. But if you're working in a static language, the, the way these a lot of times, the the the kind of the rubber meets the rubber, the feature flags is somewhere where where you've got, like, an enumeration or a map or or something that kind of lists all the flags in the system or in the code base that that or that this part of the code base is aware of and lets you kinda say is this flag on or off. And you can include in source code that extra kind of contextual information even if it's just in a comment that's useful.

It's there somewhere. And of course, if it's in source code, then you've also got that audit trail of you know when it was created because you can, you you know, use git log or whatever. And you can even sometimes infer the owner by doing a git blame. So that's that's the second best thing. I'm not sure if I totally covered all of the all of your question. No. I think that that was very useful.

And another thing that is useful in the idea of when did this toggle get introduced, when is it supposed to go out of use, is the

idea of being able to market flag as being deprecated of we don't wanna support this anymore. It really needs to go away. Please stop using it.

Yeah. Yeah. And I feel like there's this I I've talked to teams that have these kind of extreme ideas of of, like, time bombs. Right? Where, like, if the flag is still in use past its expiration date, then, like, refuse to start the application or something like that. And

I I was talking to a company the other day about a feature flagging, and they said, yeah, we did that. And inevitably,

it just meant that people kept on updating the expiration date

to the end of the future,

which is which is, you know, it's sad, but it's also

I still think there's value in that because you're at least making making it you're raising awareness

of this issue. Right? Like, it's not like it's like the difference between something that's going moldy in the back of the fridge and you can't even see it versus something where you open the fridge and it stinks. You know? Like, at least at least you're getting that smell and you know that eventually someone's gonna say, like, oh gosh. We really gotta clean up this mess.

So talking about feature flags, it's very easy to get into the mindset of, oh, feature flags are fabulous. I'm going to use them all over the place. They're wonderful. There's no downside. So what are some of the cases

where adding a feature flag is just not worth it because of the additional either cognitive overhead or possibly performance overhead or just the overall just difficulty of implementing and maintaining feature flags in a code base. I'm curious if you've run into any situations where somebody was using them, and you said that's just a horrible idea. It's much simpler if you just have this 1 if else statement. Yeah. I mean,

I I definitely think that t like, teams who have have mature users of feature flags who've using been using them for a while. 1 of the things I 1 of I think very common themes you'll you'll hear is, like, we try and keep the number of flags in check. So having, like, literally having a WIP limit a work in progress limit where you say we're only allowed 5 active flags is is legit. And I think teams that are doing that would tell you there's a lot of places where you could put it behind a feature flag but do we really need to so,

if something

is

you know, there's an argument to be to be made for where this is gonna be really fiddly to be done with a feature flag and we're okay

with we are going to go in with our eyes wide open on creating, you know, putting a pause on production deployments for a week for this small system or making this long lived feature branch even though we know it's gonna be a horrible merge. I'm I'm pragmatic enough, I think, to say even though I think in general, those are bad practices,

there's times when it's good, you know, you know the rules well enough to break them. And I think there are definitely places where you could use a feature flag but it's better not to. I think there's places where it's better to where you could use feature flags but it's better not to. In general,

there is

there's definitely

a lot of places where you can be

smart about where you put that flag. And you can also be smart about,

other ways to sequence your work. So, a flag is not necessary. So, let's say, for example,

our social login feature actually needs like 4 different changes in the system. We've gotta

add like a new

a new kind of like back end gateway that goes to this authentication system. We've gotta change some we gotta add some tables to a database or something. And then we've also gotta kind of put that that piece of UI in. You only need to put a feature flag behind that last piece of putting the the UI, and you can do all that other work without a feature flag.

If you're confident that it's not if you if you're confident that you're gonna detect breakage before you put it into prod or you you're kind of the risk of breakage is is not so much that you're you're kind of worried that you you you you need to be able to instantly turn it back off again You can make all of the back end kind of behind the scenes changes

directly on master if you're doing trunk based development or, you know, via a series of small feature branch fee you know, feature short lived feature branches that are checked into master. Make all of those kind of, like, setting up the setting up the the background ahead of time and then the last thing you do is add that bit of UI that shows that social login button let's say and you know if you're a lot of times that last piece doesn't even need to be feature flagged either because it's like a very small change. You can land that in a single, you know, a single commit or a small feature small short lived feature branch. And if you don't feel like you again, if you don't feel like there's a risk,

there's a high risk that you'd wanna kind of pull it back straight away. And if you know that you don't need to do any experimentation with

this or a controlled rollout where you only roll it out to 5% of your users or whatever else, then don't bother with a feature flag. Just do all the work behind the scenes

and then do the last piece that kind of finally surfaces

that feature to users,

at the very end, then you you you can avoid using a feature flag. And I think that that general technique of of and these are sometimes called branch by abstraction techniques. Those general techniques are useful even if you are gonna eventually put the entire feature behind a feature flag. Doesn't mean that all the back end pieces have to be checking that feature flag. They can just sit there and if they're being used, then presumably the feature flag is on. If they're not being used, then the feature flag is off. And for anybody who wants to dig deeper into feature flags or learn more about it, what are some of the useful pieces of advice or references that you recommend?

So the 2 I we've not really touched on it, but I think the 2 piece the 2 pieces of advice I I have, which sound a bit contradictory, but I don't think they are. The first piece of advice I have is just start really simply. Don't don't start with a open source framework. Don't start with a SaaS product. If you're literally just wanna get started, just start with a simple

if else statement in your code or something like that and,

and get comfortable with the concepts of of of feature flagging. But as soon as you realize that you're really that you're that this is a useful thing and you wanna kind of start using it more broadly, do not kind of take those small series of steps that eventually

end up with you hand rolling the 725th

half assed feature flagging system that's inside of a company somewhere. At that point, stop and,

don't build it yourself. Look at open source libraries. Look at SaaS products.

And yeah. Don't I I just don't so many people build this thing themselves, and I think it's just because they enjoy doing it.

The not invented here problem. Yeah. Oh, it's I mean, I don't even it's not even a not invented here thing. It's just it's a fun it feels like a fun sized it feels like a fun sized problem. It's normally something which is a little bit below the radar of product managers because it starts off as like an engineering

internal feature.

And so people can kinda sneak it in and it's and but it has enough justification that you don't have to kind of,

lobby

your your your kind of product manager or your product owner or scrum master or whatever for permission to do it. And so I think I think, honestly, I think a lot of times it's a fun thing to for people to build, and so they end up building themselves and then they kinda should be honest and not do it. And they also just don't realize how much work it's gonna be. Like, I worked at a company where the feature flagging system had been built by an intern and just death by like, you know, we we were a boiled frog. 6 months later, we're trying to figure out how to get our production systems to be reliable

with this code that had its origins in someone's summer project. So it's not not a good look. I think the other recommendations I have which kind of

we touched on already is to is to prefer

static configuration where you can. If you can have

a feature flag flipped on or flipped off by a code change, that's great because you get to leverage

all of the quality checks and safety checks that you have in your delivery pipeline

when you make that feature flag change just like as if you'd made a code change. So

you flip the flag on and then you watch the the the and then you you know, all the tests pass. The integration tests are ran and you if you've got performance testing, you check that your performance testing hasn't been impacted. That's really nice if if you can do that via code change because you get all that stuff for free versus if it's a dynamic flag, then essentially you're just, like, banging stuff something into production without doing any of the tests that you would do for a code change which is kinda scary in some ways. And then, like, my last piece of advice is

to read up on kind of beyond feature flagging kind of read up on trunk based development practices

and, continuous delivery practices in general. There's a really good website called I think it's trunkbaseddevelopment.com,

which has a lot of good background material.

The the book, Continuous Delivery, is a little bit old now. It's not a bit it's an, you know, it's great book. It's got a load of really good stuff about just broadly kind of continuous delivery practices. Yeah. I'll second that, continuous delivery book recommendation. Yeah. The content is a little old, and it has some references that might be a little outdated, but the core principles are definitely still completely valid and still useful to read up on. Yeah. The only thing that I the only thing that I I was just rereading it the other day or rereading a section of it the other day. And the 1 thing that's in there that I'd be interested to actually talk to Jez and Dave and see if they still agree with this. The 1 thing that's in there is kind of it advocates for using release branches as a way to kind of orchestrate releases. And I think that that is probably that advice is now a little out of date because the the CICD systems that we have today,

they have kind of, like, delivery pipelines as a first craft 1st class kind of thing in the system. So some some of it is some of it is just kind of like, they're talking about SVN and they're not talking about Git because Git wasn't around at the time. But some some of it is just like a little bit the, you know, the the state of the art has moved on a little bit. But that's, like, maybe 1% of the content. 99% of the content is just amazingly

super valuable, super super useful. Are there any other aspects of feature flagging or your experience of using them or working with companies who have implemented them that we didn't discuss yet that you'd like to cover before we close out the show? No. I mean, I think we I think we talked about I think we talked about a lot of stuff. I think the main thing is the main thing that I want people to try and get their heads around is that all these different types of things are kind of fundamentally the same thing. And building it yourself is probably a fun thing to do, but not necessarily the right thing to do. Always good advice.

Always good advice. Yeah. Alright. I think the only other thing that we didn't really touch on is how feature flags manifest in your tests,

but I think that anybody who is doing testing can figure that out fairly well as far as just make sure that you have a test that sets the flag to on and sets it to off and make sure that both branches have the expected functionality. Yeah. I mean, I think that the you know, we could probably spend another hour talking about this actually. But,

the other thing that I would say,

is this is another 1 of those iceberg things

is it is worthwhile

adding some kind of awareness of feature flagging into your testing system. So things things that I've seen that are really useful is the ability to kind of tag a,

a test is saying this test should be run with the feature flag off and on, and it should behave the same. Or the ability for a test that the having your feature flagging system make it easy for a test to temporarily

override

the state of a flag.

And likewise,

you talked talked about this already, but having the ability for a manual tester to temporarily override the statement flag in order to verify different things

that investment

in kind of

in the feature flagging systems kind of for, like, supporting testability

is definitely a a worthwhile thing to do. Sometimes it's not something that is top of mind for engineers if they're not also doing exploratory testing, but it's it's a really good,

investment to do that kind of stuff. Well, for anybody who wants to get in touch with you and follow along with the work that you're up

to, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the circuit playground express from Adafruit. I started playing around with 1

few weeks ago with my kids, and it's been a lot of fun. And, been able to use the circuit Python distribution

for experimenting with flashing the LEDs and doing all the fun things you can do on them. So if you're looking for a fun little hardware project that's inexpensive and easy to get started with, I recommend checking those out. And with that, I'll pass it to you, Pete. Do you have any picks this week? Plus 1 on on Adafruit. Plus 1 on all of the circuit Python stuff is super cool. Also, I'm just a huge fan of Adafruit in general, and they've got a load of just really awesome documentation

on there.

Like, the the kind of learning section of their website is amazing, and it's amazing how much stuff they do for free. So, yay, I love Adafruit.

My my recommendation

is a book or my pick is a book, called accelerate

and this is this is written by

Nicole Forsgren, Jez Humble,

and Jean Kim. Jez Humble is also 1 of the co authors of that continuous delivery book. And today, the people that were behind the the DevOps

report that that has been coming out every year for the last few years, It's amazing because they're using actual science

to kind of validate

what kind of engineering practices work and what don't in terms

of, like, orgs that are making more money. At the end of the day, they actually kind of, like, look at, like, which organizations

are performing better from kind of, like, a either a kind of a catalystic

perspective, like they make more money than their peers or from a kind of a a social perspective, what are they doing better? And then they back that all the way to what are the engineering practices that

that correlate

or not even correlate, that drive

that kind of success.

And they really dig into the details of what that means. So they talk a lot about things like continuous delivery, but also about kind of cultural aspects of the company.

And it's super useful.

It's great advice, and it's advice that I kinda generally most of the stuff they talk about is stuff that I kinda generally agree with anyway, but it's it's extra wonderful for it to be stuff I agree with that's actually backed by real science that shows that it's true rather than it just being like, you know, this is a well argued case, and so I'm gonna kind of hope that they're right about it. So really recommend that book. It's really fun to read as well and if you're a stats nerd it has a lot of stuff at the the back around how they actually did those stats.

So that's 1 pick. And then the other pick, which is very very self serving,

is an article that I wrote for,

Martin Fowler's website, which is, just an article about feature toggle. So all the stuff we we talked about, but some more detail about some of the implementation practices. It's a little bit old now, but I think not that much has changed since I wrote it. So people wanna learn get get into more details about feature toggles, then they should they should read that.

They should be that page. And I think also, the last thing I'd say is in is in the pick is, you know, I I really love talking about this stuff, and I love I really, really love hearing about what people are doing in the real world

around feature flagging. So if anyone who's listening to this

is kind of got questions or thinks I'm talking nonsense about 1 of the points I made, definitely reach out to me, and and I'd love to chat more about it. Yeah. I'll second the article. I actually read through a bunch of that to get ready for this conversation, and I'm definitely gonna have to add that accelerate book to my reading list. So thank you very much for taking the time today to join me and share your experiences

of working in this space. It's something that's definitely useful to

engineers working in any language. So I appreciate your time, and I hope you enjoy the rest of your day. Absolutely. Thanks so much for having me on.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com

for the latest on modern data management.

And visit the site at pythonpodcastdot

com to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__