Open Source Product Analytics With PostHog - Episode 266

Summary

You spend a lot of time and energy on building a great application, but do you know how it’s actually being used? Using a product analytics tool lets you gain visibility into what your users find helpful so that you can prioritize feature development and optimize customer experience. In this episode PostHog CTO Tim Glaser shares his experience building an open source product analytics platform to make it easier and more accessible to understand your product. He shares the story of how and why PostHog was created, how to incorporate it into your projects, the benefits of providing it as open source, and how it is implemented. If you are tired of fighting with your user analytics tools, or unwilling to entrust your data to a third party, then have a listen and then test out PostHog for yourself.

Springboard logo Machine learning is finding its way into every aspect of software engineering, making understanding it critical to future success. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype.

Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. Podcast.__init__ is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to pythonpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.


Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $60 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!



Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
  • You listen to this show because you love Python and want to keep your skills up to date, and machine learning is finding its way into every aspect of software engineering. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype. Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. Podcast.__init__ is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to pythonpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.
  • Your host as usual is Tobias Macey and today I’m interviewing Tim Glaser about PostHog, an open source platform for product analytics

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what PostHog is and what motivated you to build it?
  • What are the goals of PostHog and who are the target audience?
  • In the description of PostHog it mentions being a product focused analytics platform, as opposed to session based. What are the meaningful differences between the two?
  • Customer analytics is a rather crowded market, with a large number of both commercial and open source offerings (e.g. Google Analytics, Heap, Matomo, Snowplow, etc.). How does PostHog fit in that landscape and what are the differentiating factors that would lead someone to select it over the alternativs?
  • For anyone interested in using PostHog, do you offer a migration path from other platforms?
  • necessary features for a customer analytics tool
  • privacy and security issues around analytics
  • How is PostHog implemented and how has its design evolved since you first began building it?
    • reason for choosing Python
    • benefits of Django
  • thoughts on introducing Channels
  • option to include it as a pluggable Django app
  • integration points
  • data lake integration
  • challenges of providing understandable statistics and exposing options for detailed analysis
  • Having data about how users are interacting with your site or application is interesting, but how does it help in determining the useful actions to drive success?
  • business model and project governance
  • What are the most complex, complicated, or misunderstood aspects of building a product analytics platform?
  • What have you found to be the most interesting, unexpected, or challenging lessons that you have learned in the process of building PostHog?
  • When is PostHog the wrong choice?
  • What do you have planned for the future of PostHog?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...
Tobias Macey
0:00:12
When you're ready to launch your next app or want to try a project to hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested linode platform including simple pricing, load balancers, 40 gigabit networking, dedicated CPU and GPU instances s3 compatible object storage and worldwide data centers. Go to Python podcast.com slash linode. That's l i n o d today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. You listen to this show because you love Python and want to keep your skills up to date and machine learning is finding its way into it. Every aspect of software engineering, springboard has partnered with us to help you take the next step in your career by offering a scholarship to their machine learning engineering career track program. In this online project based course every student is paired with a machine learning expert who provides unlimited one to one mentorship support throughout the program via video conferences. You'll build up your portfolio of machine learning projects and gain hands on experience in writing machine learning algorithms deploying models into production and managing the lifecycle of a deep learning prototype. springboard offers a job guarantee meaning that you don't have to pay for the program until you get a job in this space. podcast thought in it is exclusively offering listeners 20 scholarships a $500 to eligible applicants, it only takes 10 minutes and there's no obligation. Go to Python podcast.com slash springboard and apply today and make sure to use the code AI springboard when you enroll. Your host as usual is Tobias Macey and today I'm interviewing Tim Glaser about PostHog an open source platform for product analytics. So Tim, can you start by introducing yourself?
Tim Glaser
0:02:02
Yeah. Hey, thanks so much. Thanks for having me. Yeah. So I'm Tim Glaser, CTO and co founder of PostHog, as you said, we do open source product analytics. So think something like mixpanel or amplitude are completely open source, you can host it yourself have full control over the data.
Tobias Macey
0:02:18
And you remember how you first got introduced to Python.
Tim Glaser
0:02:21
I actually don't remember the specific point. But I started programming. I was fairly young, sort of when I was 10 years old with with HTML, and then had a couple of jobs throughout high school that were mostly PHP. And then I ended up working for cloud nine where we use node but I think some time in between there. So playing around with Python. But the first time I touched Python professionally was at my
Tobias Macey
0:02:45
was at a company called Magnus where I spent the last sort of four years before before starting postman. And so in terms of post itself, can you give a bit of a description about what it is and what motivated you to build it?
Tim Glaser
0:02:56
Sure. So like I said, it's it's really a pleasure. form that that you can host yourself. And you know, it's, it's a platform you can host yourself that allows you to figure out what your users are doing on your app on your website. And it will help you build a better product basically, by having all that information. So the category of product analytics is fairly well established, right? You have companies like mixpanel amplitude heap, there's a bunch of others that, you know, we can kind of dig into. But the thing that all kind of lacked is, you know, we use a bunch of them before starting post org, specifically, you know, things like mixpanel. The problem was, those tools are all really built for product managers, product owners, those kinds of people in it. Yeah, I used to be one myself, so I can kind of relate but the thing they don't do very well is is make it accessible to engineers, you know, if you if you go around companies that use those kind of platforms, and the only engineer that tends to have access to those, to those platforms is the engineer that installed it, and then he probably hasn't looked at Yeah, they probably haven't looked at it since. So the That's why we kind of started post off, you know, we wanted to build something for engineers, something that they could use, and they could look at because in the end, engineers tend to make a lot of product decisions de facto, but at the moment tend to do it without any data without having access to it, etc. So that's kind of the overarching philosophy that you know why we started it, we felt quite strongly that especially tools like mixpanel, you know, they charge you an arm and a leg for getting access to your data, it tends to come with like the highest price point plan. And to us that felt really silly, because it's your data, you know, you've kind of generated it, you should be able to do with it, what you want, you should be able to do analytics on it should be able to run queries against it. So that's why we wanted to open source here, it didn't feel right to us that you should send all this data off to third parties,
Tobias Macey
0:04:48
and also with the introduction of GDPR and ccpa. And just the overall increase in awareness of consumer privacy, the sort of miasma that has begun to surround them. There's different types of products as well is even more motivation to have greater control over who's using your data and what they're using it for, rather than handing it off to Google Analytics, or whoever else for being able to improve their own products, just because it happens to be free for you,
Tim Glaser
0:05:17
obviously, you know, GDPR, etc. Those are our big motivations for it. As engineers, you're always super, like, aware of who's using your data and what and it just felt right to us that it should be something that you can control rather than, you know, like I said, sending it off to to whatever and, and certainly with things like MCs on Amazon, you know, they really encourage you to send as much data as possible about your user email addresses that, you know, addresses all that stuff. And that's all
Tobias Macey
0:05:42
hosted on that service, and you can't kind of do anything with it. And from that point on, and in terms of post target itself, you mentioned that one of the goals is to be more engineer focused, and I'm wondering who the overall target audience is for post target itself and some of the challenges that Come along with building an analytics platform with the developer in mind.
Tim Glaser
0:06:03
Yes, the target audience is definitely engineers over Product Manager owners at the moment. You know, we strongly believe that engineers need access to this data as much as as as those kind of groups of people. So our target target audience really is kind of any engineer in any organization. But it's at the moment, a lot of people that are installing it are kind of engineers at smaller organizations, you know, they have a handful of engineers that there's no formal product role yet. But the engineer wants to know what's happening in that product and who's using it and how they're using it.
Tobias Macey
0:06:35
That's why they're installing us. And in the description of post og. And as you mentioned, at the beginning, it's described as being a product focused analytics platform, and you draw the contrast to session based analytics in the readme of the project. So I'm wondering what the meaningful differences are between the two in terms of the types of data that are collected and the use cases that they benefit from.
Tim Glaser
0:07:00
Yeah, absolutely. So something like Google Analytics won't, unless you're paying them a lot of money won't give you statistics or kind of data on individual users. Which, you know, that kind of makes sense. If you're maybe running, you know, something like a media platform. For example, if you're running like an online news website, and you don't actually care who the individual users are, what they've done, you kind of just care that, okay, this article got this many hits, etc. And that's doesn't work quite as well for product, right? Because if you have like a b2b or even b2c kind of product, your users are going to be very different and behave very differently based on who they are, you know, you're the customer that's paying you a million dollars a year, they use the product very differently than the customer that's paying you $10 a year. So you kind of want to be able to segment on those things. And that's where the original rave of product analytics came in mixpanel amplitude etc. And they absolutely nailed that kind of use case. And they've done really well out of it. We're not really innovating there, to be honest, you know, the thing, the thing, the place where we've innovated is open source, right? We have taken something that has clearly proved to work very well. Both amps and mixpanel are sizable companies. With a lot of really dedicated users, what we've done is just made it work for engineers. That's, you know, that's the real power of it. Like, I kind of joke about it with with some of the some of the people on my team sometimes, but the product development for post wasn't that difficult, because it was very well defined what we had to build, because it kind of already existed. And, you know, there's a bunch of places where we thought we could do better. But the real innovation was just being able to self host it. And, you know, we've seen a lot of use cases where it's really hard in especially big organizations to get any software kind of installed. She has to go through all sorts of processes, etc. But as soon as something is MIT licensed That's really easy. So yeah, that's that's just kind of where we where we've come from.
Tobias Macey
0:09:04
And so using open source as the differentiating factor has a lot of benefits in terms of gaining broad based adoption for people who are interested in this kind of thing and want to be able to just pull something off the shelf and experiment with it. But in terms of the actual capabilities of the product that would help to sell it to the people who are actually going to eventually write the check to pay for support or pay for managed hosting, what are some of the capabilities of post hog that make it stand out in such a crowded marketplace?
Tim Glaser
0:09:36
Sure. So part of it, obviously, I guess there's there's a moment if you're a large organization, and you want something like product analytics, you probably have two choices. One is you go to something like mixpanel or amplitude and you pay them a small fortune, and you get access to that that products but you don't get access to the underlying data. Sometimes you can Pay them to put it in a data warehouse maybe. But you know, the data isn't stored in your servers, whatever. The other side of it is using something like
Tobias Macey
0:10:07
you know, snowplow and snowflake, and then using something like Tableau or Looker on the other side to do data analytics, and this is what we've seen, you know, we've talked to a bunch of people who've gone that route. So they started with amplitude mixpanel, that became too expensive. They wanted to keep more control over the data. So they went to something like the certified describes snowflake snowplow Looker. The issue with with that is you end up having to hire you know, a data science team basically, that your you know, the rest of your organization needs the ping to build dashboards. And you know, it's very hard to build even like a basic funnel, for example. So we think there's like a massive gap between the two where something with the ease and the usability of mixpanel amplitude, but that you can host yourself you can have complete control over it as you would with the kind of snowflake snowflake snowplow setup. For people who already have an analytics platform in place, whether it's session based and they're using Google Analytics or moto or they're already using a product focus tools such as heap or mixpanel, and they're interested in using post hog, do you have any sort of migration path for being able to pull data from those systems to populate your platform with the information that has already been collected? Or would it end up having to be just a clean cut over where they start new with post hog and just use that going forward?
Tim Glaser
0:11:30
Yeah, we've mostly seen kind of clean cut migrations. Part of that is, like I said, Unless you this depends per provider, but unless you're paying a lot of them a lot of money, it's actually really hard to get the data out. A lot of times they won't let you so there's no way for us to import it anyway. And something like Google Analytics again, because they're like session based. We can't, you know, there's no way of kind of marrying it up I guess. So a lot of times what we see is it is just a clear cut kind of transition. And you know, it's something I would love to kind of do better I guess, but it's It's just something that it has kind of been made up. But the thing with these mistakes is like, you know, a lot of them have 90 day retention of data, for example. And the reason is data older than that tends not to be very relevant to what you're doing now, because your product changes your website changes. So you know, even if you have product versus if you have post over on a side by side, but there's tools, you know, after, after a couple of weeks a month, you probably have enough data in post or you know, whatever the new service is that you can do the right level of analytics with it.
Tobias Macey
0:12:31
And as I was going through the documentation, one of the things that stood out to me about the way that you're approaching the event collection and post hog is that you are taking a similar approach to heap in that you're automatically collecting everything so that after the fact you can then decide that any particular signal is useful and incorporated into some sort of dashboard versus having to spend the engineering time to say that you want to start collecting something new and then have to wait for it to come in and no one During what you have found to be the utility of that approach, and any challenges that come along with it, the utility
Tim Glaser
0:13:05
of it, which forces it allows you to kind of what you said, right? It allows you not to spend engineering resources on adding, you know, track functions everywhere. And those functions, sometimes it's one engineer who's really enthusiastic, and she'll add it, you know, everywhere in one go, and then it kind of languages and the rest of the team doesn't take it up. So we've we've seen that that doesn't tend to be super reliable way of making sure you're tracking is up to date. And, you know, it allows kind of less than technical users to go in and kind of define what they want to look at. And it allows you to, you know, do backwards looking stuff. So even if you haven't installed the track functionality, you can still go in and, you know, work out what's happened, the downsides of the challenges that we've you know, that we've seen with, with that, as you know, it, it does, it tends to be basically the head Let's get that is, you know, Dom, right? You still have to kind of work with the DOM and sometimes the way, you know, for example, like CSS and j s, right, it's great, but it creates class names that are, you know, completely meaningless, basically, to anyone else. So that makes it quite challenging to write selectors that work for it. And, you know, we have like a click point and click interface to kind of defining these events, but if your class names are, like, dynamically generated, that's not gonna work. So that's, that's the challenge. And like, you definitely get more reliability if you do kind of track cause everywhere in your code, but you you lose some of the flexibility. So that's it, you know, you know, obviously, like, we also allow you to do dot track Cause if you want, yeah, it's kind of just a free extra option. Now, we see a lot of use, right?
Tobias Macey
0:14:49
And for somebody who is using post hog, can you talk through just the overall workflow of the data collection and then being able to build useful dashboards out of it. Any integration or enrichment of data sources from either additional platforms that you're using, whether it says or otherwise, or being able to collect information from things like mobile applications,
Tim Glaser
0:15:12
Joe, so, you know, if you sign up for an account on, you know, you deploy to Heroku, AWS, whatever it is, we give you a super simple snippet, you put that in in your website, and you start collecting from the word go. That's the super basic kind of version. And that will allow us to do quite a bit of analytics. And we start collecting events straight off. We have libraries for most popular kind of libraries and languages that we have, you know, React Native iOS, Android, we have Python, Ruby, go, etc, etc. So you can start collecting events from the back end as well. The thing that's most challenging, and this is challenging with with all of the kind of product focus on Netflix, is that you need to marry up what happens, you know what the users are doing the front end with what users are doing, the back end does, maybe you don't need analytics in the back end, in which case it's about easier. But you basically need to send something that uniquely identifies the user. So you know, a user ID, so you can work you. So you can kind of like, make sure that all the events that one user does does correctly get grouped under one user. And that's kind of how you does, this tends to be a little bit of challenging and we've, we've made it as easy as possible. But it's a challenge, whether you use your amplitude or mixpanel, or post or then the next step is, you know, in post August, you start, you start creating some dashboards. So you've got all this data, we pre create some of the dashboards for you. But you know, we have kind of like a graph interface that is really powerful, allows you to do all sorts of analytics, you know, things like stickiness, like retention, it allows you to filter by kind of any property that you send, or we send a bunch of properties automatically, you know, so that allows you to create really powerful dashboards really, really quickly
Tobias Macey
0:16:53
and in terms of the collection of the data to one of the challenges can be in making sure that the structure of the event is able to be joined across the different platforms. So I'm wondering what your approach has been in terms of enforcing some sort of default schema or default attributes for being able to then collect the information across multiple different sites or platform and then join it across them.
Tim Glaser
0:17:17
Yeah. So I guess the the the upside is, you know, we control all the client libraries as well. So it means that all of the attributes across the libraries are consistent in terms of marrying up the users, you know, we have kind of unified calls, basically, the thing you do is you just send us a user ID. And that will mean that everything that one user does, if they're logged in on your app, or whatever gets attributed to that user. So there's a couple of ways we you know, we kind of make sure that that all of the data is consistent across, you know, whether you've got data coming in from your iOS app, or your marketing website, or you know, your online app that it gets that it all kind of works together.
Tobias Macey
0:17:56
And that's another interesting thing too about this thought of production. analytics is that for things like Google Analytics or my tomo, it's very much that the statistics are collected in this discrete entity of one website. And then there is the possibility of being able to view traffic across different properties. But it's not a first class concern. Whereas as I was looking through the documentation for post hog, it seems that there is much more built in functionality for being able to say that the information that I'm collecting in this project is something that pertains to the marketing website, as you said, and your web application and maybe your mobile device. And I'm wondering what your thoughts are on the utility of treating the different properties as a single overall experience versus treating everything as a discrete entity in its own right.
Tim Glaser
0:18:45
You know, it's super important, I think, you know, talk to any markets here and they love talking about and, in fact, my girlfriend is a senior marketing manager. And, you know, she loves talking about things like first starch, last touch. It's so crucial To especially, you know, kind of marketing, then that filters down into the product, right? You want to know how people first found you whether that, you know, if you're doing paid ads, for example, you want to know, okay is my Facebook campaign working. But if your KPI is someone goes to your website, you know, sees an ad goes to your website then downloads an app and signs up on the app, you know, you want to be able to kind of track people across all of that, ideally, because then you can say, okay, the kinds of users that do, for example, slack knows that if you join, like a couple of channels, it knows that you're basically hooked for life. And Netflix has something similar, right? Well, if you watch a couple of movies, you're going to be hooked for life basically, or for a long time. If that's your KPI that lives in your mobile app. And you're spending a ton of money getting people you know, from Facebook, onto your website, you want to be you want to make sure that the types of campaigns that bring people to website are the types of campaigns that then eventually lead people to do whatever that action is in your app. So they're marrying the two up is it is a massive challenge. But if you get it right, especially in big organizations, it can be so powerful, because it just allows you to, if you're doing paid marketing spend much more effectively, or if you're doing content marketing to
Tobias Macey
0:20:11
focus on the things that really matter, in terms of post hog itself, can you talk through how its implemented and how the overall design and architecture of the system has evolved since he first began building it?
Tim Glaser
0:20:20
Yeah, I should say I sort of consider myself to be, you know, a fairly average developer and I like libraries. And, you know, frameworks that are well tested, that are, you know, a little bit boring, but that just do the job. So, you know, I started with was just Django and like a very basic react app. And that's basically still what it is today. Obviously, we've expanded things massively. So but you know, in essence, it's basically a Django app. It's got a you know, we use Django rest framework for the API endpoints. It's a single page application in react and we use something called key For a state management, so we actually hired Marius who wrote the key a framework. It's basically just a layer on top of Redux. That makes that takes away a lot of the boilerplate of writing kind of Redux code. And it's really helped us structure our code base much better. So that was probably the single biggest, like architecture change that we made. Because before and you know, obviously, it's all open source, you can see exactly how I kind of stuffed this up. But you know, it was we would just have like, huge classes of react components that did a bunch of magic stuff it would call API's left, right center. And yeah, we slowly started migrating to using kiya So basically, yeah, having having some kind of state management in the app, and it's made a world of difference. So that's definitely been the biggest challenge.
Tobias Macey
0:21:48
Other things change. And using Django as you said, it's a boring technology. It's something that's been battle tested for a long time and because of the fact that you're building something that is Very heavy on the data ingestion and data analytics side of things. I'm wondering what you have found to be its overall capability in terms of performance and any issues that you're seeing in scaling to larger volumes of data and larger volumes of interactions for maybe large properties.
Tim Glaser
0:22:16
Yeah, so I'm on the analytic side, it's kind of, you know, the database that's most of the heavy lifting, and the event insertion, we are definitely starting to run up against some issues. So actually, something that we have generally been working on this week is I won't get into too much detail. But the basic problem is, every time we insert an event, we have to work out if that event is part of an action we call them actions is basically a wrapper around something like an auto captured event that makes it human readable. So if you have a call to action on your website, that's like sign up, you can create an action that's called sign up and then use our point and click interface to you know, work out what button is, that is the signup button. Now, every time We do an event insert, we kind of check whether that event is part of an action. And we create this like fairly horrible SQL query to work this out, the database handles absolutely fine. But the Django ORM is just quite slow in generating it. And it's actually brought down some people that were experiencing like high volumes. And we had to quickly remove those actions to make sure it works. So that's something we're looking at at the moment. So I think, possibly for the event ingestion, you know, we might move away from the Django ORM. Or we might just cache the results of that query or, you know, whatever it is, but apart from that, you know, I absolutely love Django. I think it's one of the best things, you know, certainly kind of like the Python community. And it just allows, you know, I could have, we could have never built a postdoc as quickly as we've done
Tobias Macey
0:23:44
without Django. Since you have gotten decently far along in the product. I'm wondering what your thoughts are reflecting back on the initial decisions of using Python and Django for building this platform out and any considerations that you might have done differently now that you are further along in the journey,
Tim Glaser
0:24:03
I would absolutely use Python again. And I would absolutely use Django again. Some of the other mistakes definitely been at the front end, as I said, around, you know, kind of state management and I sort of wish we'd use TypeScript. I sort of wish we used kind of testing. But yeah, absolutely. You typing in Python is great. And you know, the the Django stubs library is a little bit funky sometimes, but it's getting better. And that has saved us quite a few times, just having strong typing everywhere. And kind of reinforcing that across the codebase. I guess, doing differently. We said maybe a little bit niche, but we definitely have kind of large functions in our Django rest like API framework. So some of the some of the kind of view sets and serializers are really, really big. And I think the sort of accept accepted wisdom these days is to have sort of a middle layer between models and the API, and we still kind of struggled to place things. There's some some The logic isn't like models. Some of the logic is in the API, you know, the API layer. And we're, we're constantly not quite sure where any of it belongs. And I think if we'd had some kind of like structured middle layer, that would have been that would have been useful. And
Tobias Macey
0:25:14
because of the fact that this is very IO heavy. I'm wondering if you put any thought into using the channels capabilities in Django, I know that looking at your requirements that you're on version three, so should be able to support the ASCII interface and use channels to maybe give you some extra performance while still being able to take advantage of all of the stability that Django offers.
Tim Glaser
0:25:35
And mostly for kind of event insertion, you're talking about them?
Tobias Macey
0:25:39
Yeah, primarily for the critical path of getting data into and out of the system.
Tim Glaser
0:25:43
And yeah, it's To be honest, it's something we haven't considered yet we all j. s library is fairly simple. We do the batching basically, for example of event insertions, and we do some clever stuff around that on that side. But no, we haven't, we haven't really looked at it. And then in session using channels to back Johnson, yeah, something we should consider.
Tobias Macey
0:26:03
And one of the interesting things that I've seen from looking at a lot of the different open source analytics offerings, it seems that most of them are implemented in PHP, whether because of the success of the LAMP stack or the success of WordPress and their desire to be able to play within that same ecosystem. And I'm wondering what your thoughts are on the trade offs of PHP as the implementation target for these analytics platforms versus the benefits that you're seeing of building it on top of Django
Tim Glaser
0:26:31
in the same way that if you get a century's GitHub, it says, it doesn't have much text in there, like read me, but the one thing it says is the century server is written in Python. But you can use any library or any language to send events to century that's kind of how I think about it as well. Like, the fact that we're written in Django is kind of irrelevant to how people send us events because we've got, you know, we have a PHP library and we have a Ruby library, and they're all as good as the Python. So, you know, we don't really care what you write, you know what your system is written? Because we will probably have a library for it. And so yeah, I think the question of what we're implemented in is almost like less interesting. We did have a, an employee from an from a, from a large Canadian bank that will remain unnamed, reach out to us and be like, we really love what postdoc is doing. But we wish it was written in Java. And I, yeah, I mean, I guess basically, the reason they wanted that is maybe they wanted to expand it and hack on it. But apart from that, I think, you know, what libraries offer is more important than what you write the software itself in?
Tobias Macey
0:27:39
Yeah, it's just interesting to me that it's actually taken so long for there to be a decent option in Python for building web analytics platform or an event or an analytics platform versus the plethora of solutions that have cropped up in PHP, probably just because of the fact that PHP has had such a dominance over the web for a while, at least until The frameworks like Django and rails have come about. And on the subject of post og being implemented in Django, I'm wondering, what you see is the viability of potentially offering it as a pluggable app for other Django projects to be able to embed within them versus running it entirely as its own hosted platform.
Tim Glaser
0:28:19
Yeah, that's that's super interesting to me. We, we we've had quite a lot of people to talk to us about wanting to kind of offer post org as it thing to their customers. So you can imagine if you're building a website builder, for example, you want to give your customer and your analytics information. Now how many pages Did you get, you know, where did people come from, etc, to kind of using using post look for that? Um, yeah, specifically for the Jaguars. I mean, that is a good idea. I think we are gonna in the same way that century for example, has like only a couple of options on quite specific options, how to deploy it, you know, we require like celery we require a Reddit server. We requires quite a bit of context sometimes to make sure that we were up and running. We obstructs a lot of that away by having like an absolute Jason thing for for Heroku and having Docker images and a Docker compose file, having like terraform, you know, cloud formation, all that stuff. So I wonder if you know how feasible it is to like, expect people to set up all that config in their own Django app just so they can use it as a as a side project. But apart from that, like, you know, we there's been quite a few use cases of people kind of serving post off to their customers in some way, shape, or form. And it's definitely something that, you know, we want to encourage and
Tobias Macey
0:29:36
as far as the utility of post hog, some of the benefits that come about from having access to the underlying data and having the API's available to it is the option for integration and extensibility. So I'm wondering what you view and what is already existing in terms of the options for extension and portability or integration For people who want to use post hoc within the broader context of their systems,
Tim Glaser
0:30:04
yeah, the thing that we see a lot of is people are just deploying it themselves, yeah, on their own systems. And then obviously, we all the events get stored in Postgres, what we see a lot of is people then using the data from Postgres to do something else. So really using post Haug as a event collection thing, they use all of our libraries on the websites and on, you know, the iOS apps and whatever to send events to the post hook instance. And then on the back end, they don't really use the post all the analytics, they kind of use the database and and do that way. We do also have a couple of people using us via the API's of se again, kind of same story they set up as an event ingestion thing, and then they use the API's to use our because we have put quite a lot of effort into making API's super powerful, and that obviously, we're using the front end to allow you to do all sorts of analytics. But then, you know, if you want to use the API, obviously Go ahead. Yeah, that's kind of the point of the exercise.
Tobias Macey
0:31:03
And on the point of scale as well, a lot of people, as they get to a certain point of volumes of data, they start to look to data lakes or if they're more interested in sort of responsive analytics, the data warehouse approach, I'm wondering what your thoughts are on the utility of post hog within those contexts, or the overall integration path for exposing post hog as a data source for those different systems either via an ELT integration or things like that.
Tim Glaser
0:31:33
I think the first thing that I would say is, it is amazing how far Postgres can go, for example, I'm fairly sure heap is using, it's still using Postgres, you know, there's a bunch of companies that have, you know, servers with with a terabyte of memory to run a massive kind of Postgres database, so I think Postgres can go much further than people give it credit for and our main focus At the moment, it's just the support, you know, even soupy like quite high volume customers with Postgres. However, we absolutely think that there will be a time You know, we've had inbound interest from, you know, kind of B to C, you know, large, well known b2c companies. And they'll be doing volumes that, you know, even if you pull every trick in the book, you know, you might not get that with Postgres, and they might already have something set up, right. So in those cases, we absolutely want to integrate with with those data warehouses, but we kind of see that as I think the way we're looking at that at the moment is we're we're just going to kind of wait and see until someone approaches us with that question. And then we're going to work with them to to implement that it's not something we think we need to actively, you know, for me at the moment, what I care about most is the individual developer, being able to install it quickly. Being able to get something up and running super quickly. It's stable, it works really well it does everything, mC mC support and we'll do a more so that's where most of our focus is right now. And Then, you know, if someone comes in and wants to use the data warehouse with us will will very happily work with them to make
Tobias Macey
0:33:05
that happen. And the other aspect of building an analytics platform, and you mentioned that you have some pre built dashboards for people who are getting started with post hog to get them up. And running is just the overall aspect of making sure that the presentation of the data and the presentation of how to query and investigate the data is accessible and understandable for the people using it. And I'm wondering what you have seen as being the challenges of providing an interface that is intuitive and still able to be expressive enough for power users who want to be able to dig deep and do their own analysis.
Tim Glaser
0:33:43
Yeah, this is pretty exciting. Yeah, I think I underestimated when starting pay stub, you look at some analytics tools. You're like, Oh, you know, it's just a couple of options. But the real challenge is in the permutations of the options, right. So if you look at device, there's a couple of fields that you can change and actually looks deceptively simple. But it's really, really powerful. And obviously, if you do the maths on the number of different settings you could use to create a graph. It's, it's immense. So kind of testing that and making sure all of those permutations work is quite challenging. But what we found also is that, and again, you know, with some of the other tools, they have, like different views for each variation of a, they have a view for retention, they have a view for aggregates by people, etc. We've actually managed to get all of that in one view that still looks kind of deceptively simple. I think the way we've done that is really just take the kind of the essence of what what do you really, you know, what are us really care about seeing and kind of taking that custom, you know, user centric approach to building it rather than this is what our system is technically capable of.
Tobias Macey
0:34:53
And once you have all the information collected, and you're viewing the results and the graphs that show you the data And trend analysis are the behavioral patterns of your users. What is the useful next step from that? And what are some of the ways that you can provide some useful sort of constructive feedback in terms of this is the structure of the events. And then this is the actual benefit that you can gain by having this knowledge and some concrete actions that you can take to improve the growth and viability of your company or your project or your product
Tim Glaser
0:35:29
that says me if you just started using product analytics, it's surprising how non DPF see go to find insights that will help you build a better product. So, you know, my favorite example is it's, it tends to be really easy to create a funnel in any product analytics platform, and really just by looking at that funnel and seeing where people drop out, you know, double your conversion rate. So if you're, if you've got a you know, webshop or, or a product that you know, takes someone from the homepage All the way to paying to put in their credit card and create that if you create that funnel with all of the steps, and then just see where people drop off the most and then do your best to improve that you can get a huge amount of, you know, huge, huge uptick in conversion rates just from that. And I guess the other the other thing that tends to be really insightful is a lot of people and especially kind of engineers don't know how users use their product. Do you really know like, okay, you have, you know, a sidebar with seven menu items. But do you know, which of those people are actually using, you know, what's the top one or two actions that people do most in your system? And it will surprise you, you know, even obviously, we're using post op to track posts. It always surprises us what people end up using the most. It's screens that we have given no thought to no effort to. Yeah, we slept together and half a day and they sent to be the most popular screen. So being able To then kind of divert some focus, you would have otherwise spend on a screen that no one looks at, you know, suddenly spending on, you know, the second most popular screen. Those are huge advantages. And you know, at the moment, the way our product works, and the way all of the other kind of product analytics products work is you have to sit there and kind of create a dashboard or, you know, filter a graph down to get these insights. And that's useful. And it's especially useful if you're someone who likes spending that time looking at dashboards and like playing around with it as I am. But a lot of engineers, I think, don't want to create a dashboard to work out what's going on in their product. So the next thing that we're working on is basically a toolbar that we, you know, sort of stay in the technical details, but inject it into your website. So if you're logged into post org, it will show you a little icon that you can click over in the toolbar that will show you exactly what people are doing on your website. So it's sort of a heatmap that actually gives you a lot more information that so when you're developing when you're looking at your own app, Your end product, you can see exactly where people drop off, and where they get stuck where they get confused. And also, like, surprisingly, web two are clicking the most. And that might be something totally different from what you thought. So I think that's kind of the next step of this, you know, we focus with post. So first I'm getting the dashboards and the graphs, etc, to kind of gets parity. The next step for us is that toolbox make it to bring the statistics and the analytics to where you're working.
Tobias Macey
0:38:28
Yeah, as I was looking through the roadmap document that you have on your site, and I'll link to it in the show notes, it was interesting to see some of your vision of where you think you can go with post hoc now that you've got this foundational layer of product analytics, it's functional, you're able to collect events and view insights from it, but then being able to incorporate that into the inner loop of the development cycle so that you can surface that information at the time that you're actually building the application and building the platform that you are Building the analytics for and seeing how the analytics interacts. Whereas a lot of times the collection of events and the usage of analytics is a second concern and something that you implement because somebody in the marketing or sales department asked for it not because it's something that you are actually engineering as a first class consideration into the platform. And so I'm wondering what your thoughts are in terms of the capabilities that you can unlock in the overall development of these systems and the ways that post hoc can be integrated into the overall lifecycle of applications?
Tim Glaser
0:39:35
Yeah, yeah, I think this is where it gets really exciting. So like you said, we kind of build the foundational layer now and, you know, we're kind of sort of 80% to parity with with the other tools, and I think it's important 80% Yeah. So the next steps for us are one is that is that toolbar, and that's going to be super crucial, you know, like you said, bringing the data to where you're developing right in the moment that you're developing, not as an afterthought. We think that's super crucial. We also think there's really just and this is kind of, you know, talking longer term, but we really think there's a place for a kind of in the same way Get lab, you know, went from being a GitHub clone to being an all encompassing sort of DevOps, see ICD life cycle behemoth. We think there's a similar place in analytics. So, you know, at the moment, we're happily kind of ingesting all these events, we're gathering some super interesting data, the next obvious basis that we're going to use that looks at the toolbar, but we're also thinking of things like a B testing, for example. So you know, why do you have to buy something like Google Analytics and Optimizely and like, launch darkly for, you know, kind of feature flags? And yeah, so why do you have to use something like Google Analytics Optimizely launch darkly, you know, for feature flags for a B testing, and we think it should be like a single platform to do all of those things and we think it's gonna be that but that is that is a, you know, the kind of future vision of it
Tobias Macey
0:40:57
And in terms of the product itself, it is an open source platform, but you've also built a business around it. So I'm wondering what your plans are in terms of the business model and your overall approach to project governance to ensure that the open source aspects continue to be useful and attractive to people who are finding it, but at the same time sustainable and something that you're able to run with in the long term.
Tim Glaser
0:41:23
We haven't been around for very long, but at the moment, our core focus is our open source product. So yeah, we want to make sure that that foundational layer is there, and that works great. And it's really pleasant to us. We want those, you know, that toolbar that we talked about that to the amazing and that's a work really well, and we want you know, an individual developer to be able to at any size company, you know, from a from a two person startup, all the way to that you visit the world to be able to pick up our tool and start using it straightaway. That's where we're going to start and that's where our focus is going to be. And you know, we we have investors who are who totally understand that and And that's going to be our focus in terms of monetization, we never want to charge individual developers, we don't want to, you know, charge very small teams, we think the people that will be able to pay for what we're building, and who will get the most value out of it, in that sense, is the large enterprises. And that's why we're talking about, they're kind of an all encompassing platform where we have things like a B tests, we have things like feature flags, and those, there's really only come into play when you're a larger organization. And, you know, if you're in your sort of proverbial bedroom, coding up your first site, AV testing is not gonna, it's not that relevant to you. So that's kind of how we're thinking of segmenting it. But the the core stuff that we're working on now, yeah, that that kind of analytics capability event ingestion, that toolbar, that's all going to be you know, free forever, and that's going to be very much open source. And
Tobias Macey
0:42:52
as far as the landscape and the exercise of building a product analytics platform, what are some of the most complex, complicated or missing Understood aspects we have encountered,
Tim Glaser
0:43:01
the biggest one is probably the biggest problem, the biggest ones probably usability more so than, obviously, all of you inserting millions of events a day is kind of an interesting technical challenge. In the end, it tends to be, you know, a little bit of optimization and a little bit of just buy bigger service, the real challenge is building an analytics tool that everyone can use. And that's why we're thinking about these things like the toolbar, etc. Because we want it to be really easy to unlock kind of insights from our products. And the way to do that we think, you know, is not going to be that's not going to be easy. I'm not you know, I'm personally not a great UX designer, for example. So it's it's a real, like concentrated effort from everyone to make sure that we build like a best in class analytics platform that's wonderful to use. And
Tobias Macey
0:43:53
then as far as your experience of building post hog, what have you found to be the most interesting or Unexpected or challenging lessons that you've learned in the process.
Tim Glaser
0:44:03
So before this, I'd never built like an open source project, you know, maybe a contribution here and there but hadn't really contributed much open source hadn't worked much with open source. And you know, open source is amazing. And I know, I'm about 20 years late to the party, but especially if you're building something that's meant for developers, we basically tell everyone, you know, founders that we meet with, we tell them like, try open source because the quality of feedback is so much better than if you're you're doing kind of a SaaS application that's stuck behind a paywall, because developers can just pick it up in developers really picky people, they will tear apart what you've done, and they'll give some really honest, direct feedback. And it's great because that's the only way you can make your product better. And you know, we get hundreds of bits of feedback versus you know, if we had to go out and sell this like one on one. We have gotten like tons of bits of feedback and it wouldn't be very honest because they would like be trying to drive us down and price or whatever it is. So having something that's open source is a great way to build a product. And open source community is great. And you know, we've kind of, you know, we're kind of building this company with that ethos as well. So our handbook is online in some way that you know, get Labs is like, we're super transparent about a roadmap about what we're thinking about how we work. And we want to just keep that up and make sure that, yeah, we really give back to the open source community. But that's been that's been the most amazing thing about building posts.
Tobias Macey
0:45:29
And when is post hug the wrong choice, and somebody would be better served using the Google Analytics of the world or some of the other open source offerings that are maybe more limited in scope.
Tim Glaser
0:45:40
So go on Netflix and special shout out to rob see kind of the open source equivalent of that. those are those are great options. If you know, like I said, you have kind of a website where you care more about things like sessions and clicks and you know, how long are people spending on my site and on average, you know, What is the most popular article on my website? And where are my visitors coming from in the world? And those kind of questions, they tend to be a lot better at asking it, you know, post or could be possibly overkill for these use cases. So yeah, you know, there's a bunch of ways that those tools will be better.
Tobias Macey
0:46:16
And are there any other aspects of your work on post hog or the process of building out a product analytics platform or its utility or just anything else about the topic at hand that we didn't discuss the you'd like to cover before we close out the show?
Unknown
0:46:30
I think we covered it I you know, I do want to give possibly another shout out to doing something like this open source, even if you are planning on eventually making a paid and closed source project, but, you know, open source has just been a Revelation, Revelation to us. So that's, that's been the kind of best thing to come
Tobias Macey
0:46:48
out of this. Alright, well, for anybody who wants to get in touch with you and follow along with the work that you're doing or give it a try and contribute. I'll help you add your preferred contact information to the show notes. And with that, I'll move Listen to the pics and this week I'm going to choose The Hitchhiker's Guide to the Galaxy. It's a great book, I've read it a couple of times and just recently started revisiting it with my family and an audiobook. So just always worth going back to and reading it for the first time or reading it again, if it's been a little while, so definitely recommend that if you're looking for something to keep you entertained. And with that, I'll pass it to you, Tim, do you have any pics this week?
Unknown
0:47:22
Yeah, so one book I recently read is his triumph of the city, by Edward Glaeser. No relation, you suppose lost your surname definitely as well. But it's a it's about 10 years old. And it talks about, you know, the way to a sustainable poverty free world is cities and especially now with obviously, Corona and a lot of chatter on Twitter about people moving to bonds in Kansas, or whatever. It's it's a really interesting read and reminding us like why people go to cities in the first place and why there's so there's such a great way of being really environmentally friendly and you know, A great way of lifting people out of poverty. And so that was a really good read. All right,
Tobias Macey
0:48:04
well, thank you very much for taking the time today to join me and discuss the work that you're doing on post hog. It's definitely a very interesting product and one that I intend to start experimenting with and using for my own purposes, and probably at my work as well. So thank you for all your time and effort on that and I hope you enjoy the rest of your day.
Tim Glaser
0:48:22
Yeah, thanks very much for having me. Have a good day.
Tobias Macey
0:48:26
Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast comm for the latest on modern data management, and visit the site at Python podcast calm to subscribe to the show, sign up for the mailing list and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at podcast and a.com with your story. To help other people find the show please leave a review on iTunes and tell your friends and co workers
Liked it? Take a second to support Podcast.__init__ on Patreon!