From Simple Script To Beautiful Web Application With Streamlit - Episode 238

Summary

Building well designed and easy to use web applications requires a significant amount of knowledge and experience across a range of domains. This can act as an impediment to engineers who primarily work in so-called back-end technologies such as machine learning and systems administration. In this episode Adrien Treuille describes how the Streamlit framework empowers anyone who is comfortable writing Python scripts to create beautiful applications to share their work and make it accessible to their colleagues and customers. If you have ever struggled with hacking together a simple web application to make a useful script self-service then give this episode a listen and then go experiment with how Streamlit can level up your work.

linode-banner-sponsor-largeDo you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? Check out Linode at linode.com/podcastinit or use the code podcastinit2020 and get a $20 credit to try out their fast and reliable Linux virtual servers. They’ve got lightning fast networking and SSD servers with plenty of power and storage to run whatever you want to experiment on.


What happens when your expanding log & event data threatens to topple your Elasticsearch strategy? Whether you’re running your own ELK Stack or leveraging an Elasticsearch-based service, unexpected costs and data retention limits quickly mount.  Now try CHAOSSEARCH.  Run your entire logging infrastructure on your AWS S3.  Never move your data. Fully managed service.  Half the cost of Elasticsearch. Check out this short video overview of CHAOSSEARCH today!  Forget Elasticsearch! Try CHAOSSEARCH – search analytics on your AWS S3.



Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • Having all of your logs and event data in one place makes your life easier when something breaks, unless that something is your Elastic Search cluster because it’s storing too much data. CHAOSSEARCH frees you from having to worry about data retention, unexpected failures, and expanding operating costs. They give you a fully managed service to search and analyze all of your logs in S3, entirely under your control, all for half the cost of running your own Elastic Search cluster or using a hosted platform. Try it out for yourself at pythonpodcast.com/chaossearch and don’t forget to thank them for supporting the show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host as usual is Tobias Macey and today I’m interviewing Adrien Treuille about Streamlit, an open source app framework built for machine learning and data science teams

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by explaining what Streamlit is and its origin story?
  • What are some of the types of applications that are commonly built by data teams and who are the typical consumers of those projects?
  • What are some of the challenges or complications that are unique to this problem space?
  • What are some of the complications or challenges that you have faced to integrate Streamlit with so many different machine learning frameworks?
  • Can you describe the technical implementation of Streamlit and how it has evolved since you began working on it?
    • How did you approach the design of the API and development workflow to tailor it for the needs and capabilities of machine learning engineers?
    • If you were to start the project from scratch today what would you do differently?
  • What is a typical workflow for someone working on a machine learning application and how does Streamlit fit in?
    • What are some of the types of tools or processes that it replaces?
  • What are some of the most interesting or unexpected ways that you have seen Streamlit used?
  • What have you found to be some of the most challenging or unexpected aspects of building and evolving Streamlit?
  • How do you see Python evolving in light of Streamlit and other work in the machine learning space?
  • What do you have in store for the future of Streamlit or any adjacent products and services?
  • How are you approaching the governance and sustainability of the Streamlit open source project?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Click here to read the raw transcript...
Tobias Macey
0:00:12
Hello, welcome to podcast, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at the node. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network all controlled by a brand new API, you'll get everything you need to scale up. For your tasks that need fast computation such as training machine learning models, they just launched dedicated CPU instances. They also have a new object storage service to make storing data for your apps even easier. Go to Python podcast.com slash Linux that's LI and OD today to get a $20 credit and launch a new server and under a minute, and don't forget to thank them for their continued support of the This show, having all of your logs and event data in one place makes your life easier when something breaks unless that something is your Elasticsearch cluster because it's storing too much data. Chaos search frees you from having to worry about data retention, unexpected failures and expanding operating costs. They give you a fully managed service to search and analyze all of your logs from s3 entirely under your control all for half the cost of running your own Elasticsearch cluster or using a hosted platform. Try it out for yourself at Python podcast.com slash chaos search and don't forget to thank them for supporting the show. You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet listen and learn from your peers you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media cranium global intelligence Alexia data Council. Upcoming events include the data orchestra summit in data Council in New York City. Go to Python podcast.com slash conferences to learn more about these and other events and take advantage of our partner discounts to save money when you register today. Your host, as usual is Tobias Macey. And today I'm interviewing Adrian Troy about streamlined and open source app framework built for machine learning and data science teams. And so Adrian, can you start by introducing yourself? Hi,
Adrien Treuille
0:02:23
yeah. So I'm Adrian. I started streaming about a year ago with some friends of mine. And we launched it five weeks ago. It's been really exciting. And before that, I was a professor at Carnegie Mellon University. I ran a pretty big AI project at Google X. And I was vice president of simulation at zoox, which is a unicorn a self driving car startup.
Tobias Macey
0:02:47
And do you remember how you first got introduced to Python?
Adrien Treuille
0:02:50
Yes, I do. I remember very clearly. So the very first time I heard of Python, I was in my first year of computer science grad school. It was 2001 And I was you know, I thought Java was so cool because it was the garbage collected and stuff. And I was so much better than c++. Anyway, and I'm in this class, and someone mentioned this language called Python. And none of us had heard of it. And until into what's Python and I remember this grad student said, Really derisively he was like, Oh, it's this weird language were white space is important. And we, you know, he was he obviously thought that was like that really just the stupidest idea in the world. And I guess I sort of agreed with him for no reason. But then at some next summer, I went to IBM in Cambridge. And a friend of mine who was super smart, this amazing Cornell undergrad said, Oh, you should check out this Python thing. It's you know, I'm doing all my work in it. And so I started playing with it and I fell in love and Python really is the language that carried me through a PhD and then professorship and and then into the corporate world. So I feel like I've been there not from the start, but from early days and non pi pi, all those things I, I learned about them and love them along the way.
Tobias Macey
0:04:11
And at this point, now you have decided to start a business and a project based on Python using it for your core product. And so I'm wondering if you can talk a bit about what the stream that product is, and some of the origin story and your inspiration for creating it.
Adrien Treuille
0:04:27
Yeah. So what stream it is, is an app framework, really, for the whole Python language. But we started with ml engineers and data scientists. And that's because that's really what our background was. And we had a lot of experience in this kind of numerical computing. And we one of the things that I noticed, basically, at Carnegie Mellon, Dukes and at Google is that actually, if you're if you're in a Python dev team, you spend a lot of time building web tools. So like in the Soccer World project we had internal web tools to, you know, search the entire image data set, huge number of images, we would like run models in real time on the images, we would run simulations, we'd run comparisons between multiple simulations, and we'd had a scenario search engine. And so these were really the tools that were kind of like the lifeblood of the project. And they really kept everyone aligned together. And they also, you know, extended into the ops group that people were driving cars around and needed to know what their schedules were into the executives. And the observation was, it was really difficult to build these tools. And either they were super ad hoc. And you know, they were like Jupyter notebooks and said they weren't really usable by the group. Or if they became important, we would call him the tools team, which were a group of engineers who were really had a specialty in reactive view and server architecture and stuff and they would sort of blast a tool and then make it this really beautiful polished thing, which was actually amazing. But it they, you know, then they'd say, well, we have to, now we have to, we'll get back to you in two months, because now we're working on the next tool. And so the ML engineers were really sort of, in many ways disintermediated from the rest of the company, by this by this barrier. So, so that was really a lot of the original thinking behind stream live. And I think once we released it, I guess five weeks ago now, it was really sort of encouraging to see basically, that the community immediately responding and saying, yes, this is super real. And this is it's awesome that this exists now. And so that's been very validating.
Tobias Macey
0:06:44
And I like what you were saying to about the fact that these applications were valuable to people outside of the data team specifically and to people who just needed to be able to have a view of what the scheduling was for a particular thing and it touches on some of the issues that we have more broadly. in software engineering, where the capabilities of what we can build with software is valuable to virtually everyone. But because of the high barrier to entry that we've created with needing to know so many different levels of the stack to build something effective, it prevents a lot of people from even exploring that space and wanting to build their own tools. And so it's nice to see that streamline is another entry to be able to provide that capability, who has some facility with programming, but doesn't necessarily want to know and understand the entire space of building a beautifully designed web application?
Adrien Treuille
0:07:38
Yeah, I think that's exactly that's exactly the spirit in which we design the software. You know, and it's really true that the, you know, the data scientists and ml engineers and it's also applies to data engineers and you know, DevOps often have, they're doing this hard and amazing work. And yet because of the way these different tech stocks are built their story disintermediated from their customers. And so, stream it is, it's very simple in some ways, but it's kind of like a little superpower that turns any Python programmer program into an interactive app that then can, you know, really project to the programmers power throughout the organization. And we're seeing a lot of uses outside of just the ML group as well. So that's really exciting.
Tobias Macey
0:08:23
So for somebody who is building an app with streamline, what are some of the types of widgets that you support and the types of applications that you've seen built commonly by data teams, and some of the typical consumers of those applications?
Adrien Treuille
0:08:39
So Well, first of all, I have to give a shout out to Jupiter. Because they really like lead the way for having this amazing widget support in, in the Python community really. And so now a lot of great JavaScript libraries, like Dec GL, which is Uber's amazing geographic visualization library has Python bindings and the reason is because of Jupyter. Basically and a stream it's, we it's a it's a different use case from Jupyter. We we actually use both side by side to Jupiter is really for interactive exploration and and disseminating ideas. And it has many use cases actually, I'm streaming it's really for app building. But it turns out that we were really rapidly able to assimilate almost all of the major visualization libraries industry. So you know, Doc GL Matt plot lib plot Lee seaborne let's see, I'm missing a whole bunch of Altair, which is an amazing library. And then we have a bunch of the basically the standard widgets, so you know, various kinds of inputs, sliders, text, input, data, inputs, those kinds of things. And, and those are sort of the basic, you know, atoms in the periodic table of the stream. And then the real innovation is in the ability to mix and match those sort of almost instantly without having to define a complex declarative, you know, web layout with that gives and spans and all these HTML and CSS things just really write it as an ordinary Python script. So in that sense, so you know, that's allowed us to see a whole bunch of applications, I'm happy to share with you some of the some of the ones that we've seen if you're interested.
Tobias Macey
0:10:23
Yeah, we can dig a bit more into some of the interesting examples later. In the meantime, I'd be interested to dig into some of the challenges and complications that you have run into that you feel are unique to this problem space of building an easy to use application framework for people who don't necessarily have a lot of front end experience or the time and inclination to dig deep into that area.
Adrien Treuille
0:10:49
That to me is kind of the central challenge of all a streamlined and it's really the animating question that that drove us to build it. And I think that the key thing How could we make building web tools as easy as writing Python scripts. And the basic idea here is that in a very logical sense, a web tool or you know, web app on a phone, it's really described as this declarative set of widgets, which are then wired together, sort of reactive Lee in order to create sort of a UI experience of some kind. and streamline our starting point is actually a Python script. So something that executes from top to bottom. And what we wanted to do is let you weave gooey code into that logic without actually subverting or inverting that logic at all. And then and come out with a an app that you can that you can use. And I think the the, it's it perhaps a slightly subversive thing to do, but I think it's a very very Pythonesque thing to do actually. And, you know, we always strove to make their to be easy, and one way to do things, or whatever that Python code is, and the response has been great. Like extraordinary, I mean, 10s of thousands of apps have been created in our five week lifespan. And, and one of the really amazing things has just been to see dozens of tweets coming out per day of people saying, Look at this app I made and 50 lines of code or 70 lines of code, or I just turned by Python script into an app that's deployed on her Roku. You know, overnight, the stream that hype is real. And that's just been so cool and exciting. And it really actually completely exceeded our expectations by a large, large margin. So yeah, I feel like people resonated with this approach to looking at app development from the perch of a Python programmer.
Tobias Macey
0:12:46
And the fact that you originally conceived this as targeting machine learning engineers and data scientists is exhibited by the fact that you have some strong integration with a number of different machine learning libraries, and curious what you have seen as far as any challenges of being able to cleanly represent those bindings, given the fact that there are so many different libraries that you're working with that might have conflicting views of what a typical workflow might be, or what the necessary bindings are for being able to wire it up to front end component, for instance? Yeah,
Adrien Treuille
0:13:21
that's a great question. So indeed, we do have sort of our first class citizens include a lot of the basic things that you'll come across in machine learning and data science. So data frames Empire is we do a lot of stuff with pytorch, TensorFlow. And those were really our guiding examples as we were building streamer, too. So we feel, you know, most comfortable endorsing it to use in those use cases. And I think to your question of, you know, what was hard and easy. On the one hand, it was really easy. And on the other hand, there were some tricky things and the that easy, and it's just comes from starting with Python itself. I mean, you know, probably Thought it has become like baxi last the, the sort of glue language of of all of these different ideas. And so people come to us and they say, well is extremely compatible with spark? Is it compatible with TensorFlow is it and we're like it's compatible with anything that pythons compatible, because it's pure Python. And that's just an amazing superpower. I mean, other app frameworks that that we see coming out in the startup space, for example, are, you know, SAS platforms, and then every single integration they have to write themselves and, and wrap in the language and we just, we present pure unadulterated Python to our users and let them do whatever they want, which is just so exciting.
Tobias Macey
0:14:39
So digging deeper into stream live itself, can you describe a bit about how its implemented and some of the ways that it's evolved since you first began implementing and iterating on it?
Adrien Treuille
0:14:49
So in terms of how extremely it's implemented, the the basic idea is that instead of saying Python, your Python five Use a stream that run your Python file. And what that does is we take it, we take your Python file, we import all of the imported libraries and, and then we run it in what we call a script runner. And what this script runner does is connect. First of all, it allows your script to connect live to a web browser via web socket connection. So you can transfer information back and forth to the web browser, just purely using only Python calls. And all of the details of this is completely hidden from the user. And the other thing that allows us to do is rerun your script really efficiently in case anything changes on the web browser. So it gives you kind of like an interactive view into a static Python script, which is really the stream magic. And that was the core idea from the start. I mean, that streaming actually was a solo programming project and not a company in The early days. And we was that was the core idea that we were working on. What happened was really early on a bunch of engineers first from Uber and then Stitch Fix, and some other great companies started using it and gave us feedback. And so in a sense, since we, you know, we had real users, even though there were only two or three in the beginning, it, it was sort of a crowdsourced development process. We regularly met with them. And they showed us what they were doing. And they told us what they wanted. And really, most of the best ideas and stream lead came from our users. And so now that we've launched, it's, it's actually just grown exponentially. So at this point, there are 10s of thousands of people have used dream, but just in the first five weeks, and we, you know, I think one of the things that's very much a concern for us in the company is how can we keep this cadence of listening to the community and keep this development approach that you scaled it up so that people feel like their voices are being heard and so that multiple voices can and information from multiple sources can be assimilated scalar play so. So it was sort of crowd sourced from the start. And now we're struggling to keep up with the community and scale our processes.
Tobias Macey
0:17:18
One of the challenges of building a tool like this is identifying what the user facing API should look like to make sure that it is approachable and usable, but at the same time, sufficiently expressive for people to be able to build the types of applications that they want to without having to dig too deep into the guts of it.
Adrien Treuille
0:17:37
Yeah, that's something that we've really approached with, I guess, I would say, a huge amount of care and actually a lot of work. We've literally used every competing app framework that we could find on the web, and we read detailed notes on how they all worked and what we liked and what we didn't like and then we We work really, really closely with the community and the ordering of arguments to functions. And so I think, and also just making things work, right the first time as you would expect, which is, of course, the hardest thing to do, it's really easy to say, Oh, well, you know, that's user error, or if they want to have it work this other way, then we'll just add another 16 arguments to this function and let them you know, configure everything, but the the cost of that is really is is paid and the complexity of the API and, you know, that's, that's not we recognize that our users brain cycles are really extremely valuable. And, and so we we worked really hard and and for that matter, broke compatibility a few times along the way, in order to get the API right. And that's that's still an ongoing process. I think that that streaming is really in its MVP phase. Right now, and just having this flood of people come in and tell us this, this didn't work and that didn't work has really sharpened our, our picture of where we were stream that needs more attention and what parts were working and, and so we're actually, you know, our roadmap over the next six eight months is, is really, really clear and features that we'd like to bring out and ways that we want to empower the community to build, you know, richer, faster, more beautiful apps quickly. And and I guess I would say to the community of users, first of all, thank you for, for telling us what's broken and what's wrong. It's so valuable. And also, please help us we've started seeing pull requests trickle in and users teaching one another tricks about streaming that we didn't even realize. And that's just been so cool. And we it's just so so so cool. And it's it's necessary for to keep the community growing. So So yeah.
Tobias Macey
0:20:00
The choice of Python seems fairly natural, given the initial community that you were targeting because of the fact that it's so widely used in the data space, but for data engineers and for data scientists, but from the perspective of the project itself, and the way that you have engineered it, if you were to start it over today, knowing what you do now, what is it that you think you would do differently, either in terms of the overall system design or in the early efforts of building and promoting it?
Adrien Treuille
0:20:29
Yeah, I think we technically stream that is language agnostic, actually, in the sense that the underlying data layer that intercedes between the browser portion, and the server portion is written in. It's actually written in Google protobuf. So it's sort of a language agnostic layer. That said, we you know, we are Python, he says, that's our background and it was certainly informed by our you know, our experiences. As Python programmers and so we we actually that the notion of opening it up to other languages, I think makes sense and is very exciting. And I think the treatment model totally works in other languages. But but we are really committed to Python right now. And also, it's just, it's just such a great language, it is really great to, you know, be able to write just a small library in some sense, and then have it be super powered by the insane reach and sort of compatibility of Python based almost unmatched in the, in the language world in terms of what we would have done differently. I think if if you look at early stream live, it looks totally different than it does now. So in a sense, I think we have taken the opportunity along the way of changing things when we were wrong, basically And so from it, I'm actually sort of happy with where it is now. But it but that's because we we really rewrote it along the way, a few times. And and, you know, we, we we rewrote even the API's when we when we realized they were confusing and stuff, so we weren't afraid of that. Now, of course. Now, the big problem is that we don't want to do that to to our, our user community because Python is really being used in production now. So I think that we are going to have to be more careful as we add new features and I think exercise additional judiciousness because we will I think it's an important value to sort of maintain backwards compatibility or at least be very cognizant of the cost of of breaking changes.
Tobias Macey
0:22:53
And for somebody who is building on top of streamline it, can you discuss a bit more detail about the overall workflow What's involved in designing their script to be compatible with what streamlined is expecting and some of the model as far as how you would go about deploying it for use by other people.
Adrien Treuille
0:23:11
So I think the most important point is that we don't expect you to, or we don't expect the user to, like clean slate right stream app are many of our use cases are people who already have existing Python scripts for training a model or for running a model on some data set. And so what we try to do is let the user instrument their script graphically. So for example, anytime you have a variable in Python, so you could say x equals three in stream that you can simply just remove that three and say x equals st dot slider. And now that x is a slider, that can be changed and it all of the downtown stream computation will be executed properly as a result of that. And so, and that doesn't apply just the numbers, you can have all kinds of different inputs. And you can actually even get into various kinds of control flow buttons and checkboxes and stuff and, and in a funny way, because the flow of a stream, that app follows the logical flow of a Python program, the UI follows that flow also. And so a very funny and sort of mysterious aspect of streaming that sort of love to write a blog post about it if I can manage to crystallize his idea is how stream that goolies tends to be logical from the users perspective, without a huge amount of effort or design required. Now, of course, you can also just clean slate, a streamlined app. And we we do that, for example, all the time, all of our dashboard. words for example of looking at downloads and GitHub stars and all the ways in which we that the telemetry on the stream that itself, we we have written our own dashboards for. And that's a first class dream without that we wrote that lets us look at it understand these things. So so so you can really do it both ways. And And certainly, you know, intentionally writing a stream, that app makes it possible to also think a little bit about how to write things quickly. We have some caching technology that allows you to save computation and reuse it across runs. So there's, there's lots of neat stuff you can do there. Then, as far as the the final part of the story arc is deployment. And right now, we don't have a solution to that. We're actually working on a solution which we're calling streaming for teams. And that's something that is designed to be a sort of enterprise version of stream live, but at the same time, the community has now written you know, probably one or two dozen articles about how to deploy stream that on EC to on her Roku. And so that's great too. And we really encourage that. And we've been reading those articles carefully. So thank you guys for writing them. And we want to make sure that they're also great open source solutions sort of playing streaming.
Tobias Macey
0:26:16
And one of the things that I'm working to understand is because of the fact that you're building on top of scripts that have a logical progression of run from start to finish. And in terms of my experience of building web apps, they're generally run in some sort of demonizing process. I'm curious how that affects the way that you run the stream live application and how you make sure that it's always available for user input. And then also some of the challenges as far as trying to make some of these stream live applications multi user capable where you might have more than one person interacting with it at a given time. So that is really the
Adrien Treuille
0:26:58
tack of streamlined Exactly solving the challenge. And so the the actual coding that we do not the design, not the API design and all that stuff, the API design is just, you know, designed to be very simple. There's actually very few function calls in the stream that there's probably a couple dozen. But underneath the hood, a great deal is happening to to enable exactly what you're describing. And so that's one of the reasons why we don't just Python run your script we stream that run it is because in fact stream that creates a server, a multi threaded server, every time your script is run it in a separate thread that's isolated from all the others, we preempt threads when, when event come in. We have our own sort of queuing system on top of the web socket layer, which allows events to go from Python to to the web browser and back and then we do a great deal of caching and D duping to make everything fast. So for example, if the user changes and input like in that example, Before x equals three, it becomes a slider. And we recognize that the, the graphical elements above that point in your script haven't changed it to look very much like react in some ways. We have a hashing caching and D duping happening at almost every layer of the stream lit to to sort of make your app as performant as possible. And that that is a little bit of deep magic. And at times, it doesn't quite do what the user wants or, or you know, it requires a little bit of sophistication to understand why this might be fast or slow. And so that's actually something that we're thinking a lot about right now. And I think we have some really neat features coming out in the next few months, which are going to get even more simplify and and have it work right the first time every time.
Tobias Macey
0:28:46
Yeah, particularly in the slider case, I can imagine that there's a fair bit of difficulty in terms of d bouncing the signal so that if somebody is playing with the slider a bunch and they haven't really settled on what they want the actual input to be that you're not just constantly sending Those signals back and forth between the app and causing it to be preempted so many times.
Adrien Treuille
0:29:04
Yeah, yeah, that's that's a great point. And oh, and I and I, and I didn't also address the multi user aspect of what you're saying. But indeed, we have isolated we call them sessions. So that two users sort of don't see one another, even though they're running on this hand Python process. So there's a there's a lot of really interesting stuff going on under the hood. And if I do say, we also, we also, you know, I think some really wonderful engineers, most of us are, you know, from from great companies came and worked on stream live and contributed and helped form the company. And we, we love the code two, and it's very tested and commented and stuff. And so, if someone's really interested in how this all works, we would we would love to answer questions on the forums and encourage you to actually read the code. It's all on GitHub. It's all you know, open source, permissive license, so so you know people are people are welcome to poke around and see see how we're doing this. But yeah, you're You're totally right, there's questions of not over flooding the event queue, because that would not be a good thing.
0:30:09
That would break the illusion.
Tobias Macey
0:30:11
And then, in terms of more advanced uses, so for somebody who's writing a simple script, they want a simple application. It's fairly straightforward as to how they would go about that. But what I'm wondering is for maybe the case of you with your dashboard for all these different metrics that you're using to measure the success of stream let wondering if you have any capacity for being able to compose together multiple apps that somebody has built with streamline it into a single overall experience.
Adrien Treuille
0:30:40
Yeah, so totally. And in fact, there's a really cool app that someone created called awesome stream lit. I think it's awesome dash stream live.org. And it's sort of a meta app, in the sense that other people can commit apps to it. And then you can run them and see different examples of of of little code snippets and how they execute industry. That's really cool. So So that's an example of that. And in fact, the creator of awesome stream that Mark has been on our forums and has really been sort of pushing the the limits of how do you do multi page apps and stream that's on, you know, applications with lots and lots of files. And so we've actually been learning a lot from his experience and improving streamed live so. So it is absolutely possible to create those kinds of complex apps. And, and moreover, as we gain more experience in building them, we are sort of adding even more sugar to make it a really, really fun experience.
Tobias Macey
0:31:37
For people who are first coming to stream live. I'm curious, what are some of the types of feedback that you've seen as far as the types of tools or processes that they had been using previously that they've been able to replace with stream live and some of the other types of systems or frameworks that you consider to be in the same type of space that you can either use collaborative Or that stream that might replace or supplant?
Adrien Treuille
0:32:05
Okay, I'm sorry. Can you take up the first part of question? I apologize.
Tobias Macey
0:32:10
Sure. Just curious what you have seen in terms of feedback of people who are coming new to stream let who had existing workflows or processes, what types of technologies or workflows they are replacing with stream look?
Adrien Treuille
0:32:28
Yeah, so the, I think that there are a bunch of adjacent technologies that sort of overlap one another in the same way that, you know, a Jupiter notebook and an Excel spreadsheet overlap, but you could do the same things in both, but also they have distinct centers of gravity. And similarly, you know, you could do interactive data exploration and stream live, but I would probably recommend Jupyter for that. You could also write an app in in Jupyter. But we think stream is a is a is a better experience for that kind of thing. So there's all these kinds of overlapping things. But I think, you know, in my experience, the the, the thing that we actually had in mind the most was flask, which is, we really saw a lot of ML engineers, especially saying, you know, I just trained this model on this data set. And now I created a flask endpoint that you can go to, and type in all kinds of URL parameters. And then I'm going to, like barf out a bunch of HTML, that tells you about whatever it is you're interested in. And that's the tool that the whole team uses in there. They all think it's amazing, right? And so we were really thinking about how to make actually flask is is an amazing framework. And actually, we're thinking of using it industry, but but in that use case, it was really there was a sort of major impedance mismatch. And so so we were thinking about, you know, how do you inter leave a, you know, neural net and all machine learning code into an interactive app in those kinds of use cases, in terms of other adjacent technologies. You know, there are, plot lead dash is really cool. It's much more customizable visually than stream lit and has a different kind of sort of event model than we do. And then if you're coming from the art world, shiny is really cool. And let's see, I think there's a panel. There's one law from Jupiter. And there are a bunch of other things. So I think there's I think there's a growing sort of agreement, that there is really a use case here that's been under appreciated. But I think also stream that has a sort of a unique place in that firmament.
Tobias Macey
0:34:35
In terms of some of the uses of stream let you mentioned that there have been a number of interesting or innovative ways that people have leverage data. And I'm curious, what are some of the most notable that you think are worth calling out or some of the lessons that you've learned as a result of seeing the ways that people have been building stream let that you didn't necessarily think were possible or plausible?
Adrien Treuille
0:34:58
Yeah, I mean, yeah, as I mentioned, We really started in our own mind with sort of internal tooling for MLNDS teams. And we've just seen this sort of explosion of cool apps being posted. And in fact, a better answer might be just to like, go to Twitter and search for stream lead and see what people are putting up there. We've seen people build like explainer demos to help show off their models, you know, a cool NLP model or something. We've seen people show off their just their GitHub repos, you know, here's a useful repo to do XYZ. And oh, if you want to see how it works, just run this stream what happened and all of a sudden, you'll be able to play really easily with my with my code. And let's see, you know, we've seen people create dashboards for marketing teams that's been really actually interesting for us is to see, for example, we're working with one company, which is where the researchers are building a recommendation engine for the sales team and doing it in stream that allows them to basically Directly create this app for the sales team disintermediated by any kind of other, you know, app building team. And so that means that not only is time to market much faster, but the iteration cycle on making changes to the app is really much shorter tools for operations team to view data as it comes off self driving car annotation tools, somebody created an app, which lets you see all of the speakers at interrupts the AI conference. We've seen demos of people's AI research. So yeah, there's a lot of a lot of cool stuff out there.
Tobias Macey
0:36:34
And in your own experience of building and evolving streamline, what have you found to be some of the most challenging or unexpected aspects of the technical implementation or lessons that you've learned in the process?
Adrien Treuille
0:36:47
Well, I mean, man, we understand Python way better. We ever thought we were going to, I think, actually to your earlier question about the different ml frameworks. One thing that has the probably the most challenging thing has been TensorFlow and pytorch. Because TensorFlow and pytorch, each, in their own way, are doing some deep, deep magic in Python. And they are kind of subverting the language to their own for their own meaning. So, you know, typically, in TensorFlow, a variable x isn't really a variable in the Python sense, it's really a pointer to a to this graph of computation, which will be executed at a later time. And that has been tricky to weave into stream that because in a sense, like those libraries, we also do some very deep magic, and sort of subvert some of the naive assumptions about how a Python program might work. But, so so that's been that's been a really interesting challenge. And I think one of the goals has been to not just brute force solve those problems, but solve them elegantly. So that that again, the idea was always that You wouldn't write a streamlined app with the idea of writing an app, you would write it with the idea that you'd already done something else. And you wanted to make it interactive in some way. And we wanted to make that just a super fast process. So that's, that's been really fun. And of course, you know, Python two and three, we have still a bunch of people who use Python two. And that's a very, very intricate thing to do sort of complex python programming as a library builder, because you have to support both simultaneously,
Tobias Macey
0:38:26
in light of your work on digging into some of the guts of Python and making it do things that are potentially unexpected or out of the bounds of normalcy and your experience of seeing the same being done and things like pytorch and TensorFlow, I'm curious what you see as being some of the future evolutionary paths of the language and the overall ecosystem for people who are working with stream live or in the machine learning space or just broadly within Python.
Adrien Treuille
0:38:55
We've really been trying to create really simple API is that often do complicated things and, and at times we find ourselves running up against the limits of the language itself, I must say just the syntactic limit of the language. Sometimes, it's difficult to express some things in Python. And so as it as a very simple example, there's not really a notion of blocks. The closest thing is sort of function decorators or if statement or with statements. And the interest, the sort of Venn diagram of those three constructs, leaves a lot of room actually for interesting things that you might want to do naturally. And so we we think really, really hard about how to create syntactic constructs that will make sense in the Python programmers mind that also allow us to do these interesting things that we want and so as you know, as the as the language as a stream, it has evolved. We really been thinking about what our Some ways that like unit maybe like multi line anonymous functions or sort of more generalized syntactic structure, also various pre processing things on a Python script that we could do to, to to make the language a very natural and beautiful fit for these use cases. Of course, while trying to maintain that just incredible simplicity and understand the ability that at the core of Python, so that's a that's a big challenge. And I think it's one that Python has historically been exceptional at. And and so we I think we want to tread very delicately there, but But certainly, we're running up against some interesting, both logical, and our sort of operational and also syntactic limitations to the language and so excited to see how those things evolve over time and hopefully play a role and I'm actually
Tobias Macey
0:40:57
we could and for stream Linux What do you have in store for the future of the project and any of the adjacent products and services that you're looking to build to tie into its ecosystem?
Adrien Treuille
0:41:09
Yeah. So, so exciting. So, first of all, if you are extremely user, a lot of people have had problems with caching, which is one of our big features into we're working on a lot of improvements there. So we've heard you and we're, we're doing some cool stuff. Then we have a bunch of features that we think are really going to expand the range of possible stream of apps. So we have we're working on horizontal layout and other kinds of layout primitives. And also on better handling for app state. Making it will make stream that were useful for like annotation problems, for example. There's a bunch of really cool features that that the communities asked for that are sort of smaller and but we really get why they're important to people and we're just are trying to make sure that we save enough space in our cycles to, to develop those and get those out. And also, I'd encourage people who are excited about adding this or that feature to stream it to reach out, we really would love community help. And then the two big the two big ones which are coming down the middle are a plugin system, which will basically allow people to write their own react components and and then why are those industry and let sort of seamlessly and the Auto Deploy stream at four teams, which, which is the enterprise solution for stream and also something that we hope to figure out a way to give for free to the community as well. So that's, that's something that we're working on, both technically a lot, but also from a sort of a business perspective. We want to keep the lights on but we also want to get streamed out to as many people as possible. So those that's kind of the roadmap right now. And each one of those, I think, in its own way, adds a new dimension to streaming. I'm just so excited about so you know, right now it's it's a square, but the next feature will make it a cube and the next may feature will make it a Tesseract, and so on. And and I think that's, that's something I'm just super, super looking forward to. So I think we're just the beginning of the, of the journey, and I can't wait, I can't wait to keep going.
Tobias Macey
0:43:21
And in terms of the governance and sustainability of the project, I'm curious how you are approaching that, given that it is open source for the actual code, but you're also trying to build and maintain a business around it and just how you see those dividing lines and how you're trying to make sure that you're keeping the needs of the community forefront.
Adrien Treuille
0:43:43
Yeah, as I mentioned, sort of crowd sourced input from the community has been the driving start of stream that's evolution from the start, even when we were in closed beta. And we recognize now with sort of the A huge amount of interest in the rapid growth in the community that scaling our community involvement is going to be a major challenge. And in in many ways, I think that we are the bottleneck right now. The number of questions coming in and and GitHub issues is sort of larger than our ability to, to handle it, quite frankly. So we are really interested actually in in how we can have the community play a role in understanding the issues that people are having, helping one another and, and triage, triage and bugs and, and actually contributing code. So that's something that that, you know, our our attitude towards this is we really are looking forward to, to working with the community and and and and building a structure of community governments and and having the community have a sense of ownership over the future of the product. And that's that's something that's evolved Right now, and or rather, I should say, that's something that we are thinking a lot about right now. And so actually, in that spirit, I really would encourage people who have thoughts about great community governance models and other open source projects, reach out and write something down in the forums, and let us know and let the other community members react to it. We are really paying attention and our goal is to share ownership and share stream with the world that's, you know, that that's ultimately the goal here this this all before this was a company this was just a cool project. And I really think it would be so much fun to to sort of get as many people involved as possible.
Tobias Macey
0:45:46
Are there any other aspects of the stream lead project itself or the ways that it's being used or your goals for it that we didn't discuss yet that you'd like to cover before we close out the show?
Adrien Treuille
0:45:57
No, no, I think we Yeah, I think we got a lot
Tobias Macey
0:46:00
For anybody who wants to follow up with you or get in touch and keep up to date with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move this into the pics. And this week, I'm going to choose the book of why by today a pearl, which I've been reading recently, I'm still only partway into it. But so far, it's been quite interesting. And it's discussing his views on some of the issues around how to systematically represent causation and not just correlation and some of the ways that that's important in our current age of computing and artificial intelligence. And he's actually one of the Turing Award winners. So definitely somebody who has a lot of thoughts and context on the matter. So definitely recommend checking that out. And so with that, I'll pass it to you, Adrian, do you have any pics this week?
Adrien Treuille
0:46:47
Yeah. I think one of the most touching books that I've read recently is called no self, no problem by on top Tim, and there are a couple of things. With that name, believe it or not, I'm tapped in is the Tibetan Buddhist monk who wrote a book with that name. And the thing about this book that's so special is it's not at all sort of religious in some ways. It doesn't, certainly doesn't ask you to believe anything. And yet it's written with this sort of exquisite precision about the world through the eyes of a so called enlightened being. And it's, it's really, it's, it's just so unapologetically and and beautifully states. There's this way of looking at the world that's accessible to everyone. And when you read it, you I just feel like, of course, that's true. And and I was really just touched to the bottom of my heart.
Tobias Macey
0:47:57
Well, thank you very much for taking the time today. To join me and discuss your experiences building the stream lead application. It's definitely a very interesting tool and one that I am excited to start playing around with. So thank you for all of your efforts on that front and I hope you enjoy the rest of your day.
Adrien Treuille
0:48:13
Yeah, thank you. That would be great. Thank you.
Tobias Macey
0:48:19
Thank you for listening. Don't forget to check out our other show the data engineering podcast at data engineering podcast com for the latest on modern data management. And visit the site of Python podcasts. com to subscribe to the show, sign up for the mailing list and read the show notes. And if you've learned something or tried out of projects and the show then tell us about it. Email host at podcast in a.com with your story to help other people find the show please leave a review on iTunes and tell your friends and coworkers
Liked it? Take a second to support Podcast.__init__ on Patreon!
From Simple Script To Beautiful Web Application With Streamlit 1