Summary
The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
- You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free!
- Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
- Your host is Tobias Macey and today I'm welcoming back Tarush Aggarwal to talk about what he and his team at 5x data are building to improve the user experience of the modern data stack.
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what 5x is and the story behind it?
- We last spoke in March of 2022. What are the notable changes in the 5x business and product?
- What are the notable shifts in the data ecosystem that have influenced your adoption and product direction?
- What trends are you most focused on tracking as you plan the continued evolution of your offerings?
- What are the points of friction that teams run into when trying to build their data platform?
- Can you describe design of the system that you have built?
- What are the strategies that you rely on to support adaptability and speed of onboarding for new integrations?
- What are some of the types of edge cases that you have to deal with while integrating and operating the platform implementations that you design for your customers?
- What is your process for selection of vendors to support?
- How would you characterize your relationships with the vendors that you rely on?
- For customers who have pre-existing investment in a portion of the data stack, what is your process for engaging with them to understand how best to support their goals?
- What are the most interesting, innovative, or unexpected ways that you have seen 5XData used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on 5XData?
- When is 5X the wrong choice?
- What do you have planned for the future of 5X?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Starburst: ![Starburst Logo](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/UpvN7wDT.png) This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by Trino, Starburst runs petabyte-scale SQL analytics fast at a fraction of the cost of traditional methods, helping you meet all your data needs ranging from AI/ML workloads to data applications to complete analytics. Trusted by the teams at Comcast and Doordash, Starburst delivers the adaptability and flexibility a lakehouse ecosystem promises, while providing a single point of access for your data and all your data governance allowing you to discover, transform, govern, and secure all in one place. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Try Starburst Galaxy today, the easiest and fastest way to get started using Trino, and get $500 of credits free. [dataengineeringpodcast.com/starburst](https://www.dataengineeringpodcast.com/starburst)
- Rudderstack: ![Rudderstack](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/CKNV8HZ6.png) Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack)
- Materialize: ![Materialize](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/NuMEahiy.png) You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing. Go to [materialize.com](https://materialize.com/register/?utm_source=depodcast&utm_medium=paid&utm_campaign=early-access) today and get 2 weeks free!
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable enriched data to every downstream team. You specify the customer traits, then profiles runs the joints and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack. Data lakes are notoriously complex.
For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte scale SQL analytics fast at a fraction of the cost of traditional methods so that you can meet all of your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and DoorDash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first class support for Apache Iceberg, Delta Lake and Hoody, so you always maintain ownership of your data.
Want to see Starburst in action? Go to dataengineeringpodcastdot com slash starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macy, and today I'm welcoming back Tarush Agarwal to talk about what he and his team at 5x Data are building to improve the user experience of the modern data stack. So, Tardush, can you start by introducing yourself for folks who haven't heard any of your past appearances?
[00:01:48] Unknown:
Of course. Hey, Tardush. Good to be on the show again. I think this is my 3rd time. My background, I'm the founder of 5 x. My backgrounds have been very focused in the data space, got to be 1 of the first data engineers at Salesforce, 1 of the first data engineers in the world, most recently ran data for WeWork, been very, very narrowly focused on what used to be called the modern data stack or data infrastructure. And, you know, we we sort of started 5x about about 2 and a half years ago, and I'm sure we'll get all into that.
[00:02:21] Unknown:
And, again, for folks who haven't heard your past appearances, do you remember how you first got started working in data management, data engineering, and what it is that's held your attention this long? Yeah. Absolutely. You know, obviously, we've been through many, many pivots. And
[00:02:36] Unknown:
when we first started, we were actually building a course or a program for data for sort of companies wanting to invest in data. You know, post WeWork, I would get a lot of calls from CEOs, you know, looking to bring in a head of data, and this was SMBs, midsized enterprise, real estate, banks, SaaS, ecommerce, you know, across the entire spectrum. And everyone was talking about, you know, monetizing their data and, you know, and sort of data is the new oil. And in reality, all of those conversations went like, hey. You know, I think, you know, you're sitting on a gold mine of data, but the reality is is you have to go build this data platform, and these are the vendors to speak to. And, you know, it's not rocket science, but it's highly contextual. And, you know, there's this thing called data modeling and, you know, BI and self-service, which are really the first steps.
And so out of that, you know, we just figured that many, many businesses are trying to get started and, you know, expertise in this space didn't quite exist. So the first version of 5 x was really, a course to actually you know, this is kind of what we've learned in the industry over the last 10 years. This is the playbook on how to go do it, and that failed miserably. We barely had anyone buy it. But, yeah, that was v 1 of back then, we were sort of 5x data. Now just 5x.
[00:04:00] Unknown:
In terms of what you're building at 5x, you gave a little bit of the backstory, how it came to be. And the last time we spoke about your work was in March of 2022, so we're at about a year and a half now. Wondering what are some of the notable changes in the overall product focus, what you're building, how you're pitching it, and just the ways that the drastic shifts in the data infrastructure ecosystem over that time have influenced the ways that you're thinking about that problem?
[00:04:31] Unknown:
Yeah. I know. It's sort of great question. You know, a few years ago, you know, pre modern day stack, like, we had everyone using Informatica. And and still today, you know, many, many, many enterprises are on Informatica. And some big advantages of Informatica was it's an end to end platform. You can do everything on it. You can ingest and transform and, you know, you have all of your capabilities inside a single platform, and that was really powerful. Now the users of, you know, sort of data teams hated the product because it just isn't a very robust product, very easy to use. It's it's, and on the banks of that, you know, we had all of these different vendors go after individual pieces.
So we had the 5 Trans and the Snowflakes and the DBTs and the BIs and, you know, reverse ETL and sort of 15, 20 other categories. And fast forward, you know, a few years, you know, we have 500 vendors in sort of 30 different categories. And today, what's happened, especially in the last 18 months, is that blank checks to your data team just because your investors wanted you to invest in data are no longer a thing. And CFOs are getting involved. And, you know, data teams have been hit pretty hard, and they need to show ROI. And the analogy we use it is what happened from, you know, the 1 platform is, you know, the analogy we used today is all of these vendors today are selling car parts. And if you walked into a Honda and instead of selling you a Honda Civic, if they sold you an engine, we would have much fewer cars on the street because building your car and maintaining your car are significant overhead.
And so we see sort of data teams today, we sort of struggle to go manage multiple different vendors, especially at the enterprise stage, like, you know, that lunch with sort of Snowflake and Looker, 10 lunches every month sort of sort of starts to become too much. Right? And everyone's trying to go optimize for their for their part of the puzzle. So, you know, we're entering this world which is do more with less and and sort of data teams are are getting back to really sort of going and focusing on ROI. So what 5 x is is how do we give you that Informatica experience, that end to end platform experience, having 1 neck to choke, unified billing, single provisioning, a single place to operate, except it's built on top of it's built in a very modular way. So depending on your industry, on your use case, on your size, on your budget, we can build you a data platform. And as you grow in scale and requirements change and you need and you need new capabilities, we can add on and sort of swap out existing pieces, and you get the entire end to end experience.
So, you know, we're really, and I think, you know, when we spoke in sort of a year and a half ago, we were in early stages of that. But sort of today, we're seeing, you know, what this looks like at the, you know, more and more at the enterprise scale and sort of the difference it's making and making and making data teams hilariously more efficient. Another
[00:07:43] Unknown:
interesting wrench in the works of the ways that people are thinking about data, particularly in the past 6 months to a year, is the, frenetic pace of AI and the fact that every company has decided that they need to implement some sort of AI or LLM capability into their product, whether it makes sense or not, and that also adds additional, strain and infrastructure requirements around the data platform. And I'm curious how you've seen that influencing the ways teams are thinking about data, talking about data, building around data, and whether the outside view of, oh, everybody's adding AI is realistic or if it's more just that maybe there are a couple of toy projects that are maybe Skunk Works, and the core of the business is still just standard, you know, let's get our business and reporting and maybe a couple ML models, and we're focused on just this core capability and AI is somewhere off in the wings, maybe it'll become a thing later on.
[00:08:43] Unknown:
That's a great question, and I think there are a few different ways to look at it. I think, for sure, we are seeing data teams being pressured to have an AI strategy. So it's it's sort of very real. It's sort of data teams that sort of bring this up all the time. We're obviously very much in the infancy of AI inside the data world. You know, we have some text to SQL. You know, what I'm actually very, very excited about is is sort of conversational BI. In order to get there, things like the data modeling layer, things like the semantic layer become even more important, you know, to essentially give AI sort of context on a business. Because things like your sort of definition of what is MRR, you know, how many daily active users you have are extremely nuanced. Right? So, you know, we think the semantic layer is really is sort of gonna be our best shot at sort of getting into conversational AI.
Having said all of these things, if you just look at the data life cycle in general, it's I think there are 2 different aspects to consider. What is, you know, sort of data practitioners and people in the industry sort of really talk about and, you know, VCs, what they're investing in, and a lot of that is like future sort of state of data. And, you know, what I find really interesting is companies, you know, just the core data stack, right, the 5 trans, the dbts, the sort of snowflakes, they're, you know, just getting into sort of adoption at the enterprise level.
They're just now getting, you know, sort of I think Snowflake is a little bit ahead, but all of the other vendors, just the core vendors, you know, are finally starting to be sort of adopted now. So, you know, there's a big delay in, you know, what practitioners and and what's happening in the industry versus really enterprise adoption looks like. So I think, you know, with that lens in mind, you know, from the enterprise standpoint, we are quite far away from this. And I think from the SMB standpoint, you know, we're looking at something in the next, you know, I think end of next year too. Still quite early, but I think 2025 is, I think, when sort of AI and data will will sort of will sort of start to get really interesting.
[00:11:00] Unknown:
And in terms of the modern data stack, that was a term that started getting thrown around, I'd say, probably in the 2020 time frame in particular, and that was also around the time that the venture investment in data infrastructure and data startups was at an all time high. They would throw money at anybody that had the word data somewhere in their pitch, and now that has dramatically shifted where, data infrastructure is, you know, the previous generation of interest. There's still work being done there. There's still successful businesses, but it's not a situation where there is room for everybody with an idea to get funding and run a business, and so we're starting to see a cycle of consolidation in the ecosystem.
And I'm wondering how you're seeing that influence the ways that people think about what actually constitutes the modern data stack, what are the capabilities that are actually necessary and required and which were the ones that were perhaps frivolous or maybe just a feature of a larger product?
[00:12:03] Unknown:
Yeah. You know, I think having 15 different categories is probably it was an overkill of, you know, going from an end to end ecosystem to a fragmented ecosystem. And we're gonna see a a sort of lot of these categories become features, which can be adopted by sort of other categories. So consolidation has to happen. Right? Like, the way the world stands, we're also looking at we're looking at a lot of data companies today, which will sort of struggle, in the next 12 to 18 months as our runway dries up. So we are gonna see consolidation.
And I think that's something which with sort of 5 x, which has always been the sort of consolidation of the data stack, is actually very excited about because what starts to happen when you start to consolidate again is optionality needs to exist. Like, 1 BI tool, you know, it's very difficult for Snowflake to go acquire 1 BI tool because they then get locked into the very specific use cases which that BI tool is good at. And so different companies have different use cases. So, you know, I think it's very healthy for those sort of 15 categories to potentially come down to, you know, 6, 7, 8 core categories. But inside these categories, you're still gonna have a bunch of dominant players. Right? Even if you just look at the data warehousing space, which is very, very mature, you know, we have Snowflake. We have Databricks. We have BigQuery. You know, we still have companies on Redshift.
Sort of DuckDB, is a company which, you know, has recently made huge headlines, you know, to think of a 5th warehouse right now getting funded in a big big way in a in an already pretty sort of mature space is just more confirmation that they're gonna be many, many different use cases. And depending on what what sort of cloud you are and what type of use cases you are, you're gonna have solutions which make more sense than others. So while consolidation helps decrease the overall footprint even in a world with 8 different categories you're gonna want to have optionality. 5x is, you know, the answer to that. And especially when you look at, you know, enterprise, it's not data stack, it's data stacks.
You know, you have different set of subsidies, acquisitions, or even different departments sort of using different tools. Right? Like, you know, what sort of what do you find in large enterprise companies? It's not are they Snowflake or Databricks? Very often, they're both. And when you look at, again, you know, how do you take now extremely disjointed space? What what sort of 5x allows you to do is we have this concept of workspaces and a workspace can belong to a department or a subsidiary or, you know, the core data team. And you're just looking at what are the vendors which make sense for you. You have a single ID experience to operate those vendors without logging into different tools. I can go forecast my cost of what does my spend on, you know, my particular snowflake instance sort of look like. You know, you you can have shared tools across different workspaces.
You know, some might have their own. And these things at the enterprise level become really, really difficult to manage. Even sort of even sort of digital identity. Right? Access to all of these different tools, the sort of permissions. And, again, how do you do this across subsidiaries and who has access to what and how do you audit this, right, from a central team perspective? So these are a whole suite of tools, which haven't been ever addressed by the sort of fragmentation of the space. Like, you would have to kinda go do this yourself and then go back to this sort of building your own car analogy. You know, at an SMB, great. You know, the many tutorials on how do you go spin up the the core 3, 4 layers in, like, in, like, 2 or 3 hours, this starts to balloon extremely quickly. So, you know, for all of these different types of use cases,
[00:15:52] Unknown:
even in inside a consolidated world, there's gonna be a huge need for this, especially at the mid market enterprise level. The identity and permissioning aspect is something that is very poignant for me because I'm dealing with that in my own data stack efforts of the the situation of the modern data stack. Bring all your tools, but then every tool wants to own that permissioning piece, and none of them are really want to talk to each other. There's no central control plane. There there's no agreement about how that should even manifest. That that's definitely a big problem and 1 that I can see being compounded when you're in an enterprise and you don't even necessarily have consistency across the tooling in the different business units, and then you have to figure out who has permission to what data, where, and why.
[00:16:36] Unknown:
Yeah. Exactly. And, you know, Oktaar and all of those sort of digital identity solutions are great, but they've worked more in a in software engineering context where they're managed by IT teams, which sit sort of separately from engineers, then it becomes fairly straightforward. But the world of beta is way more nuanced. Right? Like, all analysts don't get the same permissions. You have analysts with different permissions, and you have analysts in different departments. So when you add on that other layer of, you know, nuance, it goes away from a tool which is which is primarily managed by your IT department.
Is something which you wanna have a robust solution which can be managed by your sort of data teams or, more operational teams. So we think, you know, removing digital identity in the data space, you know, we can go support Okta and work with them, but having that that pain, actually managed in your data platform becomes really essential at the enterprise level.
[00:17:32] Unknown:
In terms of the consolidation, the shifts in the data economy of we actually need to make sure that the money we're putting into this is worth it, and we're not just gonna throw money at it because everybody says we should throw money at it, and someday it'll be useful going back to the, era of Hadoop of just throw data in there, and, eventually, that data will become useful. Now we're starting to narrow down a little bit more where the last cycle was well, we actually only wanna collect the data that we know is useful because, otherwise, we have liability problems. Now it's we only wanna invest in the data infrastructure and the data capabilities that we know are gonna be useful because, otherwise, we're gonna have money problems.
And I'm curious how you're seeing that calculus start to influence the work that data teams are doing, the ways that they think about building their infrastructure, you know, whether to, say yes to all of the data requests that are coming their way, or maybe there is a little bit more pushback about, well, why are you asking for this? I'm just wondering how how that is influencing the way Yeah. The data teams are operating. So I think there are 2 elements to that. Right? Like, when you bring on when you bring in sort of fin ops into the conversation,
[00:18:39] Unknown:
what everyone obviously talks about is the cost of data infrastructure. Right? Like, again, what is the cost of all of these tools? And they and they add up pretty quickly. But the sort of second piece, which has been sort of spoken about less, is the people in data. And we have apart from building a very fragmented ecosystem, we've also invented job titles faster than, you know, than sort of universities could even sort of keep up with sort of actually training people in some of these different professions. So there's consolidation on the infrastructure, which which we have been talking about, which is happening. But I think we're also entering the rise of the sort of data generalist and having, you know, do more with less is a theme which is which is being sort of universally applied.
And we're seeing with consolidation, there is more of a need for, you know, people working in the data realm to be able to, you know, go manage the platform, do ingestion, do modeling, do BI. And, you know, we're going back to, you know, sort of consolidation across roles. And, you know, we're gonna see the rise of, like, much leaner teams. And, again, you know, in the sort of 2019 sort of 2020 is and I was guilty of this, myself for show. And I was running data teams. It's, you know, in some ways, the the sort of metric between sort of data leaders was, you know, the size of your data team and, you know, how big, you know, you know, what teams do you have and, you know, what different roles are you bringing in, like, and what are all these different use cases and these people would sort of vary with with sort of very special skill sets. I think that's, like, reversing. And, you know, you you're gonna see the that sort of rise of, like, lean teams, which are just way more efficient, because you don't you know, you're not paying the most expensive tax in people, which is the communication tax as you build, like, sort of larger and larger teams and just being able to do more, because, you know, sort of finely tooling is at a point where it's more mature. And, again, with sort of with sort of the consolidation, just 1 person is just able to do a lot. It's to be way more end to end. So I think we're we're actually seeing it from both these different perspectives in terms of they've been layoffs. And I think data teams have been hit particularly hard, as well as more pressure
[00:20:47] Unknown:
on bringing cost down across infrastructure as a whole. Particularly as you start talking about enterprise and coordination across teams and across business units, then you start bringing in the conversations around things like data fabric, data mesh, these architectural principles, data as a product. I'm wondering if you've seen that start to come to fruition and teams are actually building that, and it's they're realizing the value that is promised by these approaches or if it's largely been, something that is maybe interesting and not as effective or harder to put into practice and a lot of confusion. Just wondering how that how these architectural ideas are also influencing the ways the teams are thinking about building their systems.
[00:21:26] Unknown:
I think at the risk of being very sort of controversial over there, like, are these features are these, you know, just sort of processes and thinking about how to go structure your data teams, or are these actually entire categories? And just going into a few of them, like, you know, we don't have any any sort of problem with your next catalog tool or observatory tool or mesh or semantic layer. They make sense as sort of features. But the idea is introducing 1 more tool into the equation, 1 more place where your team has to log in, 1 more vendor you deal with, 1 more, you know, sort of platform where you have to go sort of do digital identity. The answer of, hey, we wanna go solve this problem, and we're gonna build a new platform on top of your existing platforms to go do it, That's that's that's sort of that's sort of not the answer. And I think a lot of these will potentially be rolled into sort of bigger platforms in in sort of consolidation.
But, again, going back into the enterprise use cases and, again, you know, we get to speak with some of the biggest sort of data consultancies in the world. And what they tell us is 80, 90% and and a lot of these consultancies obviously do a lot of referrals and do a lot of reseller. And sort of speaking to them, would it you know, 80, 90% of their referrals are going into just core 4 layers, ingestion, storage, modeling, reporting. So, you know, sort of going back to kind of what I said earlier, you know, at least some massive enterprise companies which are implementing sort of these sort of solutions, I think as we get into this year and next year and the year after that, I think some of those renewal conversations are gonna be extremely difficult conversations for some of these platforms to have because the hype cycle of great, you have a huge mess.
And this 1 tool is gonna come and solve everything. Unfortunately, having a layer on top of everything else is not the answer to that.
[00:23:17] Unknown:
You shouldn't have to throw away the database to build with fast changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades old batch computation model for an efficient incremental engine to get complex queries that are always up to date. With Materialise, you can. It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real time dashboarding and analytics, personalization and segmentation, or automation and alerting, Materialise gives you the ability to work with fresh, correct, and scalable results, all in a familiar SQL interface.
Go to data engineering podcast.com/materialize today to get 2 weeks free. And talking to maybe the individual team scope and building the infrastructure components, I'm curious what are the main points of friction or the most difficult decisions that they have to make as far as how to implement their data platform?
[00:24:15] Unknown:
Yeah. You know, I think what's happening now is we're seeing companies who have a platform. You know, they have they they sort of built everything sort of actually coming to us to go consolidate. Right? Being like this is becoming really difficult to manage. In reality, we just want 1 neck to choke. So can you actually go take over all of these different vendors and really go consolidate this? Right? And so we're doing this exercise across, you know, mid market, across enterprise. And what we're seeing is, you know, the first question really is, are you are you using the right vendors? Because the reality of it is this is our full time job. Right? Like, we have you know, our entire team is is sort of is sort of tracking the space. This is what we do on a daily basis, and we can barely keep up with what's happening. So to think that, you know, a data team at an enterprise company is going to make necessarily the best decisions on what vendors to use. And number 2 is even after what vendors to use, are you using them in the right way? You know, how they're being set up, like, building your entire modeling layer inside LookML, isn't probably the best thing as you think about now reusing a lot of that modeling layer and, you know, pushing it into data science, so you are sort of sort of sort of using reverse ETL. Again, as some of these tools start to overlap, right, like, something which we see in enterprise companies all the time is, you know, a portion of the jobs are built inside 5 tran, inside their modeling, you know, inside their dbt integration, and some are built, you know, traditionally on dbt. Some are built inside Looker's semantic layer. Creates this sort of huge mess all over the place. So the idea of having sort of very consistent workflows that you can do inside a single UI is, again, becoming it's really simplifying.
It's sort of preventing a big part of that mess in the first place. And we think the solution to this is is not so much gonna be on a layer on top, but it's really how do you, you know, set things up in a way which is, you know, again, give you an end to end experience so you don't have a mess in the first place. So a lot of these data teams, again, is they've spent all this money building this platform, and now they're starting to question the ROI of it. You know, unfortunately, we are gonna have many of them going back to the drawing boards and being like, which 1 of these decisions were the right decisions? And, actually, I think the bigger questions coming up is are we even the right people to make a decision on what tool makes sense, sort of sort of for us? So I think, again, I think it's just gonna be I I think we're the direction which we're heading is sort of going back to the basics of what is
[00:26:31] Unknown:
fluff and what is, you know, all of these solutions, which actually actually promised us they can fix it. People sort of waking up and realizing that a layer on top isn't the answer. We have to kinda go back to the drawing board and figure out how do we do this correctly. Absolutely. And now digging into what you're building at 5 x, ways that you are thinking about how to unify that experience, what are the appropriate tools and vendors, and how to integrate them. Wondering if you can talk through some of the learnings that you've gone through as you're building out your current iteration of the platform and some of the choices that you've made and how you've approached that tool and vendor selection.
[00:27:07] Unknown:
Yeah. You know, in terms of vendor selection, we are more like the apple of the space, you know, instead of the Android, you know. A year and a half ago, we were looking at opening up our ecosystem such that any vendor can kinda go integrate with us. But, really, what we've, you know, decided to do is is go partner with 15, 20, sort of core vendors. We work extremely closely with them. We integrate with them at a deep level. We're integrated into the sales process. You know, we have API. We have access to APIs, which are not public APIs and really provide, you know, a sort of much sort of deeper experience and have been you know, we think of the world inside capabilities. Right? Like, what are the capabilities which your data team needs? Like, you know, we think reverse ETL is a capability.
We think sort of catalog is a capability. BI is a capability. So when we look at the conversations in your use case and sort of capabilities, we're not just looking at it at an individual sort of vendor level, but what are the set of vendors which work really well together? And we are sort of doubling down on existing vendors, so we're taking opinions on, you know, which ones we think, you know, work really well. We have optionality inside categories. So, you know, we don't pick 1 vendor across the category. But, you know, depending on the needs of our customers, you know, we we sort of try to have an open source offering. We try to have the, you know, the sort of the sort of commercial, you know, sort of non open source sort of products. And, you know, we think about what are the different types of use cases and what for those use case, what would be the best vendor? And we're coming up with these sort of golden paths. Right? Like, you know, this set of vendors work really well together. And increasingly, more people are paying us for our opinion on based on my use cases, what works really well together instead of we just hired someone who came from x y z vendor, and we're gonna go use x y z vendor. And, you know, I have no idea about the layer before this and the layer after this, and, you know, we'll just figure it out. From the perspective of that integration
[00:28:59] Unknown:
and unifying the experience, what are the core elements of making sure that there is a cohesive platform field to these disparate tools and the engineering work that you and your team have had to do to be able to give that a more contiguous flow? Yeah. So, you know, we have this concept of ID. You know, we call it super ID
[00:29:21] Unknown:
or sort of also unified ID, where you can go operate all of these different vendors inside a single product. So, you know, you have an ID on top of your warehouse. You can go, you know, go sort of sort of edit your dbt jobs. You can, you know, ingest data. So we have each sort of product becomes an app inside our experience. And so some of them get embedded. You know, we're able to work with the vendors on, like, on, like, you know, sort of removing parts of their product inside the embedded experience so to see a more focused embedded experience. There's some categories like, you know, the IDE, the core, you know, sort of SQL IDE, which we'll go build ourselves. And the idea is, you know, if you're using a sort of a sort of Fivetran or you're using, hypothetically, an air bite, the apps will be slightly different because they don't map apples to apples to each other. So the pipeline has got a concept of jobs and airflow might have a sort of different concept. So the individual apps available inside the IDE are sort of slightly different from each other, but you have a single place to go do this. And why this is really powerful is that is that, you know, sort of data teams can now go figure out what are the ideal workflows. Sort of going back to that example of, are you doing some of your transformations inside Fivetran and then some inside DBT and then some inside sort of LookML?
You know, in sort of logging into 5 different tools, it becomes really hard to go police this because sort of vendors aren't giving you feature flags that you put this that you put this sort of feature off. Like, that's you know, why would they do that? They want more adoption. They want more engagement across each of their products. Whereas inside 5 x, you know, we could still log in to your vendors. But, again, the goal is can we so can we give this to you as a single ID experience? So whatever your data team decides the path you want to do, we can make those apps available which go power those workflows. So this leads to a highly consistent experience because, you know, 80% of where someone is sort of logging in on a daily basis and doing their work sort of becomes part of, like, you know, this sort of becomes this sort of golden workflow of how things happen. So, you know, from our perspective, we decided to, you know initially, you know, we were a platform to go provision.
We would handle the procurement, billing, legal, all of those things. We had the single digital identity layer so you could go manage your users. We had utilization to go look at your spend forecast and, you know, more tools and optimizing spend. We had the security, so audit logs, all these tools are going into a single place that are great for your c I s o and your all and compliance teams. So know, we always had those sort of fundamentals. I think a big shift for us has really come in sort of focusing on this unified ID experience sort of such that, you know, the data teams on a daily basis can go do all of this from 1 tool. And along with just making their life easier because they're not logged into 5 different tools, it promotes a lot of hygiene in terms of best practices, which can be more standardized.
And, you know, you can have more guardrails up as to this is how we wanna do things as opposed to a free for all. Any vendor, just go log in and do it, in that way. And then from that unified IDE perspective,
[00:32:32] Unknown:
there are couple of interesting elements to that. 1 is engineers are very opinionated about the tools that they want to use for doing the work that they do, And I'm wondering what are some of the ways that you make that IDE experience customizable so that they can feel at home doing the work in that context? And, also, maybe some of the ways that you're thinking about how do we extend that experience
[00:32:58] Unknown:
into the tools that people are already using? Yeah. That's a great question. I think, you know, what we're gonna see is a lot more apps on sort of 5 x IDE. Right? And apps are, experiences on how you want to basically go operate them. So, you know, we are, again, going and building, you know, some of the core experiences. Again, you know, we see that in larger businesses, again, there's a little bit less, you know, inside enterprise again. You know, I think, again, at the SMB stage, you know, there's a lot of flexibility. People are free to kinda use their own tools. And, you know, again, some it's just they're way faster to move and adapt. And, you know, we see sort of circle and we see hex and, you know, different people are kind of using different things, and that's kind of all fine. Right? It's all very manageable. You can speak to 5 different people and you know it. Again, at enterprise, this becomes sort of very different. Right? Like so there is a certain level of of, like, flexibility and sort of customization. And, you know, there's a lot more stuff which we've planned in terms of the ID experience to, you know, go make it sort of more flexible. But I think, you know, what we're really focusing on now is what are some of these core sort of use cases which teams are really focused on, and how do we provide a really solid sort of unified way to go do this. So that's really what the focus is now, but we're gonna see a lot more, you know, app a a a sort of lot more experiences to go sort of modify this in a way makes sense for you while still having some guardrails, which the company wants to have in order to have a consistent experience.
[00:34:23] Unknown:
And from that consistent experience perspective to what is the ideal flow that users will experience when they say, I have this data problem I need to resolve. Either I need to onboard this data or I need to build this report or I need to ensure that these transformations are running. I'm curious what are the different stages of that development flow, some of the ways that you're thinking about how to manage versioning and change management, the auditability streams that you're integrating into that experience for managers or administrators, and just some of the key touch points in that overall experience and the ways that you're thinking about building this into a cohesive product? Yeah. You know, it's a great question. I think we don't want to go create
[00:35:11] Unknown:
the underlying sort of tooling. Right? Like, the reason we go partner with everyone in this space is we think they are, you know, very sort of robust solutions out there, which do a sort of phenomenal job in in sort of what they're doing. So we don't necessarily wanna go reinvent the wheel. Right? So a lot of the versioning, a sort of a sort of a lot of the branches, you know, sort of we think dbt does a great job in a lot of these different layers. And, again, you know, our ID supports sort of dbt natively. We have, our own version of sort of dbt core, which we've deployed, you know, more for, like, for sort of smaller customers. But we continue to be partners with dbt at the enterprise level. And, you know, we integrate into we integrate into that. So the whole premise is you get to have the underlying components are sort of powered by a solution which makes sense for your business. Then, you know, in the future, they could be coalesced. They could be sort of battalion. They could be, you know, whatever the tool which you want it underneath to basically go power the experience. We wanna create we wanna expose the functionality of that tool and the feature set of that tool to make this really robust for you. And as you are trying to simplify the experience of the end users, you're papering over some of the sharp edges of the different tools and platforms. I'm curious what are some of the edge cases that you've had to engineer around or some of the ways that you need to think about building in escape hatches for the case where somebody really needs to deep reach into the guts of the system that you are trying to prevent this nice interface to and some of the engineering work to be involved there? You can always log in directly into the tools. We never take that away. We're not OEMing any of this. We don't it's we don't have 5x branding on, like, every different piece. In fact, we go out of our way to make sure you can see what are the, you know, the tools which are actually going and powering this. And admins can sort of enable or disable the login button directly from 5 x. So you could go to 5 x. You could go over a tool, and you could push a button which says login. And you can go directly inside the tool and go do whatever you need to. We're not trying to take away any of that functionality. There are always gonna be edge cases, new features which a vendor sort of launches, which might not be integrated into IDE, sort of instantly parts of IDE, which we currently don't support.
You know, we're looking at the world and, like, you will be able to go embed some of these experiences. So you wanna go do something on Snowpark and you wanna go run Spark jobs or you wanna go run Python jobs, you can go embed that, sort of that page into IDE so that, you know, it's an easy way which people can go sort of cooperate that. So, you know, our goal is to not, you know we aren't OEM ing any of this. So you have, you know, sort of full functionality. What we think about is for, you know, your sort of day to day job, you should be able to do majority of this inside a single ecosystem. The other aspect of building this unified
[00:38:01] Unknown:
experience is that you want to give people this nice easy flow, but you're also working with companies who probably have already made investments into their data platform that maybe they don't want to get rid of or maybe there's going to be a long deprecation path. I'm curious what that overall integration and migration process looks like where they say, we've already built a bunch of stuff, but we also wanna be able to have this unified experience.
[00:38:25] Unknown:
How how do you help to bridge that gap? Yeah. That's a great question. So the sort of 5 x platform doesn't care if you already have a vendor or you wanna go you want us to go manage your vendor relationship and go buy that vendor from us. You can either buy vendors from us and, you know, sort of simplify sort of billing, or you can go, import your own vendors and have you on vendor relationships. So it doesn't sort of make a difference. We have a bunch of hybrid where, you know, they might have a bidding relationship with a few vendors, and we introduce a few others. And that works great too. What we're seeing more is we are very sort of deliberate on the partners we choose to work with. We want extremely adopted partners on our ecosystem Because, again, we sort of realized we're not gonna go integrate with 500 different vendors. We're gonna, you know, have, sort of some vendors which we think are sort of extremely widely adopted. That these vendors are gonna stand the test of time, and we bring them on. So what we see is that it's rare unless they're not using a warehouse and they're doing stuff on s 3, and they have spark jobs on top of it. And then, you know, it's a completely different paradigm shift. In general, people on the warehouse world and, you know, we support all all 4 of the big warehouses. We support Snowflake, BigQuery, sort of Redshift, and sort of NextEer integrating with with sort of Databricks is a big focus of us. So, you know, we support all of the, you know, big players out of the box. So, yeah, we support all of the big players out of the box. So very often when companies want to move to us, it's not that they are that you know, for the most part, we're able to support most of the vendors which they are already working with. That's 1 piece. But, again, I think what we're seeing more and what we're gonna see a lot more next year is sort of companies coming to us and be like, this is our data platform. Can you actually go consolidate all of this, even the sort of even the sort of vendor relationships? And, you know, we want to have 1 neck to choke. So you handle all of that, and we can spend 100% of our time focused on actually delivering data without having to manage the vendor relationships.
[00:40:21] Unknown:
In terms of those vendor relationships and the onboarding work, the integration work, you mentioned that you decided fairly early on that it wasn't just going to be an open ecosystem. Anybody can come in and be part of this experience. What are your criteria for deciding which tools, which vendors will be incorporated into that platform, the work that you have to do to be able to integrate and expose that vendor and hook them into
[00:40:50] Unknown:
the overall experience. And I'm particularly interested in how that factors in for those vendors that don't fit cleanly into 1 category or another, and there's overlap between them? Yeah. That's a great question. We evaluate vendors on, like, 4 different criteria. Obviously, number 1 is the technology. We know what they're solving, what their product is. Number 2 is their road map. So where are they going in the future? What's becoming more and more relevant for them? Number 3 is is to sort of is is to sort of partnership. Like, do we have alignment inside? Are we you know, do we have a deep partnership? You know, the sales sort of level, do we have other sales customer support at the sort of at the product level? And number 4 is just there's some sort of categories which we haven't entered. Right? Like, we think they still might be relevant, but we don't have a really good opinion on on where that category is going and, you know, how so how do we look at it? Sort of sort of sort of fourth 1 is sort of how do they fit in into the sort of general macroeconomic sort of climate. We use this sort of criteria. Again, we wanna partner with some vendors which we think are on the uprise. And we wanna partner and we wanna build sort of deep integrations into these vendors, And so we can go provide the best experience for our customers on 5x.
And I think for for, like, vendors, you know, who are across multiple different categories, you know, we are starting to see that. Right? For example, we are starting to, you know, sort of partner heavily with RudderStack, and they do the CDP stuff, but they also have a reverse CTL offering. So I think, you know, all of this is on a case to case basis. We're partnering with with an end to end called peak dotai, which is, you know, an end to end experience across data science. And we sort of mainly focus on the data engineering and sort of analyst personas.
So, you know, we our partnership with peak is, you know, more so that a lot of the customers who wanna do data science need a sort of data engineering persona, and and sort of a lot of our customers who have infrastructure from 5 x ask about a data science platform. So it's, you know, a little bit sort of disjoint. But, again, we're all we're only starting to see, you know, sort of multiple different categories overlap. For example, workflow manager is something which is sort of consistent across both. Right? You know, both sort of data science use cases, as well as data entering use cases sort of need sort of workflow manager.
So as we really sort of now get into going deeper into all of this, you know, it's not gonna be
[00:43:25] Unknown:
as clean. It sort of never is. They I it's gonna be overlap, and I think our product and ensuring teams are sort of I I sort of started to think about these things. In your work of building the 5 x data platform, working with your customers, working with vendors, what are some of the most interesting or innovative or unexpected ways that you've seen your product used?
[00:43:46] Unknown:
So what we very intentionally decided to do a few months ago is sort of double down on our consultancy. We've always had a small consultancy to sort of co help customers. Again, as we look at the enterprise landscape, there's a lot of fragmentation there too because a particular vendor is gonna go set is gonna go sell you their product. And when you ask for help on it, they're gonna introduce you to an SI integrator or a sort of consultancy. And if we really want to make sure that people are doing things correctly, you know, being able to actually offer services and, you know, help out with some of these implementations or as needed bring in the expertise.
We just think it's it's just part of helping our customers go get value from data. So, you know, we're partnering with a bunch of our our vendors. You know, again, a deep partnership where we're also become we're also becoming SI providers and can go, you know, do the implementations. And I think so we have a subset of of of sort of customers where we're doing end to end data as a service. We we sort of give you the platform, and, you know, they're using us to, go build the sort of reporting layer on top of it, and we sort of recently did something with this sort of restaurant chain, you know, they have 50 to a 100 different sort of locations.
They are inside the QSR category. They, for lack of expertise and, you know, not having sort of data people in there, they were used to, you know, the analytics they got from, you know, Uber Eats and Restaurant 365 and, you know, sort of Postmates and, you know, all these different things. And, you know, their entire marketing agencies, which just go focus on that category, and they do what they do and you know, obviously, it's it's a big business and what we were able to do with them in just a few months is that the sort of level of of sort of data and, you know, the insights we could get in the analytics perspective, it was something which they'd they'd sort of never seen before. So much so that sort of marketing agency, which, you know, works with with sort of 1700 different restaurants, was, you know, just completely sort of shocked and blown away by it. And they now wanna sort of go do this across all of these different verticals.
I think we're getting exposed to, you know, sort of use cases and and sort of company types, which are really interesting, was, you know, sort of very large. But, again, haven't had the appetite or the conviction or the expertise to actually go make those investments in data. Because as a data industry, we haven't made it easy to go make those investments in data. Instead of giving the fragmentation, we're even just giving, like, you know, the support and implementation needed to actually go get value from your product. So we're seeing a lot more of these use cases, which we don't where industries which previously haven't entered the ecosystem, are now able to, you know, be completely disrupted because what we're able to do is something which is, for lack of better words, no 1 else is able to do.
[00:46:57] Unknown:
And another interesting aspect of what you're building is that because you have this unified experience, the incremental cost of adoption for new tools or new capabilities is much lower for teams than it would be if they had to go out and do that evaluation process, do the integration process themselves where it turns from, oh, I wanna use DBT or some other tool or vendor. I just say, click a button, log in, start working with it versus, oh, now I have to spend 6 months going through that whole process. And then maybe if you are in in a larger organization, you also have to do some selection of source paperwork, get the funding, etcetera, etcetera. I'm wondering how that influences the ways that teams are, approaching that process of saying, oh, I wanna add this new capability. I wanna start doing this new thing.
[00:47:46] Unknown:
Yeah. You know, it's a good question, and I wanna answer it in once you get used to buying a car, it becomes very difficult to go by car parts ever again. So once sort of customers see how easy it is to go onboard a vendor and have our expertise on, you know, some making some of these decisions and helping out with implementation, they have a much lower barrier to, hey. What about the next tool? Because it's not as daunting of a process every single time. So we have sort of companies which sort of which sort of start small, and they add capabilities as they need them with time. And they're able to, you know I like to use the word do it in a hilariously more efficient manner, as opposed to, you know, the way companies sort of with with the way companies do this today. And, again, you know, for us, what's very important is being able to look at this very holistically. Right? Not just from the sort of vendor standpoint and the decision making standpoint, but also the implementation and making sure they're getting value from it is how do we provide sort of customers with the most amazing experience to to basically go do this.
And, you know, we're playing a long game over here. Right? Like, we are we we wanna make sure customers are getting value from data because, you know, the way we look at the world is we can go sell all of these different tools and you can go do that. But if customers are not getting value from data, at some point, it's gonna come back. Right? Like, either those tools are gonna get axed. The the data team is gonna get axed. And we're not looking to go make a short term buck by by by sort of going, and sort of adding a new tool over there. We wanna make sure that, you know, we're looking at this very sort of holistically, and and and we're just playing a sort of longer game, and we wanna create the best experience for our customers and data teams.
[00:49:34] Unknown:
And in your experience of building this product, building the company, working with customers, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:49:47] Unknown:
You know, people sort of 2 years ago called us crazy because we want building a new category. We want building what VCs call a product, which is this is a category and, you know, this is how you go sell. And, you know, you have you know, you sell your product and, you know, this is your ACV. And we are something completely different to how the entire industry work. And everyone called this idea of going and consolidating the stack in, like, by building a layer on top of it, all sorts of names and all sorts of all sorts of craziness. And I think the last, you know, few months have been really exciting because a lot of people are really seeing the value in this now. So yeah. I mean, it was it was sort of it was quite challenging to go get our first integrations. It was sort of challenging to, you know, explain data teams when you had unlimited funding why they don't want to go and build their own platform and manage this forever.
And, you know, it's been really reassuring, and I don't even think that's the right word, but it's been it's just very just sort of very grateful for people to see the value, in what we're doing now. So, yeah, it just makes a lot of the sort of decisions we made to stay on our path back then. Just just you know, we see a new wave of people really excited to show up to work every day.
[00:51:15] Unknown:
And for people who are in the process of tool evaluation, maybe they're doing their own integration work across different vendors in the modern data ecosystem, what are the cases where 5 x is the wrong choice?
[00:51:28] Unknown:
Great question. 5x is not for you if you are not using a warehouse or don't plan to use a data warehouse. We are built on a data warehouse first approach. Apart from that, I think we're relevant across the entire life cycle, of your data team. And what I mean by that is for SMB companies, we have a program where we waive the cost of the 5 x platform. You can still take advantage of provisioning vendors from us at list price or even cheaper, and then using 5 x to go operate all your vendors. So in some ways, it's a no brainer if you're building a platform from scratch today to build it on 5 x. For mid market and enterprise companies, other advanced tools in our suite, as we spend this as we spend discussing on this podcast become extremely relevant.
So, you know, again, a no brainer to use us to make your data teams more efficient. Overall, I think if you are using a warehouse first approach, using 5 x to really simplify the management and operation of your platform, is something worth doing.
[00:52:36] Unknown:
And as you continue to build and iterate on your product and business, what do you have planned for the near to medium term or any particular projects or problem areas you're excited to explore?
[00:52:48] Unknown:
I think, you know, we're we're we're just focused on the basics. Right? Like, sort of getting our daily active users to go use to go operate inside our platform. You know, big focus for us, you know, sort of enterprise and enterprise readiness, you know, sort of everything which happens as everything you need to basically go after that segment. We are not looking you know, apart from Databricks, next year, we're not really looking to expand our vendor footprint drastically. We wanna double down on our existing partnerships and, you know, build the best experience all the way from sort of SI work to, you know, integration into, you know, the sales teams, into integration into their product and engineering. And at a macro level, we're super focused on, sort of, profitability as a business. So, you know, we're playing a long game, and, I came from WeWork. I came from a company which raised a lot of money. And, you know, some of my lessons over there is you wanna go figure out what business you have sooner rather than later. So we wanna be around for a long time. We're playing a long game, and we wanna get the business in a position which we can go to that.
[00:54:04] Unknown:
Well, for anybody who wants to get in touch with you, follow along the work that you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tool in your technology that's available for data management today?
[00:54:21] Unknown:
I think, again, you know, we spoke about a lot of those things about if you look at AWS, AWS is is a collection of 50 different services, but they give you a really cohesive experience to go manage it. Right? A single place to provision, single digital identity, billing, migration. They give you cost optimization, and they give you role based access control. All of these different things have just been hilariously missing inside the data ecosystem. So, again, the glue which actually connects all of this together, you and I still continue to think that that's 1 of the biggest missing pieces in the space.
[00:54:54] Unknown:
Well, thank you very much for taking the time today to join me and share the work that you're doing at 5 x data. It's definitely a very, interesting problem area that you're trying to address, interesting product that you're building around it. So appreciate all the time and energy that you and your team are putting into making the modern data ecosystem a more tractable and approachable problem space. So thank you for the work you're doing there and for your time, and I hope you enjoy the rest of your day. Thank you so much for having me. Hopefully, we've added some we've added some value to your listeners today. Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning Podcast, which helps you go from idea to production with machine learning.
Visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Sponsor Messages
Guest Introduction: Tarush Agarwal
Tarush's Journey in Data Management
Building 5x Data: Early Challenges and Pivots
Impact of AI on Data Teams
Modern Data Stack and Market Consolidation
Identity and Permissioning in Data Platforms
ROI and Efficiency in Data Infrastructure
Architectural Principles: Data Fabric and Data Mesh
Consolidating Data Platforms
Unified IDE Experience
Development Flow and Change Management
Integration and Migration Process
Interesting Use Cases and Customer Success Stories
Incremental Cost of Adoption
Challenges and Lessons Learned
Future Plans and Final Thoughts