Science is founded on the collection and analysis of data. For disciplines that rely on data about the earth the ability to simulate and generate that data has been growing faster than the tools for analysis of that data can keep up with. In order to help scale that capacity for everyone working in geosciences the Pangeo project compiled a reference stack that combines powerful tools into an out-of-the-box solution for researchers to be productive in short order. In this episode Ryan Abernathy and Joe Hamman explain what the Pangeo project really is, how they have integrated a combination of XArray, Dask, and Jupyter to power these analytical workflows, and how it has helped to accelerate research on multidimensional geospatial datasets.
Does everyone in your team ask you which database table they should use? Or if you can help them with their SQL query? If so, check out Select Star! It’s an automated data discovery portal that can save you hours of time every week.
From analyzing your metadata, query logs, and dashboard activities, Select Star will automatically document your datasets. For every table in Select Star, you can find out where the data originated from, which dashboards are built on top of it, who’s using the data in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use.
With Select Star’s data catalog, a single source of truth in data is built in minutes, even across thousands of datasets.
Try it out for free at pythonpodcast.com/selectstar. If you’re a Podcast.__init__ subscriber, we’ll double the length of your free trial and send you a swag package when you continue on a paid plan.
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan.
- Your host as usual is Tobias Macey and today I’m interviewing Ryan Abernathy and Joe Hamman about Pangeo, a community platform for Big Data geoscience
- How did you get introduced to Python?
- Can you describe what Pangeo is and the story behind it?
- What is your role in the project/community and how did you get involved?
- What are the goals of the project and community?
- What are the areas of effort and how are they organized?
- What are the scientific domains that Pangeo is focused on supporting?
- What are the primary challenges associated with data management and analysis in these scientific communities?
- What are the forms that these data take and how have they been evolving? (e.g. formats/sources)
- What are some of the challenges introduced by the widespread adoption of cloud resources and the associated architectural patterns?
- Can you describe the technical components that fall under the Pangeo umbrella?
- How do they come together to form a functional workflow for geo sciences?
- How has the scope of the Pangeo project changed or evolved since it started?
- What are the most interesting, innovative, or unexpected ways that you have seen Pangeo used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pangeo?
- When is Pangeo the wrong choice?
- What do you have planned for the future of Pangeo?
Keep In Touch
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email firstname.lastname@example.org) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Pangeo Forge
- Columbia University
- CF Metadata Conventions
- Data Engineering Podcast
- Pangeo Forge