The foundation of every ML model is the data that it is trained on. In many cases you will be working with tabular or unstructured information, but there is a growing trend toward networked, or graph data sets. Benedek Rozemberczki has focused his research and career around graph machine learning applications. In this episode he discusses the common sources of networked data, the challenges of working with graph data in machine learning projects, and describes the libraries that he has created to help him in his work. If you are dealing with connected data then this interview will provide a wealth of context and resources to improve your projects.
Hightouch is the leading Reverse ETL platform. Your data warehouse is your source of truth for customer data. Hightouch syncs this data to the tools that your business teams rely on. Hightouch has a catalog of flexible destinations including Salesforce, HubSpot, Zendesk, NetSuite, and ad platforms like Facebook or Google. Hightouch is built for data engineers and is a natural extension to the modern data stack with out-of-the-box integrations with your favorite tools like dbt, Fivetran, Airflow, Slack, PagerDuty, and DataDog.
It’s simple — connect your data warehouse, paste a SQL query, and use our visual mapper to specify how data should appear in downstream tools. No scripts, just SQL. Get started for free at pythonpodcast.com/hightouch
Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $100 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch.
- Your host as usual is Tobias Macey and today I’m interviewing Benedek Rozemberczki about his work on machine learning for graph data, including a variety of libraries to support his efforts
- How did you get introduced to Python?
- Can you start by giving an overview of when you might want to do machine learning on networked/graph data?
- How do networked data sets change the way that you approach machine learning tasks?
- Can you describe the current state of the ecosystem for machine learning on graphs?
- You have created a number of libraries to address different aspects of machine learning on graphs. Can you list them and share some of the stories behind their creation?
- How do the different tools relate to each other?
- Can you talk through some of the structural and user experience design principles that you lean on when building these libraries?
- When you are working with networked data sets, what is your current workflow from idea to completion?
- What are the most difficult aspects of working with networked data sets for machine learning applications?
- What are the most interesting, innovative, or unexpected ways that you have seen graph ML used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on graph ML problems?
- What are some examples of when you would choose not to use some or all of your own libraries?
- What do you have planned for the future of your libraries/what new libraries do you anticipate needing to build?
Keep In Touch
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email firstname.lastname@example.org) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Karate Club
- PyTorch Geometric Temporal
- University of Edinburgh
- Bipartite Graph
- Node Classification
- Graph Classification
- PyTorch Geometric
- DGL (Deep Graph Library)
- Parametric Machine Learning
- Little Ball of Fur
- GCN == Graph Convolutional Network
- Nvidia cuGraph
- Random Walk
- Graph Representation Learning by William Hamilton