Exploratory Data Analysis Made Easy At The Command Line

00:00:00
/
00:52:50

September 23rd, 2019

52 mins 50 secs

Your Hosts

About this Episode

Summary

There are countless tools and libraries in Python for data scientists to perform powerful analyses, but they often have a setup cost that acts as a barrier to ad-hoc exploration of data. Visidata is a command line application that eliminates the friction involved with starting the discovery process. In this episode Saul Pwanson explains his motivation for creating it, why a terminal environment is a useful place for this work, and how you can use Visidata for your own work. If you have ever avoided looking at a data set because you couldn’t be bothered with the boilerplate for a Jupyter notebook, then Visidata is the perfect addition to your toolbox.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host as usual is Tobias Macey and today I’m interviewing Saul Pwanson about Visidata, a terminal oriented interactive multitool for tabular data

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Visidata is and how the project got started?
    • What are the main use cases for Visidata?
    • What are some tools that it has replaced in your workflow?
  • Can you talk through a typical workflow for data exploration and analysis with Visidata?
  • One of the capabilities that you mention on the website is quickly opening large files. What are some strategies that you have used to enable performant access for files that might crash a typical editor (e.g. Vim, Emacs)?
  • Can you describe how Visidata is implemented and how it has evolved since you started working on it (including the upcoming 2.0 release)?
    • What libraries or language features have proven most useful?
  • Why did you choose to implement Visidata as a terminal only tool and what constraints does that bring with it?
    • What are some of the most challenging aspects of building a terminal UI for data exploration and analysis?
    • Because of its manifestation as a terminal/CLI application it relies heavily on keyboard bindings. How do you approach key assignments to ensure a consistent and intuitive user experience?
  • What are some of the types of analysis that Visidata can be used for out of the box?
  • What are some of the most interesting/unexpected/innovative ways that you have seen Visidata used?
  • How much community adoption have you seen and how do you approach project governance as a solo developer?
  • What do you have planned for the future of Visidata?

Keep In Touch

Picks

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA