YouTube Scraping, XPath, and Google Sheets

APIs can give you much more power but they are often overkill for what people are trying to around here- lightweight social media

Here’s a lightweight example of how you can use Google Sheets and the IMPORTXML function to grab quite a bit of data from various video pages with no API or technical skills.

Straight off, we’re going to want the URL of the video. We’ll put that in column A and we’ll use it as a variable in all our other formulas.

Getting the Paths to the Data

=IMPORTXML(A2,”(//*[contains(@class, ‘watch-title’)])[1]”)

So how’d that come to be? A2 is just asking what URL we want to go to. The XPATH stuff gets a little more interesting. It’s going to look for any class that is named watch-title. I found out the title was in that div by right clicking on the title and choosing inspect in Chrome. The appended [1] will only give us the first item that meets those qualifications. Otherwise the title shows up twice.

The rest of the formulas are pretty much variations on that theme.

  • =IMPORTXML(A2,”//*[contains(@class, ‘watch-view-count’)]”) – View count
  • =IMPORTXML(A2,”(//*[contains(@class, ‘like-button-renderer-like-button’)])[1]”) – Likes count
  • =IMPORTXML(A2,”(//*[contains(@class, ‘like-button-renderer-dislike-button’)])[1]”) – Dislikes count
  • =IMPORTXML(A2,”(//*[contains(@class, ‘yt-user-info’)])[1]”) – User name

Throw in a video of your own if you’d like.

13 thoughts on “YouTube Scraping, XPath, and Google Sheets

  1. Just keeping in mind the traded off when Google changes their layout your xpath stuff can go south. I did this a few years for my cv attribution script and it broke each time Flickr changed their layout. But it’s worth knowing some xpath jujitsu, you can use it in php

    1. I get it but same deal really when they change an API or make some new rules.

      I did some xpath with javascript for something else last week. I find it amusing more than anything else.

      1. A *good* API should not break its past. That’s the reason I love Flickr. All the features they took off the site still work via the API. Twitter is another story

        1. Valid point 🙂

          I was thinking of Instagram which changed its whole authentication requirements recently (and instituted a whole approval deal). I am not a fan.

          1. Oh yeah instagram’s changes with that stupid sandbox are insane. Why does one have to jump through those hoops for public methods. Grrrr

  2. Hey – thanks for posting these! However I get a Formula Parse Error when trying exactly what you have done. Any advice as to what might be going wrong?

    1. It looks like a couple of things happened. One to always look out for when cutting/pasting into Google Sheets (or any programming language) is whether your quote marks/apostrophes ended up being “curly” or plain. For stuff like this you want plain. Secondly, it looked like some extra stuff got in the formula below the view line (some kind of line break or something). It does appear to be working now.

Comments are closed.