YouTube Scraping, XPath, and Google Sheets

APIs can give you much more power but they are often overkill for what people are trying to around here- lightweight social media

Here’s a lightweight example of how you can use Google Sheets and the IMPORTXML function to grab quite a bit of data from various video pages with no API or technical skills.

Straight off, we’re going to want the URL of the video. We’ll put that in column A and we’ll use it as a variable in all our other formulas.

Getting the Paths to the Data

=IMPORTXML(A2,”(//*[contains(@class, ‘watch-title’)])[1]”)

So how’d that come to be? A2 is just asking what URL we want to go to. The XPATH stuff gets a little more interesting. It’s going to look for any class that is named watch-title. I found out the title was in that div by right clicking on the title and choosing inspect in Chrome. The appended [1] will only give us the first item that meets those qualifications. Otherwise the title shows up twice.

The rest of the formulas are pretty much variations on that theme.

  • =IMPORTXML(A2,”//*[contains(@class, ‘watch-view-count’)]”) – View count
  • =IMPORTXML(A2,”(//*[contains(@class, ‘like-button-renderer-like-button’)])[1]”) – Likes count
  • =IMPORTXML(A2,”(//*[contains(@class, ‘like-button-renderer-dislike-button’)])[1]”) – Dislikes count
  • =IMPORTXML(A2,”(//*[contains(@class, ‘yt-user-info’)])[1]”) – User name

Throw in a video of your own if you’d like.

Comments on this post

  1. CogDog said on February 17, 2016 at 4:29 pm

    Just keeping in mind the traded off when Google changes their layout your xpath stuff can go south. I did this a few years for my cv attribution script and it broke each time Flickr changed their layout. But it’s worth knowing some xpath jujitsu, you can use it in php

    • Tom Woodward said on February 17, 2016 at 9:51 pm

      I get it but same deal really when they change an API or make some new rules.

      I did some xpath with javascript for something else last week. I find it amusing more than anything else.

      • CogDog said on February 18, 2016 at 11:33 am

        A *good* API should not break its past. That’s the reason I love Flickr. All the features they took off the site still work via the API. Twitter is another story

        • Tom Woodward said on February 18, 2016 at 11:38 am

          Valid point 🙂

          I was thinking of Instagram which changed its whole authentication requirements recently (and instituted a whole approval deal). I am not a fan.

          • CogDog said on February 18, 2016 at 4:55 pm

            Oh yeah instagram’s changes with that stupid sandbox are insane. Why does one have to jump through those hoops for public methods. Grrrr

  2. Nick Lisher said on February 23, 2016 at 5:52 pm

    Hey – thanks for posting these! However I get a Formula Parse Error when trying exactly what you have done. Any advice as to what might be going wrong?

    • Tom Woodward said on February 23, 2016 at 6:14 pm

      Not sure. You want to share the document and I’ll look at it? Gmail is bionicteaching.

    • Tom Woodward said on February 24, 2016 at 8:30 am

      It looks like a couple of things happened. One to always look out for when cutting/pasting into Google Sheets (or any programming language) is whether your quote marks/apostrophes ended up being “curly” or plain. For stuff like this you want plain. Secondly, it looked like some extra stuff got in the formula below the view line (some kind of line break or something). It does appear to be working now.

  3. Nick Lisher said on February 24, 2016 at 1:44 pm

    Thank you Tom! Yes curly apostrophes are always a pain 😉

Leave a Reply

Trackbacks and Pingbacks on this post

  1. […] A quick shout-out: the process below builds on the solid resources written by Nate Shivar and Tom Woodward. […]

  2. […] A quick shout-out: the process below builds on the solid resources written by Nate Shivar and Tom Woodward. […]

TrackBack URL