Scraping with Google Spreadsheets Across Instagram, Flickr, YouTube etc.

I remain kind of amazed with how many little tricks can be done with Google Sheets. After seeing Alan’s post today, I wonder how much of the data I could pull (assuming we had the right user names and knew the services . . . really the harder part) just using Google Sheets. Turns out we could get a pretty good amount. The following is a mix of XPath, regex, and APIs. I started with as little real programming as possible and gradually increased sophistication. The following are just meant to get a rough idea of how much stuff you’ve got in the various spaces. Flickr The URL: http://flickr.com/photos/bionicteaching The function: =IMPORTXML(C2,”//*[@class=’photo-count’]”) This uses a basic Google Sheets function to grab the photo-count content. The function is grabbing the div class with the title photo-count. Vimeo The URL: http://vimeo.com/twwoodward The function: =INDEX(IMPORTXML(C3,”//*[@class=’stat_list_count’]”),1) Pretty similar to the example above but with the addition of INDEX. That solves the problem that there are multiple items that are all in the stat_list_count class and we only want the first matching item. Sound Cloud The URL: http://soundcloud.com/cogdog The function: =REGEXEXTRACT(IMPORTXML(C4,”//*[@name=’description’]/@content”),”([0-9]+) Tracks”) This gets a bit fancier. IMPORTXML brings in a large chunk of content from the page but it wasn’t structured in a way that I could get the exact information I wanted. REGEX […]