Re-Scraping Instagram

Back when Instagram’s API rules didn’t completely suck, I wrote a few posts on scraping it so that some of our faculty could use those data in their research. Then all their rules changed and everything broke. That’s their prerogative but it’s also my option to complain about it. But because I posted about it, I got a comment from raiym1 who let me know he wrote a PHP scraper that avoided the API limitations. I’ve now got that up and running and set up a simple GET so that the URL determines the tagged content that is returned. The PHP for that page is below and allows you to replace the API URL in the old Google Scripts with a new url like You can then make your own custom displays based on that. I made a quick custom page template for the artfulness WP theme (currently showing filler data from the exciting ‘fish’ tag). This example has the tag hardcoded in but could easily use a custom field to pass the value. 1 On this post. And apparently this theme doesn’t support direct links to comments. About time I wrote my own theme . . .

Scraping with Google Spreadsheets Across Instagram, Flickr, YouTube etc.

I remain kind of amazed with how many little tricks can be done with Google Sheets. After seeing Alan’s post today, I wonder how much of the data I could pull (assuming we had the right user names and knew the services . . . really the harder part) just using Google Sheets. Turns out we could get a pretty good amount. The following is a mix of XPath, regex, and APIs. I started with as little real programming as possible and gradually increased sophistication. The following are just meant to get a rough idea of how much stuff you’ve got in the various spaces. Flickr The URL: The function: =IMPORTXML(C2,”//*[@class=’photo-count’]”) This uses a basic Google Sheets function to grab the photo-count content. The function is grabbing the div class with the title photo-count. Vimeo The URL: The function: =INDEX(IMPORTXML(C3,”//*[@class=’stat_list_count’]”),1) Pretty similar to the example above but with the addition of INDEX. That solves the problem that there are multiple items that are all in the stat_list_count class and we only want the first matching item. Sound Cloud The URL: The function: =REGEXEXTRACT(IMPORTXML(C4,”//*[@name=’description’]/@content”),”([0-9]+) Tracks”) This gets a bit fancier. IMPORTXML brings in a large chunk of content from the page but it wasn’t structured in a way that I could get the exact information I wanted. REGEX […]

Insta Snoop

A while back I was messing with getting Instagram data without bothering with their API because I think their most recent API changes are really annoying. I’m also a bit fascinated with the scale of numbers in social media right now. I opted to look at Snoop Dogg’s Instagram followers and plot their change very 10 minutes. Click here or on the image to see the live chart. Get the Instagram Data w/o the API & Put it in the Database It turns out that each Instagram page has an embedded JSON file with the data I wanted. You can see it if you view the source of any page. This Stackoverflow post was kind enough to point it out and you see regex rearing it’s head again. I started out with my standard process of using Google Sheets as the database but decided I’d try MySQL because I wanted to try getting the JSON ought more cleanly. The chunk below grabs the data and puts it in the database. So that gets us the stuff we want in a nice little box on the Internet. I did try to do some fancy mysql stuff to avoid entering the change in followers as an additional field but I failed in enough ways that I just opted to proceed with the […]