Re-Scraping Instagram

Back when Instagram’s API rules didn’t completely suck, I wrote a few posts on scraping it so that some of our faculty could use those data in their research. Then all their rules changed and everything broke. That’s their prerogative but it’s also my option to complain about it. But because I posted about it, I got a comment from raiym1 who let me know he wrote a PHP scraper that avoided the API limitations. I’ve now got that up and running and set up a simple GET so that the URL determines the tagged content that is returned. The PHP for that page is below and allows you to replace the API URL in the old Google Scripts with a new url like You can then make your own custom displays based on that. I made a quick custom page template for the artfulness WP theme (currently showing filler data from the exciting ‘fish’ tag). This example has the tag hardcoded in but could easily use a custom field to pass the value. 1 On this post. And apparently this theme doesn’t support direct links to comments. About time I wrote my own theme . . .

Google Script to Copy Row Above to Blank Row Below

I had a spreadsheet that entered blank cells when there was more than one admin for a WordPress site. So if Site_1 had two admins, I’d get two rows of data. The first row for the site would have- siteURL | siteTitle | siteAdmin but the second row for that site would have something like- < blank > | < blank > | siteAdmin I started to just drag down and fill but there was lots of data and it just felt like something for the machine to do. The following Google Script did it for me in a few seconds despite a couple thousand rows of data. Not rocket science but it might be handy for someone and it was a really convenient example of variables and loops when I had a conversation with my son last night.

Grabbing JSON

flickr photo shared by Library Company of Philadelphia with no copyright restriction (Flickr Commons) What I wanted to do was grab data from the WordPress API and use that to provision chunks of my new portfolio site. The portfolio is hosted on GitHub and GitHub is HTTPS. At the moment my bionicteaching site is not HTTPS.1 That causes problems as secure and insecure are not friends. I wanted a quick and easy solution so I could continue until I do the HTTPS switch. The following is how I wandered towards a solution. A number of the things worked but don’t quite work for what I wanted. So they’re worth remembering/documenting for later and it’s kind of fun to see a mix of javascript, php, url manipulation, Google API, and the WordPress V2 API all in one little bit of wandering. My first thought was to grab the JSON via a Google Script and store it in Google Drive. I can do that but can’t seem to make it available for use the way I want. I tried messing with various URL parameters but it wasn’t flowing and I only started there because I thought it would be easy. I did eventually get the file accessible in DropBox (the only other place I could think of immediately for https file storage) […]

YouTube View Count to Tweet

I was just messing around a bit more with mashing up the YouTube API post from earlier with the Twitter post from earlier today. This Google Script will get the view count of a particular video (Gangnam Style) and send out the total views and the time elapsed watching the video (assuming I did the math right). It has no real purpose (other than the scale of the number amuses me) but it’s one of those things with concepts that I’ll be able to apply more usefully in the future. That’s a gut feeling but I’m pretty confident. Don’t forget to add the Twitter library as directed here.1 1 in the Apps Script project, include the library MKvHYYdYA4G5JJHj7hxIcoh8V4oX7X1M_ with the identifier set to Twitter


YouTube API to Google Spreadsheets

Because I love Alan. Here’s the API version in Google Script to grab YouTube stats. It does a bit more than the previous XPath version and you can set it to be triggered repeatedly. I’m going to add a loop to add multiple videos etc. in the near future but it’s a good start for anyone who’s doing research on stuff like this. It is funny what you might notice when you can see the data like this. I triggered it manually twice just to get a few lines in there. Notice that between the first two entries there are no additional views but a chunk more likes/dislikes. Makes me wonder if people are just weighing in without watching or if the data are collected differently resulting in some delay. Here’s the script1 and it’s pretty well commented up. You’ll need an API key. 🙂 You do see some weird stuff in the raw JSON. Like there’s a Favorites field. Does that exist in YouTube? I didn’t really think about it until it came up 0 for every video . . . even Gangnam Style. Here’s the result running every hour on a video that I’m hoping changes a bit. I got it off the trending page so it has to be cool right? 1 It took me a good […]


YouTube Scraping, XPath, and Google Sheets

APIs can give you much more power but they are often overkill for what people are trying to around here- lightweight social media Here’s a lightweight example of how you can use Google Sheets and the IMPORTXML function to grab quite a bit of data from various video pages with no API or technical skills. Straight off, we’re going to want the URL of the video. We’ll put that in column A and we’ll use it as a variable in all our other formulas. Getting the Paths to the Data =IMPORTXML(A2,”(//*[contains(@class, ‘watch-title’)])[1]”) So how’d that come to be? A2 is just asking what URL we want to go to. The XPATH stuff gets a little more interesting. It’s going to look for any class that is named watch-title. I found out the title was in that div by right clicking on the title and choosing inspect in Chrome. The appended [1] will only give us the first item that meets those qualifications. Otherwise the title shows up twice. The rest of the formulas are pretty much variations on that theme. =IMPORTXML(A2,”//*[contains(@class, ‘watch-view-count’)]”) – View count =IMPORTXML(A2,”(//*[contains(@class, ‘like-button-renderer-like-button’)])[1]”) – Likes count =IMPORTXML(A2,”(//*[contains(@class, ‘like-button-renderer-dislike-button’)])[1]”) – Dislikes count =IMPORTXML(A2,”(//*[contains(@class, ‘yt-user-info’)])[1]”) – User name Throw in a video of your own if you’d like.

Random Stuff I Learned This Week

I’m messing around with some workflows to figure out how I’m spending some of my time . . . and, as a result, spending some of my time on how I spend my time. One of those efforts is trying to get some decent numbers around how my own efforts towards rampages support are going. I have my gut feelings but figured I’d see if I could work out a low energy way to document some of it in a more specific manner. I’m currently playing with a simple IFTTT recipe. It takes an email to a specific address with a specific hashtag (#vcusupport in this case) and puts it into a spreadsheet. I modify the subject line so that it’s #vcusupport HELP ME! | faculty | aggregation |. The regex1 below pulls out the elements and puts them in separate columns. Giving me something like the data below. Grabs content between two characters (in this case my “pipes”). =REGEXEXTRACT(B:B,”\|([^()]+?)[\|].*?(([^()]+?)[\|])”)) I started to do some work with Zapier as well. I actually prefer it in a number of ways (much more granular data and other neat things to pull from email) but the price was way too high for this type of use. I might try a more direct API route within Google if I end up feeling the need […]

Community Engaged Learning – Stackoverflow Version

We have two faculty learning communities exploring how technology and the Internet might change how community engaged learning can work. I’ve attended a few meetings and one of the key things I take away from the conversations is that I have a slightly different view of community. It did get me thinking about giving back to the community that’s been helping out so much as I’ve tried to tackle more technical challenges. I’ve learned many things from Stackoverflow so I thought I’d make some effort to contribute some answers. I’ve attempted to comment on things there before but you need 100 reputation points before you can do that. So I needed to actually answer things. Given I’ve been writing on the Internet for a large number of years you’d think I’d have more confidence but the structure of Stackoverflow (and the whole Stack Exchange community) is a bit different. I’m used to just posting what I think or what I managed to get to work. It doesn’t feel quite the same as “this is the answer.” There’s also some angry nerd stuff that goes on at times that I’m not a big fan of. As I’ve gone a little deeper the experience has been pretty positive. While I’ve answered some questions in Stackoverflow, I’ve ended up participating more in a […]


Map a Folder Full of Images to Google Spreadsheet

Imagine you have a large folder of images in Google Drive. I don’t have to imagine this as I do thanks to an IFTTT recipe.1 Google tends to be kind of stingy with the kinds of filtering/interactions you can have with files in their folders and we know that if you get stuff in Google Sheets then a world of other possibilities opens up. I’ve been thinking about what options there are with Google file storage because VCU is a GAFE school and we have unlimited Drive storage. That might open some media storage options with heavy load projects like our Field Botany site or our more recent work with the East End Cemetery. So . . . I wrote a quick script to take a large G Drive folder full of images and write the content to a spreadsheet while embedding an image preview. The script is below. I ended up revamping both enough that I felt it was worth reposting. For the record, the script ran through about 4,500 images but it may have timed out so keep that in mind if you’re dealing with lots of images. 1 I figure if I have enough backups to my online backups then I can pretend I’m safe.

findURL – Custom Google Function Fun

via the magical XKCD Fun being a fairly relative term . . . but I’m amused. The Meat You can write custom functions in Google Spreadsheets and then use them like other built in functions. I didn’t realize that. The script below1 grabs all the URLs from a chunk of text. You could add it to your spreadsheet in Google by going to Tools>Script Editor and opening a blank project. Replace all of the content with this and then save it. You can now use it like other functions by putting =findULR(A1) (assuming A1 is the text chunk you want). It spits out a single cell with the URLs in it on individual lines and with a count of the URLs found at the top of that cell. I’ve commented up the script below in case you want to understand/change it to better suit your needs. Next Steps The regex works well about 90-95% of the time. So it sure beats doing it by hand but it could be improved. You can try your own stuff here. The post I’m using there is pretty messy so it’s a good thing to check against. It’s hard to guess what odd things people will do with URLs. I’ve already seen people doing stuff like “”””. No idea why they did double quotes. […]