Scraping Instagram with Google Script

I thought I’d take a stab at using Google Script to capture Instagram data after being inspired by all the great Martin Hawksey has done with the TAGS Explorer. I’m doing something very similar and it turned out to be fairly straightforward to take the PHP I’d written previously and turn it into something that’d function in Google Sheets. I found this example highly useful in creating my own script.1 The following script can also be set with a time based trigger to fire every X minutes/hours/days which is perfect for this particular project. You can see it pulling 20 pictures worth of data every hour here if you’d like. 1 I basically copied portions of it.

Scraping Instagram – Take 3

flickr photo shared by JeepersMedia under a Creative Commons ( BY ) license I started to apologize for writing three posts on this and promising not to do any more but I reconsidered. This is my site. I’ll write whatever I want. Skip it if it bores you or exile me from your feed reader.1 Alan’s comment got me thinking that using spreadsheet formulas was not necessary and it felt awkward to me anyway. So I figured out how to do it all in the php. I’ll include the relevant portion of the code below. You can get the whole thing here. The key is substr_count which will find stuff in a string and count it. The other little piece is boolval which returns true if it’s greater than 0. 1 Plus no one reads blogs any more. Shouldn’t you be on Twitter or vaping?

Scraping Instagram #2

flickr photo shared by Marco Gomes under a Creative Commons ( BY ) license Remember last night when I posted about scraping data from Instagram? I woke up this morning about 5:30 (literally with a start) astounded by how easy the solution to archiving the pagination returns was. So before I even left for work I managed to get this working so much better than my previous attempt. I stripped out all of the previous GitHub stuff as I realized I didn’t really need it. It had provided a nice crutch and let me know I sort of knew what I was doing. The explanation of what’s going on is in the comments interspersed in the code below. There’s a much cleaner way to do this where I don’t duplicate so much code. I could just call the part that builds the csv1 twice. I may do that at some point but I think having it all in one place will help people new to this sort of thing see what’s going on more clearly. This is fun stuff. I need to do more of it and more consistently. In the past, I’d do some programming for a few days and then not do any for a number of months. That makes for slow progress and frustration. I’m going to […]

Scraping Instagram

flickr photo shared by ajmexico under a Creative Commons ( BY ) license I’m trying to step up my programming game a bit.1 APIs are also getting more and more accessible to jokers like myself.2 (In this case I also use php, cron, and some regex.) All of this should make Alan very proud. But I’m relatively terrible at doing things without a purpose. Luckily one wandered in on Tuesday. A faculty member who I’ve worked with a few times before came in and asked if there was any way to grab Instagram data for a project on social media and health that focused on vaping and ecigs. I’m not one to look a gift project in the mouth so I said I’d take a stab at it. Step one was to check out Instagram’s API3– in particular I wanted to see the tag endpoints. Those are URLs that give you access to JSON data. To get at these you need to register as an Instagram developer and register a client. This is a pretty straightforward process. After that I browsed around GitHub to see what might already exist. This got me to the Instagram PHP API. I always start by wandering GitHub much like I start my WordPress work by looking at plugins first. It took me a long […]

05

Rampages Growth Plotted

As part of the gen ed seminar I pulled the rampages.us user signup data for Kristina Anthony. It was just a straight export from the wp_users table and stripped of everything but the date. She pulled it into Excel and used a pivot table to make it manageable. Which is awesome. So I pulled it down and pushed it back up into Google Docs so that I could embed the chart in this post. It makes me feel better to look at the growth over what amounts to around a year of actual use. I tend to focus on places for improvement (and there are many) but it’s worth looking at what ALT Lab has managed to achieve in a fairly short period of time.1 The July to February jump of about 6000 users is pretty insane. I have every expectation that we’ll add another 6000 or so users next year. Things will certainly only get more interesting. This has been done without huge student training initiatives. For the most part faculty members are able to support their own students. We have some of that filter up and we deal with some troubleshooting online but there’s no dedicated person(s) to support WordPress issues or train students. That’s a testament to WordPress. 1 In the higher ed dimension a year is […]

Private Comments via XMLIMPORT

Making shareable (Sharing with a single person or specific group but not with the world.) comments on public writing is a fairly awkward spaaaaaace right now. There are things like AnnotateIt and Awesome Screenshot and the annotations in Diigo. So I’m looking around for other free options and brain storming odd ideas and not find a whole lot and I came up with the following . . . Note: I’m not saying this is a good idea, it may even be a bad idea but it might inspire someone to do something more interesting down the line.1 I at least found it mildly amusing. Here’s how you might pull an author feed from WordPress into Google Spreadsheets with separate cells each paragraph (for paragraph level commenting). The idea being that you can share the Google document with just that student and do the commenting via the GSS commenting feature. Google spreadsheets will import lots of things (xml, atom, rss). WordPress provides lots of specific feeds (author, tag, categories, combinations thereof). So step one is to get the author feed – for example http://rampages.us/fren330/author/sheehantm/feed/. You can then use the IMPORTXML formula in GSS to import that XML and do some XPATH parsing of the pieces. In this case I used =IMPORTXML(“http://rampages.us/fren330/author/sheehantm/”,”//p”) to pull out the paragraphs. I can then share the […]

Little Trick, Big Numbers

I often want to know just a bit more about numbers I see in tables. As I was looking at some thing today, I stumbled on the Wikipedia page for “List of Most Viewed YouTube Videos“. After being more than a bit amazed at the utterly staggering numbers. I wanted to know what they translated to in terms of years because the numbers were just too big. I remembered that Google Spreadsheets will let you pull in a table from a website with no fuss. All I needed to do was put =IMPORTHTML(“http://en.wikipedia.org/wiki/List_of_most_viewed_YouTube_videos”,”table”,1) in the first cell on the spreadsheet and viola the table is transcluded. I can now add a few more calculations to figure out the import stuff – like how many years worth of time have been spent watching Gangnam Style (16,274.24 years for the record1). You can go mess around with the data here. 1 Assuming I didn’t screw something up.

29

Citation Workflow – Diigo/Pinterest to Google SS

Talking to Bud the other day he mentioned that generating the citation page for his digital stories was something of a pain. I’ve thought about it a bit since then and decided to try to simplify a workflow for this. Odd thing I learned – – CHAR(10) is the official way to get line breaks in Google Spreadsheet formulas. Flickr to Diigo to Google Spreadsheets Initially, I looked at the Flickr galleries because that’s the option that Bud normally uses. I saw that the gallery was in a standard HTML list format and I had some hope. Google spreadsheets lets you pull lists and tables like these in via the IMPORTHTML function. Martin Hawksey has some good instructions and examples over here. So that failed but I could import just about every other list on the page. So, I decided doing this through Diigo would make pretty decent sense for a number of people. Assuming you choose a unique tag for the images you plan to use- this example just uses “flickr”, I’d suggest something story/movie specific. So the basic Diigo URL you’d get is https://www.diigo.com/user/bionicteaching/flickr. Trying to make this really easy for people, I set up the first page to allow you to paste that URL in and our friendly formulas transform it into https://www.diigo.com/rss/user/bionicteaching/flickr. The example linked here […]

18

Text Acrobatics in Google Spreadsheets

I’ve established here enough times I’m not a programmer so I have to find ways to get things done until I learn more. I found this gigantic list of edtech related conferences compiled by Dr. Wright thanks to Stephen Downes’ feed. It’s in Word1 for a variety of reasonable reasons and I can’t fault anyone who puts in this kind of time and energy and puts the result out there for free. It does make it harder to manipulate but it is very consistent which opens up doors which might otherwise be closed. It does cut/paste into a Google Spreadsheet pretty well. The key to things like this is finding ways to break them into pieces. It is really algebra and variables but a more entertaining version. You can chop pieces of the block up and then chop up those parsed out pieces. For convenience’s sake we’ll use cell A2 as the housing for the unparsed information. December 1-2, 2013 International Conference on Advanced Education Technology and Management Science (AETMS), Hong Kong, China. http://www.aetms.org/ The first thing I did was scan for consistencies that I might use as chop points. The date is always first and in most cases it’s one word (the month) followed by the dates and a comma. That allows me to do things like =FIND(” “,A2). […]

Calendar as Unifier

I touched on this with a previous zombie pictures post. Essentially, metadata is awesome because it lets people find your stuff and it helps your stuff find its audience. Metadata is also absent more often than not because people don’t like to type in lots of tags and they especially don’t like to do it on phones. You see elements of this metadata addition becoming automatic- simple things like Instagram (or maybe IFTT) auto-tagging my images with instagram and (in my case) iPhone (like the image above). I’ve also seen auto-tagging of image filters and with exif data you get all sorts of interesting automated metadata details but they tend to be mechanical rather than social. IFTT, FeedWordPress, and others allow you to do some low level of automatic metadata association. What keeps coming back to me is that it would be relatively simple to enable people to associate calendars and specific calendar events with online media publishing workflows. This would add the socially relevant automated metadata so the audience could find the media. The end goal being audience rather than metadata.). This would work particularly well at institutions which have centralized calendars or in the case of Udell’s Elm City aggregated calendars. Take VCU’s calendar of events as an example. It has time, location, and categorical elements already. You […]