Grabbing FaceBook Comments – More Primitive Programming

Functional is a kind way to describe this . . . but it’s hard to argue with computers doing tedious work that was previously done by hand. The goal here was to automate the collection of the comments on 300 or so posts from the CDC’s Facebook page so they could be analyzed. Ebola is in the mix which leads to some comments that may very well rival YouTube comments in terms of causing me sadness. We did end up with around 34,000 comments which would have been hand copied in the past. So some karma balance was achieved. wTake all this with a grain or two of salt. It works but may very well be illegal in some states or do things in a way that would make real programmers weep. Consider my programming skills to be like a man trying to swat flies with a broomstick- lots of furious action, panting, cursing, and every so often I will kill a fly and celebrate wildly.1 The Facebook This was my first experience trying to do anything with Facebook. The hardest part here for me was just getting the stupid access token needed to get at the JSON feeds. I eventually found the secretly named Facebook Login Example. Part of my issue was that I thought I could skip the […]


Facebook Comments for Research

This is a post about (at least temporary) failure. I should be able to do all this via the Facebook API/SDK but I’m doing something wrong. While I’m learning a decent amount in the land of programming, fundamentally I still suck. With that cathartic self-flagellation out of the way . . . I’m working with someone who wants to grab all the comments from a large number of CDC posts about health issues for some research. She is currently doing it by hand. Here are two improvements to that awful reality.1 Expand All Comments In FB land if there are many comments you only see the most recent. And if there are many, many comments you have to keep hitting “View Previous Comments” over and over. That’s super boring to do once. If you have to do it a lot it would really suck. Enter Alec’s bookmarklet. I tweaked it a tiny bit because I think FB changed the wording but it works like a charm. You’d copy the text below. Add a bookmark and then click edit. Replace the URL with this text and name it whatever you want. Grab All Comments Now to grab all these enlightening comments . . . Install the Scraper Chrome Extension. You can now right click a particular item and choose “scrape similar.” […]

Another Google Sheets Tip – Find Non-Standard Characters

flickr photo shared by Lynn Friedman under a Creative Commons ( BY-NC-ND ) license We’re still messing with manipulating data in Google Sheets. I hate to think of humans going through and manually parsing things that the machine should do so I’m putting all this out in the hopes of saving someone somewhere some time. It’s also meant to help our internal people see how it works so it might be more broadly applied. In this case, we’re just looking for a line break and using that to cut out a chunk of text. This Stack Overflow post gave me the char(10) piece so I could find the line break. You can see the details above and more on unicode characters here. It’s broken in two pieces for clarity. Piece one finds the number of characters prior to the line break. The second piece takes that number of letters from the left. I find manipulating text in spreadsheets to be a constant and practical use of algebra.

Scraping Instagram with Google Script

I thought I’d take a stab at using Google Script to capture Instagram data after being inspired by all the great Martin Hawksey has done with the TAGS Explorer. I’m doing something very similar and it turned out to be fairly straightforward to take the PHP I’d written previously and turn it into something that’d function in Google Sheets. I found this example highly useful in creating my own script.1 The following script can also be set with a time based trigger to fire every X minutes/hours/days which is perfect for this particular project. You can see it pulling 20 pictures worth of data every hour here if you’d like. 1 I basically copied portions of it.

Scraping Instagram – Take 3

flickr photo shared by JeepersMedia under a Creative Commons ( BY ) license I started to apologize for writing three posts on this and promising not to do any more but I reconsidered. This is my site. I’ll write whatever I want. Skip it if it bores you or exile me from your feed reader.1 Alan’s comment got me thinking that using spreadsheet formulas was not necessary and it felt awkward to me anyway. So I figured out how to do it all in the php. I’ll include the relevant portion of the code below. You can get the whole thing here. The key is substr_count which will find stuff in a string and count it. The other little piece is boolval which returns true if it’s greater than 0. 1 Plus no one reads blogs any more. Shouldn’t you be on Twitter or vaping?

Scraping Instagram #2

flickr photo shared by Marco Gomes under a Creative Commons ( BY ) license Remember last night when I posted about scraping data from Instagram? I woke up this morning about 5:30 (literally with a start) astounded by how easy the solution to archiving the pagination returns was. So before I even left for work I managed to get this working so much better than my previous attempt. I stripped out all of the previous GitHub stuff as I realized I didn’t really need it. It had provided a nice crutch and let me know I sort of knew what I was doing. The explanation of what’s going on is in the comments interspersed in the code below. There’s a much cleaner way to do this where I don’t duplicate so much code. I could just call the part that builds the csv1 twice. I may do that at some point but I think having it all in one place will help people new to this sort of thing see what’s going on more clearly. This is fun stuff. I need to do more of it and more consistently. In the past, I’d do some programming for a few days and then not do any for a number of months. That makes for slow progress and frustration. I’m going to […]

Scraping Instagram

flickr photo shared by ajmexico under a Creative Commons ( BY ) license I’m trying to step up my programming game a bit.1 APIs are also getting more and more accessible to jokers like myself.2 (In this case I also use php, cron, and some regex.) All of this should make Alan very proud. But I’m relatively terrible at doing things without a purpose. Luckily one wandered in on Tuesday. A faculty member who I’ve worked with a few times before came in and asked if there was any way to grab Instagram data for a project on social media and health that focused on vaping and ecigs. I’m not one to look a gift project in the mouth so I said I’d take a stab at it. Step one was to check out Instagram’s API3– in particular I wanted to see the tag endpoints. Those are URLs that give you access to JSON data. To get at these you need to register as an Instagram developer and register a client. This is a pretty straightforward process. After that I browsed around GitHub to see what might already exist. This got me to the Instagram PHP API. I always start by wandering GitHub much like I start my WordPress work by looking at plugins first. It took me a long […]


Rampages Growth Plotted

As part of the gen ed seminar I pulled the rampages.us user signup data for Kristina Anthony. It was just a straight export from the wp_users table and stripped of everything but the date. She pulled it into Excel and used a pivot table to make it manageable. Which is awesome. So I pulled it down and pushed it back up into Google Docs so that I could embed the chart in this post. It makes me feel better to look at the growth over what amounts to around a year of actual use. I tend to focus on places for improvement (and there are many) but it’s worth looking at what ALT Lab has managed to achieve in a fairly short period of time.1 The July to February jump of about 6000 users is pretty insane. I have every expectation that we’ll add another 6000 or so users next year. Things will certainly only get more interesting. This has been done without huge student training initiatives. For the most part faculty members are able to support their own students. We have some of that filter up and we deal with some troubleshooting online but there’s no dedicated person(s) to support WordPress issues or train students. That’s a testament to WordPress. 1 In the higher ed dimension a year is […]

Private Comments via XMLIMPORT

Making shareable (Sharing with a single person or specific group but not with the world.) comments on public writing is a fairly awkward spaaaaaace right now. There are things like AnnotateIt and Awesome Screenshot and the annotations in Diigo. So I’m looking around for other free options and brain storming odd ideas and not find a whole lot and I came up with the following . . . Note: I’m not saying this is a good idea, it may even be a bad idea but it might inspire someone to do something more interesting down the line.1 I at least found it mildly amusing. Here’s how you might pull an author feed from WordPress into Google Spreadsheets with separate cells each paragraph (for paragraph level commenting). The idea being that you can share the Google document with just that student and do the commenting via the GSS commenting feature. Google spreadsheets will import lots of things (xml, atom, rss). WordPress provides lots of specific feeds (author, tag, categories, combinations thereof). So step one is to get the author feed – for example http://rampages.us/fren330/author/sheehantm/feed/. You can then use the IMPORTXML formula in GSS to import that XML and do some XPATH parsing of the pieces. In this case I used =IMPORTXML(“http://rampages.us/fren330/author/sheehantm/”,”//p”) to pull out the paragraphs. I can then share the […]

Little Trick, Big Numbers

I often want to know just a bit more about numbers I see in tables. As I was looking at some thing today, I stumbled on the Wikipedia page for “List of Most Viewed YouTube Videos“. After being more than a bit amazed at the utterly staggering numbers. I wanted to know what they translated to in terms of years because the numbers were just too big. I remembered that Google Spreadsheets will let you pull in a table from a website with no fuss. All I needed to do was put =IMPORTHTML(“http://en.wikipedia.org/wiki/List_of_most_viewed_YouTube_videos”,”table”,1) in the first cell on the spreadsheet and viola the table is transcluded. I can now add a few more calculations to figure out the import stuff – like how many years worth of time have been spent watching Gangnam Style (16,274.24 years for the record1). You can go mess around with the data here. 1 Assuming I didn’t screw something up.