15

Facebook Comments for Research

This is a post about (at least temporary) failure. I should be able to do all this via the Facebook API/SDK but I’m doing something wrong. While I’m learning a decent amount in the land of programming, fundamentally I still suck. With that cathartic self-flagellation out of the way . . . I’m working with someone who wants to grab all the comments from a large number of CDC posts about health issues for some research. She is currently doing it by hand. Here are two improvements to that awful reality.1 Expand All Comments In FB land if there are many comments you only see the most recent. And if there are many, many comments you have to keep hitting “View Previous Comments” over and over. That’s super boring to do once. If you have to do it a lot it would really suck. Enter Alec’s bookmarklet. I tweaked it a tiny bit because I think FB changed the wording but it works like a charm. You’d copy the text below. Add a bookmark and then click edit. Replace the URL with this text and name it whatever you want. Grab All Comments Now to grab all these enlightening comments . . . Install the Scraper Chrome Extension. You can now right click a particular item and choose “scrape similar.” […]

Scraping Wikipedia User Data w Google Spreadsheets

creative commons licensed ( BY-SA ) flickr photo shared by nojhan Alice Campbell in the VCU library hosted a Wikipedia edit-a-thon today. It was interesting and we had a variety of faculty and even some students show up. Gardner joked at one point whether we had a leader board for edits. It got me thinking. I remembered that Wikipedia keeps track of the edits of logged in users and I figured I’d take a shot at scraping some of that data so we’d have a rough idea of how many edits were made by our group. I started off by looking at the contributions page. This URL will get you the page for my user name. https://en.wikipedia.org/wiki/Special:Contributions/Woodwardtw I used the IMPORTHTML formula in Google Spreadsheets.1 It was easy because this was the first list on the page. You can see in the image above that you have the choice between trying to grab a list or a table. The other variable is what number that element is from the top of the page. You can see the working document embedded below. I considered parsing out2 the ..(+30)..3 but after talking to Alice that wasn’t the kind of data that would travel well. She was more interested in number of edits which, as it turns out, is available on the Edit […]