Scraping Wikipedia User Data w Google Spreadsheets

creative commons licensed ( BY-SA ) flickr photo shared by nojhan Alice Campbell in the VCU library hosted a Wikipedia edit-a-thon today. It was interesting and we had a variety of faculty and even some students show up. Gardner joked at one point whether we had a leader board for edits. It got me thinking. I remembered that Wikipedia keeps track of the edits of logged in users and I figured I’d take a shot at scraping some of that data so we’d have a rough idea of how many edits were made by our group. I started off by looking at the contributions page. This URL will get you the page for my user name. https://en.wikipedia.org/wiki/Special:Contributions/Woodwardtw I used the IMPORTHTML formula in Google Spreadsheets.1 It was easy because this was the first list on the page. You can see in the image above that you have the choice between trying to grab a list or a table. The other variable is what number that element is from the top of the page. You can see the working document embedded below. I considered parsing out2 the ..(+30)..3 but after talking to Alice that wasn’t the kind of data that would travel well. She was more interested in number of edits which, as it turns out, is available on the Edit […]

Private Comments via XMLIMPORT

Making shareable (Sharing with a single person or specific group but not with the world.) comments on public writing is a fairly awkward spaaaaaace right now. There are things like AnnotateIt and Awesome Screenshot and the annotations in Diigo. So I’m looking around for other free options and brain storming odd ideas and not find a whole lot and I came up with the following . . . Note: I’m not saying this is a good idea, it may even be a bad idea but it might inspire someone to do something more interesting down the line.1 I at least found it mildly amusing. Here’s how you might pull an author feed from WordPress into Google Spreadsheets with separate cells each paragraph (for paragraph level commenting). The idea being that you can share the Google document with just that student and do the commenting via the GSS commenting feature. Google spreadsheets will import lots of things (xml, atom, rss). WordPress provides lots of specific feeds (author, tag, categories, combinations thereof). So step one is to get the author feed – for example http://rampages.us/fren330/author/sheehantm/feed/. You can then use the IMPORTXML formula in GSS to import that XML and do some XPATH parsing of the pieces. In this case I used =IMPORTXML(“http://rampages.us/fren330/author/sheehantm/”,”//p”) to pull out the paragraphs. I can then share the […]