Scraping Wikipedia User Data w Google Spreadsheets

creative commons licensed ( BY-SA ) flickr photo shared by nojhan Alice Campbell in the VCU library hosted a Wikipedia edit-a-thon today. It was interesting and we had a variety of faculty and even some students show up. Gardner joked at one point whether we had a leader board for edits. It got me thinking. I remembered that Wikipedia keeps track of the edits of logged in users and I figured I’d take a shot at scraping some of that data so we’d have a rough idea of how many edits were made by our group. I started off by looking at the contributions page. This URL will get you the page for my user name. https://en.wikipedia.org/wiki/Special:Contributions/Woodwardtw I used the IMPORTHTML formula in Google Spreadsheets.1 It was easy because this was the first list on the page. You can see in the image above that you have the choice between trying to grab a list or a table. The other variable is what number that element is from the top of the page. You can see the working document embedded below. I considered parsing out2 the ..(+30)..3 but after talking to Alice that wasn’t the kind of data that would travel well. She was more interested in number of edits which, as it turns out, is available on the Edit […]