Data Games with Google Sheets

This is for a workshop with students on Saturday. It’s a revision of a thing Jeff and I did a way back in 2018. I think I’ve come up with some improved examples and I’ve revised a few things. The goal is to let any skill level start to see how spreadsheets can be useful to them. The videos are rough walkthroughs of what’s written here. The sheets have links to the various function help pages so I’m not going to duplicate that here.

Get your own copy of the importing sheet.

Get your own copy of the manipulating data sheet.

Get your own copy of the getting data out of sheet.

Get Data In

The new example sheet for getting data into sheets is here.

It has an example of the main ways I can think of to get data into Google sheets that exists elsewhere.

importHTML

=IMPORTHTML("https://en.wikipedia.org/wiki/World_War_II_casualties", "table",2)

This goes to the URL indicated and imports the second table. Most of the time you’d just cut/paste from the table but there are scenarios (like a Wikipedia page) where the content continues to grow where you might prefer this route. It’s also a bit harder to use these days as some sites load this content via javascript and it ends up being inaccessible to this particular function.

importData

=IMPORTDATA("https://raw.githubusercontent.com/wblakecannon/DataCamp/master/19-unsupervised-learning-in-python/data/fish.csv")

Another option that pulls in CSV or TSV files and probably another scenario where I’d be linking to a source I expect to change. Otherwise just upload it and open as a sheet or import it directly into the sheet.

importXML

=IMPORTXML("https://en.wikipedia.org/wiki/Special:Contributions/Woodwardtw","//*[@id='mw-content-text']/ul/li")

Now things get a bit more interesting. The importXML option allows you to scrape a page using xpath. It’s a nice option for people not wanting to program or wanting something quick and easy for a particular project.

The example above goes to a Wikipedia page, finds the element with the id of “mw-content-text” and then pulls each li element into the spreadsheet. It’s worth comparing it to the copy/paste option as it does a much cleaner job. In this example it wouldn’t matter a ton as it’s not a lot to clean up but think about the hassle it saves with larger amounts of data.

importFEED

=IMPORTFEED("http://digitalhumanitiesnow.org/feed/", "items", FALSE, 20)

Probably of lesser use but pulls in the RSS feed so you could analyze blog posts etc.

API Example

Since APIs differ quite a bit, this is just a simple example using a weather API. But this path is endlessly useful. I’ve had sites running for months gathering data every hour with sheets having many thousands of rows.1

function otherWeather(){
  const ss = SpreadsheetApp.getActiveSpreadsheet();//gets your spreadsheet
  const sheet = ss.getSheetByName('APIData');//gets the sheet in your spreadsheet with this name
  const apiUrl  = 'http://api.openweathermap.org/data/2.5/weather?zip=05753,us&appid=YOUR_API_KEY&units=imperial';//goes to the API
  const response = UrlFetchApp.fetch(apiUrl);//gets the API data
  const json = JSON.parse(response.getContentText());//treat it as json
  const data = [json.dt,json.weather[0].main, json.main.temp, json.wind.speed];//get the data we want from the API
  sheet.appendRow(data);//write it to the sheet
}

We can hook it to a timer and run it every hour or whatever.

I’ve used this for all sorts of things.

It’s the basis of my Gangnam Style weekly tweet.

Having it in a spreadsheet lets calculate the weekly change.

It’s part of the Trump Dump project (which will load but has ~25,000 items so it takes a while).2 The project is explained here.

Cleaning Up Data

Generally, the data you get is not exactly the data you need. You either need it combined or you need it broken apart.

Sticking stuff together

=CONCATENATE(A2,", ",B2,", ",C2)

You can stick multiple cell data together. This is a nicer options than I’ve used previously and creates a comma separated list without the hassle of ampersands. Nice that you can easily mix cell values and your own data.

Breaking stuff apart

SPLIT(A4, ",")

And if you want to break things apart by whatever you want split is so very handy.

Getting specific pieces of the whole

Many times you can scrape, import, or have chunks of data you’d like to clean up. If that data has consistent patterns then you can do magic using a variety of functions.

The following examples are just ways to look for some character (a period, an @, etc.) and then use where that is in the text string to get back what you want. It’s a bit like algebra and kind of fun in a weird way. These examples will make more sense if you look at the spreadsheet data source but the short explanation is that if your data has a pattern, you can chop that data up across all your examples in ways that become more useful to you.

=LEFT(A10,FIND(".",A10))
=RIGHT(A12,(LEN(A12)-(FIND("@",A12))))
=MID(A13,B13+1,(C13-B13)+1)

Historically, I’ve just used LEFT, RIGHT, and MID to do this stuff but REGEX is more powerful and sophisticated if you’re willing to learn it. It’s really magical and you can do insane things if you invest the time and energy.

=REGEXEXTRACT(A8,"\([A-Z]+\)")

– gets you the first instance of text surrounded by ()

=REGEXEXTRACT(A9,"\([^A-Z]+\)")

– gets you the last instance of text surrounded by ()

Getting Data Out

Now that you got your data in, got it doing what you needed, now you need to get it out in the way that you want.

Publish/Embed

The easiest way to get data out of sheets is to publish a sheet or chart. You can then link to it directly or embed that element in another website.

The charts/graphs sync up with changes to the spreadsheet in about 5 minutes which isn’t bad. If you want to take a deeper dive, you can make that happen nearly live.

Keep in mind you can also send it to other places like Voyant for additional options on display and analysis.

Custom Fitted

If you want to pay for it there are many services that will take your Google Sheets and turn them into sites, apps, and other magical creatures. I used to recommend Awesome Tables but they’ve locked down their free stuff pretty hard of late. There are lots of other options out there. It might be worth it for you.

Depending on your goals, your background, and what you have time for . . . it might be worth building your own front-end for the data. Here’s the codepen example. If you search this site or my codepen you’ll find tons more using the WordPress API, Vue, jQuery etc.


1 I think the cap is 5 million rows. Google has yet to cut me off but I’ll keep trying.

2 Should I fix this? No.