Text Acrobatics in Google Spreadsheets

I’ve established here enough times I’m not a programmer so I have to find ways to get things done until I learn more.

I found this gigantic list of edtech related conferences compiled by Dr. Wright thanks to Stephen Downes’ feed. It’s in Word1 for a variety of reasonable reasons and I can’t fault anyone who puts in this kind of time and energy and puts the result out there for free. It does make it harder to manipulate but it is very consistent which opens up doors which might otherwise be closed. It does cut/paste into a Google Spreadsheet pretty well.

The key to things like this is finding ways to break them into pieces. It is really algebra and variables but a more entertaining version. You can chop pieces of the block up and then chop up those parsed out pieces.

For convenience’s sake we’ll use cell A2 as the housing for the unparsed information.

December 1-2, 2013 International Conference on Advanced Education Technology and Management Science (AETMS), Hong Kong, China. http://www.aetms.org/

The first thing I did was scan for consistencies that I might use as chop points. The date is always first and in most cases it’s one word (the month) followed by the dates and a comma.

That allows me to do things like =FIND(” “,A2). That will give me the number of the first space- in this case 9 (written in cell B2). Now no matter what the first month is I can pull it out as a separate piece by putting in a formula like =LEFT(A2,B2). That gives me the first X letter of whatever is in cell A2.

I can also get fancy and find the second space. Something like =FIND(” “,A2,B2+1). That will start us at one character beyond the first space and will find the next space in the sequence.

Other useful tricks include =LEN(A2) which will tell you how long the text is (148 in this case and written in cell C2). Now if I =FIND(“http”,A2), I’ll get 128 (written in cell D2). So now by subtracting (148-128) we know the URL is 20 characters long and since it’s at the end of the line we can get it by =RIGHT(A2,(C2-D2)). If it wasn’t at the start/end of a line, you can opt to use =MID and play with those variables.

In any case, I have little hope this makes sense and slightly less hope anyone would struggle through it but the spreadsheet is here and while you can’t edit it you can see the formulas and comment. I have done some erroneous/useless parsing that needs to be pulled out but that’s what weekends are for.

The reason, of course, that I did this was to pull the information in Exhibit. The cramped version is below and the(seems to break things so it’s now gone) full version is over here. The example has December through March. I may do the rest at some point.

1 Although I believe a Word table or Excel might have all the advantages and allow for easier transitions to other formats.

Comments on this post

  1. CogDog said on November 19, 2013 at 8:07 pm

    It’s worth learning a little grep too – I do this in BBEdit like patterns that can replace all urls with HTML hyperlinked urls. I bet one could run it through something that would get tabs in there so you could spreadsheet it easier.

    Still it’s super handy to have the spreadsheet formula chops. I’m thinking it’s worth learning the scripting language for gdocs– apparently you can do more with forms and build apps that run in gDrive. When I grow up I want to script like Martin Hawksey

    • Tom Woodward said on November 19, 2013 at 10:55 pm

      I’ll have to work on that. I’ve been looking at scraping options etc. The spreadsheet is certainly not the most efficient of tools. I’m going to be improving my skill set out of necessity with the added beauty of time, focus, and interest.

  2. CogDog said on November 19, 2013 at 11:28 pm

    Too bad the dude does not put info in a database and then generate as needed. Unstructured data, blerg