Get the PDFs – Google Search to Google Folder

An old newspaper ad showing an automatic printer.

Here’s a neat little pattern that might interest others. We got a version of the question below yesterday.

Is there a way to automatically get the links from the search linked below into a spreadsheet?

https://www.google.com/search?q=site%3Aedu+filetype%3Apdf+syllabus+education&oq=site%3Aedu+filetype%3Apdf+syllabus+education

Then, from there, is there a way to automagically get the pdf files into a Drive folder?

Step 1: Get Google Search into Google Sheets

At first it seemed this would be really simple. Amit1 had done this really well back in 2015. Unfortunately, Google has started blocking this . . . even when you do it within Google Sheets/Scripts. This made me sad. Browser emulators and Python were dancing in my head but it seemed a bit too complex for a one time action.

Instead of over-complicating things, I opted to use a Chrome plugin called Scraper. I’ve had it installed for a long time. It lets you easily do xpath scraping of websites. You can see in action in the video below.

https://www.youtube.com/watch?v=KKPXPr26uvU

I also used the search settings to change the number of sites per page up to 100. Once I captured the info to the clipboard I just pasted it into Google Sheets.

Step 2: Save the PDFs

Now I just needed to loop through the URLs and save the PDFs to a particular Google Folder.

This google script gets my data and loops through it.

function getUrls(){
   var ss = SpreadsheetApp.getActiveSpreadsheet();//get the spreadsheet
   var sheet = ss.getActiveSheet();//get the right sheet
   var lastRow = sheet.getLastRow();//get the last row
   var urls = sheet.getRange('B1:B'+ lastRow).getValues();//get the urls column and gets the data within it
  for each (url in urls){
    saveInDriveFolder(url);//this will do the saving
  }
}

Now to save the PDFs to a particular folder . . .

function saveInDriveFolder(url){
  var folder = DriveApp.getFolderById('YOUR_ID_HEREE');// get the folder by ID is easy and you can just copy the ID from the URL 
  var options = {'muteHttpExceptions':true}; //important as there were missing files and if you don't mute the exceptions the script will fail 
  var file = UrlFetchApp.fetch(url, options); // get the file 
  if (file) {
  folder.createFile(file);//create the file in the folder
  }
}

1 A Google scripts man/myth/legend