Scraping Instagram #2


flickr photo shared by Marco Gomes under a Creative Commons ( BY ) license

Remember last night when I posted about scraping data from Instagram?

I woke up this morning about 5:30 (literally with a start) astounded by how easy the solution to archiving the pagination returns was. So before I even left for work I managed to get this working so much better than my previous attempt.

I stripped out all of the previous GitHub stuff as I realized I didn’t really need it. It had provided a nice crutch and let me know I sort of knew what I was doing.

The explanation of what’s going on is in the comments interspersed in the code below. There’s a much cleaner way to do this where I don’t duplicate so much code. I could just call the part that builds the csv1 twice. I may do that at some point but I think having it all in one place will help people new to this sort of thing see what’s going on more clearly.

This is fun stuff. I need to do more of it and more consistently. In the past, I’d do some programming for a few days and then not do any for a number of months. That makes for slow progress and frustration. I’m going to try to do some programming every day for a few months and see what that feels like.


<?php
//this is hard coded for now but the API endpoints for Instagram are pretty well documented - this should work for you if you replace THIS_IS_WHERE_YOU_PUT_YOUR_CLIENT_ID with your client ID.
$url = 'https://api.instagram.com/v1/tags/vape/media/recent?client_id=THIS_IS_WHERE_YOU_PUT_YOUR_CLIENT_ID';
//gets the URL of the json which is our source data
$json = file_get_contents($url);
//makes json readable by the php 
$obj = json_decode($json);
//setting the time zone
date_default_timezone_set('EST');


        //this grabs the stuff we want from the json file and adds a date gathered column
			$list = array();
			foreach ($obj->data as $media) {
			 	$username = $media->user->username;
                $likes = $media->likes->count;
                $comments = $media->comments->count;
                $link = $media->link;
                $caption = $media->caption->text;
                $filter = $media->filter;
				array_push($list, $username . '?' . $likes . '?' . $comments . '?' . $link . '?' . $caption . '?' . $filter . '?' . date(DATE_RFC2822) );
				}
			$file = fopen("vapedataBIG.csv","a+");

			foreach ($list as $line)
				  {
					  fputcsv($file,explode('?',$line));
 				  }

			fclose($file); ?>
<?php	
//this is the addition and it works through the content 11 times (remember 0 counts) giving us a total of 12 runs of 20 (240 results), each loop sets the json url to the next_url 
		
			 for ($x = 0; $x <= 10; $x++) {
				$url = $url;	
                              //the json stream has a built-in url for next set of data we need so we'll use that to move through the content
				$next = $obj->pagination->next_url;	
				$url = $next;
				$json = file_get_contents($url);	
				$obj = json_decode($json);
				echo $next . '<br>';
				$list = array();
				foreach ($obj->data as $media) {
			 	$username = $media->user->username;
                $likes = $media->likes->count;
                $comments = $media->comments->count;
                $link = $media->link;
                $caption = $media->caption->text;
                $filter = $media->filter;
				array_push($list, $username . '?' . $likes . '?' . $comments . '?' . $link . '?' . $caption . '?' . $filter . '?' . date(DATE_RFC2822) );
				}
			$file = fopen("vapedataBIG.csv","a+");

			foreach ($list as $line)
				  {
					  fputcsv($file,explode('?',$line));
 				  }

			fclose($file); 			 

};
?>
      



1 Is it still CSV file is if isn’t separated by commas?

2 thoughts on “Scraping Instagram #2

    1. It was really weird. Just shot up in bed and it was all so simple.

      Reminds me of a friend in college who used to wake up doing pass blocking drills. Except he’d wake up because he nailed the cement wall in his sleep.

Comments are closed.