flickr photo shared by Marco Gomes under a Creative Commons ( BY ) license
Remember last night when I posted about scraping data from Instagram?
I woke up this morning about 5:30 (literally with a start) astounded by how easy the solution to archiving the pagination returns was. So before I even left for work I managed to get this working so much better than my previous attempt.
I stripped out all of the previous GitHub stuff as I realized I didn’t really need it. It had provided a nice crutch and let me know I sort of knew what I was doing.
The explanation of what’s going on is in the comments interspersed in the code below. There’s a much cleaner way to do this where I don’t duplicate so much code. I could just call the part that builds the csv1 twice. I may do that at some point but I think having it all in one place will help people new to this sort of thing see what’s going on more clearly.
This is fun stuff. I need to do more of it and more consistently. In the past, I’d do some programming for a few days and then not do any for a number of months. That makes for slow progress and frustration. I’m going to try to do some programming every day for a few months and see what that feels like.
<?php //this is hard coded for now but the API endpoints for Instagram are pretty well documented - this should work for you if you replace THIS_IS_WHERE_YOU_PUT_YOUR_CLIENT_ID with your client ID. $url = 'https://api.instagram.com/v1/tags/vape/media/recent?client_id=THIS_IS_WHERE_YOU_PUT_YOUR_CLIENT_ID'; //gets the URL of the json which is our source data $json = file_get_contents($url); //makes json readable by the php $obj = json_decode($json); //setting the time zone date_default_timezone_set('EST'); //this grabs the stuff we want from the json file and adds a date gathered column $list = array(); foreach ($obj->data as $media) { $username = $media->user->username; $likes = $media->likes->count; $comments = $media->comments->count; $link = $media->link; $caption = $media->caption->text; $filter = $media->filter; array_push($list, $username . '?' . $likes . '?' . $comments . '?' . $link . '?' . $caption . '?' . $filter . '?' . date(DATE_RFC2822) ); } $file = fopen("vapedataBIG.csv","a+"); foreach ($list as $line) { fputcsv($file,explode('?',$line)); } fclose($file); ?> <?php //this is the addition and it works through the content 11 times (remember 0 counts) giving us a total of 12 runs of 20 (240 results), each loop sets the json url to the next_url for ($x = 0; $x <= 10; $x++) { $url = $url; //the json stream has a built-in url for next set of data we need so we'll use that to move through the content $next = $obj->pagination->next_url; $url = $next; $json = file_get_contents($url); $obj = json_decode($json); echo $next . '<br>'; $list = array(); foreach ($obj->data as $media) { $username = $media->user->username; $likes = $media->likes->count; $comments = $media->comments->count; $link = $media->link; $caption = $media->caption->text; $filter = $media->filter; array_push($list, $username . '?' . $likes . '?' . $comments . '?' . $link . '?' . $caption . '?' . $filter . '?' . date(DATE_RFC2822) ); } $file = fopen("vapedataBIG.csv","a+"); foreach ($list as $line) { fputcsv($file,explode('?',$line)); } fclose($file); }; ?>
1 Is it still CSV file is if isn’t separated by commas?
God help him, he is dreaming in code! Non-programmistan is pissed right now 😉
It was really weird. Just shot up in bed and it was all so simple.
Reminds me of a friend in college who used to wake up doing pass blocking drills. Except he’d wake up because he nailed the cement wall in his sleep.