Scraping Instagram – Take 3

flickr photo shared by JeepersMedia under a Creative Commons ( BY ) license

I started to apologize for writing three posts on this and promising not to do any more but I reconsidered. This is my site. I’ll write whatever I want. Skip it if it bores you or exile me from your feed reader.1

Alan’s comment got me thinking that using spreadsheet formulas was not necessary and it felt awkward to me anyway. So I figured out how to do it all in the php. I’ll include the relevant portion of the code below. You can get the whole thing here.

The key is substr_count which will find stuff in a string and count it. The other little piece is boolval which returns true if it’s greater than 0.

//$caption gets all the text associated with the instagram post
                $caption = $media->caption->text;
                $filter = $media->filter;
//$hashcount looks at $caption and counts how many times it finds a #
                $hashcount = substr_count($caption, '#');
//$hashtrue looks at $hashcount and if it's >0 it returns true
                $hashtrue = (boolval($hashcount) ? 'true' : 'false');
//same pattern here counting @ instead
                $atcount = substr_count($caption, '@');
                $attrue = (boolval($atcount) ? 'true' : 'false');
//add the results to the CSV                
				array_push($list, $username . '?' . $likes . '?' . $comments . '?' . $link . '?' . $caption . '?' . $filter . '?' . date(DATE_RFC2822) . '?' . $hashtrue . '?' . $hashcount . '?' . $attrue . '?' .$atcount) ;

1 Plus no one reads blogs any more. Shouldn’t you be on Twitter or vaping?

Comments on this post

  1. Luke said on July 24, 2015 at 3:31 pm

    I’m definitely going to delete you from Google Reader after this one.

    • Tom Woodward said on July 24, 2015 at 7:48 pm

      Damn it. Now I’m eating into my core constituency. It’s you, Alan, and Jim. But I must maintain my integrity as a blogger. Come what may.

      • Luke said on July 25, 2015 at 11:38 am

        Sorry. I gotta do what I gotta do. But you might be temporarily saved, apparently Google Reader is like down for some maintenance or something. I’ll have to try again later.

  2. CogDog said on July 24, 2015 at 4:09 pm

    I may have half submitted a comment while on a mobile in the back of a car in Mexico. Keep the code brewing.

    The boolval function is not needed, there are more direct ways you can test values in an if () statement. For example, if your $hashcount finds any matches, 1,2, 234, evaluatng a simple if ($hashcount) returns true if it is any value other than 0. 0 is the same as false, any integer value is true. When you have a string value, a if ($string) evaluates false for an empty one, and true otherwise.

    So you can go shorter with

    if ($hashcount) {
    // do stuff if there are hashtags
    } else {
    // do other stuff

    PS What is “Vaping”?

    • Tom Woodward said on July 24, 2015 at 7:52 pm

      Would I want to do that if all I want is true/false? There is no ‘else’ that I want. I just want it to write true if it’s not 0 and false if it is. It seems economical.

      Not arguing just trying to see how that’s more direct.

      Vaping is a bizarre ecigarette thing taken to the next level.

      • CogDog said on July 27, 2015 at 3:34 am

        6 of 1 etc; if never even used/seen boolval(). You could do it either

        $hashtrue = ($hashcount) ? ‘true’ : ‘false’;

        Or it looks with that function, this is most compact–

        $hashtrue = boolval($hashcount);

        They all do the same thing;.

        • Tom Woodward said on July 27, 2015 at 7:45 am

          Ah. Got it. I just didn’t get the structure when you weren’t really doing anything. There are some basic things that I’ve skipped in my miseducation.