Scraping Imgurl.com for images!

Posted on Dec 28, 2018

Like most of the code I write these days in my spare time, It’s usually for a little bit of fun! This time, I just wanted to get a random bunch of images from Imgurl.com to collect some “memes”. I’ll warn you now, the internet is not a nice place. People upload all sorts of random stuff to Imgurl. Be warned!

Let’s get technical!

Firstly, I decided to use Python 3 on an EC2 micro (AWS virtual server). Secondly, it was written in about 10 minutes. I’m sharing it for future use (if anyone dares find a legitimate reason for using it). Plus I just wanted a reason to embed GitHub’s Gist into my blog!

I use a random string generator function (id_generator) to create a new ID for the imgurl path, then use urllib to create an http request and read the data from that path. I then pass this object which I’ve just read into Pythons Image library and do a comparison on the image size (test.size) and to see if the image is remove.png or not (the image displayed when no image has been uploaded, or the image has been removed). If the image is removed.png, I do not print the results, but move on to another ID. When I find an image that doesn’t match remove.png size, I output the ID, image size and image format.

I added a try/catch to deal with Keyboard Interrupts. You can press CTRL+C and the while loop will exit gracefully (well as gracefully as it can).

Code:

This will output the image IDs which are valid in the console, along with the image size and format. You can then, of course, use the ID as you wish (open them in a browser or embed in HTML etc).

That’s as far as I got, before I realised it was a terrible project and moved onto something else…