Find us on GitHub

Teaching basic lab skills
for research computing

Multimedia Programming: Images

Multimedia Programming/Images at YouTube

Hello, and welcome to the first episode of the Software Carpentry lecture on multimedia programming. In this episode, we'll have a look at how to manipulate images with programs.

Pictures predate writing by hundreds of thousands of years. They're just as easy to manipulate with programs as text… …provided you have the right libraries, and know how to use them. We'll use the Python Imaging Library, or PIL, for our examples… …but every other language has similar tools that work in similar ways.

Let's start by loading an image from a file into memory. We import the 'Image' module from PIL… …then create an in-memory copy of the image data using ''. This function's single argument is the path to the image file we want to load.

Its size is 640×480 pixels.

Before we go any further, we need to talk a bit about coordinate systems, including the one used for color. You may not think of colors as coordinates, but the most common color scheme, called RGB, stores the red, green, and blue values of each pixel in one byte each. This is an additive color model: the color we see is the sum of the individual color values, each of which can range between 0 and 255, which is the maximum integer that can be stored in an 8-bit byte. Black is (0, 0, 0), i.e., nothing of any color. White is the maximum value of all three colors, which is (255, 255, 255), or equivalently (0xFF, 0xFF, 0xFF) in hexadecimal. We can think of this color model is as a cube: the three axes represent the primary colors, while secondary colors are combinations of maximum values. And each actual color is a coordinate in this cube.

The other coordinate system we need is one that identifies pixel locations. As you'd expect, image libraries use (x, y) coordinates. What you might not expect is that (0, 0) is the upper left corner of the image, rather than the lower left. This is a holdover from the days when images were displayed by analog devices like cathode ray tubes, which drew that pixel first. As this example shows, once the variable pic refers to a picture, we can get the RGB triple representing the color of the pixel at (x, y) with pic.getpixel((x, y)). Notice that this method takes one argument, which is a tuple of two values, rather than taking the x and y coordinates separately.

For our first exercise, let's find the brightest pixel in the image, which we might need to do if we're normalizing the image's color values.

The first thing we have to do is figure out what we mean by "brightest": is a pixel with a lot of red but no green or blue brighter than a pixel with some green and blue but no red? To keep things simple, we'll just add up each pixel's color values to approximate its overall luminance.

Our code is then a straightforward double loop: the outer loop goes through possible values of x, while the inner goes through possible values for y. When we find a pixel whose luminance is greater than the greatest seen so far, we record that value, along with its coordinates.

This simple piece of code tells us that the brightest pixel is at (59, 345), and that its total luminance is 758. By comparison, the greatest possible value is 3 * 256, or 768.

Now, how fast was that program? We normally wouldn't bother asking this question unless we were sure performance was a problem, but modern cameras produce gigapixels of information, and doing anything a few billion times is likely to be slow.

First, we'll put our code in a function (as we should have done in the first place).

Next, we'll import a function called 'time' from the 'time' library. Each time this function is called, it returns the current value of the computer's clock in seconds, measured since the rather arbitrary zero date of January 1, 1970. To find out how long a function takes to run, we just call 'time' before and after calling the function, and take the difference between the two values. Here, we've put that logic in a function called 'elapsed', which takes a function and a picture as arguments, applies the function to the picture, and returns the elapsed time along with whatever the function itself returned.

If we use 'elapsed' to run 'brightest', we find that it takes 0.63 seconds to find the brightest pixel. That's pretty fast, but we can do a lot better.

Let's ignore coordinates for a moment, and simply find the luminance of the brightest pixel.

This function, 'faster', uses 'picture.getdata' to unpack the row-and-column representation of the image to create a vector of pixels, and then loops over that.

This picture shows how the pixels are unpacked row by row to create the vector.

This function is more than nine times faster than its predecessor, partly because we are not translating between (x,y) coordinates and pixel locations in memory over and over again, and partly because the 'getdata' method unpacks the pixels to make them more accessible.

As an exercise, modify this function so that it returns the (x, y) coordinates of the brightest pixel by counting pixels inside the loop, and converting that count back to x and y values after the loop is over.

While speeding things up by a factor of nine is worthwhile, having to calculate pixels' (x,y) coordinates manually is a pain.

A useful compromise between the two is to call 'picture.load', which unpacks the picture's pixels in memory, so that you can index the picture as if it was an array.

This version of our pixel finder runs in 0.13 seconds: half the speed of the vector version, but still almost five times faster than the original. Which of the three forms you should use in a particular situation depends on what information you need from the image, and how big the images you're working with are.

One of the things an astronomer might want to do with an image like this is count how many stars it contains.

As a first step, let's convert the image to black and white, so that which pixels belong to stars and which don't is unambiguous.

We'll use black for stars and white for background, since it's easier to see black-on-white than the reverse.

Our function, 'monochrome', loops over the pixels in the loaded image, replacing the RGB values of each with either black or white depending on whether its total luminance is above or below some threshold passed in by the user.

Let's run our function with 200 + 200 + 200 as a threshold, and use '' to save the result in a file.

Remember, this threshold is a scalar, not an RGB triple: we're looking for pixels whose total color value is 600 or greater.

Here's our output: a lot of speckles that are only a couple of pixels wide, and a few larger dots representing larger, brighter objects.

With this in hand, we can start counting stars, which will be the subject of our next episode.

Thank you.