Find us on GitHub

Teaching basic lab skills
for research computing

Matrix Programming: Indexing

Matrix Programming/Indexing at YouTube

Hello, and welcome to another episode of the Software Carpentry lecture on matrix programming. In this episode, we'll have a look at some of the ways you can index arrays. It may not seem important at first, but as we'll see, clever indexing allows you to avoid writing loops, which both reduces the size of your code, and makes it more efficient.

Arrays are subscripted by integers, just like lists and other sequences, so they can be sliced like other sequences as well. For example, if 'block' is the array shown here… …then 'block[0:3, 0:2]' selects its first three rows and the first two columns.

Please take a moment and ask yourself whether a slice like this is an alias for the original data, or a copy of it. If you have a Python prompt handy, please type in a couple of lines of code to test your guess.

As with other sliceable types, it's possible to assign to slices. For example, we can assign zero to columns 1 and 2 in row 1 of 'block' in a single statement.

Slicing on both sides gives us a way to shift data along the axes. If 'vector' is a one-dimensional array as shown here, then 'vector[0:3]' selects slots 0, 1, and 2… …while 'vector[1:4]' selects the values in slots 1, 2, and 3. Doing the assignment overwrites the lower three values (but note, it leaves the uppermost untouched).

As an exercise, write a loop that does the same thing, and then write a loop that shifts values up by one space instead. Which form do you find easier to understand?

If we want to do more sophisticated things, we can use a list or an array as a subscript. For example, here's our four-element vector again… …and a list with three legal subscripts: 3, 1, and 2. The expression 'vector[subscript]' creates a new array, whose elements are selected from 'vector' as you'd expect.

This works in multiple dimensions as well… …although the syntax and rules are not immediately obvious. For example, if we have a 2×2 matrix… …and subscript it with the list containing only the index '1'… …the result is its second row.

The details are explained in the NumPy documentation. If you're going to spend any time programming with arrays—in NumPy or anything else—it's worth learning these rules so that you can get the most out of the tools your language or library.

Remember, if you're looping over the elements in an array, you're probably doing the wrong thing.

Let's have a look at another way to subscript. If we compare our vector's elements to the value 25, we get back a vector with True where the element passed the test, and False where it didn't. As we saw in the previous episode, 'dtype=bool' is NumPy's way of telling us what the array elements' data type is.

We can use a Boolean array like this as a mask to select certain elements from our original array. Here, for example, the expression 'vector[vector<25]' gives us a vector containing only the elements that passed the test.

Again, take a moment and see if you can guess whether 'result' is a copy of the original data, or an alias, and why, and then test your guess.

We can use Boolean masking on the left side of assignment as well, though we have to be careful about its meaning. If we use a mask directly, elements are taken in order from the source on the right and assigned to elements corresponding to True values in the mask.

The 'putmask' function works slightly differently: it pulls values corresponding to True's in the mask from the source, and pushes them into corresponding slots in the destination. In both cases, only locations corresponding to True values in the mask are affected; it's what happens at the source that changes.

Operators like '<' and '==' work the way you would expect with arrays, but there is one trick. Python does not allow objects to re-define the meaning of 'and', 'or', and 'not', since they are keywords. The expression '(vector <= 20) and (vector >= 20)' therefore doesn't actually select elements with the value 20. Instead, it produces an error message.

One way around this is to use functions like 'loglcal_and', which combine Boolean arrays in the way you'd expect. Another is to use bitwise operators like '|' and '&', which operate on the bit-level representations of the Boolean values in the arrays.

These can produce some surprising results. For example, the bitwise or of anything with -1 is -1, since -1's bit representation is 11111…. In contrast, logical_and and related functions treat any nonzero value as True.

NumPy provides a whole-array alternative to 'if' and 'else' called 'where'. Its first argument is a Boolean mask. Where that is true, it takes the value from its second argument; where it is false, it takes its third. For example, 'where(vector < 25, vector, 0)' produces an array with the values from the original that are less than 25, or 0. Similarly, 'where(vector > 25, vector/10, vector)' scales large values or leaves values alone.

As an exercise, have a look at what the 'choose' and 'select' functions do, and try to think of cases where you would use them.

To review… Arrays can be sliced like lists… …or subscripts with vectors of indices… …or masked with conditionals.

No matter what you do, if you are writing loops over array elements, you have probably missed something, or are doing something wrong.

In our next episode, we'll use the tools we have looked at so far to explore some linear algebra.