Hello, and welcome to the Software Carpentry episode on MATLAB programming. In this episode, we will cover the array data type, which is the primary data type in MATLAB.
MATLAB was originally designed to provide an easy way to manipulate matrices. If you've ever tried to use an array of arrays to store a matrix, you know how many loops are required to write even simple programs. MATLAB performs those loops for you, so you can write "x times A times x transpose" rather than writing matrix-vector multiplication functions.
The main data structure in MATLAB is an array of double precision, floating point numbers. Arrays can have one or more dimensions, and they can be manipulated and combined in many ways that we will cover in subsequent lectures.Many applications utilize two dimensional arrays as matrices, so you'll hear us use the terms array and matrix interchangeably. However, if we say matrix, we always mean an array with 2 dimensions.
Arrays can be created from the MATLAB prompt. When you assign an array, the entire set of numbers is enclosed in brackets. Elements in the same row are separated by spaces or commas and a semi-colon denotes a row boundary. Just like in linear algebra, a row array is different from a column array.
Sometimes, it is helpful to create a block array by using other arrays rather than scalars in the definition. This is fine as long as the sizes of the arrays match up.
In this case, we tried to combine a 2 by 4 matrix with a scalar, which is the same as a 1 by 1 matrix. MATLAB will not allow this.
There are other ways to make matrices. The functions zeros and ones will create matrices that have all ones or all zeros. Another special function is eye, which creates an identity matrix. Finally, rand creates a matrix of random doubles between zero and one.
In a later episode, we will talk about how to program your own functions, but for now, let's explore some of the built in functions to examine the shape and type of an array.
The simplest functions are arithmetic operators, which operate on arrays in the way you would expect for matrices and vectors. Addition is elementwise and assumes that the two arrays have the same size.
Multiplication is matrix multiplication and assume that the inner dimensions agree. As an exercise, see if you can figure out what happens when you multiply two 3 dimensional arrays. As a hint, try to see what sizes must match between the arrays.
Other important operators are the transpose operator, which is a single quote, and two kinds of division. The first division a forward slash, is equivalent to a divided by b if a and b are scalars. If they are arrays, it is equivalent to a times the inverse of b. As a convenience, MATLAB provides a backslash division, which is the inverse of forward slash.
Rather than loop through each element of two arrays to combine them elementwise, most operators can be made to perform elementwise operations by putting a period before the operator. For instance, a dot times b is returns the elementwise product of a and b. Do you see how this might help the programmer avoid writing a loop?
Arrays can also be reshaped using two methods. The first is the function reshape. The first parameter to reshape is the array you want to reshape while the second is a vector of dimension sizes. In this case, we want to transform a in to a vector with 4 rows and 1 column. Since vectorizing an array is such a popular operation, there is an equivalent shortcut called the colon operator.
If B is a 3 by 4 by 5 array then it has 60 elements. We want to transform it to a 2 by 3 by something array, and it isn't too hard to see that the third dimension must be of size 10. What if we don't want to compute the extra dimensions every time?
It turns out MATLAB can compute a single missing dimension in reshape. In this case, we pass each dimension size as a separate parameter. This is fine for reshape. For the last size, we put in a pair of empty brackets, which is a sign to MATLAB to figure out that dimension for us. The reshape function will not add or delete elements from an array, so if the first two dimension sizes make it impossible to define an array, MATLAB is going to complain
Another important consideration is the order that reshape lists its elements. In order to answer this question, we need to explore a little bit about how the computer stores a matrix.When we see a matrix, we think about a two dimensional set of memory locations that each hold a number. The problem is that computer memory is 1-dimensional. In fact, it is one long array. To think about two dimensional matrices, we need to decide on a convention on how store the matrix in memory.
One possibility is row-major order, which concatenates the rows.This is the convention used in the C programming language and in Python, since Python is written in C.
In contrast, column-major order concatenates the columns.The Fortran language uses column-major order, and since MATLAB uses many matrix manipulation programs in from Fortran, MATLAB uses the column-major ordering.
As an exercise, see if you can guess how the two types of languages store 3-dimensional or higher-dimensional data.
It turns out that reshape always fills the new shape by pulling elements in the order in which they are stored. Since in MATLAB this is column major order, reshape fills the new array by pulling elements from column 1, then column 2, and so on.
Array can also be dynamically resized. If an element is assigned beyond the bounds of the current array, the array is filled with zeros for all spots that need to be created. Keep this in mind if you assign a zero to, say, the millionth column of an array: MATLAB is going to create 1 million columns of zeros in order to have space for the new element.
Remember that an array is stored in column-major order. If an array is resized in a way that changes the size of the columns, then the ordering is destroyed. A whole new block of memory gets allocated to hold the array, then the existing values are copied to new the memory location. If we continuously add elements to the end of an array, like on the last media, the overhead can be very expensive. That is why it is better to initialize an array to the size you know you will need then go back and fill the array one slot at a time.
To review, in MATLAB, the primary data type is the multidimensional array of double precision floating point numbers. All of the arithmetic operations operate directly on arrays, and MATLAB provides hundreds of functions that also operate on arrays. Among those functions are reshape and implicit resizing. We'll see many more operations in the coming episodes, including a rich library of visualization functions.
In the next episode, we'll have a look at how we can select values from arrays using indexes.