Find us on GitHub

Teaching basic lab skills
for research computing

MATLAB: More Flow Control

MATLAB/More Flow Control at YouTube

Hello, and welcome to another episode of the Software Carpentry lecture on MATLAB programming. This is part two of the series on flow control. In this episode, we will take a closer look at functions.

In the last episode, we introduced the if-else and for loop statements, which provide basic flow control in MATLAB. Another important flow control construct is the function, which helps divide a large program into more manageable sections.

Each function represents a single task that can be separated from the whole. We created a function in the last episode to hold our interpolation example. Now, we will update that function to be useful in more general situations.

Any program can be broken into a series of independent tasks. The task of loading data should be independent from the task of performing a calculation on that data, which should then be independent of displaying the result of the computation.

Each subtask has its own set of subtasks. For instance, to load data, one first has to figure out where the data is located, then read it from the disk or network, and finally test that the format is correct. A large calculation often has many subtasks that should be logically independent from the rest of the program. A function can be used to encapsulate each of these tasks into its own logical unit.

In MATLAB, most user activity occurs in the main workspace, which can be controlled from the command line in MATLAB's interface. The main workspace has a state, which contains the set of defined variables and their current values. When a user calls a function, she passes some variables to the function as parameters. The function performs its task and may pass some return values to the caller. The function has its own state: it doesn't have access to the variables inside of the main workspace unless they are passed as parameters.

As an example, consider the problem of computing eigenvectors. The eigenvectors of a matrix are the set of vectors that only change by a scalar under matrix multiplication.

The scalar multiplier is called an eigenvalue, and eigen pairs form a key theoretical concept that is used in many fields. Computing eigenvectors is a subtask that should be handled by a function.

MATLAB provides a function "eig" that computes the eigenvalues of a matrix and returns them in a vector.

The same function can compute both eigenvalues and eigenvectors. If we ask for two return values, then the first return value is eigenvectors and the second is the set of eigenvalues. That means that the first return value's meaning actually changed when we request two returns!

The eigenvalue function, and many other functions in MATLAB, makes use of two special functions called nargin and nargout to tell how many parameters were passed in to the function and how many return values are expected.

Nargout should be thought of as an extra, implicit parameter that is passed to every function. Just like any other parameter, nargout can be used to dictate the behaviour of the function.

Let's return to the interpolate function that we introduced in part one of the flow control lectures. In that episode, we defined a function that would accept time series data that contained bad measurements. All of the bad measurements were assumed to be zero, and our function filled in those zeros by interpolating from nearby values.

In this episode, we will make our function more useful by adding extra parameters and return types. First, we will let the user choose a range of acceptable measurements by specifying a lower bound other than the default of zero as well as by letting them choose an upper bound. We will also add a few status variables to the output.

In our new function, we will follow a general convention. The first return value should be the result of the calculation, and generally, that value means the same thing no matter how many return values are requested. If the user asks for more return values, those should be treated as status variables. For instance, the min function returns the minimum entry in a vector as its first return value. If another return value is requested, that value holds the index of the location that contained the minimum value.

Our new function will accept an optional minimum and maximum acceptable value. It will also declare that it can return up to three values. The second return value is the number of locations we replaced. The third return value is the set of indices of locations that we replaced. Note that we are following the pattern of less to more information in our extra return variables.

At the start of our function, we test whether a second parameter was passed. If not, then the min is set to the default value of zero.

Next, we test whether a third argument was passed. If so, we find all values that are larger than the max and add them to the list of values to replace.

After we've interpolated all of the values that need to change, we assign the extra return values as needed. If the caller requested a second variable, we assign the length of the replacement vector to num_replaced. If they want all of the locations that were replaced, then we assign the set of row and column indices to replaced_locations.

To review, functions can be used to encapsulate a single task or idea in a way that separates it logically from the rest of the code, and that separates variable names to avoid accidentally overwriting globally defined variables. MATLAB functions can have a variable number of parameters and return types, and the function has access to the number of each through the use of nargin and nargout.

In the next episode, we will examine cell arrays and structures, which are data types that are useful when working with data that is not well suited for an array.