Find us on GitHub

Teaching basic lab skills
for research computing

Matrix Programming: Introduction

Matrix Programming/Introduction at YouTube

Hello, and welcome to Software Carpentry's lecture series on matrix programming in NumPy. In this episode, we will talk about three ways to work with numerical data, and we will explain why a numerical package like NumPy is the best option.

Suppose a scientist is studying patients with Babbage's Syndrome, and she has patient response values for each patient, and each potential treatment. We might ask several questions about the responses, like how similar are patients' responses, or whether one treatment is recommended over another.

Both of these questions can be answered using matrix algebra, but there are several ways to program with matrices. These kinds of programs have a lot of loops, which can hide the underlying mathematical operation. In addition, the code is hard to debug, and almost impossible to tune.

Another option is to use high-performance libraries written in low-level languages like Fortran and C. For instance, this Fortran subroutine is part of a numerical package called LAPACK. It performs the operation of a constant times a vector plus another vector.

A third option is to use a high-level language like MATLAB or NumPy in Python. High-level languages use a data-parallel programming model, which means operations act on entire arrays, rather than using a lot of loops. In addition, high-level packages hide optimization details. In fact, most packages, including NumPy, are wrappers around those clunky high-performance libraries from the previous media.

To review, NumPy is a package that facilitates matrix operations in a way that is easier to implement, easier to debug, and emphasizes the mathematical problem over the implementation detail. In the next lecture we will learn about the 'array' type, which is the basic data type in NumPy.