Find us on GitHub

Teaching basic lab skills
for research computing

Spreadsheets: Introduction

Spreadsheets/Introduction at YouTube

Hello, and welcome to the first episode of the Software Carpentry lecture on spreadsheets. In this episode, we'll have a look at a few simple things they can do.

A spreadsheet is a tool for performing computations on tables of data. They're most useful when your data is numeric, and organized into rows and columns.

We'll use Microsoft Excel, since it's cross-platform and widely available. There are alternatives, such as OpenOffice Calc and Gnumeric, that have similar features.

Each rectangle in a spreadsheet is called a cell.

Cells can contains numbers, dates, times, or plain text.

In this spreadsheet, the cells contain information about Canada's provinces and territories.

Information for each province and territory is in a single row; the columns hold their areas, populations, and CO2 emissions for 1990 and 2008.

Those CO2 emissions are measured in megatons. If we want the 2008 values in kilotons, we can put formulas in cells to do the conversions.

First, let's add a heading for our new emissions column. We want each cell in this column to contain the corresponding cell value in column E, times 1000.

Let's start by putting our cursor in cell F2. We want F2 to contain the value in E2 times a thousand, so we type "=E2*1000".

Every formula starts with an equals sign. As usual, "*" is the multiplication operator, and E2 is a cell reference. It works just like a variable: using it means, "The value located in that cell."

We could emissions in kilotons for the other regions by typing a new formula for each, but that would be tedious, and we'd probably make mistakes. There's a better way.

Let's copy the formula in cell F2 and paste it into cell F3. If you look at the formula bar, you'll see that Excel automatically changed the formula to a thousand times E3, which is what we want.

Let's copy it again, then select the rest of the cells in the column, and paste it into all of them at once.

Looking at the cells one by one, we can see that Excel has adjusted the cell reference in each.

If this was a program, we wouldn't repeat the number 1000 over and over again. Instead, we would put it in a variable of its own and reference that.

We can do that with spreadsheets as well. Let's go down to row 16 and create a new heading "megatons to kilotons". Excel doesn't require this, but it's a good habit to get into, since a spreadsheet with thousands of cells and no labels is pretty hard to figure out.

We'll put the value 1000 in cell B16, then delete all of the formulas in column F, and type in a new formula: F2 equals E2 times B16.

That does what we want, but when we copy the formula and paste it into cell F3, we get zero instead of 244,300. What's gone wrong?

Looking at the formula bar, we can see that Excel has adjusted the cell references so that the formula is E3 times B17. The first change is right, but the second makes the formula refer to an empty cell.

We can prevent the spreadsheet from automatically adjusting the reference to B16 by using an absolute reference. An absolute reference has a dollar sign before the column name, row number, or both.

In our case, since we don't want B16 to be translated in any direction, we'll reference it using $B$16.

When we copy the formula into other cells, Excel only updates one of the references, not the other, which is what we want.

And now, if we want emissions in tons instead of kilotons, all we have to do is change the value in B16 from a thousand to a million, and all of the cells that depend on it are automatically updated.

If we change the value back, the cells are updated again.

This is why people use spreadsheets: they automatically keep track of what depends on what, and update things as needed whenever any values change.