Find us on GitHub

Teaching basic lab skills
for research computing

Databases: Selecting Data

Databases/Selecting Data at YouTube

Welcome to the Software Carpentry screencast on Databases. This screencast is on retreiving data from tables in a database.

Databases come in many flavours. In these screencasts we will be using the SQLite Manager plugin for Firefox to interact with our database.

As we've seen, a database is a way to store information that is arranged as tables.

We will be using a database that has a single table named Experiment.

This table is a log of all of the work done on experiments in a research lab, broken down by project and scientist

The table has a column for the login id of the scientist, the name of their project, a numeric id for their experiment, how many hours they spent on it, and when it took place.

Each row, or record, in this table describes one scientist's work on a certain experiment on a given date.

To start, lets write an SQL query that retrieves the loginID and the Project name from the Experiment table.

We do that by using the SQL command, SELECT. We then list the columns we want to read from the database table.

We want the LoginID, and Project name, so we write those column names, and then we write FROM, and the name of the table we want the data from, Experiment:

SELECT LoginID, Project FROM Experiment;

We put a semi-colon at the end to tell the database this is the end of the command.

I've capitalised the words SELECT, and FROM because they are SQL keywords. Capitalisation isn't necessary, but we'll continue this throughout the screencast so that it is clear what is a keyword and what is a table name or field name.

When we run the command, it shows us all of the data from the Experiment table for the two columns that we asked for: the LoginID, and the name of the Project.

If wanted the Hours column, we'd just add that to the list of columns in the SELECT clause:

SELECT LoginID, Project, Hours FROM Experiment;

So, After the SELECT command you listed the fields you want returned. You can place them in any order. We could write:

SELECT Project, LoginID,

And you can repeat field names if you'd like:

SELECT Project, LoginID, LoginID, Project

If you want to pull up all of the columns in a table, you can use the asterisk, or star, after SELECT. The asterisk means "all of the column names" — it is just a shortcut. So, if we run this query:

SELECT * FROM Experiment;

we see all of the columns from the Experiment table.

If the there are duplicate rows returned by your query, it is possible to remove the duplicates. For example, we can fetch only the Project column from the Experiment table:

SELECT Project FROM Experiment;

and if we just wanted to know which different Projects the scientists were working on , we put DISTINCT keyword right after the SELECT keyword,

SELECT DISTINCT Project FROM Experiment;

This lists all of the projects, but only once.

If select more than one column name:

SELECT DISTINCT Project, LoginID FROM Experiment;

then only the distinct pairs of projects and login IDs are returned.

Suppose that 10% of the time spent on each experiment was prep. work which needs to be accounted for separately.

In our SELECT statement we can add expressions that do computations on each row.

So, to calculate 10% of the time spent on each experiment, we can add an expression to the list of columns in the SELECT field (in this case we'll leave the * there to select all of the columns), but we'll add the expression Hours * .1

SELECT *, Hours * .1 FROM Experiment

When we run the query, the expression Hours * .1 is evaluated for each row and appended to the output.

When appending expressions in the SELECT clause, you can use any of the fields, all of the arthimetic operators can be used here, as well as a certain built-in functions.

For instance, We could round these values to the first decimal place by using the ROUND function:

SELECT *, ROUND(Hours * .1, 1) FROM Experiment

In this screencast we've introduced the very basics of interacting with and retrieving data from database.

We've seen that you can select columns from a table to retreive, use the DISTINCT keyword to only return unique rows, as well as append calculated columns to output.