Hello, and welcome to another episode of the Software Carpentry lecture on MATLAB programming. In this episode, we will examine visualization techniques using the functions plot and imshow.
Visualization has always played a big role in understanding our data and ideas. This image, from Isaac Newton, demonstrates the ideas that led to the integral. In order to clearly express our findings, scientists need to be able to create interesting and relevant images.
An advantage that plotting with MATLAB provides is that the plots are created directly from the program used to perform calculations. As we will see in this lecture, producing informative plots might require that we transform our data, which is easily done from within MATLAB.
MATLAB contains several plotting functions that are highly customizable. The simplest plot function is the function plot, which will create a line graph with one data set for each column of M.
In this example, we will use two data sets that contain historical financial data for the Dow Jones Industrial Average and the Standard and Poors 500.
Both files contain four columns. The first three form a date and the fourth value is the closing value of one of the two indices on that day.
First, we load our data into two variables dow and sp using the importdata function that we introduced in the lecture on file IO. Then, it is easy to plot the column of closing values for the Dow Jones.
The plot looks familiar to anyone familiar with finance, but the axes may not make sense.
If you are an economist, you know that there was a crash in 1987. The crash might be that small bump before the graph starts growing, but it is hard to say.
Since we only told MATLAB to plot a vector, the X axis is the index of the vector. If we pass two vectors to the plot function, it will reassign the X axis.
The first three columns in the matrix dow are the year, month, and day of the associated closing value. We need to transform these into a decimal year.
As a rough approximation, we divide the month value by twelve and the day value by thirty times twelve. This translates all of the dates into the years unit. This time when we call plot, we pass the value for the X axis first, and the value for Y second.
MATLAB chose the correct values for the x-axis. Generally, MATLAB is very smart about choosing good tic marks for axes in plots. If you want to change them, you can do so in the tool dropdown menu.
If you want to make further edits to the format of the plot, you can access either Insert and Tools dropdown menus.
Most options are available under the tools menu.
The first option, edit plot, provides access to all aspects to the plot's format.
An easier way to make quick changes is to right click on the aspect of the plot you want to edit. In this case, we right clicked on the line plot, which brings up a context menu that has controls for the line style, width, and markers. If you want to make more customized edits, you can do so with the property editor.
If you pass two inputs to the plot function, we already saw that the first one is treated as an X value and the second is treated as Y. In this example, we compare the two stock indices by placing them both in a matrix and plotting the matrix against time.
Each column of the matrix stocks is plotted as its own curve. The blue line is familiar, and the green line is the S and P 500. Unfortunately, this plot is not too informative because the two variables are on different scales. For instance, it is hard to tell which index gained more as a percentage of its starting value.
It is best to rescale the sets of indices in relation to their starting value. This will produce a plot of the rate of return of each index.
When the variables are rescaled, we see that the S and P 500 actually outperforms the Dow Jones, but that it is significantly more volatile. The important point to note is that proper data visualization might take several iterations before the full sense of the message can be found.
MATLAB offers several other plotting utilities including functions to make the standard pie and bar graphs. The hist function can be used to make a histogram. If your data requires it, MATLAB can make many kinds of 3 dimensional charts as well. Again, all of these plots are fully customizable.
Another way to visualize data is to treat a matrix as an image. In this example, we will examine a data set of public, geolocated Twitter messages near Toronto.
We start with the question of from where in Toronto people are most likely to send a geolocated tweet. To answer this question, I recorded all geo-located tweets for 2 months in downtown Toronto. Then I divided the city into a grid and counted the relative number of tweets in each cell of the grid.
The result is a data matrix, where each point in the matrix is the relative number of tweets near a grid center on the map.
Of course, like many other data sets, this one is best thought of as a matrix.
The simplest way to create an image in MATLAB is to call image and pass it a matrix. Unfortunately, this matrix isn't very amenable to imaging.
We see a few pixels that are not dark blue, and the rest is just the same color.
To understand what happened, we need to take a closer look at the image function. Image takes either an N by M matrix or an N by M by 3 array. If the input is 2 dimensional, then each element is treated as an intensity value. If it has 3 dimensions, then each pixel is described by three intensity values corresponding to three color channels. Image uses a colormap, which is an N by 3 matrix, to map intensities to colors.
For this simple colormap, values are taken from the range 0 to 64, split into four equal sets, and mapped to the corresponding location in the image. Since the data is not 3 dimensional, we only need to use the first column of colormap.
The intensity values between 0 and 64 are mapped evenly to the rows of the colormap, which produces the colored output. There are many colormaps in MATLAB which can be accessed by typing help colormap. In a moment, we will explore a few more of the standard colormaps.
An important point about image is that it expects all of the values to be between 0 and 64. If the data has a different range, then it is truncated to the range 0 to 64.
It is usually better to use imagesc, which rescales the values of the matrix to fit in the range 0 to 64, which ensures that the entire colormap is used. Unfortunately, we still do not see an image in this data.
The reason is that imagesc scales data linearly between the highest and lowest points. Our data follows an exponential distribution, which means that the largest values are a great deal larger than the average value.
Imaging the logarithm of our data, we see a much more interesting pattern. We can pick out major streets and public areas in this plot.
Depending on the data, it might be worth trying other colormaps. This colormap is grayscale, and lighter values correspond to higher intensities.
Colormap hot uses a black to yellow heatmap.
In conclusion, plots and images are powerful tools to expore data, but be sure to take full advantage of the pattern that is in the data. Sometimes, you may need to rescale or otherwise edit data to fully capture its meaning.