The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21
Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate data. To show a set of points in an n-dimensional space, a backdrop is drawn consisting of N parallel lines, typically vertical and equally spaced. These will represent the variables. An observation is then taken and the values for each variable are plotted on each off the parallel lines. These points are then joined to give a segmented line. We repeat for all the observations. GJJ () Visualization 2 / 21
Consider the dogs data modern jackel cwolf iwolf cuon dingo preh x 1 97 81 135 115 107 96 103 x 2 210 167 273 243 235 226 221 x 3 194 183 268 245 214 211 191 x 4 77 70 106 93 85 83 81 x 5 320 303 419 400 288 344 323 x 6 365 329 481 446 376 431 350 GJJ () Visualization 3 / 21
modern jackel cwolf iwolf cuon dingo pre x1 x2 x3 x4 x5 x6 GJJ () Visualization 4 / 21
Or using mondrian x2 x4 x6 x1 x3 x5 GJJ () Visualization 5 / 21
Like all good visualisations, parallel coordinates can also show both the forest and the tree. The big picture can be seen in the patterns of lines; individual lines can be highlighted to see detailed performance of specific data elements. GJJ () Visualization 6 / 21
What to watch out for when using parallel coordinates? With its power to visualise multi-dimensional data, why arent parallel coordinate chart more popular? Here are a few of the issues: Large data sets create a lot of visual clutter. More from S. Few: Most of us who have used parallel coordinates to explore and analyse multivariate data would agree that meaningful patterns can be obscured in a clutter of lines, especially with large data sets. The order of the axes impacts how the reader understands the data. Relationships between adjacent dimensions are easier to perceive than between non-adjacent dimensions. As the axes get closer to each other it becomes more difficult to perceive structure or clusters. Depending on the data, each axis can have a different scale, which is difficult to display and for the reader to absorb. Lines may be mistaken for trends or change in values even thought they are only used to show the connected relationship of points. GJJ () Visualization 7 / 21
Multidimensional Scaling From a non-technical point of view, the purpose of multidimensional scaling (MDS) is to provide a visual representation of the pattern of proximities (i.e., similarities or distances) among a set of objects. For example, given a matrix of perceived similarities between various brands of air fresheners, MDS plots the brands on a map such that those brands that are perceived to be very similar to each other are placed near each other on the map, and those brands that are perceived to be very different from each other are placed far away from each other on the map. GJJ () Visualization 8 / 21
Athens Barcelona Brussels Calais Cherbourg Cologne Copenhagen Athens 0 3313 2963 3175 3339 2762 3276 Barcelona 3313 0 1318 1326 1294 1498 2218 Brussels 2963 1318 0 204 583 206 966 Calais 3175 1326 204 0 460 409 1136 Cherbourg 3339 1294 583 460 0 785 1545 Cologne 2762 1498 206 409 785 0 760 Copenhagen 3276 2218 966 1136 1545 760 0 GJJ () Visualization 9 / 21
we can draw a map based on these cmdscale(eurodist) Copenhagen Brussels Calais Cherbourg Cologne Athens Barcelona GJJ () Visualization 10 / 21
For the dogs data using Mondrian. -0.69 1.11-3.1 4.0 GJJ () Visualization 11 / 21
At a more technical level The data to be analysed is a collection of objects (colours, faces, stocks,...) on which a distance function is defined, d ij is the distance between i th and j th objects. These distances are the entries of the dissimilarity matrix D = d 11 d 12... d 1N d 21 d 22... d 2N d 31 d 32... d 3N............ d N1 d N2... d NN The goal of MDS is, given D, to find vectors x i, x j,such that x i x j d ij for all i, j for some vector norm indicated by x i x j. GJJ () Visualization 12 / 21
In classical MDS, this norm is the Euclidean distance, but, in a broader sense, it may be a metric or arbitrary distance function. In other words, MDS attempts to find an embedding from the objects into R N such that distances are preserved. If the dimension N is chosen to be 2 or 3, we may plot the vectors x i to obtain a visualisation of the similarities between the objects. Note the vectors are not unique: With the Euclidean distance, they may be arbitrarily translated and rotated, since these transformations do not change the pairwise distances. GJJ () Visualization 13 / 21
For the Whiskey data -5.2 4.8-8.6 5.6 GJJ () Visualization 14 / 21
ggobi I want to take this discussion of multidimensional data a little further and to do so I am going to introduce yet another piece of open source software, ggobi. ggobi is an open source ( free!) visualization program for exploring high-dimensional data. It provides highly dynamic and interactive graphics such as tours, as well as familiar graphics such as the scatterplot, barchart and parallel coordinates plots. Plots are interactive and linked with brushing and identification. GJJ () Visualization 15 / 21
Our main interest in ggobi is 2-D displays of projections of points and edges in high-dimensional spaces. However scatterplot matrices, parallel coordinate, time series plots and bar charts are provided by the software. Projection tools include average shifted histograms of single variables, plots of pairs of variables, and grand tours of multiple variables. Points can be labelled and brushed with glyphs and colors. Note that several displays can be open simultaneously and linked for labelling and brushing. There is also provision for missing data and patterns of missing data can be examined. GJJ () Visualization 16 / 21
ggobi is fully documented in the ggobi book: Cook, D. and D.F. Swayne (2007). Interactive and Dynamic Graphics for Data Analysis. Springer. see http://www.ggobi.org for the download and documentation. GJJ () Visualization 17 / 21
Looking at data through various graphs can reveal more information about the distribution than just looking at the numbers or a summary of them. Using the different tools within GGobi, clusters, non-linear distributions, outliers, and other important variations in the data can be discovered. It is a program which allows exploratory data analysis to occur for multi-dimensional data. GJJ () Visualization 18 / 21
Types of graphics 1D: Average shifted histogram, textured dot plot, barchart, spineplot 2D: Scatterplot High-D: Scatterplot matrix Parallel coordinates Grand tour, projection pursuit guided tour, manual tour Time series plot These tools can be used to pick out special points or clusters of data. GJJ () Visualization 19 / 21
Brushing As the brush moves over a point, the point will be highlighted. If persistent is selected, the points the brush has moved over will remain painted. Identify As the cursor moves over a point, a label, or variable value will appear at the top of the graphic screen. GJJ () Visualization 20 / 21
Linking Multiple plots are linked so identifying one point in one plot will identify the same point on all other graphs, and brushing a group of points in one plot will highlight the same points in other plots. The linking can be one-to-one, or according to the values of a categorical variable in the data set. Points in a plot can be moved interactively, eg to gauge results from multidimensional scaling. Add/remove points or edges. GJJ () Visualization 21 / 21