The Value of Visualization 2

Similar documents
Diagrams and Graphs of Statistical Data

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Graphical Representation of Multivariate Data

Exploratory Data Analysis with MATLAB

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Visualization of missing values using the R-package VIM

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Visualization Techniques in Data Mining

HDDVis: An Interactive Tool for High Dimensional Data Visualization

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

GeoGebra. 10 lessons. Gerrit Stols

Microsoft Business Intelligence Visualization Comparisons by Tool

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

Data Visualization - A Very Rough Guide

Information Visualization Multivariate Data Visualization Krešimir Matković

RnavGraph: A visualization tool for navigating through high-dimensional data

Scatterplot Layout for High-dimensional Data Visualization

Information visualization examples

Cours de Visualisation d'information InfoVis Lecture. Multivariate Data Sets

3D Interactive Information Visualization: Guidelines from experience and analysis of applications

Chapter 3 - Multidimensional Information Visualization II

Common Core Unit Summary Grades 6 to 8

Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data.

Visualization of large data sets using MDS combined with LVQ.

Interactive Data Visualization using Mondrian

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Data Exploration Data Visualization

GGobi : Interactive and dynamic

INTERACTIVE DATA EXPLORATION USING MDS MAPPING

Visualization Quick Guide

Clustering & Visualization

Summarizing and Displaying Categorical Data

A STATISTICS COURSE FOR ELEMENTARY AND MIDDLE SCHOOL TEACHERS. Gary Kader and Mike Perry Appalachian State University USA

Exploratory data analysis (Chapter 2) Fall 2011

Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki

Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension. reordering.

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz

Data Visualization Techniques

Section 1.1. Introduction to R n

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Interactive Data Visualization with Multidimensional Scaling

Virtual Landmarks for the Internet

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

The VisuLab : an Instrument for Interactive, Comparative Visualization

an introduction to VISUALIZING DATA by joel laumans

Statistics Chapter 2

Exercise 1: How to Record and Present Your Data Graphically Using Excel Dr. Chris Paradise, edited by Steven J. Price

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

The Comparisons. Grade Levels Comparisons. Focal PSSM K-8. Points PSSM CCSS 9-12 PSSM CCSS. Color Coding Legend. Not Identified in the Grade Band

How To Make Visual Analytics With Big Data Visual

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams

2D, 3D and High-Dimensional Data and Information Visualization

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Basic Understandings

Visualization methods for patent data

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Pennsylvania System of School Assessment

Class One: Degree Sequences

Choosing a successful structure for your visualization

Data exploration with Microsoft Excel: analysing more than one variable

Manifold Learning Examples PCA, LLE and ISOMAP

On History of Information Visualization

Exploratory Data Analysis GUIs for MATLAB (V1) A User s Guide

Data Mining and Visualization

MetroBoston DataCommon Training

Solving Simultaneous Equations and Matrices

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data

Graphics - an Ace up a Statistician's Sleeve

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Data Visualization Basics for Students

Efficient Information Visualization of Multivariate and Time-Varying Data

CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin

VISUALIZATION OF GEOSPATIAL METADATA FOR SELECTING GEOGRAPHIC DATASETS

Data Visualization Techniques

Interactive Data Mining and Visualization

The Scientific Data Mining Process

The electrical field produces a force that acts

Topic Maps Visualization

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Transcription:

The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21

Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate data. To show a set of points in an n-dimensional space, a backdrop is drawn consisting of N parallel lines, typically vertical and equally spaced. These will represent the variables. An observation is then taken and the values for each variable are plotted on each off the parallel lines. These points are then joined to give a segmented line. We repeat for all the observations. GJJ () Visualization 2 / 21

Consider the dogs data modern jackel cwolf iwolf cuon dingo preh x 1 97 81 135 115 107 96 103 x 2 210 167 273 243 235 226 221 x 3 194 183 268 245 214 211 191 x 4 77 70 106 93 85 83 81 x 5 320 303 419 400 288 344 323 x 6 365 329 481 446 376 431 350 GJJ () Visualization 3 / 21

modern jackel cwolf iwolf cuon dingo pre x1 x2 x3 x4 x5 x6 GJJ () Visualization 4 / 21

Or using mondrian x2 x4 x6 x1 x3 x5 GJJ () Visualization 5 / 21

Like all good visualisations, parallel coordinates can also show both the forest and the tree. The big picture can be seen in the patterns of lines; individual lines can be highlighted to see detailed performance of specific data elements. GJJ () Visualization 6 / 21

What to watch out for when using parallel coordinates? With its power to visualise multi-dimensional data, why arent parallel coordinate chart more popular? Here are a few of the issues: Large data sets create a lot of visual clutter. More from S. Few: Most of us who have used parallel coordinates to explore and analyse multivariate data would agree that meaningful patterns can be obscured in a clutter of lines, especially with large data sets. The order of the axes impacts how the reader understands the data. Relationships between adjacent dimensions are easier to perceive than between non-adjacent dimensions. As the axes get closer to each other it becomes more difficult to perceive structure or clusters. Depending on the data, each axis can have a different scale, which is difficult to display and for the reader to absorb. Lines may be mistaken for trends or change in values even thought they are only used to show the connected relationship of points. GJJ () Visualization 7 / 21

Multidimensional Scaling From a non-technical point of view, the purpose of multidimensional scaling (MDS) is to provide a visual representation of the pattern of proximities (i.e., similarities or distances) among a set of objects. For example, given a matrix of perceived similarities between various brands of air fresheners, MDS plots the brands on a map such that those brands that are perceived to be very similar to each other are placed near each other on the map, and those brands that are perceived to be very different from each other are placed far away from each other on the map. GJJ () Visualization 8 / 21

Athens Barcelona Brussels Calais Cherbourg Cologne Copenhagen Athens 0 3313 2963 3175 3339 2762 3276 Barcelona 3313 0 1318 1326 1294 1498 2218 Brussels 2963 1318 0 204 583 206 966 Calais 3175 1326 204 0 460 409 1136 Cherbourg 3339 1294 583 460 0 785 1545 Cologne 2762 1498 206 409 785 0 760 Copenhagen 3276 2218 966 1136 1545 760 0 GJJ () Visualization 9 / 21

we can draw a map based on these cmdscale(eurodist) Copenhagen Brussels Calais Cherbourg Cologne Athens Barcelona GJJ () Visualization 10 / 21

For the dogs data using Mondrian. -0.69 1.11-3.1 4.0 GJJ () Visualization 11 / 21

At a more technical level The data to be analysed is a collection of objects (colours, faces, stocks,...) on which a distance function is defined, d ij is the distance between i th and j th objects. These distances are the entries of the dissimilarity matrix D = d 11 d 12... d 1N d 21 d 22... d 2N d 31 d 32... d 3N............ d N1 d N2... d NN The goal of MDS is, given D, to find vectors x i, x j,such that x i x j d ij for all i, j for some vector norm indicated by x i x j. GJJ () Visualization 12 / 21

In classical MDS, this norm is the Euclidean distance, but, in a broader sense, it may be a metric or arbitrary distance function. In other words, MDS attempts to find an embedding from the objects into R N such that distances are preserved. If the dimension N is chosen to be 2 or 3, we may plot the vectors x i to obtain a visualisation of the similarities between the objects. Note the vectors are not unique: With the Euclidean distance, they may be arbitrarily translated and rotated, since these transformations do not change the pairwise distances. GJJ () Visualization 13 / 21

For the Whiskey data -5.2 4.8-8.6 5.6 GJJ () Visualization 14 / 21

ggobi I want to take this discussion of multidimensional data a little further and to do so I am going to introduce yet another piece of open source software, ggobi. ggobi is an open source ( free!) visualization program for exploring high-dimensional data. It provides highly dynamic and interactive graphics such as tours, as well as familiar graphics such as the scatterplot, barchart and parallel coordinates plots. Plots are interactive and linked with brushing and identification. GJJ () Visualization 15 / 21

Our main interest in ggobi is 2-D displays of projections of points and edges in high-dimensional spaces. However scatterplot matrices, parallel coordinate, time series plots and bar charts are provided by the software. Projection tools include average shifted histograms of single variables, plots of pairs of variables, and grand tours of multiple variables. Points can be labelled and brushed with glyphs and colors. Note that several displays can be open simultaneously and linked for labelling and brushing. There is also provision for missing data and patterns of missing data can be examined. GJJ () Visualization 16 / 21

ggobi is fully documented in the ggobi book: Cook, D. and D.F. Swayne (2007). Interactive and Dynamic Graphics for Data Analysis. Springer. see http://www.ggobi.org for the download and documentation. GJJ () Visualization 17 / 21

Looking at data through various graphs can reveal more information about the distribution than just looking at the numbers or a summary of them. Using the different tools within GGobi, clusters, non-linear distributions, outliers, and other important variations in the data can be discovered. It is a program which allows exploratory data analysis to occur for multi-dimensional data. GJJ () Visualization 18 / 21

Types of graphics 1D: Average shifted histogram, textured dot plot, barchart, spineplot 2D: Scatterplot High-D: Scatterplot matrix Parallel coordinates Grand tour, projection pursuit guided tour, manual tour Time series plot These tools can be used to pick out special points or clusters of data. GJJ () Visualization 19 / 21

Brushing As the brush moves over a point, the point will be highlighted. If persistent is selected, the points the brush has moved over will remain painted. Identify As the cursor moves over a point, a label, or variable value will appear at the top of the graphic screen. GJJ () Visualization 20 / 21

Linking Multiple plots are linked so identifying one point in one plot will identify the same point on all other graphs, and brushing a group of points in one plot will highlight the same points in other plots. The linking can be one-to-one, or according to the values of a categorical variable in the data set. Points in a plot can be moved interactively, eg to gauge results from multidimensional scaling. Add/remove points or edges. GJJ () Visualization 21 / 21