Graphical Representation of Multivariate Data



Similar documents
Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Data Exploration Data Visualization

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Getting Started with R and RStudio 1

The Value of Visualization 2

Advanced Statistical Methods in Insurance

Information Visualization Multivariate Data Visualization Krešimir Matković

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

EVERY DAY COUNTS CALENDAR MATH 2005 correlated to

x 2 + y 2 = 1 y 1 = x 2 + 2x y = x 2 + 2x + 1

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Section 1.1. Introduction to R n

11.1. Objectives. Component Form of a Vector. Component Form of a Vector. Component Form of a Vector. Vectors and the Geometry of Space

High Dimensional Data Visualization

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz

Imaging Systems Laboratory II. Laboratory 4: Basic Lens Design in OSLO April 2 & 4, 2002

2D, 3D and High-Dimensional Data and Information Visualization

Diagrams and Graphs of Statistical Data

Fourth Grade Math Standards and "I Can Statements"

Self-Portrait Steps Images taken from Andrew Loomis Drawing Head & Hands (there are many sites to download this out of print book for free)

Reflection and Refraction

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Everyday Mathematics GOALS

Everyday Mathematics CCSS EDITION CCSS EDITION. Content Strand: Number and Numeration

Pre-Algebra Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

ISAT Mathematics Performance Definitions Grade 4

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

ME 111: Engineering Drawing

Everyday Mathematics. Grade 4 Grade-Level Goals. 3rd Edition. Content Strand: Number and Numeration. Program Goal Content Thread Grade-Level Goals

Angle - a figure formed by two rays or two line segments with a common endpoint called the vertex of the angle; angles are measured in degrees

Visualization of Multivariate Data. Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University

A Comparative Study of Visualization Techniques for Data Mining

Figure 1.1 Vector A and Vector F

Scope and Sequence KA KB 1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A 6B

Such As Statements, Kindergarten Grade 8

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

Volumes of Revolution

Data Mining: Algorithms and Applications Matrix Math Review

How To Understand General Relativity

COMMON CORE STATE STANDARDS FOR MATHEMATICS 3-5 DOMAIN PROGRESSIONS

MATHS LEVEL DESCRIPTORS

Structural Axial, Shear and Bending Moments

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Mathematics Scope and Sequence, K-8

Everyday Mathematics. Grade 4 Grade-Level Goals CCSS EDITION. Content Strand: Number and Numeration. Program Goal Content Thread Grade-Level Goal

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

Scatter Plots with Error Bars

Charlesworth School Year Group Maths Targets

BPS Math Year at a Glance (Adapted from A Story Of Units Curriculum Maps in Mathematics K-5) 1

Prentice Hall Mathematics Courses 1-3 Common Core Edition 2013

Number Sense and Operations

New York State Student Learning Objective: Regents Geometry

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Voyager Sopris Learning Vmath, Levels C-I, correlated to the South Carolina College- and Career-Ready Standards for Mathematics, Grades 2-8

Classification of Fingerprints. Sarat C. Dass Department of Statistics & Probability

Illinois State Standards Alignments Grades Three through Eleven

SURFACE AREA AND VOLUME

Table of Contents Find the story within your data

High-dimensional labeled data analysis with Gabriel graphs

Grade 3 Core Standard III Assessment

Mathematics (Project Maths)

Data Visualization - A Very Rough Guide

GeoGebra. 10 lessons. Gerrit Stols

EDEXCEL FUNCTIONAL SKILLS PILOT TEACHER S NOTES. Maths Level 2. Chapter 5. Shape and space

Freehand Sketching. Sections

Geometry Enduring Understandings Students will understand 1. that all circles are similar.

Primary Curriculum 2014

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Prof. Nicolai Meinshausen Regression FS R Exercises

Math 5th grade. Create your own number and explain how to use expanded form to show place value to the ten millions place.

Exploratory Data Analysis

Visualizing class probability estimators

Grade 3 FCAT 2.0 Mathematics Sample Answers

Interaction between quantitative predictors

Shape Dictionary YR to Y6

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

1.2 GRAPHS OF EQUATIONS. Copyright Cengage Learning. All rights reserved.

Principal components analysis

numerical place value additional topics rounding off numbers power of numbers negative numbers addition with materials fundamentals

ANALYTICAL METHODS FOR ENGINEERS

Figure 2.1: Center of mass of four points.

Lecture 9: Introduction to Pattern Analysis

Viewing Ecological data using R graphics

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

The VisuLab : an Instrument for Interactive, Comparative Visualization

SOLVING TRIGONOMETRIC INEQUALITIES (CONCEPT, METHODS, AND STEPS) By Nghi H. Nguyen

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

How do you compare numbers? On a number line, larger numbers are to the right and smaller numbers are to the left.

Clustering & Visualization

Some Comments on the Derivative of a Vector with applications to angular momentum and curvature. E. L. Lady (October 18, 2000)

Geometric Optics Converging Lenses and Mirrors Physics Lab IV

Curriculum Map by Block Geometry Mapping for Math Block Testing August 20 to August 24 Review concepts from previous grades.

2006 Geometry Form A Page 1

Geometry Notes PERIMETER AND AREA

Transcription:

Graphical Representation of Multivariate Data One difficulty with multivariate data is their visualization, in particular when p > 3. At the very least, we can construct pairwise scatter plots of variables. Data from exercise 1.1 (transpose of Figure 1.1) 90

Graphs and Visualization (cont d) Graphs convey information about associations between variables and also about unusual observations. Example 1.3: correlation between number of employees and productivity at 16 publishing firms is -0.39 for all firms and -0.56 if Dun & Bradstreet is not included. 91

Multiple Scatter Plots Example 1.5: Paper quality data: X 1 =density(g/cc), X 2 =strength(pounds) in machine direction, X 3 =strength (pounds) in the cross direction 0.75 0.80 0.85 0.90 0.95 105 115 125 135 0.75 0.80 0.85 0.90 0.95 105 115 125 135 105 115 125 135 0.75 0.80 0.85 0.90 0.95 50 55 60 65 70 75 80 50 55 60 65 70 75 80 50 55 60 65 70 75 80 displayed in R via the pairs() function 92

Three-dimensional Plots (Example 1.6: Lizard mass (g), snout-vent length (mm), hind limb span (mm)) 93

Three-dim Plot: after standardizing 94

Four-dimensional Plots? fourth dimension is plotting character (denotes gender) 95

Other Graphical Methods Specialized software to visualize data in high dimensions is now available. One example is GGobi, downloadable free of charge from the ISU Statistics Department web site. Typically, these programs provide dynamic two-dimensional projections of high dimensional data that reveal Associations among variables Groupings of the units in the sample Other data attributes. Graphical methods to display data are very context-dependent 96

Other Graphical Methods Growth curves: appropriate when multivariate observations consist of repeated measurements on a set of units. Ex. 1.10: weight and length of female bears over three years. 97

Even More Graphical Methods Stars: Suitable for non-negative measurements (standardize the data for each variable and let the center of the circle correspond to the minimum values). Circle of fixed radius with p equally spaced rays (one ray for each variable). Value of each variable is represented by the length of a ray (distance from minimum value for all sample units). Construct one circle for each unit in the sample Group units with similar patterns 98

Example 1.11: 22 Public Utility Companies X 1 : Fixed-charge coverage X 2 : Rate of return on capital X 3 : Cost per kw capacity X 4 : Annual load factor X 5 : Peak kw hours demand X 6 : Annual sales (kw hours) X 7 : Percent nuclear X 8 : Fuel costs (cents per kwh) The variables are standardized Each variable receives equal weight in the visual impression Focus on variables X 6 and X 8 in the next display 99

100

Even More Graphical Methods Chernoff faces: Same idea as stars, but now each facial characteristic (nose, mouth, etc.) represents a variable. Some facial features can have greater impact than others on how the faces are compared. Group units with similar faces 101

Example 1.12: 22 Public Utility Companies X 1 : Fixed-charge coverage Half-height of face X 2 : Rate of return on capital Face width X 3 : Cost per kw capacity Position of mouth X 4 : Annual load factor Slant of eyes X 5 : Peak kw hours demand Eccentricity (ht/w) of eyes X 6 : Sales (kw hours per year) Half length of eye X 7 : Percent nuclear Curvature of mouth X 8 : Fuel costs (cents per kwh) Length of nose 102

103

Additional Techniques for Visualizing High-Dimensional Datasets Data in more than two dimensions are quite problematic to represent However, there are quite a few suggestions in the area of information visualization in computer science Here, we study some static display options beyond the ones studied so far 104

Survey Plots A simple technique of extending a point in a line graph (like a bar graph) down to an axis has been used in many systems. A simple variation of this extends a line from a center point, where the line length corresponds to the dimensional value. This particular visualization of n-dimensional data allows one to see correlations between any two variables especially when the data are sorted according to a particular dimension. Using color for different classifications may help (after sorting perhaps) determine best coordinates for classifying data. 105

Survey Plot of Iris Data Survey Plot for Sepal.Length Sepal.Width Petal.Length Features Petal.Width 106

Parallel Coordinates Plots Parallel coordinates plots represent multidimensional data using lines. A vertical line represents each dimension or attribute. The maximum and minimum values of that dimension are usually scaled to the upper and lower points on these vertical lines. n 1 lines connected to each vertical line at the appropriate dimensional value represent an n-dimensional point. 107

Parallel Coordinates Plots Iris Data Parallel Coordinate Plot for Sepal.Length Sepal.Width Original Attribute Order Petal.Length Petal.Width 108

Andrews curves Andrews curves plot each n-dimensional point x = (x 1, x 2,..., x p ) as a curved line using the function f(t) = x 1 + x 2 sin t + x 3 cos t + x 4 sin 2t + x 5 cos 2t +.... The function is usually plotted in the interval π < t < π. This is similar to a Fourier transform of a data point. Advantage: can represent many dimensions. Disadvantage: computational time to display each n-dimensional point for large datasets. Also, order matters. 109

Andrews Curves Iris Data 2 1 0 1 2 3 2 1 0 1 2 3 110

Radviz or Radial Visualization Plots The idea behind radial visualization plots or Radviz is that spring constants can be used to represent relational values between points. Like parallel coordinates this is also a lossless visualization method Here p-dimensional data points are laid out as points equally spaced around the perimeter of a circle. 111

The ends of each of p springs are attached to these p perimeter points. The other ends of the springs are attached to a data point. The spring constant Ki equals the values of the ith coordinate of the fixed point. Each data point is then displayed where the sum of the spring forces equals 0. All the data point values are usually normalized to have values between 0 and 1. For example if all p coordinates have the same value, the data point will lie exactly in the center of the circle.

If the point is a unit vector, then that point will lie exactly at the fixed point on the edge of the circle (where the spring for that dimension is fixed). Many points can map to the same position. This represents a non-linear transformation of the data, which preserves certain symmetries and which produces an intuitive display. Some features of this visualization are: Points with approximately equal coordinate values will lie close to the center Points with similar values whose dimensions are opposite each other on the circle will lie near the center

Points which have one or two coordinate values greater than the others lie closer to those dimensions A p-dimensional line will map to a line A sphere will map to an ellipse A p-dimensional plane maps to a bounded polygon Can not really handle severely high-dimensional datasets

Radial Visualization Plot Iris Data Sepal.Length Sepal.Width Petal.Length Petal.Width setosa versicolor virginica 2D Radviz for 112