Chapter 3 - Multidimensional Information Visualization II

Similar documents

Chapter 3 - Multidimensional Information Visualization II

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Exercise 1.12 (Pg )

Introduction to Exploratory Data Analysis

Information Visualization Multivariate Data Visualization Krešimir Matković

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

Data Exploration Data Visualization

Visualization Techniques in Data Mining

Geostatistics Exploratory Analysis

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Diagrams and Graphs of Statistical Data

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

3: Summary Statistics

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Exploratory data analysis (Chapter 2) Fall 2011

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

Data Visualization Handbook

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Northumberland Knowledge

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

3D Interactive Information Visualization: Guidelines from experience and analysis of applications

Cours de Visualisation d'information InfoVis Lecture. Multivariate Data Sets

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Visual quality metrics and human perception: an initial study on 2D projections of large multidimensional data.

Data exploration with Microsoft Excel: analysing more than one variable

COMP Visualization. Lecture 11 Interacting with Visualizations

Using SPSS, Chapter 2: Descriptive Statistics

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

Data Analysis Tools. Tools for Summarizing Data

Visualization methods for patent data

The Visual Statistics System. ViSta

The Value of Visualization 2

Exploratory Data Analysis for Ecological Modelling and Decision Support

Visualization Software

Intro to GIS Winter Data Visualization Part I

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

Azure Machine Learning, SQL Data Mining and R

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Name: Date: Use the following to answer questions 2-3:

2013 MBA Jump Start Program. Statistics Module Part 3

Clustering & Visualization

Descriptive Statistics

An introduction to using Microsoft Excel for quantitative data analysis

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Scatter Plots with Error Bars

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Descriptive Statistics

January 26, 2009 The Faculty Center for Teaching and Learning

Multivariate Analysis of Ecological Data

Big Data: Rethinking Text Visualization

Module 3: Correlation and Covariance

List of Examples. Examples 319

Data Visualization Techniques

MTH 140 Statistics Videos

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Additional sources Compilation of sources:

Dimensionality Reduction: Principal Components Analysis

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 2. Summarizing the Sample

Simple Predictive Analytics Curtis Seare

Using Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions

AP STATISTICS REVIEW (YMS Chapters 1-8)

MEASURES OF VARIATION

DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Data Visualization Techniques

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Gephi Tutorial Quick Start

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Exploratory Data Analysis

Exploratory Data Analysis

Introduction to Regression and Data Analysis

How To Write A Data Analysis

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Common Core Unit Summary Grades 6 to 8

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Data Analysis for Yield Improvement using TIBCO s Spotfire Data Analysis Software

Foundation of Quantitative Data Analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Transcription:

Chapter 3 - Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Florian Alt, WS 2013/14 Konzept und Folien (4th revised edition): Thorsten Büring, Andreas Butz, Michael Burch 1

Outline Reference model and data terminology Visualizing data with < 4 variables Visualizing multivariable data Geometric transformation Glyphs Pixel-based Downscaling of dimensions Case studies: support for exploring multidimensional data Rank-by-feature Dust & magnet Clutter reduction techniques 2

Exploring Multivariate Data 3

Rank-By-Feature Seo & Shneiderman 2004 Part of the Hierarchical Clustering Explorer (HCE) (http://www.cs.umd.edu/hcil/multi-cluster/) Tabs: histogram and scatterplot ordering Implements systematic approach for data exploration (1) study 1D, study 2D, then find features (2) ranking guides insight, statistics confirm Tool provides low-dimensional projections as a histogram (1D) or scatterplot (2D) Users can select a feature detection criterion (e.g. test for normal distribution (1D), correlation coefficient (2D)) to rank projections The ranking facility is particularly helpful when the number of possible projections is too large to investigate: concentrate on the interesting ones 4

Rank-By-Feature Users start with 1D projections (histogram ordering) Four coordinated views A: selection of ranking criterion B: overview of scores for all dimensions (color coding: the brighter the color, the higher the score) C: numerical / statistical detail for each dimension (e.g. score, mean, standard deviation) D: display of histogram + boxplot (minimum, first quartile, median, third quartile, maximum) 5

Rank-By-Feature Some basic statistical terms Mean: Sum of all values divided by the number of values Median: Middle value of a distribution of values when ranked in order of magnitude Mode: Single most common value Variance: average squared deviation between the mean and the values Standard deviation: square root of the variance (translates the variance into the original units of measurement) Statistical tests supported by HCE for 1D ranking Normality of the distribution: distribution of items forms a symmetric, bell-shaped curve Uniformity of the distribution: all of the values of a random variable occur with equal probability (results in a flat histogram) Number of potential outliers Number of unique values 6

Mode The most frequent score Describes how most items behave Pros: Easy to calculate and understand Can be used with nominal data Cons: There can be more than one mode Mode can change dramatically by adding only one dataset Independent of all other data in the set y c n e u q r e F 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 Score mode Negatively Skewed Distribution 7

Median Middle score of the distribution Example data: 1 7 3 9 6 9 2 Sorted by magnitude: 9 9 7 6 3 2 1 median = 6 If #scores even average two middle scores Example data: 1 7 3 9 4 6 9 2 Sorted by magnitude: 9 9 7 6 4 3 2 1 median = 5 Pros: Relatively unaffected by outliers (very low or high scores) and skewed distributions Can be used with ordinal, interval and ratio data Cons: Does not consider all scores of the data set Not very stable 8

Mean Sum of all scores divided by #scores: Most often used if average is mentioned Pros: Considers every score most accurate summary of the data Resistant to sampling variation: removing one sample changes the mean far less than mode or median Cons: 1 m = n Heavily affected by extreme scores and skewed distributions Can only be used with interval and ratio data n i= 1 X i 9

Standard Deviation and Variance How do you measure the accuracy of the mean? Example data set 1: 5 5 5 5 5 mean = 5 Example data set 2: 6 8 4 1 6 mean = 5 Which of the data sets is better reflected by the mean? If x1, x2, xn are the data in a sample with mean m Deviation = difference between mean and scores = (xi - m) 2 2 (xi m) ) Variance s = n Standard deviation (SD) s = Var(X) Both variance and standard deviations measure the accuracy of the data set variability of the data 10

Quantile, Quartile, and Percentile Quantile Cut points' that divide a sample of data into groups containing (as far as possible) equal numbers of observations. Quartile (Quantile of 4) Values that divide a sample of data into 4 groups containing (as far as possible) equal numbers of observations Percentile (Quantile of 100) Values that divide a sample of data into 100 groups containing (as far as possible) equal numbers of observations lower quartile median upper quartile 11

Rank-By-Feature Move on to 2D projections (scatterplot ordering) Identify pairwise relationships between dimensions B: prism provides overview of scores for dimension pairs; score is color coded D: scatterplot browser; multiple browsers are possible; Ranking criteria Correlation coefficient: direction and strength of linear relationship Least square error for simple linear / curvilinear regression: how well does the regression model fit Number of items in a user-defined region of interest & uniformity of scatterplot 12

Dust & Magnet Yi et al. 2005 Data cases are represented as particles of iron dust Magnets represent the dimensions of the data set Users manipulate the magnets to move the dust Dust moves at different speed depending on its data values for the magnet dimensions 13

Clutter Reduction Techniques 15

Clutter Reduction Techniques Clutter: crowded and disordered visual entities that obscure the structure in visual displays Avoiding clutter is one of the main challenges in Information Visualization Techniques to reduce clutter Dimension reordering Sampling Constant-density visualization Point displacement Aggregation / Clustering (one mark represents more than one case, e.g. group day sales into months) Filtering (Dynamic queries, zooming) 16

Dimension Reordering Peng et al. 2004 Automatically identify the views with the least amount of visual clutter Clutter definition and algorithms for different visualization techniques 17

Dimension Reordering 18

Sampling Reduce the density of visual representation by displaying a random subset of the data Random sampling preserves the distribution of data Overall trends (e.g. correlation) can still be detected at a reduced density Uniform sampling (Ellis & Dix 2004) Applying the same (manually or automatically defined) sampling factor to the entire data space Problem: areas with low density may become empty Non-uniform sampling (Bertini & Santucci 2004) Preserving relative density Model to compute where, how, and how much to sample to preserve image characteristics 19

Bertini & Santucci 2004 20

Sampling Lens Ellis et al. 2005 Magic Lens approach to apply random sampling to user-defined regions while maintaining the original data density in the context 21

Constant Information Density Woodruff et al. 1998 Data objects in areas with high information density are represented by small-sized low-detail glyphs Objects in sparse-density areas are represented by larger, more detailed glyphs The number of information objects is kept constant as the user scales and moves the view 22

Constant Information Density 23

Point Displacement Items that would overlap other items are moved to adjacent free positions Position of the items and their distance should be preserved as much as possible Three algorithms for displacing pixels (e.g. adding abstract data to geographical maps) (Keim & Hermann 1998) Nearest-Neighbor Curve-based Gridfit Algorithms share the same procedure All data points which have a unique position are placed on the display Keim 2000: data with and without overlap New positions are determined for the remaining points 24

Point Displacement Nearest-Neighbor algorithm For all points which have not been set in the first step, place them on the nearest unoccupied position Fast to compute, but limited effectiveness for very dense displays (pixels may be placed very far from their original positions) Curve-based algorithm For all points which have not been set in the first step Compute the nearest unoccupied position on a given screen-filling curve Shift all points between the occupied pixel and the unoccupied position along the screen-filling curve Place the point on the newly available unoccupied pixel 25

Chapter 4 - Interaction with Visualizations Dynamic linking, brushing and filtering in Information Visualization displays Vorlesung Informationsvisualisierung Prof. Dr. Florian Alt, WS 2013/14 Konzept und Folien (4th revised edition): Thorsten Büring, Andreas Butz, Michael Burch 1

Outline Definitions Direct Manipulation (DM) Dynamic Queries Movable Filters Common Interaction Techniques Select Explore Reconfigure Encode Abstract / Elaborate Filter Connect 2

Definitions 3

InfoVis & Interaction Information Visualization research originally: focus on finding novel visual representations Increasing interest in interaction design, HCI models and evaluation as well as aesthetics in the InfoVis community HCI Interaction models help us to better understand the complex concepts of human-machine communication Norman s execution-evaluation cycle (Norman 1988) 1. Establishing the goal 2. Forming the intention 3. Specifying the action sequence 4. Executing the interaction 5. Perceiving the system state 6. Interpreting the system state 7. Evaluating the system state with respect to the goals and intentions 4

InfoVis & Interaction InfoVis systems are composed of two major components: Representation Roots lie in the field of Computer Graphics Mapping from data to geometrical objects and How these objects are rendered Interaction Roots lie in the field of Human-Computer Interaction (HCI) Dialog between user and the system User explores the data from different perspectives to find insights 5

Representation and Interaction Although discussed as two separate components, representation and interaction clearly are not mutually exclusive. For example: Interaction with a system may activate a change in representation. Representation component has received vast majority of attention in InfoVis research. Scan of recent conference proceedings and journal issues in InfoVis uncovers many articles about new representations of datasets but interaction is often relegated to a secondary role. 6

Representation and Interaction Interaction rarely is the main focus of research in InfoVis. Making it the little brother of InfoVis Interaction is overshadowed by the more noteworthy representation aspects. Interaction is an essential part of InfoVis Without it, an InfoVis technique or system becomes a static image or autonomously animated images. Static images clearly have analytic and expressive value But usefulness becomes more limited as the data set they represent grows larger with more variables 7

Representation and Interaction By the way: Also with a static image such as a poster, a user (or reader) will often perform interactions Rotating the poster Looking closer/further Jotting down notes on the poster Spence suggests the notion of passive interaction through which the user s mental model on the data set is changed and enhanced. Finally, through interaction, some limits of a representation can be overcome, and the cognition of a user can be further amplified. Importance of interaction and the need for its further study seem undisputed. 8

Definition Interaction: Before discussing the term interaction, we will find a solid definition for it. But: Finding a solid definition for it is challenging! In the broader context of Human-Computer Interaction (HCI) it is simply described as the communication between user and the system. [Dix et al., 2004] Becker, Cleveland, and Wilks compactly define interaction as direct manipulation and instantianeous change. [Becker et al., 1987] 9

Definition asymmetry in data rates. [Ware, 2000] The amount of data flowing from InfoVis systems to users is far greater than from users to systems. Thus, interaction techniques in InfoVis seem more designed for changing and adjusting visual representation than for entering data into systems, which clearly is an important aspect of interaction in HCI. 10

Definition We view interaction techniques in InfoVis as the features that provide users with the ability to directly or indirectly manipulate and interpret representation. According to this view, a static image or an autonomously animated representation does not have associated interaction techniques. However, a menu interface for changing from a scatter plot to a parallel coordinates plot is an interaction technique since it allows a user to manipulate a representation even if it may be less interactive or direct. 11

Direct Manipulation 12

Direct Manipulation (DM) Shneiderman 1982 DM features Visibility of all objects of interest Incremental actions with rapid feedback on all actions Reversibility of all actions, so that users are encouraged to explore without penalties Syntactic correctness of all actions, so that every user action is a legal operation DM does not only make interaction easier for novice users but fundamentally extends visualization capabilities 13

Example: Brushing Becker & Cleveland 1987 A collection of dynamic methods for viewing multidimensional data Brush is an interactive interface tool to select / mark subsets of data in a single view, e.g. by sweeping a virtual brush across items of interest Given linked views (e.g. scatterplot matrix) the brushing can support the identification of correlations across multiple dimensions (brushing & linking) Usually used to visually filter data (via highlighting) Additional manipulation / operations may be performed on the subsets (masking, magnification, labeling etc.) Different types of brushes (Hauser et al. 2002) Simple brush via sweeping Composite brush: composed multiple single-axis brushes by the use of logical operators Angular brush Smooth brush Composite scatterplot brushes - Hauser et al. 2002 14

Example: Brushing Brushing one dimension in parallel coordinates to highlight car data objects with 4 cylinders Hauser et al. 2002 15

Brushing a parallel coordinate plot Robert Kosara, Poster: Indirect Multi-Touch Interaction for Brushing in Parallel Coordinates, IEEE Information Visualization Posters, 2010. 16

Angular Brush Angular brush: brushing by specifying a slope range highlight correlation and outliers between two dimensions Hauser et al. 2002 17

Smooth Brush Non-binary brushing Degree-of-interest defined by distance to brushed range Decreasing degree is mapped to decreasing drawing intensity Hauser et al. 2002 18

Another Brushing Example Example for composite (AND) brush in Parallel Coordinate Plot find the cities with high wages, small prices and many paid holiday days Demo InfoScope: http://www.macrofocus.com/public/ products/infoscope.html (free trial and applet) 19

Dynamic Queries Shneiderman 1994 Explore and search databases SQL example: SELECT customer_id, customer_name, COUNT(order_id) as total FROM customers INNER JOIN orders ON customers.customer_id = orders.customer_id GROUP BY customer_id, customer_name HAVING COUNT(order_id) > 5 ORDER BY COUNT(order_id) DESC Problems Takes time to learn Takes time to formulate and reformulate User must know what she is looking for only exact matches Lots of ways to fail SQL error messages helpful? Zero hits what component is to be changed? 20

Dynamic Queries Based on Direct Manipulation (DM) DM principles with regard to Dynamic Queries Visual presentation of the query s components Visual presentation of results Rapid, incremental, and reversible control of the query Selection by pointing, not typing Immediate, continuous feedback Implementation approach Graphical query formulation: Users formulate queries by adjusting sliders, pressing buttons, bounding box selection Search results displayed are continuously updated (< 100 ms) 21

Examples Visual representations of data to query? Some examples: geographic data, starfields, tables etc. Shneiderman 1994 22

HomeFinder One of the first DQ interfaces Williamson & Shneiderman 1983(!) 23

FilmFinder Ahlberg & Shneiderman 1994 26

Dynamic Query Controls Check boxes and buttons (Nominal with low cardinality) Sliders and range slider (ordinal and quantitative data) Alphaslider (ordinal data) (Ahlberg & Shneiderman 1994) Small-sized widget to search sorted lists Online-text output Two-tiled slider thumb for dragging operations with different granularities Letter index visualizing the distribution of initial letters jump to a position in the slider Locating an item out of 10,000 items ~ 28s for novice users Pros and cons to text entry? Extend data sliders with data visualization (Eick 1994) 28

Dynamic Queries Online Online examples: Apartment Search (http://immo.search.ch) Diamond search (http://www.bluenile.com) 29

DQ in current search interfaces DQ have become widespread with fast search algorithms and increased computing capacity search happens while typing in search terms in google search new routes are calculated while point is dragged in google maps 30

Making Money with Dynamic Queries Starfield displays and Dynamic Queries provided the basis for SpotFire Christopher Ahlberg 1991: Visiting student from Sweden at the HCIL University of Maryland 1996: Founder of SpotFire 2007: SpotFire was sold for 195 Mio. $ Well done! 31

Summary Dynamic Queries Users can rapidly, safely playfully explore a data space no false input possible Users can rapidly generate new queries based on incidental learning Visual representation of data supports data exploration Analysis by continuously developing and testing hypotheses (detect clusters, outliers, trends in multivariate data) Provides straightforward undo and reversing of actions Potential problems with DQ as implemented in the FilmFinder? Limit of query complexity filters are always conjunctive Performance is limited for very large data sets and client / server applications Controls require valuable display space Information is pruned Only single range queries and single selection in the alphaslider 32