Chapter 3 - Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Florian Alt, WS 2013/14 Konzept und Folien (4th revised edition): Thorsten Büring, Andreas Butz, Michael Burch 1
Outline Reference model and data terminology Visualizing data with < 4 variables Visualizing multivariable data Geometric transformation Glyphs Pixel-based Downscaling of dimensions Case studies: support for exploring multidimensional data Rank-by-feature Dust & magnet Clutter reduction techniques 2
Exploring Multivariate Data 3
Rank-By-Feature Seo & Shneiderman 2004 Part of the Hierarchical Clustering Explorer (HCE) (http://www.cs.umd.edu/hcil/multi-cluster/) Tabs: histogram and scatterplot ordering Implements systematic approach for data exploration (1) study 1D, study 2D, then find features (2) ranking guides insight, statistics confirm Tool provides low-dimensional projections as a histogram (1D) or scatterplot (2D) Users can select a feature detection criterion (e.g. test for normal distribution (1D), correlation coefficient (2D)) to rank projections The ranking facility is particularly helpful when the number of possible projections is too large to investigate: concentrate on the interesting ones 4
Rank-By-Feature Users start with 1D projections (histogram ordering) Four coordinated views A: selection of ranking criterion B: overview of scores for all dimensions (color coding: the brighter the color, the higher the score) C: numerical / statistical detail for each dimension (e.g. score, mean, standard deviation) D: display of histogram + boxplot (minimum, first quartile, median, third quartile, maximum) 5
Rank-By-Feature Some basic statistical terms Mean: Sum of all values divided by the number of values Median: Middle value of a distribution of values when ranked in order of magnitude Mode: Single most common value Variance: average squared deviation between the mean and the values Standard deviation: square root of the variance (translates the variance into the original units of measurement) Statistical tests supported by HCE for 1D ranking Normality of the distribution: distribution of items forms a symmetric, bell-shaped curve Uniformity of the distribution: all of the values of a random variable occur with equal probability (results in a flat histogram) Number of potential outliers Number of unique values 6
Mode The most frequent score Describes how most items behave Pros: Easy to calculate and understand Can be used with nominal data Cons: There can be more than one mode Mode can change dramatically by adding only one dataset Independent of all other data in the set y c n e u q r e F 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 Score mode Negatively Skewed Distribution 7
Median Middle score of the distribution Example data: 1 7 3 9 6 9 2 Sorted by magnitude: 9 9 7 6 3 2 1 median = 6 If #scores even average two middle scores Example data: 1 7 3 9 4 6 9 2 Sorted by magnitude: 9 9 7 6 4 3 2 1 median = 5 Pros: Relatively unaffected by outliers (very low or high scores) and skewed distributions Can be used with ordinal, interval and ratio data Cons: Does not consider all scores of the data set Not very stable 8
Mean Sum of all scores divided by #scores: Most often used if average is mentioned Pros: Considers every score most accurate summary of the data Resistant to sampling variation: removing one sample changes the mean far less than mode or median Cons: 1 m = n Heavily affected by extreme scores and skewed distributions Can only be used with interval and ratio data n i= 1 X i 9
Standard Deviation and Variance How do you measure the accuracy of the mean? Example data set 1: 5 5 5 5 5 mean = 5 Example data set 2: 6 8 4 1 6 mean = 5 Which of the data sets is better reflected by the mean? If x1, x2, xn are the data in a sample with mean m Deviation = difference between mean and scores = (xi - m) 2 2 (xi m) ) Variance s = n Standard deviation (SD) s = Var(X) Both variance and standard deviations measure the accuracy of the data set variability of the data 10
Quantile, Quartile, and Percentile Quantile Cut points' that divide a sample of data into groups containing (as far as possible) equal numbers of observations. Quartile (Quantile of 4) Values that divide a sample of data into 4 groups containing (as far as possible) equal numbers of observations Percentile (Quantile of 100) Values that divide a sample of data into 100 groups containing (as far as possible) equal numbers of observations lower quartile median upper quartile 11
Rank-By-Feature Move on to 2D projections (scatterplot ordering) Identify pairwise relationships between dimensions B: prism provides overview of scores for dimension pairs; score is color coded D: scatterplot browser; multiple browsers are possible; Ranking criteria Correlation coefficient: direction and strength of linear relationship Least square error for simple linear / curvilinear regression: how well does the regression model fit Number of items in a user-defined region of interest & uniformity of scatterplot 12
Dust & Magnet Yi et al. 2005 Data cases are represented as particles of iron dust Magnets represent the dimensions of the data set Users manipulate the magnets to move the dust Dust moves at different speed depending on its data values for the magnet dimensions 13
Clutter Reduction Techniques 15
Clutter Reduction Techniques Clutter: crowded and disordered visual entities that obscure the structure in visual displays Avoiding clutter is one of the main challenges in Information Visualization Techniques to reduce clutter Dimension reordering Sampling Constant-density visualization Point displacement Aggregation / Clustering (one mark represents more than one case, e.g. group day sales into months) Filtering (Dynamic queries, zooming) 16
Dimension Reordering Peng et al. 2004 Automatically identify the views with the least amount of visual clutter Clutter definition and algorithms for different visualization techniques 17
Dimension Reordering 18
Sampling Reduce the density of visual representation by displaying a random subset of the data Random sampling preserves the distribution of data Overall trends (e.g. correlation) can still be detected at a reduced density Uniform sampling (Ellis & Dix 2004) Applying the same (manually or automatically defined) sampling factor to the entire data space Problem: areas with low density may become empty Non-uniform sampling (Bertini & Santucci 2004) Preserving relative density Model to compute where, how, and how much to sample to preserve image characteristics 19
Bertini & Santucci 2004 20
Sampling Lens Ellis et al. 2005 Magic Lens approach to apply random sampling to user-defined regions while maintaining the original data density in the context 21
Constant Information Density Woodruff et al. 1998 Data objects in areas with high information density are represented by small-sized low-detail glyphs Objects in sparse-density areas are represented by larger, more detailed glyphs The number of information objects is kept constant as the user scales and moves the view 22
Constant Information Density 23
Point Displacement Items that would overlap other items are moved to adjacent free positions Position of the items and their distance should be preserved as much as possible Three algorithms for displacing pixels (e.g. adding abstract data to geographical maps) (Keim & Hermann 1998) Nearest-Neighbor Curve-based Gridfit Algorithms share the same procedure All data points which have a unique position are placed on the display Keim 2000: data with and without overlap New positions are determined for the remaining points 24
Point Displacement Nearest-Neighbor algorithm For all points which have not been set in the first step, place them on the nearest unoccupied position Fast to compute, but limited effectiveness for very dense displays (pixels may be placed very far from their original positions) Curve-based algorithm For all points which have not been set in the first step Compute the nearest unoccupied position on a given screen-filling curve Shift all points between the occupied pixel and the unoccupied position along the screen-filling curve Place the point on the newly available unoccupied pixel 25
Chapter 4 - Interaction with Visualizations Dynamic linking, brushing and filtering in Information Visualization displays Vorlesung Informationsvisualisierung Prof. Dr. Florian Alt, WS 2013/14 Konzept und Folien (4th revised edition): Thorsten Büring, Andreas Butz, Michael Burch 1
Outline Definitions Direct Manipulation (DM) Dynamic Queries Movable Filters Common Interaction Techniques Select Explore Reconfigure Encode Abstract / Elaborate Filter Connect 2
Definitions 3
InfoVis & Interaction Information Visualization research originally: focus on finding novel visual representations Increasing interest in interaction design, HCI models and evaluation as well as aesthetics in the InfoVis community HCI Interaction models help us to better understand the complex concepts of human-machine communication Norman s execution-evaluation cycle (Norman 1988) 1. Establishing the goal 2. Forming the intention 3. Specifying the action sequence 4. Executing the interaction 5. Perceiving the system state 6. Interpreting the system state 7. Evaluating the system state with respect to the goals and intentions 4
InfoVis & Interaction InfoVis systems are composed of two major components: Representation Roots lie in the field of Computer Graphics Mapping from data to geometrical objects and How these objects are rendered Interaction Roots lie in the field of Human-Computer Interaction (HCI) Dialog between user and the system User explores the data from different perspectives to find insights 5
Representation and Interaction Although discussed as two separate components, representation and interaction clearly are not mutually exclusive. For example: Interaction with a system may activate a change in representation. Representation component has received vast majority of attention in InfoVis research. Scan of recent conference proceedings and journal issues in InfoVis uncovers many articles about new representations of datasets but interaction is often relegated to a secondary role. 6
Representation and Interaction Interaction rarely is the main focus of research in InfoVis. Making it the little brother of InfoVis Interaction is overshadowed by the more noteworthy representation aspects. Interaction is an essential part of InfoVis Without it, an InfoVis technique or system becomes a static image or autonomously animated images. Static images clearly have analytic and expressive value But usefulness becomes more limited as the data set they represent grows larger with more variables 7
Representation and Interaction By the way: Also with a static image such as a poster, a user (or reader) will often perform interactions Rotating the poster Looking closer/further Jotting down notes on the poster Spence suggests the notion of passive interaction through which the user s mental model on the data set is changed and enhanced. Finally, through interaction, some limits of a representation can be overcome, and the cognition of a user can be further amplified. Importance of interaction and the need for its further study seem undisputed. 8
Definition Interaction: Before discussing the term interaction, we will find a solid definition for it. But: Finding a solid definition for it is challenging! In the broader context of Human-Computer Interaction (HCI) it is simply described as the communication between user and the system. [Dix et al., 2004] Becker, Cleveland, and Wilks compactly define interaction as direct manipulation and instantianeous change. [Becker et al., 1987] 9
Definition asymmetry in data rates. [Ware, 2000] The amount of data flowing from InfoVis systems to users is far greater than from users to systems. Thus, interaction techniques in InfoVis seem more designed for changing and adjusting visual representation than for entering data into systems, which clearly is an important aspect of interaction in HCI. 10
Definition We view interaction techniques in InfoVis as the features that provide users with the ability to directly or indirectly manipulate and interpret representation. According to this view, a static image or an autonomously animated representation does not have associated interaction techniques. However, a menu interface for changing from a scatter plot to a parallel coordinates plot is an interaction technique since it allows a user to manipulate a representation even if it may be less interactive or direct. 11
Direct Manipulation 12
Direct Manipulation (DM) Shneiderman 1982 DM features Visibility of all objects of interest Incremental actions with rapid feedback on all actions Reversibility of all actions, so that users are encouraged to explore without penalties Syntactic correctness of all actions, so that every user action is a legal operation DM does not only make interaction easier for novice users but fundamentally extends visualization capabilities 13
Example: Brushing Becker & Cleveland 1987 A collection of dynamic methods for viewing multidimensional data Brush is an interactive interface tool to select / mark subsets of data in a single view, e.g. by sweeping a virtual brush across items of interest Given linked views (e.g. scatterplot matrix) the brushing can support the identification of correlations across multiple dimensions (brushing & linking) Usually used to visually filter data (via highlighting) Additional manipulation / operations may be performed on the subsets (masking, magnification, labeling etc.) Different types of brushes (Hauser et al. 2002) Simple brush via sweeping Composite brush: composed multiple single-axis brushes by the use of logical operators Angular brush Smooth brush Composite scatterplot brushes - Hauser et al. 2002 14
Example: Brushing Brushing one dimension in parallel coordinates to highlight car data objects with 4 cylinders Hauser et al. 2002 15
Brushing a parallel coordinate plot Robert Kosara, Poster: Indirect Multi-Touch Interaction for Brushing in Parallel Coordinates, IEEE Information Visualization Posters, 2010. 16
Angular Brush Angular brush: brushing by specifying a slope range highlight correlation and outliers between two dimensions Hauser et al. 2002 17
Smooth Brush Non-binary brushing Degree-of-interest defined by distance to brushed range Decreasing degree is mapped to decreasing drawing intensity Hauser et al. 2002 18
Another Brushing Example Example for composite (AND) brush in Parallel Coordinate Plot find the cities with high wages, small prices and many paid holiday days Demo InfoScope: http://www.macrofocus.com/public/ products/infoscope.html (free trial and applet) 19
Dynamic Queries Shneiderman 1994 Explore and search databases SQL example: SELECT customer_id, customer_name, COUNT(order_id) as total FROM customers INNER JOIN orders ON customers.customer_id = orders.customer_id GROUP BY customer_id, customer_name HAVING COUNT(order_id) > 5 ORDER BY COUNT(order_id) DESC Problems Takes time to learn Takes time to formulate and reformulate User must know what she is looking for only exact matches Lots of ways to fail SQL error messages helpful? Zero hits what component is to be changed? 20
Dynamic Queries Based on Direct Manipulation (DM) DM principles with regard to Dynamic Queries Visual presentation of the query s components Visual presentation of results Rapid, incremental, and reversible control of the query Selection by pointing, not typing Immediate, continuous feedback Implementation approach Graphical query formulation: Users formulate queries by adjusting sliders, pressing buttons, bounding box selection Search results displayed are continuously updated (< 100 ms) 21
Examples Visual representations of data to query? Some examples: geographic data, starfields, tables etc. Shneiderman 1994 22
HomeFinder One of the first DQ interfaces Williamson & Shneiderman 1983(!) 23
FilmFinder Ahlberg & Shneiderman 1994 26
Dynamic Query Controls Check boxes and buttons (Nominal with low cardinality) Sliders and range slider (ordinal and quantitative data) Alphaslider (ordinal data) (Ahlberg & Shneiderman 1994) Small-sized widget to search sorted lists Online-text output Two-tiled slider thumb for dragging operations with different granularities Letter index visualizing the distribution of initial letters jump to a position in the slider Locating an item out of 10,000 items ~ 28s for novice users Pros and cons to text entry? Extend data sliders with data visualization (Eick 1994) 28
Dynamic Queries Online Online examples: Apartment Search (http://immo.search.ch) Diamond search (http://www.bluenile.com) 29
DQ in current search interfaces DQ have become widespread with fast search algorithms and increased computing capacity search happens while typing in search terms in google search new routes are calculated while point is dragged in google maps 30
Making Money with Dynamic Queries Starfield displays and Dynamic Queries provided the basis for SpotFire Christopher Ahlberg 1991: Visiting student from Sweden at the HCIL University of Maryland 1996: Founder of SpotFire 2007: SpotFire was sold for 195 Mio. $ Well done! 31
Summary Dynamic Queries Users can rapidly, safely playfully explore a data space no false input possible Users can rapidly generate new queries based on incidental learning Visual representation of data supports data exploration Analysis by continuously developing and testing hypotheses (detect clusters, outliers, trends in multivariate data) Provides straightforward undo and reversing of actions Potential problems with DQ as implemented in the FilmFinder? Limit of query complexity filters are always conjunctive Performance is limited for very large data sets and client / server applications Controls require valuable display space Information is pruned Only single range queries and single selection in the alphaslider 32