Cours de Visualisation d'information InfoVis Lecture Multivariate Data Sets Frédéric Vernier Maître de conférence / Lecturer Univ. Paris Sud Inspired from CS 7450 - John Stasko CS 5764 - Chris North
Data Sets Ø Data comes in many different forms Ø Typically, not in the way you want it Ø How is stored (in the raw)? Ø Heterogeneous data often seen as multiple dimensions of elements extracted by patterns or needs.
Data set!
Schema Ø Cars Ø brand Ø model Ø year Ø cost Ø size Ø weights Ø miles per gallon Ø 1 M2R InfoVis Lecture. 2011. Univ. Paris Sud
Data Tables Ø Often, we take raw data and transform it into a form that is more workable Ø Main idea: Ø Individual items are called cases Ø Cases have variables (attributes)
Variable Types Ø N-Nominal (equal or not equal to other values) Ø Example: gender, hair color (blond, brown, black, red) Ø O-Ordinal (obeys < relation, ordered set) Ø Example: soccer leagues, rainbow colors Ø Q-Quantitative (can do math on them) Ø Example: age, photoshop colors
Variable Types Ø Three main types of variables Ø N-Nominal Ø By Class: data belong or not to classes (.org,.com,.fr) Ø Partially ordered: order on classes (engineer students) Ø O-Ordinal Ø Q-Quantitative Ø Quantitative + 0 (clear 0) Ø Sometimes the type depends on the context Ø O-Ordinal is always possible
Example Baseball statistics
Metadata Ø Descriptive information about the data Ø Might be something as simple as the type of a variable, or could be more complex (INT) Ø For times when the table itself just isn t enoughi Ø AtBats Hit HomeRuns Ø if YearInMasterLeague =1 then AtBats=CareerAtBat Ø if player is injured more than half of the season the avg do not take into account this season Ø 1rst season stats are not backed-up by the
How Many Variables? Ø Data sets of dimensions 1,2,3 are common Ø Number of variables per class Ø 1 - Univariate data (e.g timeline) Ø 2 - Bivariate data (e.g maps) Ø 3 - Trivariate data (volume) Ø >3 - Hypervariate data (???) Ø Example: www.nationmaster.com Ø Cases always the same
Univariate Ø Representations Ø Dot plot Ø Bar chart (item vs. attribute) Ø Tukey box plot Ø Histogram 7 Bill 5 3 1
Bivariate Ø Scatterplot Common BUT Powerful
Density problem
Trivariate Ø 3D scatterplot, 2D plot+size 2D plot+color, 3x barchart
Hypervariate Data Ø What about data sets with MANY variables? Ø Often the interesting ones Ø n-d What does 10-D space look like?
Multiple Projections Give each variable its own display 1 A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4 2 6 3 1 5 2 3 4 A B C D E What if more than 4 cases?
Help me Infovis! Ø smart layout Ø using graphical
Scatterplot Matrix All pair of variables in their own 2-D scatterplot Brushing (subset) & Linking (sync.) [Voigt, 2002]
label, dot plot, scale Histogram > dot plot for distribution Scale row & column
On steroids
Chernoff Faces Encode different variables values in characteristics of human face
Simple Example [Turner, 1977] [Spinelli and Zhou, 2004]
On steroids Look at faces, not colors 1 M2R InfoVis Lecture. 2011. Univ. Paris Sud
Star Plots / Glyphs Var 5 Var 1 Value Var 2 Space out the n variables at equal angles around a circle Var 4 Var 3 Each spoke encodes a variable s value
examples circular // coords Star plot or Glyph plot => freedom on layout!
On prednizone... just 2 dims [bertillon] population x percent foreigners area = number of foreigners
On steroids (count)
On steroids (dim)
Star Coordinates E. Kandogan, Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions, InfoVis 2000 Late-Breaking Hot Topics, Oct. 2000
Demo - Interaction Ø Activate/ deactivate axis Ø Color selection or axis Ø Glyph coordinates Ø Scale axis Ø Rotate axis Ø Dot size Ø Brushing on axis Ø Trail Ø Inspector Ø Panning
Parallel Coordinates By A. Inselberg Encode variables along a horizontal row Vertical line specifies values V1 V2 V3 V4 V5
Parallel Coords Example Basic Grayscale From: Dean F. Jerding and John T. Stasko http://www.cc.gatech.edu/gvu/softviz/infoviz/information_mural.html Color
And more cars
With brushing
and more brushing
On steroids
VisDB Ø Database of data items, each of n dimensions Ø Issue a query that specifies a target value of the dimensions Ø Often get back no exact matches Ø Want to find near matches Ø Relevance factor Ø metadata Taken from: D. Keim, H-P Kriegel, VisDB Database Exploration Using Multid Vis, IEEE CG&A, 1994.
Technique Ø Calculate relevance of all data points Ø Sort items based on relevance Ø Use spiral technique to order the values Ø Color items based on relevance High Empirically established Low
Display Methodology Highest relevance value in center, decreasing values grow outward Items ordered by total relevance Spiral in each window Total relevance Dim 1 Dim 2 Same item appears in same place in each window Dim 5 Dim 4 Dim 3
Figure from Paper
Example Display
Alternative Ø Grouping arrangement => single window Ø Create all relevance dimensional depictions for an item and group them Ø Spiral out the different data items
Example 8 dimensions 1000 items Multi-window Grouping
On Steroids?
Overview Scatterplot Matrix Chernoff Faces Star Plots / Glyphs Star Coordinates Parallel Coordinates Spiral plots
More techniques? Ø Combinations Ø More integrated software Ø legacy spreadsheet layout
Seelt
Highlighted Dynamic Table Viewer Nada Golmie & Bill Kules
InfoZoom
SpotFire
Spotfire
Advizor
IBM ILOG Discovery
Eureka / TableLens Rao & Card 94
Focus + context
EZChooser: K. Wittenburg
Comparisons Ø ParCood: <1000 items, <20 attrs Ø Relate between adjacent attr pairs Ø StarCoord: <1,000,000 items, <20 attrs Ø Interaction intensive Ø TableLens: similar to par-coords Ø more items with aggregation Ø Relate 1:m attrs (sorting), short learn time Ø Visdb: 100,000 items with 10 attrs Ø Items*attrs = screenspace, long learn time, must query Ø Spotfire: <1,000,000 items, <10 attrs (DQ many) Ø Filtering, short learn time
MultiVariate Visu Tools INTERACTION is the key!
Paper presentations Ø Hajar Falih Ø Multi-Dimensional Detective Ø Thibaut Jacob Ø Rolling the Dice: Multidimensional Visual Exploration using Scatterplot Matrix Navigation 06/12/2011 90 min Lecture: Multi-dimensional Data Visualization Δ 10 min Break 30 min Paper presentations (students) 40 min Lab work on Processing: interaction Δ (Dragicevic & Vernier)