TIES443. Lecture 9: Visualization. Lecture 9. Course webpage: http://www.cs.jyu.fi/~mpechen/ties443. November 17, 2006



Similar documents
Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

Visualization Techniques in Data Mining

Information Visualization Multivariate Data Visualization Krešimir Matković

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Exploration Data Visualization

Time Series Data Visualization

COC131 Data Mining - Clustering

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

an introduction to VISUALIZING DATA by joel laumans

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Cours de Visualisation d'information InfoVis Lecture. Multivariate Data Sets

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Big Data: Rethinking Text Visualization

Chapter 3 - Multidimensional Information Visualization II

Outline. Fundamentals. Rendering (of 3D data) Data mappings. Evaluation Interaction

CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

How To Create A Data Visualization

Visualization methods for patent data

Diagrams and Graphs of Statistical Data

Data Visualization - A Very Rough Guide

Graphical Representation of Multivariate Data

Visualization Software

Visualization of Multivariate Data. Dr. Yan Liu Department of Biomedical, Industrial and Human Factors Engineering Wright State University

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

Visualization Quick Guide

On History of Information Visualization

Data Visualization VINH PHAN AW1 06/01/2014

Getting Started with R and RStudio 1

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

Visualization of missing values using the R-package VIM

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Security visualisation

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Data Visualization Handbook

3D Interactive Information Visualization: Guidelines from experience and analysis of applications

Visibility optimization for data visualization: A Survey of Issues and Techniques

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz

Data exploration with Microsoft Excel: analysing more than one variable

The course: An Introduction to Information Visualization Techniques for Exploring Large Database

Introduction to Data Visualization

2D, 3D and High-Dimensional Data and Information Visualization

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

Data, Measurements, Features

A Comparative Study of Visualization Techniques for Data Mining

CSE 564: Visualization. Time Series Data. Time Series Data Are Everywhere. Temporal relationships can be highly complex

Introduction to Principal Component Analysis: Stock Market Values

Introduction to Multivariate Analysis

The Visualization Pipeline

Relationships Between Two Variables: Scatterplots and Correlation

Week 1. Exploratory Data Analysis

COSC 6344 Visualization

Quantitative vs. Categorical Data: A Difference Worth Knowing Stephen Few April 2005

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Universidade de Aveiro Departamento de Electrónica, Telecomunicações e Informática. Introduction to Information Visualization

A Short Introduction on Data Visualization. Guoning Chen

Effective Visualization Techniques for Data Discovery and Analysis

Data visualization Zhao Kaidi School of Computing, National University of Singapore

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Data Mining and Visualization

COMP Visualization. Lecture 11 Interacting with Visualizations

Dynamic Visualization and Time

Understanding Data: A Comparison of Information Visualization Tools and Techniques

Common Tools for Displaying and Communicating Data for Process Improvement

Introduction to D3.js Interactive Data Visualization in the Web Browser

SPSS Manual for Introductory Applied Statistics: A Variable Approach

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Hierarchy and Tree Visualization

Data Visualization Techniques

Introduction of Information Visualization and Visual Analytics. Chapter 2. Introduction and Motivation

BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data

Introduction to Information Visualization

0 Introduction to Data Analysis Using an Excel Spreadsheet

Good Graphs: Graphical Perception and Data Visualization

Visualisatie BMT. Introduction, visualization, visualization pipeline. Arjan Kok Huub van de Wetering

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

2. Simple Linear Regression

Criteria for Evaluating Visual EDA Tools

Transcription:

TIES443 Lecture 9 Visualization Mykola Pechenizkiy Course webpage: http://www.cs.jyu.fi/~mpechen/ties443 Department of Mathematical Information Technology University of Jyväskylä November 17, 2006 1 Topics for today Purpose of visualization in DM/KDD Visualization of data 1D, 2D, 3D, 3+D Visualization of data mining results Model visualization, model performance visualization Visualization of data mining processes Application Interactive data mining and visualization Time-series visualization Data mining as part of visualization system or vice versa? 2 1

Sources Books: Information Visualization in DM and KD Fayyad et al. Information Visualization: Beyond the Horizon : 2nd ed, C. Chen The Visual Display of Quantitative Information, 2nd ed, E. Tufte People: Edward Tufte, Ben Shneiderman, Daniel A. Keim, Marti Hearst, Tamara Munzner, and other Excellent handouts and recommended reading on information visualization by Tamara Munzner http://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/ Some papers: Visual Data Mining Framework for Convenient Identification of Useful Knowledge http://www.cs.uic.edu/~kzhao/papers/06_icdm_zhao_visual.pdf Lots of other sources 3 Visualization & related fields User Interface Studies Computer Vision Cognitive Science Computer-Aided Design Geometric Modelling Computer Graphics Signal Processing Approximation Theory Fig. by Jack van Wijk from Views on Visualization 4 2

Purpose of Visualization in DM/KDD Data Visualization Qualitative overview of large and complex data sets Data summarization and (interactive) exploration Regions of interest identification Support humans in looking for structures, features, patterns, trends, and anomalies in data Perceptual capabilities of the human visual system Model (and its performance) Visualization Process Visualization Disadvantages? requires human eyes can be misleading "The purpose of computing is insight, not numbers" Richard Hamming 5 Visualization Techniques Categorization Based on task at hand To explore data To confirm a hypothesis Based on structure of underlying data set Based on the dimension of the display Stationary Animated Interactive Geometric vs. Symbolic Results of physical models, simulations, computations Nonnumeric data Networks, relations as graphs; icons, arrays 1D-2D-3D, stereoscopic, virtual reality vs. 3+D Static vs. Dynamic 6 3

Some Basic Visualization Terminology Visualization Interaction Graphical attributes Mapping Rendering Field Scalar, Vector, Tensor Isosurface Glyph 7 Perception in Visualization What graphical primitives do humans detect quickly? What graphical attributes can be accurately measured? How many distinct values for a particular attribute can be used without confusion? How do we combine primitives to recognize complex phenomena? Human perception and information theory 8 4

The most of following slides for data visualization were adopted from the presentation Visualization and Data Mining by G. Piatetsky-Shapiro (available from www.kdnuggets.com) 9 Data Visualization Outline Graphical excellence and lie factor Representing data in 1,2, and 3-D Representing data in 4+ dimensions Parallel coordinates Scatterplots Stick figures 10 5

Napoleon Invasion of Russia, 1812 Napoleon 11 www.odt.org, from http://www.odt.org/pictures/minard.jpg, used by permission 12 6

Snow s Cholera Map, 1855 13 Asia at night 14 7

Bad Visualization: misleading Y -axis Year Sales 1999 2110 Sales 2000 2001 2002 2003 2105 2120 2121 2124 2130 2125 2120 2115 2110 2105 2100 2095 1999 2000 2001 2002 2003 Sales Y-Axis scale gives WRONG impression of big change 15 Better Visualization Year Sales 1999 2110 Sales 2000 2001 2002 2003 2105 2120 2121 2124 3000 2500 2000 1500 1000 500 Sales 0 1999 2000 2001 2002 2003 Axis from 0 to 2000 scale gives correct impression of small change 16 8

Lie Factor=14.8 (E.R. Tufte, The Visual Display of Quantitative Information,, 2nd edition) 17 Lie Factor Lie Factor = size of effect shown in graphic size of effect in data = = (5.3 0.6) 0.6 (27.5 18.0) 18 7.833 = = 14.8 0.528 Tufte requirement: 0.95<Lie Factor<1.05 (E.R. Tufte, The Visual Display of Quantitative Information,, 2nd edition) 18 9

Give the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Tufte s Principles of Graphical Excellence Tell the truth about the data! (E.R. Tufte, The Visual Display of Quantitative Information,, 2nd edition) 19 1-D D (Univariate) Data Representations 20 10

2-D D (Bivariate( Bivariate) ) Data Scatter plot, price mileage 21 3-D D Data (projection) price 22 11

3-D D image (requires 3-D 3 D blue and red glasses) Taken by Mars Rover Spirit, Jan 2004 23 Visualizing in 4+ Dimensions Scatterplots Parallel coordinates Chernoff faces Stick figures Star glyphs 24 12

Multiple Views Give each variable its own display 1 A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4 2 6 3 1 5 2 3 4 Problem: does not show correlations A B C D E 25 Scatterplot Matrix Represent each possible pair of variables in their own 2-D scatterplot (car data) Q: Useful for what? A: linear correlations (e.g. horsepower & weight) Q: Misses what? A: multivariate effects For 13 dimensions (columns) : Number of histograms = 13 Number of scatterplots = C(13,2) = 78 26 13

Parallel Coordinates Encode variables along a horizontal row Vertical line specifies values Dataset in a Cartesian coordinates Same dataset in parallel coordinates Invented by Alfred Inselberg while at IBM, 1985 27 Example: Visualizing Iris Data Iris setosa sepal sepal petal petal length width length width 5.1 3.5 1.4 0.2 4.9 3 1.4 0.2............ 5.9 3 5.1 1.8 Iris versicolor Iris virginica 28 14

Parallel Coordinates: 2 D Sepal Length Sepal Width 3.5 5.1 sepal sepal petal petal length width length width 5.1 3.5 1.4 0.2 29 Parallel Coordinates: 4 D Sepal Length Sepal Width Petal length Petal Width 3.5 5.1 1.4 0.2 sepal sepal petal petal length width length width 5.1 3.5 1.4 0.2 30 15

Parallel Visualization of Iris data 3.5 5.1 1.4 0.2 31 Parallel Coordinates: Correllation Hyperdimensional Data Analysis Using Parallel Coordinates. Edward J. Wegman. Journal of the American Statistical Association, 85(411), Sep 1990, p 664-675. 32 16

Hierarchical Parallel Coordinates Hierarchical Parallel Coordinates for Visualizing Large Multivariate Data Sets. Fua, Ward, and Rundensteiner, IEEE Visualization 99 33 Chernoff Faces Encode different variables values in characteristics of human face Cute applets: http://www.cs.uchicago.edu/~wiseman/chernoff/ http://hesketh.com/schampeo/projects/faces/chernoff.html 34 17

Chernoff faces, example 35 Stick Figures Two variables are mapped to X, Y axes Other variables are mapped to limb lengths and angles Texture patterns can show data characteristics 36 18

Stick figures, example census data showing age, income, sex, education, etc. Closed figures correspond to women and we can see more of them on the left. Note also a young woman with high income 37 Star Glyphs from Xmdv Two variables are mapped to X, Y axes Other variables are mapped to limb lengths and angles Texture patterns can show data characteristics 38 19

39 Model Visualization 40 20

DM Model Visualization: Why Understanding the model allow a user to discuss and explain the logic behind the model also with colleagues, customers, and other users more than just comprehension; it also involves business context financial indicators like profit and cost visualization of the DM output in a meaningful way allowing the user to interact with the visualization simple what if questions can be answered 3 whales: representation, interaction, and integration Trusting the model good quantitative measures of "trust" the overall goal of "visualizing trust" understanding the limitations of the model Comparing Different Models using Visualization as Input-Output Mappings, as Algorithms, as Processes 41 Model Visualization: Performance Evaluation 1 0.9 Normalized Expected Cost 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 PC(+) Probability Cost Cost-curves (Holt) 42 21

Visualization of DM/KDD Process 43 Applications Bioinformatics Visual genomics Social networks Process mining Time series visualization 44 22

Visualization of Clusters of Microarray Data 45 Visual Genomics Pictures by Christoph W. Sensen www.visualgenomics.ca/index.php?option=com_content&task=section&id=15&itemid=143 46 23

The Result of Process Mining From presentation by Wil van der Aalst, TU/e www.processmining.org 47 Design Studies 48 24

Focus+Context 49 Time Series Spirals Spiral Axis = serial attributes are encoded as line thickness Radii = periodic attributes One year of power demand data Wednesday Tuesday Monday Friday Thursday Sunday Saturday Carlis & Konstan. UIST-98 Independently rediscovered by Weber, Alexa & Müller InfoVis-01 But dates back to 1888! (c) Eamonn Keogh, eamonn@cs.ucr.edu 50 25

Power Consumption van Wijk and van Selow, Cluster and Calender based Visualization of Time Series Data, InfoVis99, http://www.win.tue.nl/ vanwijk/clv.pdf 51 Time Series Spirals April July January October Chimpanzees Monthly Food Intake 1980-1988 The spokes are months, and spiral guide lines are years 112 types of food Simple and intuitive, but works only with (c) Eamonn Keogh, eamonn@cs.ucr.edu periodic data, and if the period is known 52 26

ThemeRiver Current width = strength of theme River width = global strength Color mapping (similar themes/ same color family) Time axis External events can be linked Havre, Hetzler, Whitney & Nowell InfoVis 2000 A company s patent activity 1988 to 1998 53 ThemeRiver Castro confiscates American oil refineries oil Fidel Castro s speeches 1960-1961 Simple and intuitive Many extensions possible Scalability is still an issue (c) Eamonn Keogh, eamonn@cs.ucr.edu dot.com stocks 1999-2002 54 27

TimeSearcher Comments Simple and intuitive Highly dynamic exploration Query power may be limited and simplistic Limited scalability Hochheiser, and Shneiderman (c) Eamonn Keogh, eamonn@cs.ucr.edu 55 01011001011110011010010000100 01010011011010111000010101011 10111110001101101101111110100 11001001000110100011110011011 01000101111000101101001101100 11010000001001100010011100000 11101001100101100001010010 VizTree 10001000101001000101010100001 01010001010111011110101101001 01110100101010011101010101001 01001010101110101010010101010 11010101001011001011101111010 00111000010100001001110101000 11100001010101100101110101 Which set of bit strings is generated by a human and which one is generated by a computer? the frequencies of all triplets are encoded in the thickness of branches humans usually try to fake randomness by alternating patterns adopted from ex. by Eamonn Keogh 56 28

VizTree The trick on the previous slide only works for discrete data, but time series are real valued. we will discuss how we can SAX up a time series to make it discrete during the next tutorial on time series mining Details 2 Zoom in Overview Details 1 Overview, zoom & filter, details on demand main principles of visualization Ben Shneiderman (c) Eamonn Keogh, eamonn@cs.ucr.edu 57 Visualization Summary Purpose of visualization in DM/KDD Visualization of data - Many methods 1D, 2D, 3D Visualization is possible in more than 3-D Visualization of data mining results Model visualization, model performance visualization Visualization of data mining processes Application Interactive and integrative data mining and visualization Time-series, symbolic, networks visualization One picture may worth 1000 words! What else did you get from this lecture? 58 29

Additional Slides 59 Visualization Software Free and Open-source Ggobi Xmdv Some techniques are available in WEKA and YALE as you have seen already Many more - see www.kdnuggets.com/software/visualization.html Matlab has also many nice integrated tools for viz. 60 30