Today s Class. High Dimensional Data & Dimensionality Reduc9on. Readings from Last Week: Readings from Last Week: Scien9fic Data.

Similar documents
ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION

Dimensionality Reduction: Principal Components Analysis

Finding Anomalies in Time- Series using Visual Correla/on for Interac/ve Root Cause Analysis

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

Data Visualization Techniques

3D Distance from a Point to a Triangle

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Multivariate data visualization using shadow

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Data Visualization Techniques

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

an introduction to VISUALIZING DATA by joel laumans

Diagrams and Graphs of Statistical Data

Manifold Learning Examples PCA, LLE and ISOMAP

Principal components analysis

Common factor analysis

The Scientific Data Mining Process

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Microsoft Business Intelligence Visualization Comparisons by Tool

DEVELOPMENT OF AN IMAGING SYSTEM FOR THE CHARACTERIZATION OF THE THORACIC AORTA.

CAS CS 565, Data Mining

Component Ordering in Independent Component Analysis Based on Data Power

Visualization and Feature Extraction, FLOW Spring School 2016 Prof. Dr. Tino Weinkauf. Flow Visualization. Image-Based Methods (integration-based)

Multivariate Normal Distribution

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

Common Core Unit Summary Grades 6 to 8

Visualization methods for patent data

Data Mining: Algorithms and Applications Matrix Math Review

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Data Visualization. BUS 230: Business and Economic Research and Communication

Environmental Remote Sensing GEOG 2021

Multi scale random field simulation program

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Principal Component Analysis

Figure 1. An embedded chart on a worksheet.

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Biometric Authentication using Online Signatures

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Going Big in Data Dimensionality:

Dynamic Visualization and Time

Big Data Text Mining and Visualization. Anton Heijs

Introduction to Principal Components and FactorAnalysis

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Pennsylvania System of School Assessment

Voronoi Treemaps in D3

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Data Mining: An Overview. David Madigan

Visualization Techniques in Data Mining

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

The ith principal component (PC) is the line that follows the eigenvector associated with the ith largest eigenvalue.

V6 AIRS Spectral Calibra2on

A Short Introduction to Computer Graphics

Subspace Analysis and Optimization for AAM Based Face Alignment

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Netzcope - A Tool to Display and Analyze Complex Networks

McAFEE IDENTITY. October 2011

Information Literacy Program

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

. Address the following issues in your solution:

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Chapter 7 Factor Analysis SPSS

Topographic Change Detection Using CloudCompare Version 1.0

Tutorial on Exploratory Data Analysis

How To Create A Data Visualization

The Value of Visualization 2

Anatomic Surface Reconstruc1on from Sampled Point Cloud Data and Prior Models

White Paper. Data Visualization Techniques. From Basics to Big Data With SAS Visual Analytics

Visualization Process. Alark Joshi

Astronomy of Planets

TOOLS FOR 3D-OBJECT RETRIEVAL: KARHUNEN-LOEVE TRANSFORM AND SPHERICAL HARMONICS

Visualization. For Novices. ( Ted Hall ) University of Michigan 3D Lab Digital Media Commons, Library

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

Choosing Colors for Data Visualization Maureen Stone January 17, 2006

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

A Demonstration of Hierarchical Clustering

How To Cluster

CS 325 Computer Graphics

Spatio-Temporal Mapping -A Technique for Overview Visualization of Time-Series Datasets-

Pro/ENGINEER Wildfire 4.0 Basic Design

Hierarchical Data Visualization. Ai Nakatani IAT 814 February 21, 2007

Expert Color Choices for Presenting Data

Cork Education and Training Board. Programme Module for. 3 Dimensional Computer Graphics. Leading to. Level 5 FETAC

High-Dimensional Data Visualization by PCA and LDA

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho

COMP Visualization. Lecture 11 Interacting with Visualizations

A. OPENING POINT CLOUDS. (Notepad++ Text editor) (Cloud Compare Point cloud and mesh editor) (MeshLab Point cloud and mesh editor)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC Politecnico di Milano)

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

P6 Analytics Reference Manual

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)

Microsoft Excel 2010 Charts and Graphs

Visibility optimization for data visualization: A Survey of Issues and Techniques

Scatter Plot, Correlation, and Regression on the TI-83/84

Business Intelligence and Process Modelling

Transcription:

High Dimensional Data & Dimensionality Reduc9on Readings from Last Week: Readings from Last Week: QSplat: A Mul9resolu9on Point Rendering System for Large Meshes, Rusinkiewicz & Levoy, SIGGRAPH 2000 Tree visualiza9on with Tree maps: A 2 d space filling approach, Ben Shneiderman, 1991 Scien9fic Data For many 3D spa9al loca9ons during an experiment or simula9on Time varying temperature, velocity, pressure, humidity, etc. h^p://www.ncnr.nist.gov/dave/screenshots.html 1

Misc. Personal Data Gene Expression Height, weight, eye color, phone number, address, IQ, age, cholesterol score, grade in Data Structures, etc. Example Hypotheses: Expression level for hundreds of genes For many trials (different individuals/condi9ons) Discover correla9ons: genes that commonly work together or in opposi9on There is no correla9on between height & phone number There is a posi9ve correla9on between the ages of spouses Sca^er plot: Look at 2 dimensions at a 9me Units on each axis are different! h^p://www.mzandee.net/~zandee/sta9s9ek/stat online/chapter4/pearson.html Mathematicians and physicists talk about more than three spatial dimensions Not us Simply ask How many numbers does it take to describe a data point? This is the dimensionality of the data Feature Vectors o Compute properties at each point in a data set and store them all in a long vector associated with each point Examples o SIFT descriptors (128-D) h^p://www.imbb.forth.gr/people/poirazi/researchep.html h^p://www.bioss.ac.uk/~dirk/essays/geneexpression/bayes_net.html Time o A particle moving in 3D can be described by its location (X,Y,Z) at a particular time (t) o To describe it, we can specify a vector (X,Y,Z,t) o 4D Color o A colored point can be specified by its location (X,Y,Z) and its color (R,G,B) o To describe it, we can specify a vector (X,Y,Z,R,G,B) o 6D Computational o Nearest neighbor searches are very expensive o People have developed approximate nearest neighbor algorithms Visualization o How can we display/see more than colored 3D points? 2

K Means Clustering Simply throw away most of the dimensions and keep some that you can visualize This is equivalent to projecting the data onto an axis aligned hyperplane Important qualities of the data may not be visible For a set of 2D/3D/nD points: 1. Choose k, how many clusters you want (oracle) 2. Select k points from your data at random as initial team representative 3. Every other point determines which team representative it is closest to and joins that team 4. The team averages the positions of all members, this is the team s new representative 5. Repeat 3-5 until change < threshold Advanced Debugging Debugging Level 1: Remove syntax errors in compila9on Debugging Level 2: Produces an answer Debugging Level 3: Matches the output provided by the instructor Debugging Level 4: Hypothesize system behavior Develop & run experiments Collect data & analyze results Validate (or repeat process) Applies to sohware development, and other sciences too! A visualization technique used to plot high dimensional data Each of the dimensions corresponds to a vertical axis Each data point is displayed as a series of lines. 3

Designing Visualiza9ons using How many dimensions (ver9cal axes)? In what order should the axes appear? Which direc9on should each axis run (up or down?) How many data points (lines)? How could color, line thickness, etc. be used to highlight pa^erns in the data? Data explora9on or debugging tool? (Iterate) Or final visualiza9on? // Set up a 2D scene, add an XY chart to it vtkcontextview* view = vtkcontextview::new(); vtkchartparallelcoordinates* chart = vtkchartparallelcoordinates::new(); view->getscene()->additem(chart); vtktable* table = vtktable::new(); vtkfloatarray* array1 = vtkfloatarray::new(); array1->setname("field 1"); table->addcolumn(array1);.. Repeat for each dimension/axis // Generate 4D data points [i, cos(i), sin(i), tan(i)] int numpoints = 200; table->setnumberofrows(numpoints); for (int i = 0; i < numpoints; ++i) { table->setvalue(i, 0, i); table->setvalue(i, 1, cos(i)); table->setvalue(i, 2, sin(i)); table->setvalue(i, 3, tan(i)); } chart->getplot(0)->setinput(table); view->getrenderwindow()->setmultisamples(0); view->getinteractor()->initialize(); view->getinteractor()->start(); Interaction really helps o Highlighting o Filtering o Roll-over detail Takes high dimensional data, where some/many axes are correlated h^p://cnx.org/content/m11461/latest/ Reduce to a smaller set of dimensions that are not correlated Dimensions/axes form a new basis/coordinate system Each example from the original data can be defined as a linear combina9on of the new axes Essen9ally we want to find the internal structure that best explains the variance in the data 4

PCA Example: Material Reflectance Model How many eigenvalues/ dimensions are necessary to represent this data? Matusik, Pfister, Brand, & McMillan, A Data Driven Reflectance Model SIGGRAPH 2002 PCA Example: Face Modeling & Transfer Face Transfer with Mul9linear Models, Vlasic, Brand, Pfister, & Popovic, SIGGRAPH 2005 Find the important directions in the data If we could fix the data and rotate the coordinate axes before slicing, how would we rotate them? Use PCA o Eigenvectors of the scatter matrix o Not important, just need to know what it does // Create the data vtkdoublearray* xarray = xarray->setnumberofcomponents(1); xarray->setname("x"); // Load the data into a table vtktable* datasettable = vtktable::new(); datasettable->addcolumn(xarray); datasettable->addcolumn(yarray); vtkdoublearray* yarray = yarray->setnumberofcomponents(1); yarray->setname("y"); vtkpcastatistics* pcastatistics = vtkpcastatistics::new(); for(vtkidtype i = 0; i < polydata->getnumberofpoints(); i++) { double p[3]; polydata->getpoint(i,p); xarray->insertnextvalue(p[0]); yarray->insertnextvalue(p[1]); } pcastatistics->setcolumnstatus("x", 1 ); pcastatistics>setcolumnstatus("y", 1 ); pcastatistics->requestselectedcolumns(); pcastatistics>setderiveoption(true); pcastatistics->update(); pcastatistics->setinput( vtkstatisticsalgorithm::input_data, datasettable ); 5

// Get eigenvalues vtksmartpointer<vtkdoublearray> eigenvalues = vtksmartpointer<vtkdoublearray>::new(); pcastatistics>geteigenvalues(eigenvalues); for(vtkidtype i = 0; i < eigenvalues->getnumberoftuples(); i++) { std::cout << "Eigenvalue " << i << " = " << eigenvalues->getvalue(i) << std::endl; } // Get eigenvectors vtkdoublearray* eigenvectors = pcastatistics->geteigenvectors(eigenvectors); vtkdoublearray* evec1 = pcastatistics->geteigenvector(0, evec1); std::cout << evec1: << evec1->getvalue(0) << << evec1->getvalue(1); vtkdoublearray* evec2 = pcastatistics->geteigenvector(1, evec2); std::cout << evec2: << evec2->getvalue(0) << << evec2->getvalue(1); Readings for Next Week: Ben Schneiderman, The eyes have it: A task by data type taxonomy for informa9on visualiza9on, Visual Languages, 1996 Visual lnforma7on Seeking Mantra: overview first zoom and filter then details on demand Readings for Next Week: Colin Ware, "Quan9ta9ve Texton Sequences for Legible Bivariate Maps," IEEE Transac7ons on Visualiza7on and Computer Graphics, 2009. 6

Studio 2 Setup? Mul9ple projec9on surfaces Fixed & mobile Varied size/purpose/resolu9on Visualiza9on Interac9on Central control sta9on Individual input: iclicker, smart phone, etc. Audio & MaxMSP G.W. T.S. Microfluidic Visualiza9on W.F. Audio A.K. Anima$on/Video Anima9on of Erosion C.S. Graph Layout A.S. C.O. Flight Simulator F.A. Network Mul$ple Visualiza9on Projec$on A.G. Surfaces? Naviga9ng a Maze E.S. Visualizing HMMs E.G. Input & Interac$on D.S. Mul9 Dimensional Popula9on Data J.M. Game A.D. Projec$on on Moveable Screens I Spy M.D. Final Project Guidelines Read & summarize 2 3 papers related to your project & incorporate/extend components of this work Save early itera9ons of the visualiza9on (and any bloopers ), to show the progression of your visualiza9on design and data explora9on Tues Nov 16 th or Fri Nov 19 th 10 11:50am? pre review of final projects in Visual Design class 7