Multivariate Data Visualization



Similar documents
Data Visualization - A Very Rough Guide

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

3D Interactive Information Visualization: Guidelines from experience and analysis of applications

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

Big Data: Rethinking Text Visualization

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Exploratory Data Analysis with MATLAB

Visualization methods for patent data

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Interactive Data Mining and Visualization

DATA VISUALIZATION. Lecture 1 Introduction. Lin Lu llu@sdu.edu.cn

Clustering & Visualization

HT2015: SC4 Statistical Data Mining and Machine Learning

The Value of Visualization 2

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Exploratory Data Analysis for Ecological Modelling and Decision Support

Outline. Fundamentals. Rendering (of 3D data) Data mappings. Evaluation Interaction

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Analysis Tools and Libraries for BigData

DATA MINING TECHNIQUES AND APPLICATIONS

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

Tutorial 2 Online and offline Ship Visualization tool Table of Contents

Data Mining Introduction

Big Data Analytics. Tools and Techniques

Monitoring chemical processes for early fault detection using multivariate data analysis methods

DHL Data Mining Project. Customer Segmentation with Clustering

Integration of Cluster Analysis and Visualization Techniques for Visual Data Analysis

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Machine Learning with MATLAB David Willingham Application Engineer

Data Mining mit der JMSL Numerical Library for Java Applications

COC131 Data Mining - Clustering

Data Mining: STATISTICA

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Principles of Data Mining by Hand&Mannila&Smyth

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo Database And Data Mining Research Group

Application of Data Visualization

Visualization of Software

Geovisual Analytics Exploring and analyzing large spatial and multivariate data. Prof Mikael Jern & Civ IngTobias Åström.

Model Deployment. Dr. Saed Sayad. University of Toronto

ICT Perspectives on Big Data: Well Sorted Materials

Computer Science Electives and Clusters

Data Visualization Principles: Interaction, Filtering, Aggregation

Interactive Visual Data Analysis in the Times of Big Data

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Nagarjuna College Of

An In-Depth Look at In-Memory Predictive Analytics for Developers

Confidently Anticipate and Drive Better Business Outcomes

Dynamic Data in terms of Data Mining Streams

Visualization of large data sets using MDS combined with LVQ.

The Scientific Data Mining Process

CITY UNIVERSITY OF HONG KONG 香 港 城 市 大 學. Self-Organizing Map: Visualization and Data Handling 自 組 織 神 經 網 絡 : 可 視 化 和 數 據 處 理

Advanced Big Data Analytics with R and Hadoop

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Data mining as a tool of revealing the hidden connection of the plant

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

from Larson Text By Susan Miertschin

SIMCA 14 MASTER YOUR DATA SIMCA THE STANDARD IN MULTIVARIATE DATA ANALYSIS

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

High-dimensional labeled data analysis with Gabriel graphs

Customer and Business Analytic

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Using Data Mining for Mobile Communication Clustering and Characterization

Today's Topics. COMP 388/441: Human-Computer Interaction. simple 2D plotting. 1D techniques. Ancient plotting techniques. Data Visualization:

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining Part 5. Prediction

VisIVO, a VO-Enabled tool for Scientific Visualization and Data Analysis: Overview and Demo

Azure Machine Learning, SQL Data Mining and R

Scalable Developments for Big Data Analytics in Remote Sensing

Chapter ML:XI (continued)

Chapter 3 - Multidimensional Information Visualization II

Spatial Data Analysis

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Introduction. A. Bellaachia Page: 1

DEA implementation and clustering analysis using the K-Means algorithm

What is Data mining?

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

ADVANCED VISUALIZATION

ADVANCED DATA VISUALIZATION

Transcription:

Multivariate Data Visualization VOTech/Universities of Leeds & Edinburgh Richard Holbrey

Outline Focus on data exploration Joined VOTech in June, so...... informal introduction Examine some definitions and assumptions Describe visual & data mining approaches in general Open discussion Some demos of simple vis tools Feedback

Definition: Visualization Born as a computing discipline in 1987 with publication of NSF Report Gurus tell us: use of computer-supported, interactive, visual representations of data to amplify cognition (Card, McKinlay, Shneiderman) Emphasises human brain as pattern recognition tool...but can only accept so much input eg. active region of the eye update rate (c. 25Hz eye, 1000Hz touch)

Building mental maps HCI FilmFinder tool (Shneiderman et al) Web maps www.inxight.com

Data mining Cynical view Statistics with better marketing More mainstream Practical way of dealing with large data sets employing computing power to handle complexity (n^?)

In practice Classification rules/decision tree eg. if a and b then x Association/Dissociation rules eg. loyalty cards multiple, categorical data items Numeric data clustering regression neural nets ( black box ) Statistical modelling needed where uncertainty occurs

Straw poll I do pattern matching by eye likely many spurious correlations otherwise Some tools popular with astronomers IDL in the UK Mirage cf Weka cf Topcat (RM & MA) need to handle various formats manipulate/add columns to table data overlay sources and models graphically SRM doc need for models, function setting visualization as an endpoint

My assumptions Background bone density study and real-time VR surgical simulation Working in 3D or higher difficult 6dof, focus, context 2D much more obvious.....but can t lose 3D Sound? Haptics?

Some example techniques

Projections from nd 2D One of the oldest vis tricks Visual techniques Parallel coordinates, dendrograms, MDS, stacking Projections PCA, Kohonen maps, Sammon maps Problem: tend to lose relationship with original variates Class-preserving CViz approach

Parallel coordinates Xmdvtool originally limited to 20 columns of data practical limit Brushing and clustering to compensate for screen real-estate

Class-projection CViz aim is to tour through the data, through a series of 2D projections only a sample of the data is shown Authors tested 26 variables with some difficulty

Clustering-centred HCE: Hierarchical Clustering Explorer linked trees and display tools Copes well with ~10 variates

R (http://cran.r-project.org) Large statistically-oriented library with many contributed packages Powerful vectorized scripting can do almost anything Scales to large number of variates Command-line oriented: graphics, but noninteractive in the usual sense Can run under CEA!

Work at Leeds - Hypercell Extends hyperslice method (2D plots of all variates) Each attribute has a range of interest and a focus value These values can be dynamically changed

Re-thinking Joint study Leeds/Bob Mann SuperCOSMOS archive..made us rethink some ideas! Only looked at subset of 57 attributes and 1000 observations Analytical task: Calibration of SSA data Look for expected and unexpected correlations

A result.. Subspace (l, b, ebmv) with colouring by meanclass attribute An outlier is evident

..and a plane in 3D Subspace defined by (classmag(b-r1), gcormag(b-r1), scormag(b-r1))

Also, gviz Developed at Leeds Grid-based collaborative visualization..but not tied to grid Can push geometry or data In principle, client can be any capable renderer (eg. Iris explorer, Matlab, VTK,...)

gviz Allowing steering or collaboration

Discuss... These and other tools? 3D, 4D, hyperslice? Black-box techniques? Intuitive nature of interface?

Some demos

Conclusions? Visualization demands high level of interaction (and good HCI) Interactivity on AG does tied to specific applications could we make use of pipe/svg model? 3D can be challenging, but can t afford to lose focus, context, manipulation Using R within AG demonstrated several problems passing files, security, interactivity Gap between moving from 100s of variates to 10s can we prioritize data? Preliminary survey of data mining tools: http://wiki.eurovotech.org/bin/view/votech/ds6preliminary