Interactive Visual Data Analysis in the Times of Big Data



Similar documents
Information Visualization WS 2013/14 11 Visual Analytics

Big Data a threat or a chance?

Big Data in Pictures: Data Visualization

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

UniGR Workshop: Big Data «The challenge of visualizing big data»

Information Visualization and Visual Analytics

Geovisual Analytics Exploring and analyzing large spatial and multivariate data. Prof Mikael Jern & Civ IngTobias Åström.

Introduction to Data Mining

Statistics for BIG data

Visualization Techniques in Data Mining

IC05 Introduction on Networks &Visualization Nov

Introduction of Information Visualization and Visual Analytics. Chapter 2. Introduction and Motivation

Fundamentals of Visualizing Biological Data

ICT Perspectives on Big Data: Well Sorted Materials

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

Information Management course

Universidade de Aveiro Departamento de Electrónica, Telecomunicações e Informática. Introduction to Information Visualization

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

Geographically weighted visualization interactive graphics for scale-varying exploratory analysis

UNDERSTANDING HUMAN FACTORS OF BIG DATA VISUALIZATION

TravelOAC: development of travel geodemographic classifications for England and Wales based on open data

MS1b Statistical Data Mining

Sanjeev Kumar. contribute

How To Make Visual Analytics With Big Data Visual

2 Visual Analytics. 2.1 Application of Visual Analytics

Introduction. A. Bellaachia Page: 1

An Interactive Web Based Spatio-Temporal Visualization System

Data Visualization - A Very Rough Guide

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Introduction to Information Visualization

Spatio-Temporal Networks:

Exploratory Data Analysis for Ecological Modelling and Decision Support

CS171 Visualization. The Visualization Alphabet: Marks and Channels. Alexander Lex [xkcd]

The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. Ben Shneiderman, 1996

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Comparison of K-means and Backpropagation Data Mining Algorithms

ADVANCED VISUALIZATION

Data, Measurements, Features

CS Data Science and Visualization Spring 2016

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Challenges for Data Driven Systems

BS COMPUTER SCIENCE BEST THESIS

Visual Exploration of Geographic Time Series using Interactive Maps

Using Visual Analytics to Enhance Data Exploration and Knowledge Discovery in Financial Systemic Risk Analysis: The Multivariate Density Estimator

Project Participants

Tableau's data visualization software is provided through the Tableau for Teaching program.

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Big Data Hope or Hype?

Visualization methods for patent data

Information Visualisation and Visual Analytics for Governance and Policy Modelling

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Multivariate Data Visualization

Learning Analytics & Educational Data Mining FP7 Objective ICT b

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Interactive Data Mining and Visualization

CLUSTER ANALYSIS WITH R

A Proposal for the use of Artificial Intelligence in Spend-Analytics

Big Data Analytics. Chances and Challenges. Volker Markl

How To Create A Data Visualization

Investigating and Reflecting on the Integration of Automatic Data Analysis and Visualization in Knowledge Discovery

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

JustClust User Manual

A Short Introduction on Data Visualization. Guoning Chen

TIES443. Lecture 9: Visualization. Lecture 9. Course webpage: November 17, 2006

Visualisatie BMT. Introduction, visualization, visualization pipeline. Arjan Kok Huub van de Wetering

Predictive Analytics: Extracts from Red Olive foundational course

NStreamAware: Real-Time Visual Analytics for Data Streams to Enhance Situational Awareness

Business Intelligence. Data Mining and Optimization for Decision Making

High Performance Computing

Exploration and Visualization of Post-Market Data

Visual Analytics. Daniel A. Keim, Florian Mansmann, Andreas Stoffel, Hartmut Ziegler University of Konstanz, Germany

Transcription:

Interactive Visual Data Analysis in the Times of Big Data Cagatay Turkay * gicentre, City University London Who? Lecturer (Asst. Prof.) in Applied Data Science Started December 2013 @ the gicentre (gicentre.net) PhD @ VisGroup at Univ. of Bergen, Norway with Helwig Hauser MSc @ CGLab at Sabanci University, Istanbul, Turkey with Selim Balcisoy 1

gicentre 6 academics 3 postdocs 5 PhDs Research to date Integrating Computational Tools in Interactive Visual Analysis Methods Perceptually Optimized Visualization Information Theoretical Approach to Crowd Simulation 2

Interactive Visual Data Analysis in the Times of Big Data Data supported science Data analysis in almost all scientific fields Biology, medicine, astronomy, psychology, Data driven science Research in several fields Visualization Data Mining Machine Learning Statistics 3

The times of Big Data http://www.ibmbigdatahub.com/infographic/four-vs-big-data More V s added Volume Velocity Variety Veracity Visualisation Value big data on Google Trends, as of today http://www.google.com/trends/explore#q=big%20data 4

Getting the best of Big data Limited value in size! Real value in complexity Heterogeneous Several modalities Many variables Spatially & Temporally varying Visual data analysis can help! 5

Visualization? Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. [Tamara Munzner, 2014] The use of computer-generated, interactive, visual representations of data to amplify cognition [Card, Mackinlay, & Shneiderman 1999] Visualization a mature field 6

Interactive Visual Analysis (a.k.a. Visual Analytics) the science of analytical reasoning facilitated by interactive visual interfaces. [Thomas and Cook, 2005] combining visualization, human factors and data analysis. [Keim et al., 2006] Combine the best of two worlds: human capabilities and computing power [Seo & Shneiderman, 2002] [Tableau Software] How visual analysis can help? Ease of interpretation Ease of communication Flexibly varying & relating multiple aspects Compare multiple computational outputs & communicate uncertainties Seamless integration of computation 7

1- Flexibly vary & relate multiple aspects Basics -- Multiple linked views Several visualizations Linking + brushing Combining selections Commercial success ComVis [Tableau Software] 8

Example : Analyzing Crime Statistics through Selection Combinations and Data Derivation Analyzing Crime Statistics Crime statistics over Birmingham Crime records Crime types Offender location Crime location Want to identify.. Hot spots Groups Trends / patterns 9

Crime Locations Offender Locations Difference in Location (derived) Offence types Day of week Other details 10

A key aspect Iterative process Learn through the process Evaluate & re-perform Hauser, 2013 Turkay, 2012 Sense making loop by Pirolli and Card, 2005 2- Compare multiple computational outputs & communicate uncertainties 11

Issues in computational analysis Computational tools have helped Several tasks: find structures, summarize, trends Rich set of methods, PCA, LDA, Clustering Have issues, unfortunately Not flexible Limited interaction Reliability (noisy data, assumptions) Visual analysis to: Compare multiple computational outputs & communicate uncertainties Example : Comparing and Characterizing Clustering Results in Cancer Subtype Analysis [ Turkay, Cagatay, et al.., Characterizing Cancer Subtypes Using Dual Analysis in Caleydo StratomeX IEEE computer graphics and applications 34.2 (2014): 38-47.] 12

Items 10/20/2014 Cluster analysis Cluster is a group of similar entities Several algorithms important to compare Challenging to evaluate/interpret Dimensions (variables) Thanks to Alexander Lex for the slides Multiple Clusterings (& Datasets) Cluster A1 C B1 Cluster A2 C B2 Cluster A3 Clustering 1 e.g., k-means Clustering 2, e.g., hierarchical 26 Thanks to Alexander Lex for the slides 13

Multiple Clusterings (& Datasets) Cluster A1 C B1 Cluster A2 C B2 Cluster A3 Clustering 1 e.g., k-means Clustering 2, e.g., hierarchical 27 Thanks to Alexander Lex for the slides Multiple Clusterings (& Datasets) Cluster A1 C B1 Dep. C1 Cluster A2 C B2 Dep. C2 Cluster A3 Clustering 1, e.g., k-means Clustering 2, Meta Data, e.g., hierarchical e.g. clinical data 28 Thanks to Alexander Lex for the slides 14

Gene Overlaps?? Patients (samples) 10/20/2014 Cancer types are not homogeneous, i.e., subtypes Subtypes are identified by clustering datasets, e.g., based on an expression pattern a mutation status a copy number alteration a combination of these Header / Summary of whole Stratification http://caleydo.org/ Genes Candidate Subtype / Heat Map Thanks to Alexander Lex for the slides Multiple Clustering Results Sample Overlaps Many shared Patients Clustering 1 Clustering 2 15

Integrated detail visualisations Only the member samples visualized Communicating (Un)Certainty Only the member samples used in computations Genes significantly different for a cluster 16

Some observations Compare many results to get support and increase trust Identify how and why results change Communicate uncertainties Applied to many other problems Prediction Simulations Booshehrian et al., 2012 Waser, 2012 17

3- Seamless integration of computation Seamless integration Best of human + computer the ultimate goal of VA! Computational algorithms implicit Embedded within interaction Basic interaction methods, e.g., select, pan, zoom, etc. Muhlbacher, 2014 (VAST) 18

Example : Dynamically generated visual (statistical) summaries for geographical data analysis [ Turkay, Cagatay, et al., Attribute Signatures: Dynamic Visual Summaries for Analyzing Multivariate Geographical Data, IEEE TVCG (InfoVis 2014)] Visualization to understand how things change with geography 19

ONE VARIABLE MULTIPLE VARIABLES? visualizations to look at many variables and computational support to generate comparative summaries for the visualizations 20

Example: Analysis of UK OA data UK Census of Population in 2001 and 2011 for the 181,000 Output Areas (OA) for 41 indicators 41 X 181,000 Integrated summary computation Variable comparison baseline (e.g., country or city avg.) Above baseline Below baseline Variation axis Each value is computed only for the selection Dynamic computations to compare 21

Along the Southwest England coast An interactive transect through London 22

More examples of integrations Multi-dimensional projection Regression analysis Janicke et al., 2008 Muhlbacher, T., Piringer, 2013 Finding structures Oliver et al., 2010 Time to wrap-up! 23

Lessons learned All starts with problems and related data Need to earn users trust! Visual analysis as a means to increase data s value Enhanced analysis through informed use of computation Interactive visual methods improve reliability and interpretability Tight integration enables quick hypothesis prototyping Important to communicate the certainty of the findings Looking into the future Understanding the real value of data Scalable analysis processes Improved methods for visual guidance Enable interactive response in seamless integrations Adapt to changing information characters & problems Dynamic data, e.g., data streams Prediction 24

Thank you! Helwig Hauser Peter Filzmoser gicentre folks: Aidan Slingsby, Jason Dykes, Jo Wood Arvid Lundervold, Astri Lundervold, Nathalie Reuter, the Caleydo Team 25