Data Visualization. Nils Gehlenborg (nils@hms.harvard.edu)



Similar documents
Course: Visual Analytics of largescale biological data. Kay Nieselt Center for Bioinformatics Tübingen University of Tübingen

Visualizing Networks: Cytoscape. Prat Thiru

Universidade de Aveiro Departamento de Electrónica, Telecomunicações e Informática. Introduction to Information Visualization

Introduction to Information Visualization

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

COMP Visualization. Lecture 11 Interacting with Visualizations

Interactive Visual Data Analysis in the Times of Big Data

Dynamic Visualization and Time

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

CS171 Visualization. The Visualization Alphabet: Marks and Channels. Alexander Lex [xkcd]

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

Visualization Software

Outline. Fundamentals. Rendering (of 3D data) Data mappings. Evaluation Interaction

<Insert Picture Here> Web 2.0 Data Visualization with JSF. Juan Camilo Ruiz Senior Product Manager Oracle Development Tools

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

an introduction to VISUALIZING DATA by joel laumans

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

Methods for network visualization and gene enrichment analysis July 17, Jeremy Miller Scientist I jeremym@alleninstitute.org

Big Data in Pictures: Data Visualization

MultiExperiment Viewer Quickstart Guide

Data Visualization VINH PHAN AW1 06/01/2014

Data Visualization. Scientific Principles, Design Choices and Implementation in LabKey. Cory Nathe Software Engineer, LabKey

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Visualizing Repertory Grid Data for Formative Assessment

Data Visualization Frameworks: D3.js vs. Flot vs. Highcharts by Igor Zalutsky, JavaScript Developer at Altoros

Interactive Data Mining and Visualization

Tutorial for proteome data analysis using the Perseus software platform

Today's Topics. COMP 388/441: Human-Computer Interaction. simple 2D plotting. 1D techniques. Ancient plotting techniques. Data Visualization:

Data, Measurements, Features

A Short Introduction on Data Visualization. Guoning Chen

Innovative Information Visualization of Electronic Health Record Data: a Systematic Review

How To Create A Data Visualization

CS Data Science and Visualization Spring 2016

4/25/2016 C. M. Boyd, Practical Data Visualization with JavaScript Talk Handout

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

On History of Information Visualization

Data Visualization - A Very Rough Guide

Web-Based Enterprise Data Visualization a 3D Approach. Oleg Kachirski, Black and Veatch

IC05 Introduction on Networks &Visualization Nov

Guide for Data Visualization and Analysis using ACSN

Web-based Information Visualization Using JavaScript. Selin Guldamlasioglu

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Cloud-based Log Analysis and Visualization

Introduction to D3.js Interactive Data Visualization in the Web Browser

JustClust User Manual

COSC 6344 Visualization

Multi-Dimensional Data Visualization. Slides courtesy of Chris North

A Hybrid Visualization System for Molecular Models

Visualizing Historical Agricultural Data: The Current State of the Art Irwin Anolik (USDA National Agricultural Statistics Service)

Choosing Colors for Data Visualization Maureen Stone January 17, 2006

DATA VISUALIZATION. Lecture 1 Introduction. Lin Lu llu@sdu.edu.cn

Create Cool Lumira Visualization Extensions with SAP Web IDE Dong Pan SAP PM and RIG Analytics Henry Kam Senior Product Manager, Developer Ecosystem

Voronoi Treemaps in D3

UniGR Workshop: Big Data «The challenge of visualizing big data»

Data Mining mit der JMSL Numerical Library for Java Applications

Visualizing the Top 400 Universities

Overview of InfoVis. Exercise. Get out pencil and paper. CS Information Visualization Aug. 19, 2015 John Stasko. Fall 2015 CS

Information Visualization Multivariate Data Visualization Krešimir Matković

TIES443. Lecture 9: Visualization. Lecture 9. Course webpage: November 17, 2006

Visualizing Data: Scalable Interactivity

Visualization of Software

JavaFX Session Agenda

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

BusinessObjects Enterprise InfoView User's Guide

BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou

Information Visualization WS 2013/14 11 Visual Analytics

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

Advanced analytics at your hands

Visualization. For Novices. ( Ted Hall ) University of Michigan 3D Lab Digital Media Commons, Library

HPC & Visualization. Visualization and High-Performance Computing

A Collaborative Approach to Building Personal Knowledge Networks or How to Build a Knowledge Advantage Machine?

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Big Data Visualization for Genomics. Luca Vezzadini Kairos3D

Dr Alexander Henzing

Sisense. Product Highlights.

Fundamentals of Visualizing Biological Data

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

Visualization in 4D Construction Management Software: A Review of Standards and Guidelines

Expert Color Choices for Presenting Data

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

NakeDB: Database Schema Visualization

A Tutorial on dynamic networks. By Clement Levallois, Erasmus University Rotterdam

Hierarchical Data Visualization

Principles of Information Visualization Tutorial Part 1 Design Principles. Prof Jessie Kennedy Institute for Informatics & Digital Innovation

Transcription:

Data Visualization Nils Gehlenborg (nils@hms.harvard.edu) Center for Biomedical Informatics / Harvard Medical School Cancer Program / Broad Institute of MIT and Harvard ISMB/ECCB 2011

http://www.biovis.net Flyers at ISCB booth!

A good sketch is better than a long speech. Napoleon Bonaparte

Minard 1869 Napoleon s March on Moscow 4

I believe when I see it. Unknown

Anscombe 1973, The American Statistician Anscombe s Quartet mean(x) = 9, var(x) = 11, mean(y) = 7.5, var(y) = 4.12, cor(x,y) = 0.816, linear regression line Y = 3 + 0.5*X 6

Anscombe 1973, The American Statistician Anscombe s Quartet 7

Exploration: Hypothesis Generation trends gaps outliers clusters - A large data set is given and the goal is to learn something about it. - Visualization is employed to perform pattern detection using the human visual system. - The goal is to generate hypotheses that can be tested with statistical methods or follow-up experiments. 8

Visualization Use Cases Presentation Confirmation Exploration 9

Definition The use of computer-supported, interactive, visual representations of data to amplify cognition. Stu Card, Jock Mackinlay & Ben Shneiderman Computer-based visualization systems provide visual representations of datasets intended to help people carry out some task more effectively. Tamara Munzner 10

Tasks (Rule #1 - Know Your Users: User-centered design)

low-level analytical tasks: Amar et al. 2005, Proceedings of InfoVis 2005 Tasks in Gene Expression Analysis - Task 1: Determine expression level of a given gene in a given sample (retrieve value) - Visualization: Provide the whole profile as context for the particular measurement. - Task 2: Determine the range of expression levels in a given profile and how much they vary across the profile (extrema, range, characterize distribution) - Present profile so that the range and distribution of expression levels can be evaluated efficiently. 12

Tasks in Gene Expression Analysis log expression ratio 0 Time (min) 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 0 7 70 77 0 0 7 70 77 log expression ratio Time (min) 0 0 0 7 70 77 log expression ratio 0 7 70 77 log expression ratio Time (min) Time (min) Time (min) 14

Visual Representation

slide adapted from Munzner 2011, Visualization Principles Data Types data tabular ordered categorical ordinal quantiative relational spatial apples oranges bananas small medium large 10 inches 13 inches 18.5 inches trees networks intrinsic position 16

Data Types data tabular ordered categorical ordinal quantiative relational spatial abstract 17

Marks: Geometric Primitives points lines areas 18

Visual Channels: Appearance of Marks size color shape hue saturation position angle texture 19

Munzner 2009, in Fundamentals of Computer Graphics (redrawn from Mackinlay 1986) Ranking of Encodings Data Type Best Encoding Worst 20

Rankings in Action Year 1 Year 2 A B C D 21

Rankings in Action Year 1 Year 2 27 27 18 18 9 9 0 A B C D 0 A B C D 22

Ranking of Encodings - How accurately can the data be read from the visualization? - How many classes can be distinguished? - Can the channels be separated from each other? - Which channels are processed preattentively? Principle of Importance Ordering (Mackinaly 1986): Encode more important information more effectively.

Interaction

Shneiderman 1996, in Proceedings IEEE Symposium on Visual Languages Information Seeking Mantra - In explorative settings the user is normally dealing with large amounts of data. - Impossible to grasp everything at once. - Solution: Make visualizations interactive to support the user in exploring subsets of the data at different resolutions. - Ben Shneiderman s Information Seeking Mantra: Overview first, zoom and filter, then details on demand. 25

Roberts 2007, Coordinated and Multiple Views in Exploratory Visualization Linked Views - beyond static views, multiple linked views - allow the user to have a dialog with the data - technique that allows for data exploration - interactive, multiple views of the data 26

Slide from Miriah Meyer @ Utah large pse outliers 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 images courtesy of Angela DePace and Charles Fowlkes

Slide from Miriah Meyer @ Utah large pse outliers 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 images courtesy of Angela DePace and Charles Fowlkes

Meyer et al. 2010, MulteeSum: A Tool for Comparative Spatial and Temporal Gene Expression Data

Implementation

Considerations Rendering: Render loop or event-driven? Platform: performance, deployment, UI support, libraries, plugin vs application Backend: local or remote, database or files, web service, HTTP/FTP, in memory, etc. Data summaries: precompute or compute on the fly? Renderer Renderer Backend Backend 31

Platforms and APIs Java: Java 2D, Java 3D, JOGL (OpenGL) JavaScript: SVG, HTML5 Canvas, WebGL (OpenGL light ) Flash + Actionscript: hardware accelerated rendering in Version 11 others: deployment? UI support? cross-platform compatibility? 32

Visualization Toolkits for the Web - Java applets: Processing, Prefuse* - Flash: Flare* - JavaScript - SVG: Google Chart Tools*, Flot*, ProtoVis/D3.js*, Raphael, TheJIT - HTML5 Canvas: Three.js, ProcessingJS - WebGL: Three.js, PhiloGL * indicates high-level visualization library 33

Take Home Message Carefully analyze the tasks that need to be supported. Make informed decisions about your visual encodings. Use interaction and multi-scale approaches to get a handle on the data size. Choose your platform wisely. Visualization is science, not art. 34

Acknowledgements: Slides & Ideas Miriah Meyer (University of Utah) Tamara Munzner (University of British Columbia) 35

Resources

Scientific and Information Visualization - Scientific Visualization ( scivis ) and Information Visualization ( infovis ) are very illdefined terms - Scientific Visualization is often used to describe visualization of data that is intrinsically spatial (such as medical imaging data, fluid flows or protein structures) - Information Visualization is typically used to describe visualization of abstract data (such as gene expression data or interaction networks) - there is plenty of overlap and the separation is quite arbitrary - both Scientific and Information Visualization are used to visualize scientific data 37

Recommended Books Information Visualization - Perception for Design Colin Ware, Morgan Kaufmann, 2004 Information Visualization - Using Vision to Think Stuart K Card, Jock D Mackinlay, Ben Shneiderman, Morgan Kaufmann, 1999 The Visual Display of Quantitative Information (2nd Edition) Edward R Tufte, Graphics Press, 2001 38

Recommended Books Fundamentals of Computer Graphics (3rd Edition) Peter Shirley, Steve Marschner, AK Peters Publishers, 2009 (in particular: Chapter 27 - Visualization, also as free PDF from Tamara Munzner s website) The Non-Designer s Design Book (3rd Edition) Robin Williams, Peachpit Press, 2008 39

Recommended Resources on Color A Field Guide to Digital Color Maureen C Stone, AK Peters Publishers, 2003 ColorBrewer 2.0 Cynthia Brewer, Mark Harrower, http://www.colorbrewer2.org VisCheck http://www.vischeck.com Color Oracle http://colororacle.cartography.ch 40

Recommended Journals Nature Methods Special Issue on Visualizing Biological Data http://www.nature.com/nmeth/journal/v7/n3s Nature Methods Points of View column by Bang Wong http://bang.clearscience.info/?p=546 IEEE Transactions on Visualization and Computer Graphics http://www.computer.org/portal/web/tvcg IEEE Computer Graphics and Applications http://www.computer.org/portal/web/cga/home 41

Recommended Meetings IEEE Symposium on Biological Data Visualization - BioVis http://www.biovis.net Workshop on Visualizing Biological Data - VIZBI http://www.vizbi.org IEEE VisWeek with InfoVis, Vis and VAST Conferences http://www.visweek.org 42

Gehlenborg et al. 2010, Nature Methods Tools for Interaction Network Visualization Name Cost Availability Description URL Stand-alone Arena 3D Free Win Mac Linux Visualization of biological multi-layer networks in 3D http://www.arena3d.org BiNA Free Win Mac Linux Exploration and interactive visualization of pathways http://www.bnplusplus.org/bina BioLayout Express 3D Free Win Mac Linux Generation and cluster analysis of networks with 2D/3D visualization http://www.biolayout.org BiologicalNetworks 2 Free Win Mac Linux Analysis suite; visualizes networks and heat map; maps abundance data http://www.biologicalnetworks.org Cytoscape Free Win Mac Linux Network analysis; extensive list of plug-ins for advanced visualization http://www.cytoscape.org GENeVis Free Win Mac Linux Network and pathway visualization; abundance data http://tinyurl.com/genevis Medusa Free Win Mac Linux Basic network visualization tool http://coot.embl.de/medusa NBrowse Free Win Mac Linux Network visualization software for heterogeneous interaction data http://www.gnetbrowse.org NAViGaTOR Free Win Mac Linux Visualization of large protein-protein interaction data sets; abundance data http://tinyurl.com/navigator1 Ondex Free Win Mac Linux Integrative workbench; large network visualizations; abundance data http://www.ondex.org Osprey Free Win Mac Linux Tool for visualization of interaction networks http://tinyurl.com/osprey1 Pajek Free Win Generic network visualization and analysis tool http://pajek.imfm.si ProViz Free Win Mac Linux Software for visualization and exploration of interaction networks http://tinyurl.com/proviz SpectralNET Free Win Network visualizations; scatter plots for dimensionality reduction methods http://tinyurl.com/spectralnet Tulip Free Win Mac Linux Generic visualization and analysis tool; extremely large networks; 3D support http://tulip.labri.fr/tulipdrupal VANTED Free Win Mac Linux Combined visualization of abundance data and pathways http://tinyurl.com/vanted yed Free Win Mac Linux Generic network visualization software; offers many layout algorithms. http://tinyurl.com/yedgraph Cytoscape Plug-ins BiNoM Free Win Mac Linux Extensive support for common systems biology network formats http://tinyurl.com/binom1 BioModules Free Win Mac Linux Detects modules in networks; maps abundance data onto nodes and modules http://tinyurl.com/biomodules Cerebral Free Win Mac Linux Biologically motivated layout algorithm; maps abundance data; clustering http://tinyurl.com/cerebral1 MCODE Free Win Mac Linux Network clustering algorithm; support for manual cluster refinement http://preview.tinyurl.com/mcode123 VistaClara Free Win Mac Linux Mapping of abundance data to nodes and heat strips ; provides heat map http://www.cytoscape.org/plugins Web-based Graphle Free Distributed client/server network exploration and visualization tool http://tinyurl.com/graphle Lichen Free Library for web-based visualization of network and abundance matrix data http://tinyurl.com/lichen1 MAGGIE Data Viewer Free Visualization of networks; abundance data in heat maps and profile plots http://maggie.systemsbiology.net STITCH 2 Free Construction and visualization of networks from a wide range of sources http://stitch.embl.de VisANT Free Win Mac Linux Analysis, mining and visualization of pathways and integrated omics data http://visant.bu.edu 43

Gehlenborg et al. 2010, Nature Methods Tools for Pathway Visualization Name Cost Availability Description URL Stand-alone BioTapestry Free Win Mac Linux Visualization of genetic regulatory networks, also with experimental data. http://www.biotapestry.org Caleydo Free Win Linux Interactive framework for pathway and expression data; 3D bucket view http://www.caleydo.org CellDesigner Free Win Mac Linux Drawing and simulation of pathways and models, supports SBGN http://www.celldesigner.org Edinburgh Pathway Editor Free Win Mac Linux Construction and visualization of pathway diagrams, supports SBGN http://tinyurl.com/edinburghpe GenMAPP 2 Free Win Pathway visualization and construction; abundance data http://www.genmapp.org IngenuityPathways $ Win Mac Linux Full analysis suite; network and pathway visualizations; abundance data. http://tinyurl.com/ingenuitypath JDesigner Free Win Drawing and simulation of pathways and models http://tinyurl.com/jdesigner KaPPA View Free Win Analysis and visualization of plant pathways and mapped abundance data http://tinyurl.com/kappa-view KEGG Atlas Free Win Mac Linux Visualization of abundance data on interactive KEGG pathways http://www.genome.jp/kegg MetaCore $ Win Mac Linux Pathway, network and omics data analysis and visualization suite http://www.genego.com PathVisio Free Win Mac Linux Visualization and editing pathways, supports mapping of omics data http://www.pathvisio.org VitaPad Free Win Mac Linux Editing of pathway diagrams, integration of abundance data http://tinyurl.com/vitapad Web-based ArrayXPath Free Mapping of abundance data to pathway visualizations http://tinyurl.com/arrayxpath GEPA Free Analysis suite; visualization of transcriptomics data on pathways maps http://tinyurl.com/gepat1 ipath Free Visualization and exploration of combined KEGG pathways http://pathways.embl.de MapMan Free Application that visualizes abundance data on metabolic pathways http://tinyurl.com/mapmanapp Omics Viewer Free Tool that maps abundance data to BioCyc pathway diagrams http://www.biocyc.org Pathway Explorer Free Visualization of abundance data on pathways http://tinyurl.com/pathwayexp PATIKA Free Extensive pathway visualization tool; good support for signaling pathways http://www.patika.org Payaologue Free Collaborative pathway annotation and visualization tool http://celldesigner.org/payao ProMeTra Free Maps abundance matrices of multiple omics data types on pathways http://tinyurl.com/prometra Reactome SkyPainter Free Visualization of overrepresented pathways and reactions from gene lists http://reactome.org WikiPathways Free Wiki-based, community-driven pathway curation and visualization tool http://www.wikipathways.org 44

Gehlenborg et al. 2010, Nature Methods Tools for Visualization of Multivariate Data Name Cost OS Description URL Stand-alone BicOverlapper Free Win Mac Linux Visualization of biclusters combined with profile plots and heat maps http://vis.usal.es/bicoverlapper/ BiGGEsTS Free Win Mac Linux Heat map-based bicluster visualization http://tinyurl.com/biggests Brain Explorer Free Win Mac Visualization of 3D transcription data in the central nervous system http://tinyurl.com/brainexplorer Caryoscope Free Win Mac Linux Abundance data mapped to chromosomal location http://tinyurl.com/caryoscope Data Matrix Viewer Free Win Mac Linux Simple profile plot visualization; supports Gaggle http://gaggle.systemsbiology.net EXPANDER Free Win Linux Heat maps, scatter plots and profile plots of cluster averages http://acgt.cs.tau.ac.il/expander GENESIS Free Win Mac Linux Analysis suite; offers several interactive visualizations http://genome.tugraz.at GeneSpring GX $ Win Mac Linux Analysis suite; interactive and linked visualizations; also networks http://tinyurl.com/genespring GeneVAnD Free Win Mac Linux Linked heat maps, dendrograms and 2D/3D scatter plots http://tinyurl.com/genevand geworkbench Free Win Mac Linux Modular suite; heat maps, dendrograms, profile and scatter plots http://tinyurl.com/geworkbench Hierarchical Clustering Explorer Free Win Linked heat map, profile and scatter plots; systematic exploration http://tinyurl.com/hcexplorer Java TreeView Free Win Mac Linux Linked heat maps, karyoscopes, sequence alignments, scatter plots http://jtreeview.sourceforge.net Mayday Free Win Mac Linux Modular suite; many linked visualizations; enhanced heat map113 http://tinyurl.com/maydaywp MultiExperiment Viewer Free Win Mac Linux Analysis suite; heat maps, dendrograms, profile and scatter plots http://www.tm4.org PointCloudXplore Free Win Mac Linux Visualization of 3D transcription data in Drosophila embryos http://tinyurl.com/pointcloudxplore Spotfire Functional Genomics $ Win Analysis suite; many linked visualizations and exploration tools http://spotfire.tibco.com TimeSearcher Free Win Exploration and analysis of time series; advanced profile plots http://tinyurl.com/timesearcher R/BioConductor Geneplotter Free Win Mac Linux Karyoscope-style plots and other visualizations http://www.bioconductor.org Web-based ExpressionProfiler Free Transcriptomics data analysis suite with basic visualizations http://tinyurl.com/exprespro GenePattern Free Modular analysis platform; several visualization modules available http://tinyurl.com/genepatt 45