Graphics - an Ace up a Statistician's Sleeve



Similar documents
On History of Information Visualization

CS171 Visualization. The Visualization Alphabet: Marks and Channels. Alexander Lex [xkcd]

GRAPHING DATA FOR DECISION-MAKING

The Value of Visualization 2

Visualization Software

Principles of Data Visualization

Information Visualization Multivariate Data Visualization Krešimir Matković

This file contains 2 years of our interlibrary loan transactions downloaded from ILLiad. 70,000+ rows, multiple fields = an ideal file for pivot

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

3D Interactive Information Visualization: Guidelines from experience and analysis of applications

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Outline. Milestones in the History of Data Visualization. Milestones: Conceptual Overview. Milestones: Project Goals

Innovative Information Visualization of Electronic Health Record Data: a Systematic Review

How To Create A Data Visualization

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Unresolved issues with the course, grades, or instructor, should be taken to the point of contact.

Examples of Data Representation using Tables, Graphs and Charts

MTH 140 Statistics Videos

Reflection and Refraction

Data visualisation. Statistics Methods (201209) Statistics Netherlands. The Hague/Heerlen, 2012

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin

Diagrams and Graphs of Statistical Data

RnavGraph: A visualization tool for navigating through high-dimensional data

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS

Graphical Representation of Multivariate Data

Common Mistakes in Data Presentation Stephen Few September 4, 2004

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Designing Information Displays. Overview

UNDERSTANDING THE TWO-WAY ANOVA

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)

Quantitative vs. Categorical Data: A Difference Worth Knowing Stephen Few April 2005

Institut für Mathematik

Specific Usage of Visual Data Analysis Techniques

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Numbers as pictures: Examples of data visualization from the Business Employment Dynamics program. October 2009

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki

Exploratory Data Analysis with MATLAB

Data Visualization. Introductions

Microsoft Business Intelligence Visualization Comparisons by Tool

Choosing a successful structure for your visualization

Visualization Quick Guide

Create a Poster Using Publisher

ABSORBENCY OF PAPER TOWELS

Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics

Good Scientific Visualization Practices + Python

Excel Tutorial. Bio 150B Excel Tutorial 1

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Fairfield Public Schools

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

MULTIPLE REGRESSION WITH CATEGORICAL DATA

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

TIES443. Lecture 9: Visualization. Lecture 9. Course webpage: November 17, 2006

Additional sources Compilation of sources:

Exploratory Data Analysis

an introduction to VISUALIZING DATA by joel laumans

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Part 1: Background - Graphing

GGobi : Interactive and dynamic

The importance of graphing the data: Anscombe s regression examples

Creating Bar Charts and Pie Charts Excel 2010 Tutorial (small revisions 1/20/14)

Mendelian Genetics in Drosophila

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Chapter 5 Analysis of variance SPSS Analysis of variance

CALCULATIONS & STATISTICS

Mathematics. Mathematical Practices

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Common Core Unit Summary Grades 6 to 8

Linear Models in STATA and ANOVA

Gage Studies for Continuous Data

Visualizing Categorical Data in ViSta

Data Exploration Data Visualization

Visualization of missing values using the R-package VIM

The Chi-Square Test. STAT E-50 Introduction to Statistics

Explorable Visual Analytics (EVA) Interactive Exploration of LEHD. Saman Amraii - Amir Yahyavi Carnegie Mellon University

Research Methods & Experimental Design

Modifying Colors and Symbols in ArcMap

Week 1. Exploratory Data Analysis

Gestation Period as a function of Lifespan

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Introduction to Geographical Data Visualization

Paper 2. Year 9 mathematics test. Calculator allowed. Remember: First name. Last name. Class. Date

Transcription:

Graphics - an Ace up a Statistician's Sleeve Heike Hofmann Bad graphics Beginning of Statistical Graphics Milestones in Graphics Interactive Graphics

BAD Graphics Guidelines for a bad graphic: (Howard Wainer) don t show much data show the data inaccurately Criteria for bad graphics: (Edward Tufte) Lie Factor size of effect in graphic / size of effect in data obfuscate the data

Lie Factor Increase in Mileage The Lie Factor (from Tufte, 1983, p.57) gif image by Clay Helberg, Pitfalls of Data Analysis This graph, from the NY Times, purports to show the mandated fuel economy standards set by the US Department of Transportation. The standard required an increase in mileage from 18 to 27.5, an increase of 53%. The magnitude of increase shown in the graph is 783%, for a whopping lie factor = (783/53) = 14.8!

BAD Graphics Goals for a bad graphic: (Howard Wainer) don t show much data show the data inaccurately obfuscate the data Criteria for bad graphics: (Edward Tufte) Lie Factor size of effect in graphic / size of effect in data Data-Ink Ratio data ink / total ink used in graphic

Worst Graphic ever... in print (Tufte) Age Structure of College Enrollment Art or Artifice? As a substitute for substance, one can try lots of color, 3D effects, or disguised redundancy. This graph uses all three techniques, to display just five numbers. Note the clever use of mirror-imaging -- the top series is just (100 - the bottom series) and the interesting use curved lines, front and back to avoid the appearance that there s a lot less here than meets the eye. Tufte (1983, p.118) says, This may well be the worst graphic ever to find its way into print.

Beginnings of Graphics

Beginnings of Statistical Graphics William Playfair (1759-1823) Scottish economist author of The Commercial and Political Atlas (1786) includes 44 charts: time series plots, one bar chart simple in design, yet data rich. Joseph Minard (1781-1870) Mathematician École Nationale des Ponts et Chaussées (ENPC) 1844-1870 draws maps and data flow graphs

2003 WNAR/IMS meeting Playfair: Price of Wheat Price of a quarter of wheat (28 pounds) from 1565 to 1821 in comparison to weekly wages with a time-line of reigns of different rulers

Minard: Napoleon's Russian Campaign 1812 6d Data on Army: geographic location size of army time temperature direction of movement

Overlaid Maps Cholera Outbreak in Central London September 1854 Dr John Snow plotted deaths by dots crosses for water pumps

Overlaid Maps Armoring Airplanes during WWII Abraham Wald challenged to add extra armor to airplanes based on pattern of bullet holes in returning aircrafts Wald determined where planes had been shot conclusion: put extra armor every place else!

Train schedule: Paris - Lyon Marey's Plot (1880) today's TGV

Modern Dark Ages (1900-1949) only few innovations, rise of "classical" statistics: distributions, hypothesis tests, parameter estimates,... Re-Birth of Statistical Graphics (1950-1974) John W. Tukey variety of new simple graphics: Exploratory Data Analysis Jean Jacques Bertin Semiologie Graphique: organize visual and perceptual elements of graphics http://viscog.beckman.uiuc.edu/djs_lab/demos.html Computer available

High Dimensional, High Interaction Graphics With increasing computer power processing of high dimensional data possible High-interaction graphics with new paradigms: selection, linked highlighting, brushing, logical zooming New Methods: for continuous variables Scatterplot Matrix, Grand Tour + Projection Pursuit, Parallel Coordinate Plots for categorical variables Mosaic Plots, Tree Maps Ever Expanding Application to New Areas Wide Range of Commercial and Free Software DataDesk, Spotfire, Statistica, JMP, Visual Insights GGobi, Manet, Mondrian

Biplots & Grand Tour 2. pc %weaver 18 Biplots (Gabriel 1971) idea: scatterplot of 1st & 2nd principal component, add original variables as lines %unknown 18 %patrician 18 %merchants 18 %women 18 %widow 18 %baker 18 %goldsmith 18 %textiles 18 Grand Tour (Asimov 1984) 1. pc walk along path of ALL POSSIBLE d-dimensional projections additional indices for optimization: Projection Pursuit

Parallel Coordinates allow high dimensional visualization of data (Ed Wegman, Al Inselberg) Non-Euclidean Geometry: points to lines and lines to points

Mosaic Plots Visualization of high-dimensional contingency tables (Hartigan & Kleiner) further development (Friendly) and variations (Hofmann) area based plots: one rectangle for each cell in the table, area is proportional to cell size Sex F M F M F Male F Male Age Variation: Double Decker Plot Adult Child Class 1st 2. 3rd First 2nd Third Crew Sex Female Male F M Age Class First Second Third Adult Crew Child

Tree Maps Ben Shneiderman Splits on same level can be according to different variables not all cells are on same level aspect ratio optimized (close to 1) squares are easier to compare than skew rectangles green-shading indicates development of stock

Where do new ideas come from? stimulated by applications, new data types here: network data Network Graphs Graham Wills problems overview vs close-ups layout

Application: Gene Expression Data Experimental Setup 2 genotypes: Wildtype, growth impaired mutant 2 treatments: cure (not) added to soil 2 replicates each cure added? no yes genotype WT mutant Goal: identify genes with changes in gene expression due to treatment or genotype or both Classical ANOVA Problem!

ANOVA Model Model Setup (for each gene): cure added Y ijk = µ + λ i C + λj T + λij CT + εk Y ijk gene expression level µ base expression level (average) λ C i effect of cure λ j T effect of genotype genotype WT mutant λ j T no λ C i λ CT ij yes λ ij CT interaction effect of cure & genotype Compute F statistics, get P-values -> P-values of < 5% show significant effects... or NOT??

Summary Statistical Graphics have beautiful & interesting past Successful Applications, in some cases saved human lives Development goes through Cycles seems that graphics once more in highly productive phase Stimulation from application areas massive data sets, new areas with problems on new scale data mining / knowledge discovery

Sources Howard Wainer: "Visual Revelations - Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot" Edward Tufte: "The Visual Display of Quantitative Information" Michael Friendly's Data Visualization Gallery Milestones Project http://www.math.yorku.ca/scs/ Gallery/milestone/