Introduction to Data Visualization



Similar documents
Diagrams and Graphs of Statistical Data

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Data exploration with Microsoft Excel: analysing more than one variable

Visualization and descriptive statistics. D.A. Forsyth

Unresolved issues with the course, grades, or instructor, should be taken to the point of contact.

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Effective Big Data Visualization

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

The Comparisons. Grade Levels Comparisons. Focal PSSM K-8. Points PSSM CCSS 9-12 PSSM CCSS. Color Coding Legend. Not Identified in the Grade Band

The importance of graphing the data: Anscombe s regression examples

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Univariate Regression

Exploratory Data Analysis with #codemash

Descriptive statistics parameters: Measures of centrality

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Visualizing Data from Government Census and Surveys: Plans for the Future

Algebra 1 Course Information

Data Visualisation and Its Application in Official Statistics. Olivia Or Census and Statistics Department, Hong Kong, China

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Exploratory data analysis (Chapter 2) Fall 2011

BANA6037 Data Visualization Fall Semester 2014 (14FS) / First Half Session Section 001 S 9:00a- 12:50p Lindner 107

Fairfield Public Schools

Introduction. Chapter Before you start Formulation

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

Multivariate Logistic Regression

STAT355 - Probability & Statistics

AP STATISTICS REVIEW (YMS Chapters 1-8)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Data Exploration Data Visualization

Lecture 2. Summarizing the Sample

Organizing Your Approach to a Data Analysis

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Healthcare data analytics. Da-Wei Wang Institute of Information Science

2. Filling Data Gaps, Data validation & Descriptive Statistics

with functions, expressions and equations which follow in units 3 and 4.

Simple Predictive Analytics Curtis Seare

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

Overview of InfoVis. Exercise. Get out pencil and paper. CS Information Visualization Aug. 19, 2015 John Stasko. Fall 2015 CS

Security visualisation

Introduction to Regression and Data Analysis

Common Core Unit Summary Grades 6 to 8

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Intro to Parametric & Nonparametric Statistics

Data Visualization in R

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Representing Data Using Frequency Graphs


430 Statistics and Financial Mathematics for Business

An introduction to using Microsoft Excel for quantitative data analysis

Analyzing Experimental Data

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

SPSS Guide: Regression Analysis

Exploratory Data Analysis

R and Rcmdr : Basic Functions for Managing Data

Business Intelligence and Process Modelling

PREPARING TO TEACH DATA ANALYSIS AND PROBABILITY WITH TECHNOLOGY

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Bachelor's Degree in Business Administration and Master's Degree course description

an introduction to VISUALIZING DATA by joel laumans

Data Visualization Basics for Students

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

II. DISTRIBUTIONS distribution normal distribution. standard scores

How To Write A Data Analysis

?????? Data Analytics

Survey Data Analysis. Qatar University. Dr. Kenneth M.Coleman - University of Michigan

Get The Picture: Visualizing Financial Data part 1

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Data exploration with Microsoft Excel: univariate analysis

Session 7 Bivariate Data and Analysis

Module 3: Correlation and Covariance

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Tableau's data visualization software is provided through the Tableau for Teaching program.

For example, estimate the population of the United States as 3 times 10⁸ and the

Introduction to Statistics and Quantitative Research Methods

Data Visualization Techniques

Exercise 1.12 (Pg )

List of Examples. Examples 319

Statistics. Measurement. Scales of Measurement 7/18/2012

Computation of the Aggregate Claim Amount Distribution Using R and actuar. Vincent Goulet, Ph.D.

240ST014 - Data Analysis of Transport and Logistics

Big Data in Pictures: Data Visualization

Pennsylvania System of School Assessment

Data Mining. Dr. Saed Sayad. University of Toronto

Grade 6 Mathematics Assessment. Eligible Texas Essential Knowledge and Skills

Transcription:

Introduction to Data Visualization STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/teaching/stat133

Graphics 2

Data Visualization Using only numerical reduction methods in data analyses is far too limiting 3

Motivation Consider some data (four pairs of variables) ## x1 y1 x2 y2 x3 y3 x4 y4 ## 1 10 8.04 10 9.14 10 7.46 8 6.58 ## 2 8 6.95 8 8.14 8 6.77 8 5.76 ## 3 13 7.58 13 8.74 13 12.74 8 7.71 ## 4 9 8.81 9 8.77 9 7.11 8 8.84 ## 5 11 8.33 11 9.26 11 7.81 8 8.47 ## 6 14 9.96 14 8.10 14 8.84 8 7.04 ## 7 6 7.24 6 6.13 6 6.08 8 5.25 ## 8 4 4.26 4 3.10 4 5.39 19 12.50 ## 9 12 10.84 12 9.13 12 8.15 8 5.56 ## 10 7 4.82 7 7.26 7 6.42 8 7.91 ## 11 5 5.68 5 4.74 5 5.73 8 6.89 4

What things would you like to calculate for each variable? 5

Motivation ## x1 x2 x3 x4 ## Min. : 4.0 Min. : 4.0 Min. : 4.0 Min. : 8 ## 1st Qu.: 6.5 1st Qu.: 6.5 1st Qu.: 6.5 1st Qu.: 8 ## Median : 9.0 Median : 9.0 Median : 9.0 Median : 8 ## Mean : 9.0 Mean : 9.0 Mean : 9.0 Mean : 9 ## 3rd Qu.:11.5 3rd Qu.:11.5 3rd Qu.:11.5 3rd Qu.: 8 ## Max. :14.0 Max. :14.0 Max. :14.0 Max. :19 ## y1 y2 y3 y4 ## Min. : 4.260 Min. :3.100 Min. : 5.39 Min. : 5.250 ## 1st Qu.: 6.315 1st Qu.:6.695 1st Qu.: 6.25 1st Qu.: 6.170 ## Median : 7.580 Median :8.140 Median : 7.11 Median : 7.040 ## Mean : 7.501 Mean :7.501 Mean : 7.50 Mean : 7.501 ## 3rd Qu.: 8.570 3rd Qu.:8.950 3rd Qu.: 7.98 3rd Qu.: 8.190 ## Max. :10.840 Max. :9.260 Max. :12.74 Max. :12.500 6

What things would you like to calculate for each pair of variables (e.g. x1, y1)? 7

Motivation cor(anscombe$x1, anscombe$y1) ## [1] 0.8164205 cor(anscombe$x2, anscombe$y2) ## [1] 0.8162365 cor(anscombe$x3, anscombe$y3) ## [1] 0.8162867 cor(anscombe$x4, anscombe$y4) ## [1] 0.8165214 8

Motivation Mean of x values = 9.0 Mean of y values = 7.5 least squares equation: y = 3 + 0.5x Sum of squared errors: 110 Correlation coefficient: 0.816 9

Why Graphics? Are you able to see any patterns, associations, relations? ## x1 y1 x2 y2 x3 y3 x4 y4 ## 1 10 8.04 10 9.14 10 7.46 8 6.58 ## 2 8 6.95 8 8.14 8 6.77 8 5.76 ## 3 13 7.58 13 8.74 13 12.74 8 7.71 ## 4 9 8.81 9 8.77 9 7.11 8 8.84 ## 5 11 8.33 11 9.26 11 7.81 8 8.47 ## 6 14 9.96 14 8.10 14 8.84 8 7.04 ## 7 6 7.24 6 6.13 6 6.08 8 5.25 ## 8 4 4.26 4 3.10 4 5.39 19 12.50 ## 9 12 10.84 12 9.13 12 8.15 8 5.56 ## 10 7 4.82 7 7.26 7 6.42 8 7.91 ## 11 5 5.68 5 4.74 5 5.73 8 6.89 Famous dataset "anscombe" (four data sets) 10

Why Graphics? How are these two variables associated? What does these data values look like? ## x1 y1 ## 1 10 8.04 ## 2 8 6.95 ## 3 13 7.58 ## 4 9 8.81 ## 5 11 8.33 ## 6 14 9.96 ## 7 6 7.24 ## 8 4 4.26 ## 9 12 10.84 ## 10 7 4.82 ## 11 5 5.68 11

Our eyes are not very good at making sense when looking at (many) numbers 12

Our eyes are not very good at making sense when looking at (many) numbers But they are great for looking at shapes and detecting patterns 12

Why Graphics y1 4 6 8 10 y2 3 5 7 9 4 6 8 10 12 14 x1 4 6 8 10 12 14 x2 y3 6 8 10 12 y4 6 8 10 12 4 6 8 10 12 14 x3 8 10 12 14 16 18 x4 13

Data Visualization Using only numerical reduction methods in data analyses is far too limiting. Visualization provides insight that cannot be appreciated by any other approach to learning from data. (W. S. Cleveland) 14

Data Visualization A key component of computing with data consists of Data Visualization Google "data visualization" 15

Data Visualization 16

Data Visualization Data Visualization Statistical Graphics? Computer Graphics? Computer Vision? Infographics? Data Art? 17

Infographic 18

Scientific Imaging 19

Data Art 20

Visualization Continuum Statistical Graphics Facts Data Art Entertainment 21

Data Art? There s value in entertaining, putting a smile on someone s face, and making people feel something, as much as there is in optimized presentation. Nathan Yau, 2013 (Data Points, p 69) 22

Data Art? Data Art: visualizations that strive to entertain or to create aesthetic experiences with little concern for informing. Stephen Few, 2012 23

Data Visualization 24

Stats Graphics 25

Stats Graphics Things commonly said about statistical graphics The data should stand out Story telling Big Picture The purpose of visualization is insight, not pictures (Ben Shneiderman) We ll focus on statistical graphics and other visual displays of data in science and technology 26

Stats Graphics Graphics for Exploration & Communication 27

Graphics for Exploration graphics for understanding data the analyst is the main (and usually only) consumer typically quick & dirty (not much care about visual appearance and design principles) lifespan of a few seconds 28

Graphics for Exploration 0 2 4 6 8 A B C 29

Graphics for Communication graphics for presenting data to be consumed by others must care about visual appearance and design require a lot of iterations in order to get the final version what s the message? who s the audience? on what type of media / format? 30

Graphics for Communication 10 Average Score 8 6 4 2 0 A B C 31

Graphics for Communication Use visualization to communicate ideas, influence, explain persuade Visuals can serve as evidence or support 32

Visualization Visuals can frequently take the place of many words, tables, and numbers Visuals can summarize, aggregate, unite, explain Sometimes words are needed, however 33

Graphics (Part I) In this first part of the course we ll focus on: graphics for exploration types of statistical graphics understanding graphics system in R traditional R graphics and graphics with "ggplot2" 34

Graphics (Part II) Later in the course we ll talk about: graphics for communication design principles color theory and use of color guidelines and good practices "shiny" and interactive graphics (time permitting) 35

Considerations Number of Variables Type of Variables 36

How many variables? Variables in datasets: 1 - univariate data 2 - bivariate data 3 - trivariate data multivariate data 37

What type of variables? Quantitative -vs- Qualitative Continuous -vs- Discrete 38

Univariate Quantitative variable: How values are distributed max, min, ranges measures of center measures of spread areas of concentration outliers interesting patterns 39

Univariate Qualitative variable: Counts and proportions (i.e. frequencies) Common values Most typical value Distribution of frequencies 40

Bivariate Quantitative-Quantitative Qualitative-Quantitative Qualitative-Qualitative In general we care about association (correlation, relationships) 41

Multivariate Quantitative Qualitative Mixed In general we care about association (correlation, relationships) 42

What about individuals? Resemblance Similarities and disimilarities Typologies 43