Scientific data visualization



Similar documents
Data Visualization with R Language

Getting started with qplot

Analysis One Code Desc. Transaction Amount. Fiscal Period

Data Visualization in R

5 Correlation and Data Exploration

Assignment 4 CPSC 217 L02 Purpose. Important Note. Data visualization

Scientific Graphing in Excel 2010

Data Visualization. Richard T. Watson. apple ibooks Author

5 Point Choice ( 五 分 選 擇 題 ): Allow a single rating of between 1 and 5 for the question at hand. Date ( 日 期 ): Enter a date Eg: What is your birthdate

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

Package RIGHT. March 30, 2015

Getting Started with R and RStudio 1

Scatter Plots with Error Bars

A Layered Grammar of Graphics

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

Based on Chapter 11, Excel 2007 Dashboards & Reports (Alexander) and Create Dynamic Charts in Microsoft Office Excel 2007 and Beyond (Scheck)

Bigger data analysis. Hadley Chief Scientist, RStudio. Thursday, July 18, 13

Basic Understandings

a. mean b. interquartile range c. range d. median

KaleidaGraph Quick Start Guide

Tutorial: ggplot2. Ramon Saccilotto Universitätsspital Basel Hebelstrasse 10 T F saccilottor@uhbs.ch

Plotting: Customizing the Graph

Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices

Hands-On Data Science with R Dealing with Big Data. Graham.Williams@togaware.com. 27th November 2014 DRAFT

MetroBoston DataCommon Training

If the World Were Our Classroom. Brief Overview:


WEB TRADER USER MANUAL

DIGITAL DESIGN APPLICATIONS Word Exam REVIEW

Formulas, Functions and Charts

PURPOSE OF GRAPHS YOU ARE ABOUT TO BUILD. To explore for a relationship between the categories of two discrete variables

Web Development with R

Plots, Curve-Fitting, and Data Modeling in Microsoft Excel

Case 2:08-cv ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

Data Analytics and Business Intelligence (8696/8697)

Introduction Course in SPSS - Evening 1

MIXED MODEL ANALYSIS USING R

3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS. Copyright Cengage Learning. All rights reserved.

Comparing share-price performance of a stock

CALL VOLUME FORECASTING FOR SERVICE DESKS

Information Literacy Program

Create Charts in Excel

How To Perform Predictive Analysis On Your Web Analytics Data In R 2.5

Each function call carries out a single task associated with drawing the graph.

Working with Excel in Origin

Exploratory Data Analysis and Plotting

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Psychology 205: Research Methods in Psychology

Intermediate PowerPoint

Enhanced Vessel Traffic Management System Booking Slots Available and Vessels Booked per Day From 12-JAN-2016 To 30-JUN-2017

Excel 2010: Create your first spreadsheet

Using SPSS, Chapter 2: Descriptive Statistics

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

NASA Explorer Schools Pre-Algebra Unit Lesson 2 Student Workbook. Solar System Math. Comparing Mass, Gravity, Composition, & Density

Main Effects and Interactions

Updates to Graphing with Excel

Data Visualization. Scientific Principles, Design Choices and Implementation in LabKey. Cory Nathe Software Engineer, LabKey

Reading and writing files

Gestation Period as a function of Lifespan

Designing forms for auto field detection in Adobe Acrobat

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Microsoft Excel. Qi Wei

Data exploration with Microsoft Excel: analysing more than one variable

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Lecture 2: Exploratory Data Analysis with R

Graphics in R. Biostatistics 615/815

Introduction to Microsoft Excel 2007/2010

Visualization Quick Guide

APES Math Review. For each problem show every step of your work, and indicate the cancellation of all units No Calculators!!

PharmaSUG Paper DG06

Example of an APA-style manuscript for Research Methods in Psychology

Systems Dynamics Using Vensim Personal Learning Edition (PLE) Download Vensim PLE at

4. Descriptive Statistics: Measures of Variability and Central Tendency

Volumes of Revolution

What s new in TIBCO Spotfire 7.0

R Commander Tutorial

Publisher 2010 Cheat Sheet

Interactive Applications for Modeling and Analysis with Shiny

Example of an APA-style manuscript for Research Methods in Psychology. William Revelle. Department of Psychology. Northwestern University

Microfinance Credit Risk Dashboard User Guide

User Recognition and Preference of App Icon Stylization Design on the Smartphone

Introduction to SPSS 16.0

General instructions for the content of all StatTools assignments and the use of StatTools:

MARS STUDENT IMAGING PROJECT

Introduction to Exploratory Data Analysis

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

1. Introduction. 2. User Instructions. 2.1 Set-up

Software Re-Engineering and Ux Improvement for ElegantJ BI Business Intelligence Suite

Plotting Data with Microsoft Excel

Tutorial: Managing Ocean Data with R

Guidelines for Writing An APA Style Lab Report

Excel -- Creating Charts

Getting started in Excel

Data Science in the Cloud with Microsoft Azure Machine. Learning and R. Stephen F. Elston. Learning and R by Stephen F. Elston

Charts for SharePoint

Transcription:

Scientific data visualization Using ggplot2 Sacha Epskamp University of Amsterdam Department of Psychological Methods 11-04-2014

Hadley Wickham

Hadley Wickham

Evolution of data visualization

Scientific data visualization Data and analysis results are best communicated through visualizations The leading software for statistical analyses is the statistical programming language R The leading R extension for data visualization is ggplot2 This presentation will quickly teach you strong visualization techniques in R

First use of R We will use the environment RStudio for our work in R RStudio has 4 panels: Console This is the actual R window, you can enter commands here and execute them by pressing enter Source This is where we can edit scripts. It is where you should always be working. Control-enter sends selected codes to the console Plots/Help This is where plots and help pages will be shown Workspace Shows which objects you currently have Anything following a # symbol is treated as a comment!

R workflow File New File R script Write codes in the R script Select codes and press control + enter to execute them

Import data File <- "http://sachaepskamp.com/files/opdata.csv" Data <- read.csv(file)

Look at data head(data) ## userid Measurement Gender Age Study Work Neuroticism ## 1 1 1 female 24 yes part time low ## 2 1 2 female 24 yes part time low ## 3 1 3 female 24 yes part time low ## 4 1 4 female 24 yes part time low ## 5 1 5 female 24 yes part time low ## 6 1 6 female 24 yes part time low ## Extraversion Openness Conscienciousness Agreeableness ## 1 low high high high ## 2 low high high high ## 3 low high high high ## 4 low high high high ## 5 low high high high ## 6 low high high high ## Stress ## 1 0.375 ## 2 0.875 ## 3 1.375 ## 4 1.875 ## 5 1.875 ## 6 0.750

Look at data names(data) ## [1] "userid" "Measurement" ## [3] "Gender" "Age" ## [5] "Study" "Work" ## [7] "Neuroticism" "Extraversion" ## [9] "Openness" "Conscienciousness" ## [11] "Agreeableness" "Stress"

Look at data str(data) ## 'data.frame': 750 obs. of 12 variables: ## $ userid : int 1 1 1 1 1 1 1 1 1 1... ## $ Measurement : int 1 2 3 4 5 6 7 8 9 10... ## $ Gender : Factor w/ 2 levels "female","male": 1 1 1 1 1 ## $ Age : int 24 24 24 24 24 24 24 24 24 24... ## $ Study : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 ## $ Work : Factor w/ 3 levels "full time","none",..: 3 3 ## $ Neuroticism : Factor w/ 2 levels "high","low": 2 2 2 2 2 2 ## $ Extraversion : Factor w/ 2 levels "high","low": 2 2 2 2 2 2 ## $ Openness : Factor w/ 2 levels "high","low": 1 1 1 1 1 1 ## $ Conscienciousness: Factor w/ 2 levels "high","low": 1 1 1 1 1 1 ## $ Agreeableness : Factor w/ 2 levels "high","low": 1 1 1 1 1 1 ## $ Stress : num 0.375 0.875 1.375 1.875 1.875...

Look at data View(Data)

ggplot2 ggplot2 (Wickham, 2009) is an implementation of the Grammer of Graphics (Wilkinson, Wills, Rope, Norton, & Dubbs, 2006) Very different from base R plotting but also very flexible and powerfull Uses data frames as input Data must be in long format This means that each row is an observation and each column a variable Use reshape2 to get data in long format Also check out dplyr (http://sachaepskamp.com/ files/dplyrtutorial.html)

Basics of a plot A plot is a 2D repressentation of data, in which variables can be visualized by, e.g.,: Horizontal placing Vertical placing Color Different Lines Line type Size Shape... These are called aesthetics In ggplot2 we first set aesthetic mapping of our data using aes() inside ggplot() Which variables will be mapped to which aesthetics?

install.packages("ggplot2") library("ggplot2") ggplot(data, aes(x = Measurement, y = Stress)) ## Error: No layers in plot

Geometrics Next, we define how these aesthetics are used and what we are plotting: Lines Points Boxplots Curves... These are called geometrics (geoms) We can add these to the plot using +

ggplot(data, aes(x = Measurement, y = Stress)) + geom_point() 3 2 Stress 1 0 0 10 20 30 Measurement

ggplot(data, aes(x = Measurement, y = Stress)) + geom_boxplot() 3 2 Stress 1 0 10 20 Measurement

ggplot(data, aes(x = Measurement, y = Stress, group = Measurement)) + geom_boxplot() 3 2 Stress 1 0 0 10 20 30 Measurement

ggplot(data, aes(x = Measurement, y = Stress, group = userid) ) + geom_line() 3 2 Stress 1 0 0 10 20 30 Measurement

ggplot(data, aes(x = Measurement, y = Stress, group = userid, colour = Age)) + geom_line() + facet_grid(gender ~.) 3 Stress 2 1 0 3 2 1 female male Age 50 40 30 20 0 0 10 20 30 Measurement

Store elements in an object: g <- ggplot(data, aes(x = Measurement, y = Stress, group = userid, colour = Age)) g <- g + geom_line() g <- g +facet_grid(gender ~.)

Print the object to plot: print(g) 3 Stress 2 1 0 3 2 1 female male Age 50 40 30 20 0 0 10 20 30 Measurement

Many more graphical options can be added to ggplot calls xlab Label of x-axis ylab Label of y-axis ggtitle Title of plot theme Many, many graphical settings theme_bw() A default black and white theme Use Google!

g + xlab("time") + ylab("amount of Stress") + ggtitle("a very fancy plot") + theme_bw() 3 A very fancy plot Amount of Stress 2 1 0 3 2 1 female male Age 50 40 30 20 0 0 10 20 30 Time

str(sumdata) ## 'data.frame': 66095 obs. of 5 variables: ## $ user_id : num 1456 1713 1837 1845 21167... ## $ rating : num 0.482-5.225-6.639-0.417-3.008... ## $ date_of_birth: Date, format: "2001-06-03"... ## $ grade : num 8 8 8 8 7 8 7 8 8 8... ## $ gender : chr "f" "m" "m" "f"...

ggplot(sumdata, aes(x = date_of_birth, y = rating)) + geom_point() 20 rating 10 0-10 -20 2002 2004 2006 date_of_birth 2008 2010

ggplot(sumdata, aes(x = date_of_birth, y = rating)) + stat_binhex() 20 10 rating 0 count 800 600 400 200-10 -20 2002 2004 2006 2008 2010 date_of_birth

ggplot(sumdata, aes(x = date_of_birth, y = rating, colour = factor(grade))) + geom_point() 20 10 factor(grade) 1 rating 2 3 0 4 5 6 7-10 8-20 2002 2004 2006 date_of_birth 2008 2010

ggplot(sumdata, aes(x = date_of_birth, y = rating, colour = factor(grade), fill = factor(grade))) + geom_point() + geom_smooth(col = "black", method = "lm") 20 10 factor(grade) 1 rating 2 3 0 4 5 6 7-10 8-20 2002 2004 2006 date_of_birth 2008 2010

ggplot(sumdata, aes(x = date_of_birth, y = rating, colour = factor(grade), fill = factor(grade))) + geom_point() + geom_smooth(col = "black", method = "lm", formula = y ~ poly(x, 2)) 20 10 factor(grade) 1 rating 2 3 0 4 5 6 7-10 8-20 2002 2004 2006 date_of_birth 2008 2010

ggplot(sumdata, aes(x = grade)) + geom_histogram() 12000 9000 count 6000 3000 0 2 4 6 8 grade

ggplot(sumdata, aes(x = grade, y = rating, colour = factor(grade)) ) + geom_violin() 20 10 rating 0-10 factor(grade) 1 2 3 4 5 6 7 8-20 2 4 6 8 grade

Xss Xso Xsb Xli Oun Oin Ocr Oaa Hsi Hmo Hga Hfa Ese Efe Ede Ean Cpr Cpe Cor Cdi Apa Age Afo Afl Betweenness Closeness Strength Zhang Onnela 0 10 20 30 0.0020 0.0025 0.0030 0.0035 0.6 0.9 1.2 1.5 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15

Spel: mollenspel 20 Rating 10 0 Sequence length 1 2 3 4 5 6 7 8 9 Apr 01 Apr 15 May 01 May 15 Jun 01 Date

Spel: mollenspel 20 Rating 10 0 Sequence length 1 2 3 4 5 6 7 8 9 10 Nov 03 Nov 10 Nov 17 Nov 24 Dec 01 Dec 08 Dec 15 Date

woord.pdf 8 userrating 4 0 4 Nov 15 Dec 01 Dec 15 Jan 01 created Jan 15 Feb 01

woord.pdf 8 Binned by item 4 Rating 0 Number of items made 500 400 300 200 100 4 Nov 15 Dec 01 Dec 15 Jan 01 Jan 15 Feb 01

woord.pdf 8 Binned by item 4 Rating 0 Score expected 0.5 0.0 0.5 1.0 4 Nov 15 Dec 01 Dec 15 Jan 01 Jan 15 Feb 01

woord.pdf 8 Binned by item 4 Rating 0 Response time 40000 30000 20000 10000 4 Nov 15 Dec 01 Dec 15 Jan 01 Jan 15 Feb 01

More ggplot2

More ggplot2

More ggplot2 likert package (Bryer & Speerschneider, 2013)

More ggplot2 sjplot package (Lüdecke, 2014)

More ggplot2 sjplot package (Lüdecke, 2014)

More ggplot2 sjplot package (Lüdecke, 2014)

More ggplot2 sjplot package (Lüdecke, 2014)

ggplot2 conclusion ggplot2 can create very complex visualizations with minimal codes Automatizes convenient things such as Margins Legend Documentation: http://docs.ggplot2.org/

Thank you for your attention! Exercises are on http://sachaepskamp.com/files/ ggplot2_exercises.html

References I Bryer, J., & Speerschneider, K. (2013). likert: Functions to analyze and visualize likert type items [Computer software manual]. Retrieved from http://cran.r-project.org/package=likert (R package version 1.1) Lüdecke, D. (2014). sjplot: sjplot - data visualization for statistics in social science [Computer software manual]. Retrieved from http://cran.r-project.org/package=sjplot (R package version 1.3) Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer New York. Retrieved from http://had.co.nz/ggplot2/book Wilkinson, L., Wills, D., Rope, D., Norton, A., & Dubbs, R. (2006). The grammar of graphics. Springer.