Data Visualization with R Language



Similar documents
OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE

Scientific data visualization

Data Visualization in R

Getting started with qplot

A Short Guide to R with RStudio

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

Using stargazer to report regression output and descriptive statistics in R (for non-latex users) (v1.0 draft)

Lecture 2: Exploratory Data Analysis with R

Tutorial: ggplot2. Ramon Saccilotto Universitätsspital Basel Hebelstrasse 10 T F saccilottor@uhbs.ch

Diagrams and Graphs of Statistical Data

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

5 Correlation and Data Exploration

Graphics in R. Biostatistics 615/815

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Package RIGHT. March 30, 2015

Data Exploration Data Visualization

Viewing Ecological data using R graphics

Exploratory Data Analysis

Text Mining with R Twitter Data Analysis 1

Exploratory data analysis (Chapter 2) Fall 2011

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

How To Perform Predictive Analysis On Your Web Analytics Data In R 2.5

Getting Started with R and RStudio 1

Time Series Analysis AMS 316

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Exploratory Data Analysis

Descriptive Statistics

MAS8381: Statistics for Big data Part 2: Multivariate Data Analysis using R

Bigger data analysis. Hadley Chief Scientist, RStudio. Thursday, July 18, 13

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Visualization Quick Guide

Introduction to R and Exploratory data analysis

Introduction to Exploratory Data Analysis

Regression and Programming in R. Anja Bråthen Kristoffersen Biomedical Research Group

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Variables. Exploratory Data Analysis

Data exploration with Microsoft Excel: analysing more than one variable

Tableau Your Data! Wiley. with Tableau Software. the InterWorks Bl Team. Fast and Easy Visual Analysis. Daniel G. Murray and

Visualising big data in R

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Introduction to Data Visualization

Package ggrepel. February 8, 2016

Getting started in Excel

Demographics of Atlanta, Georgia:

A Layered Grammar of Graphics

Summarizing and Displaying Categorical Data

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Cluster Analysis using R

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

MetroBoston DataCommon Training

Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16

Web Development with R

QAC Data Visualization. General Information. Course Description. Course Materials. Course Topics. Syllabus: Fall 2015

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Installing R and the psych package

Data Visualization Techniques

Solutions to Homework 3 Statistics 302 Professor Larget

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Exercise 1: How to Record and Present Your Data Graphically Using Excel Dr. Chris Paradise, edited by Steven J. Price

AMS 7L LAB #2 Spring, Exploratory Data Analysis

Evaluating the results of a car crash study using Statistical Analysis System. Kennesaw State University

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

COMMON CORE STATE STANDARDS FOR

Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki

Introduction Course in SPSS - Evening 1

Data Visualization Handbook

Data Visualization: Some Basics

Visualizing Data from Government Census and Surveys: Plans for the Future

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

An introduction to using Microsoft Excel for quantitative data analysis

Data Visualization. Richard T. Watson. apple ibooks Author

TDWI 2013 Munich. Training - Using R for Business Intelligence in Big Data

Graphs. Exploratory data analysis. Graphs. Standard forms. A graph is a suitable way of representing data if:

Package MBA. February 19, Index 7. Canopy LIDAR data

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

Data mining as a tool of revealing the hidden connection of the plant

Figure 1. An embedded chart on a worksheet.

Module 2 Basic Data Management, Graphs, and Log-Files

Employee Survey Analysis

ggplot2 Version of Figures in Lattice: Multivariate Data Visualization with R

% 40 = = M 28 28

Data Visualization. BUS 230: Business and Economic Research and Communication

Interactive Applications for Modeling and Analysis with Shiny

Data Science in the Cloud with Microsoft Azure Machine. Learning and R. Stephen F. Elston. Learning and R by Stephen F. Elston

Hands-On Data Science with R Exploring Data with GGPlot2. Graham.Williams@togaware.com. 22nd May 2015 DRAFT

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Visualization and descriptive statistics. D.A. Forsyth

Package hazus. February 20, 2015

R Graphics II: Graphics for Exploratory Data Analysis

Each function call carries out a single task associated with drawing the graph.

GETTING STARTED WITH R AND DATA ANALYSIS

Data exploration with Microsoft Excel: univariate analysis

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Transcription:

1 Data Visualization with R Language DENG, Xiaodong (xiaodong_deng@nuhs.edu.sg ) Research Assistant Saw Swee Hock School of Public Health, National University of Singapore

Why Visualize Data? For better presentation and communication 2

What Can We Do with R? Exploratory Visualization demand 8 10 14 18 1 2 3 4 5 6 7 Time 3

What Can We Do with R? Plotting for deeper insights 3e+05 AgeGroup <5 Thousands 2e+05 1e+05 5-14 15-24 25-34 35-44 45-54 55-64 >64 0e+00 1900 1925 1950 1975 2000 Year The change of age structure Correlation matrix with correlation coefficient 4

Outline Basic Plotting Function in R Advanced Plotting in R: ggplot2 packages 5

Data Visualization with R Language: Basic Plotting Function in R 6

Dataset I BOD data The biochemical oxygen demand and corresponding time was recorded in an evaluation of water quality. Time 1 2 3 4 5 6 demand 8.3 10.3 19.0 16.0 15.6 19.8 7

Dataset I BOD data R object 8

Basic Plotting Functions Line Plot demand 8 10 14 18 1 2 3 4 5 6 7 Time Line only demand 8 10 14 18 1 2 3 4 5 6 7 Time Line and dot 9

Basic Plotting Functions Commonly Used Function for Plotting plot(x = BOD$Time, y = BOD$demand, type="l") Function Name Function we use Arguments - 1 Specify the data we use Arguments 2 (optional) Plotting options 10

Basic Plotting Functions Line Plot plot(x=bod$time, y=bod$demand, type="l") plot(x=bod$time, y=bod$demand, type="b") demand 8 10 14 18 demand 8 10 14 18 1 2 3 4 5 6 7 Time 1 2 3 4 5 6 7 Time 11

Basic Plotting Functions Line Plot Previous usage plot(x=bod$time, y=bod$demand, type="l") Alternative commands plot(demand ~ Time, data=bod, type="l" 12

Basic Plotting Functions More Options - Color plot(demand ~ Time, data=bod, type="b", col="red") plot(demand ~ Time, data=bod, type="b", col="blue") demand 8 10 14 18 1 2 3 4 5 6 7 demand 8 10 14 18 1 2 3 4 5 6 7 Time Time 13

Basic Plotting Functions More Options Shape plot(demand ~ Time, data=bod, type="b", pch=1:6) 14

Basic Plotting Functions More Options on shape by using pch argument 15

Basic Plotting Functions More Options Line Width plot(demand ~ Time, data=bod, type="l") plot(demand ~ Time, data=bod, type="l", lwd=5) demand 8 10 14 18 1 2 3 4 5 6 7 Time demand 8 10 14 18 1 2 3 4 5 6 7 Time 16

Basic Plotting Functions More Options Add Title plot(demand ~ Time, data=bod, type="l", main="line plot-test") 17

Basic Plotting Functions More Options Change Labels plot(demand ~ Time, data=bod, type="l", xlab="x-axis", ylab="y-axis") 18

Basic Plotting Functions More Options To explore more options in plotting functions,?par 19

Dataset II mtcars data The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973 74 models) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4 Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4 ---: numerical ---: categorical Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2 20

Basic Plotting Functions Scatter Plot plot(mpg ~ wt, data = mtcars, type = "p") 21

Basic Plotting Functions Boxplot Upper adjacent value 75% quantile Median (50% quantile) 25% quantile Low er adjacent value boxplot(mpg ~ cyl, data=mtcars) 22

Basic Plotting Functions Histogram hist(mtcars$qsec, freq = FALSE) 23

Basic Plotting Functions Histogram hist(mtcars$qsec, freq = FALSE) d <- density(mtcars$qsec) lines(d) 24

Basic Plotting Functions 3-D Graphics x <- seq(-10, 10, length= 30) y <- x f <- function(x, y) { r <- sqrt(x^2+y^2) 10 * sin(r)/r } z <- outer(x, y, f) persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = "lightblue") z x y 25

Basic Plotting Functions 3-D Graphics (GIF made with animation package) 26

Basic Plotting Functions Contour Plot x <- seq(-10, 10, length= 30) y <- x f <- function(x, y) { r <- sqrt(x^2+y^2) 10 * sin(r)/r } z <- outer(x, y, f) contour(x, y, z) 27

Data Visualization with R Language: ggplot2 package 28

What is ggplot2 A packaged developed by Hadley Wickham (Chief scientist at Rstudio); A plotting system for R, based on the grammar of graphics by Leland Wilkonson - An abstraction which makes thinking, reasoning and communicating graphics easier - Enables us to concisely describe the components of a graphic Produce complex multi-layered graphics 29

An Example Call ggplot2 package library(ggplot2) ggplot2 is a separate package, so we need to call it before we use it. 30

An Example A wrong example library(ggplot2) ggplot(bod, aes(x = Time, y = demand)) Error: No layers in plot ggplot() is typically used to construct a plot incrementally Use + operator to add layers to the existing ggplot object Explicitly decide which layers are added and the order in which they are added 31

Common Pattern Common pattern in the usage of ggplot2 q <- ggplot(bod, aes(x = Time, y = demand)) q + geom_line() + geom_point() +... Construct a ggplot object from data frame Layer - 1 line layer Layer - 2 point layer More layers 32

An Example Add a point layer q_0 <- ggplot(mtcars, aes(x = wt, y = mpg)) q_1 <- q_0+ geom_point() q_1 33

An Example Add a point layer q_0 <- ggplot(mtcars, aes(x = wt, y = mpg)) q_1 <- q_0 + geom_point(col="red", size=5) q_1 34

An Example Add a point layer q_0 <- ggplot(mtcars, aes(x = wt, y = mpg, col=cyl)) q_1 <- q_0 + geom_point(size=5) q_1 35

An Example Add a multiple layer q_0 <- ggplot(bod, aes(x = Time, y = demand)) q_1 <- q_0 + geom_point() q_1 q_2 <- q_1 + geom_line() q_2 36

An Example Add a multiple layer q_0 <- ggplot(bod, aes(x = Time, y = demand)) q_1 <- q_0 + geom_line() + geom_point() q_1 Layer - 1 line layer Layer - 2 point layer 37

ggplot2 Other Plots Histogram q_0 <- ggplot(mtcars, aes(x=qsec)) q_1 <- q_0 + geom_histogram() q_1 38

ggplot2 Other Plots Histogram q_0 <- ggplot(mtcars, aes(x=qsec)) q_1 <- q_0 + geom_histogram(col="black", fill = "lightblue") q_1 39

ggplot2 Other Plots Boxplot q_0 <- ggplot(mtcars, aes(y=qsec, x=factor(carb))) q_1 <- q_0 + geom_boxplot() q_1 40

ggplot2 Other Plots Boxplot q_0 <- ggplot(mtcars, aes(y=qsec, x=factor(carb), fill=factor(carb))) q_1 <- q_0 + geom_boxplot() q_1 41

ggplot2 Other Plots Stacked Area Graph library(gcookbook) ggplot(uspopage, aes(x=year, y=thousands, fill=agegroup)) + geom_area() Thousands 3e+05 2e+05 1e+05 0e+00 AgeGroup <5 5-14 15-24 25-34 35-44 45-54 55-64 >64 1900 1925 1950 1975 2000 Year 42

ggplot2 Other Options Others Applicable Layers geom_abline geom_area geom_bar geom_boxplot geom_contour geom_density geom_dotplot geom_histogram geom_line geom_map geom_path geom_point geom_rug geom_step geom_text geom_violin 43

Reference Hadley Wickham, A Layered Grammar of Graphics, http://vita.had.co.nz/papers/layered-grammar.pdf http://fbmap.bitaesthetics.com/ https://support.rstudio.com/hc/en-us/articles/200551906-interactive- Plotting-with-Manipulate http://www.cookbook-r.com/graphs/ Book: Winston Chang, R Graphics Cookbook, O Reilly 44

Thanks 45