Graphics in R. Biostatistics 615/815



Similar documents
Viewing Ecological data using R graphics

Each function call carries out a single task associated with drawing the graph.

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

Getting Started with R and RStudio 1

Getting started with qplot

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

Scatter Plots with Error Bars

R Graphics II: Graphics for Exploratory Data Analysis

Introduction to R and Exploratory data analysis

Describing, Exploring, and Comparing Data

Time Series Analysis AMS 316

5 Correlation and Data Exploration

Journal of Statistical Software

Package tagcloud. R topics documented: July 3, 2015

An R Tutorial. 1. Starting Out

Exploratory data analysis (Chapter 2) Fall 2011

Lecture 2: Exploratory Data Analysis with R

NMSA230: Software for Mathematics and Stochastics Sweave Example file

GeoGebra Statistics and Probability

Using SPSS, Chapter 2: Descriptive Statistics

Cluster Analysis using R

STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS

Analysis of System Performance IN2072 Chapter M Matlab Tutorial

Using R for Windows and Macintosh

R: A self-learn tutorial

Plotting: Customizing the Graph

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.

Gestation Period as a function of Lifespan

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Diagrams and Graphs of Statistical Data

Big Data and Scripting. Plotting in R

Getting started in Excel

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

STC: Descriptive Statistics in Excel Running Descriptive and Correlational Analysis in Excel 2013

OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE

Beginner s Matlab Tutorial

Data exploration with Microsoft Excel: analysing more than one variable

SPSS Manual for Introductory Applied Statistics: A Variable Approach

4 Other useful features on the course web page. 5 Accessing SAS

Data Exploration Data Visualization

Alternative Graphics System Lattice

Basic Excel Handbook

Data exploration with Microsoft Excel: univariate analysis

Chapter 1 Exploratory data analysis

Dongfeng Li. Autumn 2010

Computer Training Centre University College Cork. Excel 2013 Pivot Tables

MetroBoston DataCommon Training

Visualization Quick Guide

Public Health Activities and Services Tracking (PHAST) Interactive Data Visualization Tool User Manual

Variables. Exploratory Data Analysis

Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16

IBM SPSS Statistics for Beginners for Windows

Data representation and analysis in Excel

A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package.

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

The data set we have taken is about calculating body fat percentage for an individual.

Time Series Analysis with R - Part I. Walter Zucchini, Oleg Nenadić

Microsoft Excel 2010 Pivot Tables

Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller

KaleidaGraph Quick Start Guide

Package survpresmooth

Data Visualization with R Language

Introduction to R and UNIX Working with microarray data in a multi-user environment

GeoGebra. 10 lessons. Gerrit Stols

MatLab - Systems of Differential Equations

Package lattice. July 14, 2015

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample.

Exploratory Data Analysis

An introduction to using Microsoft Excel for quantitative data analysis

Exercise 1: How to Record and Present Your Data Graphically Using Excel Dr. Chris Paradise, edited by Steven J. Price

Integrated Company Analysis

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

To draw a line. To create a construction line by specifying two points

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

UCL Depthmap 7: Data Analysis

Quickstart for Desktop Version

Computer Statistics with R

TIPS FOR DOING STATISTICS IN EXCEL

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Objective: Use calculator to comprehend transformations.

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.

Statgraphics Getting started

Introduction Course in SPSS - Evening 1

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Introduction to Microsoft Excel 2007/2010

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

Drawing a histogram using Excel

Visualization and descriptive statistics. D.A. Forsyth

AMS 7L LAB #2 Spring, Exploratory Data Analysis

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Bernd Klaus, some input from Wolfgang Huber, EMBL

The Center for Teaching, Learning, & Technology

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Package plan. R topics documented: February 20, 2015

Appendix 2.1 Tabular and Graphical Methods Using Excel

Using Excel for descriptive statistics

Transcription:

Graphics in R Biostatistics 615/815

Last Lecture Introduction to R Programming Controlling Loops Defining your own functions

Today Introduction to Graphics in R Examples of commonly used graphics functions Common options for customizing graphs

Computer Graphics Graphics are important for conveying important features of the data They can be used to examine Marginal distributions Relationships between variables Summary of very large data Important complement to many statistical and computational techniques

Example Data The examples in this lecture will be based on a dataset with six variables: Height (in cm) Weight (in kg) Waist Circumference (in cm) Hip Circumference (in cm) Systolic Blood Pressure Diastolic Blood Pressure

The Data File Height Weight Waist Hip bp.sys bp.dia 172 72 87 94 127.5 80 166 91 109 107 172.5 100 174 80 95 101 123 64 176 79 93 100 117 76 166 55 70 94 100 60 163 76 96 99 160 87.5 154 84 98 118 130 80 165 90 108 101 139 80 155 66 80 96 120 70 146 59 77 96 112.5 75 164 62 76 93 130 47.5 159 59 76 96 109 69 163 69 96 99 155 100 143 73 97 117 137.5 85...

Reading in the Data > dataset <- read.table( 815data.txt", header = T) > summary(dataset) Height Weight Waist Min. :131.0 Min. : 0.00 Min. : 0.0 1st Qu.:153.0 1st Qu.: 55.00 1st Qu.: 74.0 Median :159.0 Median : 63.00 Median : 84.0 Mean :159.6 Mean : 64.78 Mean : 84.6 3rd Qu.:166.0 3rd Qu.: 74.00 3rd Qu.: 94.0 Max. :196.0 Max. :135.00 Max. :134.0...

Graphics in R plot() is the main graphing function Automatically produces simple plots for vectors, functions or data frames Many useful customization options

Plotting a Vector plot(v) will print the elements of the vector v according to their index # Plot height for each observation > plot(dataset$height) # Plot values against their ranks > plot(sort(dataset$height))

Plotting a Vector 130 140 150 160 170 180 190 sort(dataset$height) 130 140 150 160 170 180 190 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Index Index plot(dataset$height) plot(sort(dataset$height)) dataset$height

Common Parameters for plot() Specifying labels: main provides a title xlab label for the x axis ylab label for the y axis Specifying range limits ylim 2-element vector gives range for x axis xlim 2-element vector gives range for y axis

A Properly Labeled Plot Distribution of Heights Height (in cm) 120 140 160 180 200 0 1000 2000 3000 4000 Rank plot(sort(dataset$height), ylim = c(120,200), ylab = "Height (in cm)", xlab = "Rank", main = "Distribution of Heights")

Plotting Two Vectors plot() can pair elements from 2 vectors to produce x-y coordinates plot() and pairs() can also produce composite plots that pair all the variables in a data frame.

Plotting Two Vectors Circumference (in cm) Waist 60 80 100 120 40 60 80 100 120 140 Hip plot(dataset$hip, dataset$waist, xlab = "Hip", ylab = "Waist", main = "Circumference (in cm)", pch = 2, col = "blue")

Plotting Two Vectors Circumference (in cm) Possible Outlier Waist 60 80 100 120 40 60 80 100 120 140 Hip plot(dataset$hip, dataset$waist, xlab = "Hip", ylab = "Waist", main = "Circumference (in cm)", pch = 2, col = "blue")

Plotting Two Vectors Circumference (in cm) Possible Outlier Waist 60 80 100 120 These options set the plotting symbol (pch) and line color (col) 40 60 80 100 120 140 Hip plot(dataset$hip, dataset$waist, xlab = "Hip", ylab = "Waist", main = "Circumference (in cm)", pch = 2, col = "blue")

Plotting Contents of a Dataset 40 60 80 100 120 Height 130 150 170 190 40 60 80 100 Weight Waist 60 80 100 120 130 150 170 190 60 80 100 120 plot(dataset[-c(4,5,6)])

Plotting Contents of a Dataset Height 40 60 80 100 120 130 150 170 190 Weight and Waist Circumference Appear Strongly Correlated 40 60 80 100 Weight You could check this with the cor() function. Waist 60 80 100 120 130 150 170 190 60 80 100 120 plot(dataset[-c(4,5,6)])

Histograms Generated by the hist() function The parameter breaks is key Specifies the number of categories to plot or Specifies the breakpoints for each category The xlab, ylab, xlim, ylim options work as expected

Histogram Blood Pressure Frequency 0 200 400 600 800 1000 80 100 120 140 160 180 200 220 Systolic Blood Pressure hist(dataset$bp.sys, col = "lightblue", xlab = "Systolic Blood Pressure", main = "Blood Pressure")

Histogram, Changed breaks Blood Pressure Frequency 0 100 200 300 400 Can you explain the peculiar pattern? Graphical representations of data are useful at identifying these sorts of artifacts 80 100 120 140 160 180 200 220 Systolic Blood Pressure hist(dataset$bp.sys, col = "lightblue", breaks = seq(80,220,by=2), xlab = "Systolic Blood Pressure", main = "Blood Pressure")

Boxplots Generated by the boxplot() function Draws plot summarizing Median Quartiles (Q1, Q3) Outliers by default, observations more than 1.5 * (Q1 Q3) distant from nearest quartile

A Simple Boxplot Appropriate Units 50 100 150 200 Height Weight Waist Hip bp.sys bp.dia boxplot(dataset, col = rainbow(6), ylab = "Appropriate Units")

Adding Individual Observations rug() can add a tick for each observation to the side of a boxplot() and other plots. The side parameter specifies where tickmarks are drawn 40 60 80 100 120 Weight (in kg) > boxplot(dataset$weight, main = "Weight (in kg)", col = "red") > rug(dataset$weight, side = 2)

Customizing Plots R provides a series of functions for adding text, lines and points to a plot We will illustrate some useful ones, but look at demo(graphics) for more examples

Drawing on a plot To add additional data use points(x,y) lines(x,y) For freehand drawing use polygon() rect()

Text Drawing Two commonly used functions: text() writes inside the plot region, could be used to label datapoints mtext() writes on the margins, can be used to add multiline legends These two functions can print mathematical expressions created with expression()

Plotting Two Data Series > x <- seq(0,2*pi, by = 0.1) > y <- sin(x) > y1 <- cos(x) > plot(x,y, col = "green", type = "l", lwd = 3) > lines(x,y1, col = "red", lwd = 3) > mtext("sine and Cosine Plot", side = 3, line = 1) Sine and Cosine Plot y -1.0 0.0 1.0 0 1 2 3 4 5 6 x

Printing on Margins, Using Symbolic Expressions > f <- function(x) x * (x + 1) / 2 > x <- 1:20 > y <- f(x) > plot(x, y, xlab = "", ylab = "") > mtext("plotting the expression", side = 3, line = 2.5) > mtext(expression(y == sum(i,1,x,i)), side = 3, line = 0) > mtext("the first variable", side = 1, line = 3) > mtext("the second variable", side = 2, line = 3) The second variable 0 100 200 Plotting the expression x y = i 1 5 10 15 20 The first variable

Adding a Label Inside a Plot Who will develop obesity? Frequency 0 200 400 600 800 1000 At Risk 20 40 60 80 100 120 140 > hist(dataset$weight, xlab = "Weight", main = "Who will develop obesity?", col = "blue") > rect(90, 0, 120, 1000, border = "red", lwd = 4) > text(105, 1100, "At Risk", col = "red", cex = 1.25) Weight

Symbolic Math Example from demo(plotmath) sum(x[i], i = 1, n) Big Operators n x i 1 prod(plain(p)(x == x), x) x P(X = x) integral(f(x) * dx, a, b) union(a[i], i == 1, n) b f(x)dx a n Ai i=1 intersect(a[i], i == 1, n) i=1 n Ai lim(f(x), x %->% 0) min(g(x), x >= 0) inf(s) sup(s) lim f(x) x 0 min g(x) x 0 infs sup S

Further Customization The par() function can change defaults for graphics parameters, affecting subsequent calls to plot() and friends. Parameters include: cex, mex text character and margin size pch plotting character xlog, ylog to select logarithmic axis scaling

Multiple Plots on A Page Set the mfrow or mfcol options Take 2 dimensional vector as an argument The first value specifies the number of rows The second specifies the number of columns The 2 options differ in the order individual plots are printed

Multiple Plots > par(mfcol = c(3,1)) > hist(dataset$height, breaks = 10, main = "Height (in cm)", xlab = "Height") > hist(dataset$height * 10, breaks = 10, main = "Height (in mm)", xlab = "Height") > hist(dataset$height / 2.54, breaks = 10, main = "Height (in inches)", xlab = "Height") Frequency Frequency Frequency 0 400 800 0 400 800 0 400 800 Height (in cm) 130 140 150 160 170 180 190 200 Height Height (in mm) 1300 1400 1500 1600 1700 1800 1900 2000 Height Height (in inches) 50 55 60 65 70 75 Height

Outputting R Plots R usually generates output to the screen In Windows and the Mac, you can point and click on a graph to copy it to the clipboard However, R can also save its graphics output in a file that you can distribute or include in a document prepared with Word or LATEX

Selecting a Graphics Device To redirect graphics output, first select a device: pdf() high quality, portable format postscript() high quality format png() low quality, but suitable for the web After you generate your graphics, simply close the device dev.off()

Example of Output Redirection > x <- runif(100) > y <- runif(100) * 0.5 + x * 0.5 # This graph is plotted on the screen > plot(x, y, ylab = This is a simple graph ) # This graph is plotted to the PDF file > pdf( my_graph.pdf ) > plot(x, y, ylab = This is a simple graph ) > dev.close() # Where does this one go? > plot(x, y, ylab = This is a simple graph )

Today Introduction to Graphics in R Examples of commonly used graphics functions Common options for customizing graphs