# Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Save this PDF as:

Size: px
Start display at page:

Download "Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller"

## Transcription

1 Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize oneself with the data at hand (this is often called exploratory data analysis). This usually involves graphing the variables in various distributional displays (histograms, box plots etc.) and plotting the relationships between variables (scatter plots, box plots etc). The focus of this tutorial is therefore on how to use some graphical and other tools in R that are generally useful in preliminary graphic data analyses. Exploratory analysis also helps us decide on whether or not to transform a variable to meet some statistical requirements. In this tutorial we will cover some of the basic plotting functions that are built into the R system. In particular, we will cover histograms, boxplots, and scatterplots (with linear regression). Basic plotting functions in R Before starting, create a data frame named fish of the data contained in eg1.txt (this file should be available to you along with this tutorial). The data in this example (kindly provided by Joe Travis) are derived from a larger study of character variation in sailfin mollies (Poecilia latipinna). Specifically, the data consists of selected measurements on females from three populations (identified by the POP variable, populations 1, 2 and 5). We can ignore the column headed IDNO: this matches the line in the example data set to a record in a lab notebook where the data was originally recorded. The actual data recorded includes several measurements taken on individual fish: RAYNO, a discrete variable for the number of rays (finger-like bones) in the dorsal fin; SL, a quantitative variable of the body lengths of the females in millimeters; FINAREA, a quantitative variable containing measurements for the area of the dorsal fin in square millimeters (a function of the fin s length, height and shape), and TAREA, a quantitative variable of the tail-fin area in square millimeters (a function of the tail s length height and shape). The Histogram in R The most common way to examine the distribution of a quantitative (continuous) variable is by means of a histogram. A histogram dissects the range of the variable into equal-width class intervals called bins and then plots the number of observations falling in each bin as a bar chart (i.e. the height of the bar represents the number, proportion or percentage of observations in that class). To plot histograms of the variables SL, FINAREA and TAREA, use the hist command as follows (remember to attach the fish data frame first to make the variables visible to R): >hist(sl)

2 For histograms of the other two variables simply replace the input in parenthesis by the variable names (note that R is case-sensitive). These data come from three different populations; can we plot histograms separately for each population? Recall that POP is a factor identifying which population each measurement came from. Therefore we can simply specify which variable and from which population to plot using the population variable as follows: >hist(sl[pop==1]) >hist(finarea[pop==5]) Note the double equals sign is used for specifying the population: a single equals sign is used to change the actual values stored in POP, so don t do this. In addition, there cannot be a space between the equal signs. Often a factor such as the POP variable will be specified by a name (i.e., a text string) rather than a number. In this case, if we want to specify the level of the factor we place it inside inverted commas (Note: a factor is just another name for a categorical variable where the different classes of the variable are known as the levels of the factor). For example if the POP factor contained the entries U.S.A, Mexico, Canada etc, we could specify the USA level with: >hist(sl[pop=="u.s.a ]) Note also how R automatically creates the axes labels. To modify an axis label we set the properties xlab or ylab (or both) within the histogram command, separated from the other function arguments by commas. The axis title is given between inverted commas. Thus >hist(sl[pop==1], xlab = body length for population 1 ) produces a histogram of SL for population 1, where the X axis is labeled with the text

3 between the inverted commas. The main property is used to give a main heading to your graph. So to give the graph above the heading Size we would type: > hist(sl[pop == 1], xlab = body length for population 1, main= Size ) Type?hist or help(hist) to find out about more options on how to customize your histogram. Note that there are quite a few options concerning bar color, number of bins to use etc. The available colors are encoded by numbers or you can try some color names in quotes (e.g., col = blue ). Try different colors for your histogram by setting the col property. Eg: > hist(sl[pop == 1], xlab = body length for population 1, main= Size, col=2) should change your bars to red. If you prefer shading lines rather then a full color for you histogram bars then set the density function: > hist(sl[pop == 1], xlab = body length for population 1, main= Size, col=2, density = 5) Try some different density values to see what happens (you can go up to pretty high numbers (50+) to get a shading effect). The Box Plot in R The box plot is another useful way to examine the distribution of a variable. A box plot consists of a box on a set of axes where the top and bottom lines of the box represent the upper and lower quartiles respectively (see the example in the figure below). A horizontal line within the box shows the median. The length of the box is known as the inter-quartile range. Box plots also include vertical lines (called whiskers), extending from the top and bottom of the box up to the largest and smallest observations respectively, that fall within 1.5 inter-quartile ranges. All observations that fall beyond these limits are plotted individually as points. Statisticians really like boxplots because a number of distributional properties are easily visualized. Box plots show a measure of location (the median line), dispersion (the length of the box and distance between the upper and lower whiskers), skewness (asymmetry of the upper and lower portions of the box and asymmetry of the whisker lengths), and long drawn out tails (for example whisker length in relation to the length of the box). Box plots are particularly useful as a quick visual comparison between 2 or more samples. To create a box plot in R use the command boxplot(variable), where variable is the name of the variable you want to plot. Adding labels to the axes is done in exactly the same way as for hist, as is changing the color (use?boxplot to check all the options, some differ from the hist function). A very useful feature of the boxplot function is the ability to plot a number of box plots of different variables, parallel to each other on

4 the same plot. Thus >boxplot(sl[pop==1]) Creates a box plot of the SL measurements from population 1 whereas: >boxplot(sl[pop==1], SL[POP== 2], SL[POP==5]) creates parallel box plots of the SL variable from all three populations. Note here that you can see that both the median (solid bar in box) and range (whiskers) increase as we go from population 1 to 5. By the way, there is a short-cut that you can use to see all boxplots for all the subclasses of a variable: >boxplot(sl~pop) Neat, eh? Just use the tilda (~). This works for a lot of the graphics, but not generally for other functions. If you are the nonvisual type and prefer text output rather than a graph the function summary could help you out. Summary returns the minimum and maximum values, the 1 st and 3rd quartiles, the mean and the median of a given vector. For example, to return these values for RAYNO from population 5 type use: >summary(rayno[pop= =5]) which returns:

5 Min. 1st Qu. Median Mean 3rd Qu. Max Whereas: >summary(fish) would return the same statistics for all the variables in the data frame fish (not by population however). There are many other ways of customizing your boxplot (as with any plot in R), but that is something you can check for yourself using the help functions or user manuals. As a final example, let s say we want to plot some blue boxplots of the SL variable on top of each other rather than next to each other, then we would type: >boxplot(sl [POP==1], SL[POP==2], SL[POP==5], col=4, horizontal = TRUE) All these parameters can be found in the help(boxplot) information. The Scatter Plot in R The scatter plot is the standard graph for examining the relationship between two quantitative variables. The plot function is used to create scatter plots (amongst other things), where two numeric vectors are required as arguments to the function. For example, to create a simple scatter plot of FINAREA against TAREA type: >plot(finarea, TAREA)

6 R interprets the first vector as the horizontal coordinates and the second as the vertical coordinates. An alternative means to plotting the relationship is by specifying a simple linear model in the function arguments. This is done with the tilde symbol (~): >plot(finarea~tarea) Note that using the tilda switches the axes: the response or dependent variable (FINAREA) is now plotted on the vertical (y) axis whereas the predictor or independent variable (TAREA) is on the horizontal (x) axis. Once again xlab and ylab can be used to add labels to the axes (most of the plotting functions use roughly same arguments to modify their attributes). Interpretation of a scatter plot is often assisted by enhancing the plot with least squares or non-parametric regression lines. As such lines are often dependent on the particular statistical model we are applying we won t go into too much depth here, but just to see what I mean, let s add a simple linear regression line to the plot above. The function lm can be used to specify a linear model and abline to add a line of the slope-intercept form. So to add a simple linear regression line to the plot above we type: >abline(lm(finarea~tarea)) Note: the graph has to already be there, so if you have closed the graphics panel, first replot the variables. As we are busy with graphics, we might as well discuss some ways to modify the line (i.e. width, type and color). The attributes lwd, lty, and col are used to specify the line width, line type and line color respectively. First, close the graphics window and plot FINAREA against TAREA again (this is because we are going to add a line to the graph but at the moment, we are adding a new line, not modifying the existing

7 one, so it will be difficult to see against the one already plotted). Next, type: > abline(lm(finarea~tarea), lwd = 10, lty = 2, col=3) Wow, didn t expect that, did you? Repeat this a few times with different values for the line parameters to see what s available. Often scatter plots include a number of overlapping points (this is especially true when we are plotting discrete variables, as they can only take so many values). For example create a scatter plot of RAYNO against POP. You will notice that only nine points are shown on the plot. This is because a number of these points are overlapping (i.e. POP can only take on three different values and RAYNO 4, so there is bound to be a lot of overlap). In these situations, the jitter function comes in handy. The jitter function adds a small random quantity to the data coordinates thus serving to separate the overplotted points. Try the following: >plot(jitter(rayno), POP) >plot(jitter(rayno), jitter(pop)) And, finally: >plot(jitter(rayno), jitter(pop), xlab= number of rays, ylab= population, main= A jittered scatter plot ) Multiple Figures on One Plot You will have noticed by now that each time a plotting function is used, the previous graph is replaced by the new one. Although you can copy and paste the graphs into some

8 other program in order to compare them, it might be useful to plot them simultaneously in R in one panel. One way of doing this is to use the split.screen function (as per usual, there are many other ways, but we will start with this one). Let s say that we want to plot histograms and box plots of the SL variable for the three different populations, all on one screen. We could begin by using split.screen to create 6 separate areas in the graphics panel as follows: >split.screen(figs=c(2, 3)) The function above splits the graphics panel into a matrix of 2 rows and 3 columns where the screens are numbered 1-6 by rows (the original graphics surface is now called screen 0). Next we specify which screen we want to plot in. for example to plot a histogram of the SL variable from population 1 in the first screen we use the following two commands: >screen(1) >hist(sl[pop= =1]) We can now plot the rest of the graphs on the screen by specifying which screen to use each time, then typing the actual plotting function. E.g.: >screen(2) >hist(sl[pop==2]). >screen(4) >boxplot(sl[pop== 1]) etc.

10 Answers: 1. >split.screen(fig=c(1,3)); >screen(1); >boxplot(data1,xlab="population",ylab="range",col=5); >screen(2); >boxplot(data2,xlab="population",ylab="range",main="these are the results of my first function and the graph heading was made a little smaller in order to fit on the page",cex.main=0.6,col=6); >screen(3); >boxplot(data3,xlab="population",ylab="range",col=7); >close.screen(all=t); 2. >par(mfrow=c(2,3)); >boxplot(sl[pop==1],xlab="population",ylab="range",col=4); >boxplot(sl[pop==2],xlab="population",ylab="range",col=5); >boxplot(sl[pop==5],xlab="population",ylab="range",col=6); >plot(tarea[pop==1],finarea[pop==1],col=4); >abline(lm(finarea[pop==1]~tarea[pop==1]),col=4) >plot(tarea[pop==2],finarea[pop==2],col=5); >abline(lm(finarea[pop==2]~tarea[pop==2]),col=5) >plot(tarea[pop==5],finarea[pop==5],col=6); >abline(lm(finarea[pop==5]~tarea[pop==5]),col=6) >close.screen(all=t);

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

### Drawing a histogram using Excel

Drawing a histogram using Excel STEP 1: Examine the data to decide how many class intervals you need and what the class boundaries should be. (In an assignment you may be told what class boundaries to

### Getting started with qplot

Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy

### SPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav)

Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Organize and Display One Quantitative Variable (Descriptive Statistics, Boxplot & Histogram) 1. Move the mouse pointer

### Data Exploration Data Visualization

Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

### Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

### AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis

AMS 7L LAB #2 Spring, 2009 Exploratory Data Analysis Name: Lab Section: Instructions: The TAs/lab assistants are available to help you if you have any questions about this lab exercise. If you have any

### Lecture 2: Exploratory Data Analysis with R

Lecture 2: Exploratory Data Analysis with R Last Time: 1. Introduction: Why use R? / Syllabus 2. R as calculator 3. Import/Export of datasets 4. Data structures 5. Getting help, adding packages 6. Homework

### Exploratory Spatial Data Analysis

Exploratory Spatial Data Analysis Part II Dynamically Linked Views 1 Contents Introduction: why to use non-cartographic data displays Display linking by object highlighting Dynamic Query Object classification

### Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

### CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin

My presentation is about data visualization. How to use visual graphs and charts in order to explore data, discover meaning and report findings. The goal is to show that visual displays can be very effective

### A Guide for a Selection of SPSS Functions

A Guide for a Selection of SPSS Functions IBM SPSS Statistics 19 Compiled by Beth Gaedy, Math Specialist, Viterbo University - 2012 Using documents prepared by Drs. Sheldon Lee, Marcus Saegrove, Jennifer

### SPSS Manual for Introductory Applied Statistics: A Variable Approach

SPSS Manual for Introductory Applied Statistics: A Variable Approach John Gabrosek Department of Statistics Grand Valley State University Allendale, MI USA August 2013 2 Copyright 2013 John Gabrosek. All

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### Visualization Quick Guide

Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### Getting started in Excel

Getting started in Excel Disclaimer: This guide is not complete. It is rather a chronicle of my attempts to start using Excel for data analysis. As I use a Mac with OS X, these directions may need to be

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Statistics Chapter 2

Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students

### Getting Started with R and RStudio 1

Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### Graphics in R. Biostatistics 615/815

Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions

### Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

### Statistics Revision Sheet Question 6 of Paper 2

Statistics Revision Sheet Question 6 of Paper The Statistics question is concerned mainly with the following terms. The Mean and the Median and are two ways of measuring the average. sumof values no. of

### Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

### Using SPSS, Chapter 2: Descriptive Statistics

1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

### Tutorial 2: Using Excel in Data Analysis

Tutorial 2: Using Excel in Data Analysis This tutorial guide addresses several issues particularly relevant in the context of the level 1 Physics lab sessions at Durham: organising your work sheet neatly,

### 5 Correlation and Data Exploration

5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both

### 0 Introduction to Data Analysis Using an Excel Spreadsheet

Experiment 0 Introduction to Data Analysis Using an Excel Spreadsheet I. Purpose The purpose of this introductory lab is to teach you a few basic things about how to use an EXCEL 2010 spreadsheet to do

### WHICH TYPE OF GRAPH SHOULD YOU CHOOSE?

PRESENTING GRAPHS WHICH TYPE OF GRAPH SHOULD YOU CHOOSE? CHOOSING THE RIGHT TYPE OF GRAPH You will usually choose one of four very common graph types: Line graph Bar graph Pie chart Histograms LINE GRAPHS

### Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller

Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller Most of you want to use R to analyze data. However, while R does have a data editor, other programs such as excel are often better

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### IQR Rule for Outliers

1. Arrange data in order. IQR Rule for Outliers 2. Calculate first quartile (Q1), third quartile (Q3) and the interquartile range (IQR=Q3-Q1). CO2 emissions example: Q1=0.9, Q3=6.05, IQR=5.15. 3. Compute

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### Frequency distributions, central tendency & variability. Displaying data

Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the

### There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet

### Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

### Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

### Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS

Biostatistics: A QUICK GUIDE TO THE USE AND CHOICE OF GRAPHS AND CHARTS 1. Introduction, and choosing a graph or chart Graphs and charts provide a powerful way of summarising data and presenting them in

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

### Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics

Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics Why visualize data? The human eye is extremely sensitive to differences in: Pattern Colors

### Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

1 Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab I m sure you ve wondered about the absorbency of paper towel brands as you ve quickly tried to mop up spilled soda from

### IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

### Graphical and Tabular. Summarization of Data OPRE 6301

Graphical and Tabular Summarization of Data OPRE 6301 Introduction and Re-cap... Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information

### Exploratory Data Analysis

Exploratory Data Analysis Learning Objectives: 1. After completion of this module, the student will be able to explore data graphically in Excel using histogram boxplot bar chart scatter plot 2. After

### Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Hierarchical Clustering Analysis

Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.

### Brief Review of Median

Session 36 Five-Number Summary and Box Plots Interpret the information given in the following box-and-whisker plot. The results from a pre-test for students for the year 2000 and the year 2010 are illustrated

### 3: Summary Statistics

3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

### A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

### Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### GeoGebra. 10 lessons. Gerrit Stols

GeoGebra in 10 lessons Gerrit Stols Acknowledgements GeoGebra is dynamic mathematics open source (free) software for learning and teaching mathematics in schools. It was developed by Markus Hohenwarter

### Basic Tools for Process Improvement

What is a Histogram? A Histogram is a vertical bar chart that depicts the distribution of a set of data. Unlike Run Charts or Control Charts, which are discussed in other modules, a Histogram does not

### Descriptive Statistics

Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Viewing Ecological data using R graphics

Biostatistics Illustrations in Viewing Ecological data using R graphics A.B. Dufour & N. Pettorelli April 9, 2009 Presentation of the principal graphics dealing with discrete or continuous variables. Course

### The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Paper 156-2010 The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Abstract JMP has a rich set of visual displays that can help you see the information

### Intro to Statistics 8 Curriculum

Intro to Statistics 8 Curriculum Unit 1 Bar, Line and Circle Graphs Estimated time frame for unit Big Ideas 8 Days... Essential Question Concepts Competencies Lesson Plans and Suggested Resources Bar graphs

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

### San Jose State University Engineering 10 1

KY San Jose State University Engineering 10 1 Select Insert from the main menu Plotting in Excel Select All Chart Types San Jose State University Engineering 10 2 Definition: A chart that consists of multiple

### Universal Simple Control, USC-1

Universal Simple Control, USC-1 Data and Event Logging with the USB Flash Drive DATA-PAK The USC-1 universal simple voltage regulator control uses a flash drive to store data. Then a propriety Data and

### Each function call carries out a single task associated with drawing the graph.

Chapter 3 Graphics with R 3.1 Low-Level Graphics R has extensive facilities for producing graphs. There are both low- and high-level graphics facilities. The low-level graphics facilities provide basic

### 13.2 Measures of Central Tendency

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

### Chapter 4 Creating Charts and Graphs

Calc Guide Chapter 4 OpenOffice.org Copyright This document is Copyright 2006 by its contributors as listed in the section titled Authors. You can distribute it and/or modify it under the terms of either

### Introduction to R and Exploratory data analysis

Introduction to R and Exploratory data analysis Gavin Simpson November 2006 Summary In this practical class we will introduce you to working with R. You will complete an introductory session with R and

### Introduction Course in SPSS - Evening 1

ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/

### Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data

Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data Chapter Focus Questions What are the benefits of graphic display and visual analysis of behavioral data? What are the fundamental

### Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data

### CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

Curriculum Map/Pacing Guide page 1 of 14 Quarter I start (CP & HN) 170 96 Unit 1: Number Sense and Operations 24 11 Totals Always Include 2 blocks for Review & Test Operating with Real Numbers: How are

### Demographics of Atlanta, Georgia:

Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From

### Appendix 2.1 Tabular and Graphical Methods Using Excel

Appendix 2.1 Tabular and Graphical Methods Using Excel 1 Appendix 2.1 Tabular and Graphical Methods Using Excel The instructions in this section begin by describing the entry of data into an Excel spreadsheet.

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Lean Six Sigma Training/Certification Book: Volume 1

Lean Six Sigma Training/Certification Book: Volume 1 Six Sigma Quality: Concepts & Cases Volume I (Statistical Tools in Six Sigma DMAIC process with MINITAB Applications Chapter 1 Introduction to Six Sigma,

### Data Visualization Techniques

Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

### MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

### How Does My TI-84 Do That

How Does My TI-84 Do That A guide to using the TI-84 for statistics Austin Peay State University Clarksville, Tennessee How Does My TI-84 Do That A guide to using the TI-84 for statistics Table of Contents

Using Your TI-NSpire Calculator: Descriptive Statistics Dr. Laura Schultz Statistics I This handout is intended to get you started using your TI-Nspire graphing calculator for statistical applications.

### Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate