5 Correlation and Data Exploration

Size: px
Start display at page:

Download "5 Correlation and Data Exploration"


1 5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both children and adult learners of L2 English. We used a Spearman Rank Order Correlation Test to compare the orders of different groups of learners and found that there were statistically significant relationships (i.e. p < 0.05). We also used a Pearson Correlation Test to find if the morpheme acquisition difficulties were similar across groups of learners. The results were mixed. Some showed statistically significant relationships but others did not. Correlation tests tell us how much two variables vary together. Figure 1 shows scatterplots of pairs of variables with different correlation strengths (r = 0.90, r = 0.50 and r = 0.00) with a regression line and its 95% confidence interval. (Regression is another statistical technique that is closely related to correlation. We shall look at it later.) The 95% confidence intervals show the range of regression lines that are possible based on the sample. The further they are apart, the less precise our regression line is likely to be. Figure 1. Scatterplots of variables at different correlation coefficients (r) with 95% confidence intervals In the scatterplot on the left (Figure 1), there is a very strong relationship (r = 0.90) between the variables. All the points are close to the regression line, and most of them are also in the bottom left and top right quadrants. The 95% confidence interval is also relatively narrow. In the 1

2 middle scatterplot the relationship is not as strong (r = 0.50). The points are more spread out and further from the regression line, but most are still in the bottom left and top right quadrants. The 95% confidence interval is also wider. In the scatterplot on the right, there is no relationship between the variables (r = 0.00). The points are randomly scattered over the graph, and there are roughly the same number in each of the four quadrants. The regression line cannot be seen because it is now I horizontal line that goes through the mean of y. The 95% confidence interval is also the widest. Data Exploration Data exploration means looking at our data in detail so that we can find its characteristics. It is an essential step before carrying out any statistical tests, although this seems to often be forgotten by researchers in the field of SLA. The first step is often to calculate the descriptive statistics: the mean, the median, the minimum, the maximum, the range, the standard deviation, 95% confidence intervals of the mean, skewness, kurtosis and standard error. Other data exploration techniques that are being used more and more are graphic techniques such as histograms, density plots, box plots, scatterplots with regression lines and/or smoothed trend (loess or lowess) lines and confidence intervals. Exploring the critical period hypothesis (dekeyser, 2000) In our quick look at correlations (above), one of our assumptions was that the relationship between the two variables is linear (a straight line). However, the Critical Period Hypothesis (CPH) claims that the relationship between Age of Acquisition (AoA) and ultimate attainment is non-linear (see Unit 5, Figure 1). In this section, we will explore the data from one study that claimed to support the CPH and see if it suggests that the relationship is non-linear. We will begin by making a scatterplot of the data with a regression line and its 95% confidence interval, and a loess (smoothed trend) line and its 95% confidence interval. The graph will look like Figure 2. In order to make this graph, you will need to have the ggplot2 package installed in R. (If it is not installed, follow the instructions in Appendix A to install it, or, if you cannot install it, follow the instructions in Appendix B for creating it with the built-in plotting functions.) The data you will need is in a file called dekeyser.txt. This file begins with a header, which contains the names of the variables ( AoA, GJT, Status ) and below them three columns of data. The columns (and variable names) are separated by an invisible tab character (or "\t" in R). The first few lines of the file look like this: 2

3 Figure 2. Scatterplot of scores on a grammaticality judgement task (GJT) and age of acquisition (AoA) with regression and loess lines and their 95% confidence intervals produced using the ggplot function in the ggplot2 package (data from dekeyser, 2000) "AoA" "GJT" "Status" "Under 15" "Under 15" "Under 15" "Under 15" "Under 15" "Under 15" First, you will need to read the data into R and store it in a variable. There are several ways to do this but the following one is the most similar to other software. The command has several parts. dekeyser is the name of the data frame that you are going to store the data in (you can choose another name if you prefer). read.table() is the function that will actually read the data. file.choose() is another function that will start an open file dialogue box, similar to other programs in Windows and Mac. header = TRUE indicates that the first line of the file is a header and NOT data. The 3

4 final argument sep = "\t" indicates that the columns are separated by tab character. Type the following (without "> ") and choose the file dekeyser.txt. > dekeyser <- read.table(file.choose(), header = TRUE, sep = "\t") If you get an error message just try again. Now let s see what things look like. Type: > head(dekeyser) AoA GJT Status Under Under Under Under Under Under 15 You should see the first six lines of the data. The first row (AoA GJT Status) is your header. You can also see that R has added row numbers ( ) at the beginning of each row of data. Now that the data has been imported, we can start to plot the graph. The first thing to do is to load the ggplot2 package. This is done with the library() function: > library(ggplot2) Next, we use the ggplot() function to plot the graph. ggplot() is a bit different from other functions we have used, and is made up of parts joined by a + symbol. > ggplot(data = dekeyser, aes(aoa, GJT)) + geom_point() + geom_smooth(method = "lm") + geom_smooth(colour = "red") The first part, ggplot(), initialises the plot but does not draw anything. In this example, it has two arguments. data = dekeyser tells ggplot to use the data frame called dekeyser, and aes(aoa, GJT) tells it to use the AoA and GJT variables (notice the order is x-axis, y-axis). geom_point() plots the points. geom_smooth() draws lines (and their 95% confidence intervals) calculated from the data. geom_smooth(method = "lm") draws a straight regression line, which is specified by the argument method = 4

5 "lm". geom_smooth(colour = "red"), draws a red loess trend line on the graph. Notice, the method does not need to be specified for a loess trend line because it is the default. > detach(package: ggplot2) Interpretation What does the graph we have produced tell us about the data? Is there any evidence for a Critical Period? Figure 3. Regression and Loess lines with 95% confidence intervals of the dekeyser (2000) data. I think the most important thing we need to look at is the regression line (blue) and the confidence intervals for the Loess line. If the regression line goes outside the confidence intervals for the Loess line, then there may be evidence for a Critical Period. In this case, we can see that it is outside the confidence intervals from about 20 to 24 years old. This however seems to be very late as the Critical Period is assumed to 5

6 end at puberty, which is usually thought to be from 13 to 15 years old. In other words, this data does not appear to support the Critical Period hypothesis. Of course, more sophisticated statistical techniques are needed to show whether this is likely to be true or not, but a visual analysis of the data can also be extremely helpful. Assignments Create similar graphs using the dekeyserisr.txt, dekeyserus.txt and FlegeSimple.txt. Is there any evidence of a Critical Period? 6

7 Appendix A Installing packages in R This section shows you how to install packages in R. For instructions on installing R, refer to: The easiest way to install new packages in R is to use the menus. First, click Packages (パッケージ) and select Install package(s) (パッケージ のインストール ). A list of servers will appear (see below). 7

8 Next, select the server from which to download the package. Here, the default server (0- Cloud) has been selected. If you prefer you may scroll down and select a server in Japan. After you do this, a list of packages appears. Scroll down this list until you find ggplot2. Select it and click OK. The package will be installed automatically. 8

9 Appendix A Plotting the data without the ggplot2 package Using the built in functions for plotting data (e.g., plot(), abline() and lines()) is more complicated than using functions in the ggplot2 package. The steps to produce Figure 2 are explained below. Figure 4. Scatterplot of scores on a grammaticality judgement task (GJT) and age of acquisition (AoA) with regression and loess lines and their 95% confidence intervals The data you will need is in a file called dekeyser.txt. This file begins with a header, which contains the names of the variables ( AoA, GJT, Status ) and below them three columns of data. The columns (and variable names) are separated by an invisible tab character (or "\t" in R). The first few lines of the file look like this: "AoA" "GJT" "Status" 9

10 8 170 "Under 15" "Under 15" "Under 15" "Under 15" "Under 15" "Under 15" First, you will need to read the data into R and store it in a variable. There are several ways to do this but the following one is the most similar to other software. The command has several parts. dekeyser is the name of the variable that you are going to store the data in (you can choose another name if you prefer). read.table() is the function that will actually read the data. file.choose() is another function that will start an open file dialogue box, similar to other programs in Windows and Mac. header = TRUE indicates that the first line of the file is a header and NOT data. The final argument sep = "\t" indicates that the columns are separated by tab character. Type the following (without the leading "> ") and choose the file dekeyser.txt. > dekeyser <- read.table(file.choose(), header = TRUE, sep = "\t") If you get an error message just try again. Now let s see what things look like. Type: > head(dekeyser) AoA GJT Status Under Under Under Under Under Under 15 You should see the first six lines of the data. The first row (AoA GJT Status) is your header. You can also see that R has added row numbers ( ) at the beginning of each row of data. The variable dekeyser is different from the vector variables that we used before. It is a data frame variable. However, in order to use the variables in it like vectors, type the following: > attach(dekeyser) Now we shall, do some calculations that the graphics functions need in order to draw the lines. The first one lm() calculates the regression line for 10

11 GJT (x-axis) and AoA (x-axis) and stores it in dekeyser.lm. [After you done this, type dekeyser.lm to see what the regression line data looks like.] > dekeyser.lm <- lm(gjt ~ AoA) The next command stores a sequence of numbers in newx. [After you have done it, type newx to see what it looks like.] > newx <- seq(0, 45, 0.1) The next command, predict.lm(), calculates the predicted values of GJT and their confidence intervals and stores them in pred. Because AoA does not have many values, the 95% confidence lines may not be very smooth. newx is used instead of the original AoA values in order to make smoother lines. [Once again, you can type pred to see what this data looks like.] > pred <- predict.lm(dekeyser.lm, newdata = data.frame(aoa=newx), interval = "confidence") Now, we can start to plot the graph. First, the points and the regression line and its 95% confidence intervals. > plot(gjt ~ AoA, bty = "n", col = "grey", ylim = c(80,210)) > abline(dekeyser.lm) > lines(pred[,2]~newx, lty = 2, col = "grey") > lines(pred[,3]~newx, lty = 2, col = "grey") The next step, is to do the calculations for the loess trend line and confidence intervals. The sequences similar to that for the regression line but, because there are differences in the structure of pred (used for the regression line) and pred2, the arguments used for drawing the lines are different. > dekeyser.lo <- loess(gjt ~ AoA) > newx <- seq(0, 45, 0.1) > pred2 <- predict(dekeyser.lo, newdata = data.frame(aoa=newx), se = TRUE) > lines(pred2$fit~newx, col = "red4") > lines(pred2$fit - qt(0.975,pred2$df)*pred2$se~newx, lty = 2, col = "pink3") > lines(pred2$fit + qt(0.975,pred2$df)*pred2$se~newx, lty = 2, col = "pink3") 11

12 Summary of commands > attach(dekeyser) > dekeyser.lm <- lm(gjt ~ AoA) > newx <- seq(0, 45, 0.1) > pred <- predict.lm(dekeyser.lm, newdata = data.frame(aoa=newx), interval = "confidence") > plot(gjt ~ AoA, bty = "n", col = "grey", ylim = c(80,210)) > abline(dekeyser.lm) > lines(pred[,2]~newx, lty = 2, col = "grey") > lines(pred[,3]~newx, lty = 2, col = "grey") > dekeyser.lo <- loess(gjt ~ AoA) > newx <- seq(0, 45, 0.1) > pred2 <- predict(dekeyser.lo, newdata = data.frame(aoa=newx), se = TRUE) > lines(pred2$fit~newx, col = "red4") > lines(pred2$fit - qt(0.975,pred2$df)*pred2$se~newx, lty = 2, col = "pink3") > lines(pred2$fit + qt(0.975,pred2$df)*pred2$se~newx, lty = 2, col = "pink3") > detach(dekeyser) 12

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

More information

Microsoft Excel. Qi Wei

Microsoft Excel. Qi Wei Microsoft Excel Qi Wei Excel (Microsoft Office Excel) is a spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

More information

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Getting started with qplot

Getting started with qplot Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy

More information

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices: Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

More information

An introduction to using Microsoft Excel for quantitative data analysis

An introduction to using Microsoft Excel for quantitative data analysis Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing

More information

An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Data analysis and regression in Stata

Data analysis and regression in Stata Data analysis and regression in Stata This handout shows how the weekly beer sales series might be analyzed with Stata (the software package now used for teaching stats at Kellogg), for purposes of comparing

More information

GeoGebra Statistics and Probability

GeoGebra Statistics and Probability GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,

More information

Plot and Solve Equations

Plot and Solve Equations Plot and Solve Equations With SigmaPlot s equation plotter and solver, you can - plot curves of data from user-defined equations - evaluate equations for data points, and solve them for a data range. You

More information

Introduction to Exploratory Data Analysis

Introduction to Exploratory Data Analysis Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Scientific Graphing in Excel 2010

Scientific Graphing in Excel 2010 Scientific Graphing in Excel 2010 When you start Excel, you will see the screen below. Various parts of the display are labelled in red, with arrows, to define the terms used in the remainder of this overview.

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calibration and Linear Regression Analysis: A Self-Guided Tutorial Calibration and Linear Regression Analysis: A Self-Guided Tutorial Part 1 Instrumental Analysis with Excel: The Basics CHM314 Instrumental Analysis Department of Chemistry, University of Toronto Dr. D.

More information

SPSS Tutorial, Feb. 7, 2003 Prof. Scott Allard

SPSS Tutorial, Feb. 7, 2003 Prof. Scott Allard p. 1 SPSS Tutorial, Feb. 7, 2003 Prof. Scott Allard The following tutorial is a guide to some basic procedures in SPSS that will be useful as you complete your data assignments for PPA 722. The purpose

More information


TIPS FOR DOING STATISTICS IN EXCEL TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you

More information

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application

More information

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc. STATGRAPHICS Online Statistical Analysis and Data Visualization System Revised 6/21/2012 Copyright 2012 by StatPoint Technologies, Inc. All rights reserved. Table of Contents Introduction... 1 Chapter

More information

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics. Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Using Excel for Statistical Analysis

Using Excel for Statistical Analysis Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

More information

SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

Data exploration with Microsoft Excel: univariate analysis

Data exploration with Microsoft Excel: univariate analysis Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

STC: Descriptive Statistics in Excel 2013. Running Descriptive and Correlational Analysis in Excel 2013

STC: Descriptive Statistics in Excel 2013. Running Descriptive and Correlational Analysis in Excel 2013 Running Descriptive and Correlational Analysis in Excel 2013 Tips for coding a survey Use short phrases for your data table headers to keep your worksheet neat, you can always edit the labels in tables

More information


HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction

TI-Inspire manual 1. Instructions. Ti-Inspire for statistics. General Introduction TI-Inspire manual 1 General Introduction Instructions Ti-Inspire for statistics TI-Inspire manual 2 TI-Inspire manual 3 Press the On, Off button to go to Home page TI-Inspire manual 4 Use the to navigate

More information


PERFORMING REGRESSION ANALYSIS USING MICROSOFT EXCEL PERFORMING REGRESSION ANALYSIS USING MICROSOFT EXCEL John O. Mason, Ph.D., CPA Professor of Accountancy Culverhouse School of Accountancy The University of Alabama Abstract: This paper introduces you to

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

0 Introduction to Data Analysis Using an Excel Spreadsheet

0 Introduction to Data Analysis Using an Excel Spreadsheet Experiment 0 Introduction to Data Analysis Using an Excel Spreadsheet I. Purpose The purpose of this introductory lab is to teach you a few basic things about how to use an EXCEL 2010 spreadsheet to do

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample.

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample. Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample. Step 1 is to construct a standard curve that defines

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16

Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16 Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16 Since R is command line driven and the primary software of Chapter 16, this document details

More information

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Formula for linear models. Prediction, extrapolation, significance test against zero slope. Formula for linear models. Prediction, extrapolation, significance test against zero slope. Last time, we looked the linear regression formula. It s the line that fits the data best. The Pearson correlation

More information

Using Excel for descriptive statistics

Using Excel for descriptive statistics FACT SHEET Using Excel for descriptive statistics Introduction Biologists no longer routinely plot graphs by hand or rely on calculators to carry out difficult and tedious statistical calculations. These

More information

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships

More information

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Introduction Course in SPSS - Evening 1

Introduction Course in SPSS - Evening 1 ETH Zürich Seminar für Statistik Introduction Course in SPSS - Evening 1 Seminar für Statistik, ETH Zürich All data used during the course can be downloaded from the following ftp server: ftp://stat.ethz.ch/u/sfs/spsskurs/

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Biology statistics made simple using Excel

Biology statistics made simple using Excel Millar Biology statistics made simple using Excel Biology statistics made simple using Excel Neil Millar Spreadsheet programs such as Microsoft Excel can transform the use of statistics in A-level science

More information

Each function call carries out a single task associated with drawing the graph.

Each function call carries out a single task associated with drawing the graph. Chapter 3 Graphics with R 3.1 Low-Level Graphics R has extensive facilities for producing graphs. There are both low- and high-level graphics facilities. The low-level graphics facilities provide basic

More information

Excel Tutorial. Bio 150B Excel Tutorial 1

Excel Tutorial. Bio 150B Excel Tutorial 1 Bio 15B Excel Tutorial 1 Excel Tutorial As part of your laboratory write-ups and reports during this semester you will be required to collect and present data in an appropriate format. To organize and

More information

Chapter 4 Creating Charts and Graphs

Chapter 4 Creating Charts and Graphs Calc Guide Chapter 4 OpenOffice.org Copyright This document is Copyright 2006 by its contributors as listed in the section titled Authors. You can distribute it and/or modify it under the terms of either

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang

Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang School of Computing Sciences, UEA University of East Anglia Brief Introduction to R R is a free open source statistics and mathematical

More information

Final Software Tools and Services for Traders

Final Software Tools and Services for Traders Final Software Tools and Services for Traders TPO and Volume Profile Chart for NinjaTrader Trial Period The software gives you a 7-day free evaluation period starting after loading and first running the

More information

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure Note: there is a second document that goes with this one! 2046 - Absorbance Spectrophotometry. Make sure you

More information


OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-110012 1. INTRODUCTION R is a free software environment for statistical computing

More information

UCL Depthmap 7: Data Analysis

UCL Depthmap 7: Data Analysis UCL Depthmap 7: Data Analysis Version 7.12.00c Outline Data analysis in Depthmap Although Depthmap is primarily a graph analysis tool, it does allow you to investigate data that you produce. This tutorial

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

Beginner s Matlab Tutorial

Beginner s Matlab Tutorial Christopher Lum lum@u.washington.edu Introduction Beginner s Matlab Tutorial This document is designed to act as a tutorial for an individual who has had no prior experience with Matlab. For any questions

More information

Microsoft Excel Tutorial

Microsoft Excel Tutorial Microsoft Excel Tutorial Microsoft Excel spreadsheets are a powerful and easy to use tool to record, plot and analyze experimental data. Excel is commonly used by engineers to tackle sophisticated computations

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

MetroBoston DataCommon Training

MetroBoston DataCommon Training MetroBoston DataCommon Training Whether you are a data novice or an expert researcher, the MetroBoston DataCommon can help you get the information you need to learn more about your community, understand

More information

UCINET Visualization and Quantitative Analysis Tutorial

UCINET Visualization and Quantitative Analysis Tutorial UCINET Visualization and Quantitative Analysis Tutorial Session 1 Network Visualization Session 2 Quantitative Techniques Page 2 An Overview of UCINET (6.437) Page 3 Transferring Data from Excel (From

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Graphics in R. Biostatistics 615/815

Graphics in R. Biostatistics 615/815 Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Updates to Graphing with Excel

Updates to Graphing with Excel Updates to Graphing with Excel NCC has recently upgraded to a new version of the Microsoft Office suite of programs. As such, many of the directions in the Biology Student Handbook for how to graph with

More information

SPSS Introduction. Yi Li

SPSS Introduction. Yi Li SPSS Introduction Yi Li Note: The report is based on the websites below http://glimo.vub.ac.be/downloads/eng_spss_basic.pdf http://academic.udayton.edu/gregelvers/psy216/spss http://www.nursing.ucdenver.edu/pdf/factoranalysishowto.pdf

More information


CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Workspaces Creating and Opening Pages Creating Ticker Lists Looking up Ticker Symbols Ticker Sync Groups Market Summary Snap Quote Key Statistics

Workspaces Creating and Opening Pages Creating Ticker Lists Looking up Ticker Symbols Ticker Sync Groups Market Summary Snap Quote Key Statistics Getting Started Workspaces Creating and Opening Pages Creating Ticker Lists Looking up Ticker Symbols Ticker Sync Groups Market Summary Snap Quote Key Statistics Snap Report Price Charts Comparing Price

More information

Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller

Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller Most of you want to use R to analyze data. However, while R does have a data editor, other programs such as excel are often better

More information

General instructions for the content of all StatTools assignments and the use of StatTools:

General instructions for the content of all StatTools assignments and the use of StatTools: General instructions for the content of all StatTools assignments and the use of StatTools: An important part of Business Management 330 is learning how to conduct statistical analyses and to write text

More information

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

IBM SPSS Statistics 20 Part 1: Descriptive Statistics CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 1: Descriptive Statistics Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

TI-Inspire manual 1. I n str uctions. Ti-Inspire for statistics. General Introduction

TI-Inspire manual 1. I n str uctions. Ti-Inspire for statistics. General Introduction TI-Inspire manual 1 I n str uctions Ti-Inspire for statistics General Introduction TI-Inspire manual 2 General instructions Press the Home Button to go to home page Pages you will use the most #1 is a

More information

Introduction and usefull hints for the R software

Introduction and usefull hints for the R software What is R Statistical software and programming language Freely available (inluding source code) Started as a free re-implementation of the S-plus programming language Introduction and usefull hints for

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information