Module 3: Correlation and Covariance

Size: px
Start display at page:

Download "Module 3: Correlation and Covariance"

Transcription

1 Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis is how two or more variables influence each other. We may be searching for a driver than helps explains sales, profits, or revenues; we may be interested in factors that better explain performance of employees; or how which marketing method has the most impact on sales. A basic starting point for understanding a relationship between two variables is covariance, or the more common and standardized measure, correlation. Covariance and correlation are both measures of association between two variables that shows the linear relationship between the variables. Each provides a single summary measure of association that is easily interpreted, and provides a building block for more advanced techniques, such as regression. You will see that correlation and covariance are really similar concepts and are related mathematically. However, of the two terms, correlation is used more often in every day language. When we say two things are correlated we mean that the two things are related to each other. The correlation can be strong or weak, but we understand it as a relationship. In statistics, correlation has the same meaning, but it will be expressed in mathematical terms with a specific interpretation, direction (positive or negative) and strength. In particular, the correlation coefficient provides a good starting point for more advanced data analysis. Along with scatter plots, the correlation coefficient provides insight into bivariate, or two variable, relationships. It is a flexible measure of association which can be used with continuous level variables, ordinal variables, and dummy variables. I think you will find the correlation coefficient intuitive and useful tool to summarize a relationship between two variables. It also has a direct connection with bivariate regression. Key Objectives Understand the properties of measures of association Understand the covariance and correlation as bivariate measures of association Understand how to interpret the correlation coefficient and to read and interpret a correlation matrix Understand how to use correlations as an intermediate step in data analysis In this Module We Will: Describing measures of association Look at covariance and correlation matrices, along with corresponding scatter plots Begin the linkage of correlation with regression For more information, contact: Tom Ilvento 213 Townsend Hall, Newark, DE

2 Using Statistical Data to Make Decisions: Correlation and Covariance Page 2 MEASURES OF ASSOCIATION Measures of association show the relationship between two variables. It is a numerical measure and in most cases a single measure (although it can be several numbers). Most often, they focus on how two variables vary together (or not). There are many measures of association in statistics, developed for their usefulness with different types of data and different situations. Some of them have inferential properties and some are useful solely for their ability to help describe a relationship. Example measures of association include the correlation coefficient, an odds ratio, R 2 in regression, and the regression coefficient. A good starting point for discussion of measures of association is to understand some criteria of any measure of association. These criteria are used to evaluate and compare various measures of association, and as such help us to interpret the measure. The criteria focus on the range of the measure, whether it is bounded by an upper or lower level, whether is is symmetrical, and how to interpret the measure. Each are discussed briefly below. What is the range (from high to low)? We want to know the possible range of a measure of association in order to gain some sense of what is a high or low value. We might ask if it can take on negative values or is it only positive; whether it is centered around a natural midpoint; and if the upper and lower values are the same when it is calculated for every variable. Measures of association are numerical measures which typically focus on how two variables vary together (or not). Criteria for Measures of Association What is the range? Is it bounded? Is it Symmetrical? How to interpret? Is it bounded? Similar to the last point, we want to know if there is a natural upper or lower bound to our measure of association. Some measures of association (such as an odds ratio) have a lower bound, but no upper level. As a result, an odds ratio can be very large. Other measures of association do have natural upper and lower bound that makes it easier to interpret is there is a strong or weak relationship. In some cases, statisticians have been able to reformulate a measure of association to create an upper and lower bound. Is it symmetrical? If a measure of association is symmetrical it means that the relationship between two variables, say X and Y, is the same for when we specify it as X to Y or Y to X. This implies that we do not have to designate one variable as preliminary, independent, or as necessarily influencing the other.

3 Using Statistical Data to Make Decisions: Correlation and Covariance Page 3 How to interpret? Interpretation should be the key criteria for any measure of association - what does it mean for my data? We usually start with trying to understand the extremes. What does it mean to have a perfect relationship (the highest value or the lowest value)? What does it mean if there is no relationship? If you can identify a clear understanding of the extremes you can begin to gain a sense of what an intermediate value means. The next section will begin to discuss covariance and then correlation. We will return to these criteria of measures of association as a way to interpret and compare these two measures of association. COVARIANCE We have already started with the concept of how a single variable varies about its mean as a measure of the spread of the data. We identified the variance as the total sum of squared deviations about the mean (Total Sum of Squares) divided by n-1 (the degrees of freedom). We will use a similar concept to talk about how two variables vary about their means together. Another way to express the formula for covariance is given below. SS XY is called the sum of squares cross product. Cov XY = SS n XY The formula for covariance is given below. If you focus on the numerator, it shows that the we are looking at how two variables vary about their means together. Cov XY = n 2 ( X i X ) ( Yi Y ) i= 1 Let me use an illustration to show how covariance works, and then we will use a data example. The following table (Figure 1) represents a the graph of a scatter plot between X (on the horizontal axis) and Y (on the vertical Axis). I have marked the Y-mean and the X-mean values on the graph with lines which divide the graph into four quadrants. A data point that is above the mean for both X and Y will fall in the first quadrant, and a data point that is both below the mean for Y and the mean for X will fall in the third quadrant. If a scatter plot tends to have values that fall mainly in the First and Third quadrants the covariance between the two variables will be positive - values of X tend to vary about its mean in the same way that values of Y vary about its mean. Likewise, if values tend to fall in the Second and Fourth quadrants it means that deviations of X values about the X- mean tend to be in a different direction than deviations of Y values about its mean. This is associated with negative covariance. n 2 If a scatter plot tends to have values that fall mainly in the I and III quadrants, the covariance between the two variables will be positive. If they fall in the II and IV quadrants, it will result in negative covariance.

4 Using Statistical Data to Make Decisions: Correlation and Covariance Page 4 II I Y-mean III IV X-mean Figure 1. Graphic depiction of Covariance Between Two Variables, X and Y Let s look at a data example. The following is some data about mid-level managers in a company. The variables are RATING, a rating scale of the managers from 0 to 10; SALARY, the salary of the manager in $1,000); YEARS, years of service at the company; and ORIGIN, a dummy variable indicating whether they were promoted inside the company (coded as 0) or were recruited from outside the company (coded as 1). The descriptive statistics for these variables are given below. RATING SALARY YEARS ORIGIN Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Table 1. Descriptive Statistics in the Manager Salary Example

5 Using Statistical Data to Make Decisions: Correlation and Covariance Page 5 The mean salary level is 71.63, or $71,630. The mean for ORIGIN is.59, indicating that 59 percent of the managers were recruited from outside the company. The mean and the median levels for all the variables are very close to each other, indicating no great skew in any of the variables. The coefficients of variation (data not shown) indicate that the most variability is with the variable YEARS (CV = 48%). The covariance matrix is given in Table 2. A covariance matrix shows the covariance of each variable with the other variables and itself. It is a symmetric matrix (the of covariance of X with Y is the same as the covariance of Y with X). As a result, you generally only see half the matrix presented as output (the rest is redundant). The values on the diagonal are the covariance of each variable with itself -in other words, the variances. If you compare these values with the variances in the descriptive statistics tables you will notice a slight difference. For example, the covariance of RATING with itself is and the variance is given as The slight difference is because the descriptive statistics use the sample formula for the covariance which is divided by n-1. Table 2. Covariance Matrix of Manager Salary Data RATING SALARY YEARS ORIGIN RATING SALARY YEARS ORIGIN Limitations of Covariance Covariance is measured in squared cross-products terms The upper bound is not known Hard to interpret and compare The covariance values in Table 2 point out some of the problems with using covariance as a measure of association. The values are is squared cross-product terms and are hard to interpret. There is a sign to the values (either positive or negative), but it is not clear how to interpret something in squared, cross-product terms. Covariance are unbounded, and thus it is difficult to determine if a value is larger or small. As a result, interpretation is difficult. Most of these problems will be solved by making a transformation of the covariance into correlation coefficients. However, the covariance is the building block for regression and many other multivariate analyses. It is important to at least grasp the basic concept of covariance - that it is based on how two variables vary about their means together; that it is similar to the variance and seeks to place the measure of association in the context of variability of the variables; and that it is a symmetric measure of association.

6 Using Statistical Data to Make Decisions: Correlation and Covariance Page 6 CORRELATION If we divided the SS XY by the cross-product of the standard deviations we generate a new measure of association, the correlation coefficient (often designated by r). The correlation coefficient is a standardized version of the covariance. It is bounded between -1 and 1, and zero means there is no linear relationship between the two variables. Correlation coefficients provide an easy way to summarize the relationship between two variables and that is why they are so often used. You should note that correlation coefficient requires an equal sample size for both variables and any missing values for one variable will cause that observation to be removed from the analysis (this is called pair-wise deletion). The formula for the correlation coefficient (also known as the Pearson Product Moment Correlation Coefficient) is given below. Cov r = XY σ σ X Y The correlation coefficient r) has the following useful properties. The correlation coefficient has many nice properties: It is bounded between -1 and 1 It is a symmetric measure of association It is standardized measure and easy to compare It is invariant to scale r has a range from 1 to 1. A value of -1 means perfect negative correlation, a value of 1 means perfect positive correlation, and a value of 0 means no linear association. Thus, it is bounded to -1 to 1. If you obtain a value greater than 1 or less than -1, something is wrong! The correlation coefficient is a symmetrical measure of association. The correlation between X and Y is the same as the correlation between Y and X ( r XY = r YX ) The correlation coefficient is invariant to scale. By this I mean that if you add or subtract a constant to each value in the data set, or you multiply or divide by a constant, it does not change the correlation between the two variables. For example, if you express income as per $1,000, it will not change the relationship of income and sales. As with covariance, the correlation matrix is usually present as half a matrix because the values are symmetrical. Table 3 contains the correlations for the Manager Salary data.

7 Using Statistical Data to Make Decisions: Correlation and Covariance Page 7 Table 3. Correlation Coefficients for the Manager Salary Data RATING SALARY YEARS ORIGIN RATING SALARY YEARS ORIGIN The values on the diagonal are all 1 indicating each variable is perfectly correlated with itself. The value of.684 shows the correlation between RATING and SALARY. Its interpretation is that managers with higher salaries tend to get higher ratings. The correlation is not perfect, but it is moderately large (we will see a scatter plot of these two variables to get a better sense of what a correlation of.684 looks like). Any correlation with a dummy variable (one which has only two values, zero and one) has a very simple interpretation. Since it is a dummy variable that only takes on two values, the interpretation of the correlation coefficient reflects which group has a higher on average level of the other variable. For example, the correlation between ORIGIN and SALARY is This means that managers who are recruited outside the company (ORIGIN =1) have on average, lower salaries. The correlation coefficient is a useful summary measure of a relationship between two variables,. With a single value you can talk about the strength and direction of the relationship. However, we need to be cautious in its use. For one thing, it is a linear measure of association between two variables. A correlation of zero means there is no linear relationship between two variables. It would be represented by a flat line in a graphical representation. However, if the relationship in nonlinear the correlation coefficient would fail to capture the full relationship. Figure 2. Shows a graphical depiction of an obvious and perfect nonlinear relationship. Such a relationship would most likely have a correlation of near zero. A correlation with a continuous variable with a dummy variable has the following interpretation. If the correlation is positive, the category in the dummy variable that is represented by one tends to have higher on average values of the continuous variables. If the correlation is negative, the dummy group represented by one has lower on average values. The correlation coefficient is a linear measure of association. A value of zero only means no linear association between the variables. Nonlinear Relationshp Figure 2. Graphic of a Non-Linear Relationship

8 Using Statistical Data to Make Decisions: Correlation and Covariance Page 8 A second caution with correlations is that it does not reflect causality; the fact that two things are correlated does not mean one variable causes the other. This is an easy trap to fall into, but as we will see in multiple regression, bivariate relationships can be deceiving. For example, in the summer, there is a correlation between ice cream sales and the number of people who drown in cities and towns across America. This does not mean that eating ice cream causes people to drown - the two things tend to happen more in the summer time, and the season is the third variable that is related to both of the others. Correlation does not imply causality - be careful not to imply a casual relationship when using correlation coefficients. GRAPHICAL EXAMPLES OF CORRELATIONS A value of 1 or -1, or a value of zero, are relatively easy correlations to interpret. A value of 1 or -1 reflects a perfect linear relationship between two variables. A value of zero reflects no linear relationship. If we drew a line on a scatter plot for a correlation of zero it would be a flat line - any change in the value of X does not influence the value of Y. However, intermediate values of correlations are not as easy to interpret. Often what is large or small depends upon the data you are using and the discipline you are involved with. When the units of analysis are people, correlations of.5 to.6 are relatively large. However, when looking at data over time, correlations tend to be much higher;.90 to.99. Scatter Plot of Salary vs Employee Rating SALARY ($1,000s) RATING Figure 3. Scatter Plot of Salary Versus Rating Scatter plots are a useful way to look at the relationship between two variables. Figure 3 shows the scatter plot of the relationship between SALARY(Y-axis) and RATING (Xaxis). Earlier we noted that the correlation between these two variables was.684. From the graph we can see that the relationship is linear, but not perfect. If we fit a line to the data all the points would not fit on the line.

9 Using Statistical Data to Make Decisions: Correlation and Covariance Page 9 SALARY ($1,000s) Scatter Plot of Salary vs Employee Rating y = x R 2 = RATING Excel will allow you to fit a best fitting line to the scatter plot. This line is a regression line. Figure 4. Scatter Plot with Trendline, Equation, and R 2 In fact, Excl will allow us to fit a best Fitting linear line which is generated from a regression of SALARY on RATING. Using options with the Chart feature in Excel we can add a trend line, include the equation of the line on the chart, and include a measure of association called R 2. Figure 4 shows the same graph with these options. The options can be accessed by selecting the graph in Excel, clicking on Chart in the menu bar, and then clicking on Add Trendline. Once in Trendline you should click on Linear and then you can access options of including the equation and R 2. The best fitting line in Figure 4 is actually a regression line. From the graph we can see that the line fits the data very well. The equation for the line follows the classic formula for a line with an intercept term (a) and a slope coefficient (b) Y = a +b(x). Our line is not a perfect deterministic function (there is scatter around the line) so I am expressing it as an estimate. Estimated Y = (X) R 2 given on the graph is a measure of association from regression. More will be said about this in the next module on regression. For now we can say that R 2 shows how much of the dependent variable (in this case SALARY) is explained by knowing something about the independent variable. It ranges from zero to one. In this case, an R means that 46.7 percent of the variability in SALARY is explained by knowing the RATING of the employee. You should also note that if we squared the correlation coefficient it would equal R 2 (r 2 = R 2 for a bivariate regression). Try it and see. Thus, another interpretation of the correlation coefficient, if squared, is how much variability in one variable is explained by knowing something about another variable.

10 Using Statistical Data to Make Decisions: Correlation and Covariance Page 10 Average State Verbal Scores Versus Math Scores Verbal Scores y = x R 2 = Math Scores Scatter Plots are a good way to see the correlation between two variables. Figure 5. Average State Verbal SAT Scores Versus Math Scores, 2001 Let s look at few other graphic depictions of correlations to better see what a high or low correlation looks like. In Figure 5 we have a scatter plot of average state verbal versus math SAT scores. The correlation is very high,.970. You can see that the pattern is linear and there is very little scatter of the data points around the best fitting line. The positive correlation tells us that states with higher average verbal scores also tend to have higher average math scores, as might be expected. Notice also that R2 for the this line is very high, percent of the variability in verbal scores is explained by knowing the math scores. A scatter plot can show the strength and direction of the relationship, as well as if the relationship is in fact linear. Figure 6 show a strong negative correlation between the average state SAT scores (verbal plus math) versus the Average State SAT scores by Percent Taking the Test, 2001 Average SAT (Math + Verbal) y = x R 2 = Percent Taking Figure 6. Average SAT Scores (Math + Verbal) Versus Percent of High School Class Taking SAT percent of the high school class that took the SAT test. The correlation between these two variables is The scatter plot shows the downward slope of the relationship and that the fit of the line is good, but not perfect.

11 Using Statistical Data to Make Decisions: Correlation and Covariance Page 11 Manager Salary versus Years of Service Salary ($1,000s) y = x R 2 = Years of Service Figure 7. Scatter Plot of a Low Correlation Between Salary and Years of Service Finally, the last graph shows a weak correlation between two variables (Figure 7). The correlation between the managers salary and years of service is The more years of service, the lower the salary, but the relationship is weak. Figure 6 shows far more scatter around the best fitting line. We can see the relationship in the graph, but there is considerable scatter in the data than in the other graphs. CONCLUSIONS Measures of association are useful summary statistics to describe a relationship between two or more variables. In this module we looked at covariance and correlation as two measures of linear association between two variables. Both of these measures are related to each other and to regression. The correlation coefficient is a standardized version of the covariance so it has a known range and is bounded between -1 and 1, with zero indicating no linear relationship. In a single number, the correlation coefficient provides a indication of the strength and direction of the relationship. It is a useful next step in data analysis to begin to examine bivariate relationships with correlation coefficients and to graph these relationships. We also noted that caution should be taken with correlation coefficients in two main areas. First, it is a linear measure of association. We cannot assume that a low value of a correlation means that there is no association, only there is no linear association. The second issue is to be careful not to imply causation when dealing with correlation coefficients. While we noted we can establish that two variables are related to each other, care should be taken not to say that one variable causes the other.

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Multiple regression - Matrices

Multiple regression - Matrices Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6 WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent before-thefact, expected values. In particular, the beta coefficient used in

More information

Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

There are six different windows that can be opened when using SPSS. The following will give a description of each of them. SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Determine If An Equation Represents a Function

Determine If An Equation Represents a Function Question : What is a linear function? The term linear function consists of two parts: linear and function. To understand what these terms mean together, we must first understand what a function is. The

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your

More information

Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics. Probability and Statistics Curriculum Guide. Revised 2010 Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Elements of a graph. Click on the links below to jump directly to the relevant section

Elements of a graph. Click on the links below to jump directly to the relevant section Click on the links below to jump directly to the relevant section Elements of a graph Linear equations and their graphs What is slope? Slope and y-intercept in the equation of a line Comparing lines on

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Excel Tutorial Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Working with Data Entering and Formatting Data Before entering data

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Elasticity. I. What is Elasticity?

Elasticity. I. What is Elasticity? Elasticity I. What is Elasticity? The purpose of this section is to develop some general rules about elasticity, which may them be applied to the four different specific types of elasticity discussed in

More information

Using Excel for Statistical Analysis

Using Excel for Statistical Analysis Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

More information

2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)

2) The three categories of forecasting models are time series, quantitative, and qualitative. 2) Exam Name TRUE/FALSE. Write 'T' if the statement is true and 'F' if the statement is false. 1) Regression is always a superior forecasting method to exponential smoothing, so regression should be used

More information

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style Solving quadratic equations 3.2 Introduction A quadratic equation is one which can be written in the form ax 2 + bx + c = 0 where a, b and c are numbers and x is the unknown whose value(s) we wish to find.

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Graphing Linear Equations in Two Variables

Graphing Linear Equations in Two Variables Math 123 Section 3.2 - Graphing Linear Equations Using Intercepts - Page 1 Graphing Linear Equations in Two Variables I. Graphing Lines A. The graph of a line is just the set of solution points of the

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Pearson s Correlation Coefficient

Pearson s Correlation Coefficient Pearson s Correlation Coefficient In this lesson, we will find a quantitative measure to describe the strength of a linear relationship (instead of using the terms strong or weak). A quantitative measure

More information

The Big Picture. Correlation. Scatter Plots. Data

The Big Picture. Correlation. Scatter Plots. Data The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

A full analysis example Multiple correlations Partial correlations

A full analysis example Multiple correlations Partial correlations A full analysis example Multiple correlations Partial correlations New Dataset: Confidence This is a dataset taken of the confidence scales of 41 employees some years ago using 4 facets of confidence (Physical,

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics **This chapter corresponds to chapters 2 ( Means to an End ) and 3 ( Vive la Difference ) of your book. What it is: Descriptive statistics are values that describe the

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Charts, Tables, and Graphs

Charts, Tables, and Graphs Charts, Tables, and Graphs The Mathematics sections of the SAT also include some questions about charts, tables, and graphs. You should know how to (1) read and understand information that is given; (2)

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality. 8 Inequalities Concepts: Equivalent Inequalities Linear and Nonlinear Inequalities Absolute Value Inequalities (Sections 4.6 and 1.1) 8.1 Equivalent Inequalities Definition 8.1 Two inequalities are equivalent

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

More information

Measurement with Ratios

Measurement with Ratios Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical

More information

CORRELATION ANALYSIS

CORRELATION ANALYSIS CORRELATION ANALYSIS Learning Objectives Understand how correlation can be used to demonstrate a relationship between two factors. Know how to perform a correlation analysis and calculate the coefficient

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Pearson s Correlation

Pearson s Correlation Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

INTRODUCTION TO MULTIPLE CORRELATION

INTRODUCTION TO MULTIPLE CORRELATION CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary

More information

Georgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade

Georgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade Georgia Standards of Excellence Curriculum Map Mathematics GSE 8 th Grade These materials are for nonprofit educational purposes only. Any other use may constitute copyright infringement. GSE Eighth Grade

More information

Module 5: Measuring (step 3) Inequality Measures

Module 5: Measuring (step 3) Inequality Measures Module 5: Measuring (step 3) Inequality Measures Topics 1. Why measure inequality? 2. Basic dispersion measures 1. Charting inequality for basic dispersion measures 2. Basic dispersion measures (dispersion

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Business Valuation Review

Business Valuation Review Business Valuation Review Regression Analysis in Valuation Engagements By: George B. Hawkins, ASA, CFA Introduction Business valuation is as much as art as it is science. Sage advice, however, quantitative

More information

This activity will show you how to draw graphs of algebraic functions in Excel.

This activity will show you how to draw graphs of algebraic functions in Excel. This activity will show you how to draw graphs of algebraic functions in Excel. Open a new Excel workbook. This is Excel in Office 2007. You may not have used this version before but it is very much the

More information

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

Chapter 3. Introduction to Linear Correlation and Regression Part 1

Chapter 3. Introduction to Linear Correlation and Regression Part 1 Tuesday, December 12, 2000 Ch3 Intro Correlation Pt 1 Page: 1 Richard Lowry, 1999-2000 All rights reserved. Chapter 3. Introduction to Linear Correlation and Regression Part 1 Correlation and regression

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information