Module 3: Correlation and Covariance


 Lucas Johnson
 3 years ago
 Views:
Transcription
1 Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis is how two or more variables influence each other. We may be searching for a driver than helps explains sales, profits, or revenues; we may be interested in factors that better explain performance of employees; or how which marketing method has the most impact on sales. A basic starting point for understanding a relationship between two variables is covariance, or the more common and standardized measure, correlation. Covariance and correlation are both measures of association between two variables that shows the linear relationship between the variables. Each provides a single summary measure of association that is easily interpreted, and provides a building block for more advanced techniques, such as regression. You will see that correlation and covariance are really similar concepts and are related mathematically. However, of the two terms, correlation is used more often in every day language. When we say two things are correlated we mean that the two things are related to each other. The correlation can be strong or weak, but we understand it as a relationship. In statistics, correlation has the same meaning, but it will be expressed in mathematical terms with a specific interpretation, direction (positive or negative) and strength. In particular, the correlation coefficient provides a good starting point for more advanced data analysis. Along with scatter plots, the correlation coefficient provides insight into bivariate, or two variable, relationships. It is a flexible measure of association which can be used with continuous level variables, ordinal variables, and dummy variables. I think you will find the correlation coefficient intuitive and useful tool to summarize a relationship between two variables. It also has a direct connection with bivariate regression. Key Objectives Understand the properties of measures of association Understand the covariance and correlation as bivariate measures of association Understand how to interpret the correlation coefficient and to read and interpret a correlation matrix Understand how to use correlations as an intermediate step in data analysis In this Module We Will: Describing measures of association Look at covariance and correlation matrices, along with corresponding scatter plots Begin the linkage of correlation with regression For more information, contact: Tom Ilvento 213 Townsend Hall, Newark, DE
2 Using Statistical Data to Make Decisions: Correlation and Covariance Page 2 MEASURES OF ASSOCIATION Measures of association show the relationship between two variables. It is a numerical measure and in most cases a single measure (although it can be several numbers). Most often, they focus on how two variables vary together (or not). There are many measures of association in statistics, developed for their usefulness with different types of data and different situations. Some of them have inferential properties and some are useful solely for their ability to help describe a relationship. Example measures of association include the correlation coefficient, an odds ratio, R 2 in regression, and the regression coefficient. A good starting point for discussion of measures of association is to understand some criteria of any measure of association. These criteria are used to evaluate and compare various measures of association, and as such help us to interpret the measure. The criteria focus on the range of the measure, whether it is bounded by an upper or lower level, whether is is symmetrical, and how to interpret the measure. Each are discussed briefly below. What is the range (from high to low)? We want to know the possible range of a measure of association in order to gain some sense of what is a high or low value. We might ask if it can take on negative values or is it only positive; whether it is centered around a natural midpoint; and if the upper and lower values are the same when it is calculated for every variable. Measures of association are numerical measures which typically focus on how two variables vary together (or not). Criteria for Measures of Association What is the range? Is it bounded? Is it Symmetrical? How to interpret? Is it bounded? Similar to the last point, we want to know if there is a natural upper or lower bound to our measure of association. Some measures of association (such as an odds ratio) have a lower bound, but no upper level. As a result, an odds ratio can be very large. Other measures of association do have natural upper and lower bound that makes it easier to interpret is there is a strong or weak relationship. In some cases, statisticians have been able to reformulate a measure of association to create an upper and lower bound. Is it symmetrical? If a measure of association is symmetrical it means that the relationship between two variables, say X and Y, is the same for when we specify it as X to Y or Y to X. This implies that we do not have to designate one variable as preliminary, independent, or as necessarily influencing the other.
3 Using Statistical Data to Make Decisions: Correlation and Covariance Page 3 How to interpret? Interpretation should be the key criteria for any measure of association  what does it mean for my data? We usually start with trying to understand the extremes. What does it mean to have a perfect relationship (the highest value or the lowest value)? What does it mean if there is no relationship? If you can identify a clear understanding of the extremes you can begin to gain a sense of what an intermediate value means. The next section will begin to discuss covariance and then correlation. We will return to these criteria of measures of association as a way to interpret and compare these two measures of association. COVARIANCE We have already started with the concept of how a single variable varies about its mean as a measure of the spread of the data. We identified the variance as the total sum of squared deviations about the mean (Total Sum of Squares) divided by n1 (the degrees of freedom). We will use a similar concept to talk about how two variables vary about their means together. Another way to express the formula for covariance is given below. SS XY is called the sum of squares cross product. Cov XY = SS n XY The formula for covariance is given below. If you focus on the numerator, it shows that the we are looking at how two variables vary about their means together. Cov XY = n 2 ( X i X ) ( Yi Y ) i= 1 Let me use an illustration to show how covariance works, and then we will use a data example. The following table (Figure 1) represents a the graph of a scatter plot between X (on the horizontal axis) and Y (on the vertical Axis). I have marked the Ymean and the Xmean values on the graph with lines which divide the graph into four quadrants. A data point that is above the mean for both X and Y will fall in the first quadrant, and a data point that is both below the mean for Y and the mean for X will fall in the third quadrant. If a scatter plot tends to have values that fall mainly in the First and Third quadrants the covariance between the two variables will be positive  values of X tend to vary about its mean in the same way that values of Y vary about its mean. Likewise, if values tend to fall in the Second and Fourth quadrants it means that deviations of X values about the X mean tend to be in a different direction than deviations of Y values about its mean. This is associated with negative covariance. n 2 If a scatter plot tends to have values that fall mainly in the I and III quadrants, the covariance between the two variables will be positive. If they fall in the II and IV quadrants, it will result in negative covariance.
4 Using Statistical Data to Make Decisions: Correlation and Covariance Page 4 II I Ymean III IV Xmean Figure 1. Graphic depiction of Covariance Between Two Variables, X and Y Let s look at a data example. The following is some data about midlevel managers in a company. The variables are RATING, a rating scale of the managers from 0 to 10; SALARY, the salary of the manager in $1,000); YEARS, years of service at the company; and ORIGIN, a dummy variable indicating whether they were promoted inside the company (coded as 0) or were recruited from outside the company (coded as 1). The descriptive statistics for these variables are given below. RATING SALARY YEARS ORIGIN Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Table 1. Descriptive Statistics in the Manager Salary Example
5 Using Statistical Data to Make Decisions: Correlation and Covariance Page 5 The mean salary level is 71.63, or $71,630. The mean for ORIGIN is.59, indicating that 59 percent of the managers were recruited from outside the company. The mean and the median levels for all the variables are very close to each other, indicating no great skew in any of the variables. The coefficients of variation (data not shown) indicate that the most variability is with the variable YEARS (CV = 48%). The covariance matrix is given in Table 2. A covariance matrix shows the covariance of each variable with the other variables and itself. It is a symmetric matrix (the of covariance of X with Y is the same as the covariance of Y with X). As a result, you generally only see half the matrix presented as output (the rest is redundant). The values on the diagonal are the covariance of each variable with itself in other words, the variances. If you compare these values with the variances in the descriptive statistics tables you will notice a slight difference. For example, the covariance of RATING with itself is and the variance is given as The slight difference is because the descriptive statistics use the sample formula for the covariance which is divided by n1. Table 2. Covariance Matrix of Manager Salary Data RATING SALARY YEARS ORIGIN RATING SALARY YEARS ORIGIN Limitations of Covariance Covariance is measured in squared crossproducts terms The upper bound is not known Hard to interpret and compare The covariance values in Table 2 point out some of the problems with using covariance as a measure of association. The values are is squared crossproduct terms and are hard to interpret. There is a sign to the values (either positive or negative), but it is not clear how to interpret something in squared, crossproduct terms. Covariance are unbounded, and thus it is difficult to determine if a value is larger or small. As a result, interpretation is difficult. Most of these problems will be solved by making a transformation of the covariance into correlation coefficients. However, the covariance is the building block for regression and many other multivariate analyses. It is important to at least grasp the basic concept of covariance  that it is based on how two variables vary about their means together; that it is similar to the variance and seeks to place the measure of association in the context of variability of the variables; and that it is a symmetric measure of association.
6 Using Statistical Data to Make Decisions: Correlation and Covariance Page 6 CORRELATION If we divided the SS XY by the crossproduct of the standard deviations we generate a new measure of association, the correlation coefficient (often designated by r). The correlation coefficient is a standardized version of the covariance. It is bounded between 1 and 1, and zero means there is no linear relationship between the two variables. Correlation coefficients provide an easy way to summarize the relationship between two variables and that is why they are so often used. You should note that correlation coefficient requires an equal sample size for both variables and any missing values for one variable will cause that observation to be removed from the analysis (this is called pairwise deletion). The formula for the correlation coefficient (also known as the Pearson Product Moment Correlation Coefficient) is given below. Cov r = XY σ σ X Y The correlation coefficient r) has the following useful properties. The correlation coefficient has many nice properties: It is bounded between 1 and 1 It is a symmetric measure of association It is standardized measure and easy to compare It is invariant to scale r has a range from 1 to 1. A value of 1 means perfect negative correlation, a value of 1 means perfect positive correlation, and a value of 0 means no linear association. Thus, it is bounded to 1 to 1. If you obtain a value greater than 1 or less than 1, something is wrong! The correlation coefficient is a symmetrical measure of association. The correlation between X and Y is the same as the correlation between Y and X ( r XY = r YX ) The correlation coefficient is invariant to scale. By this I mean that if you add or subtract a constant to each value in the data set, or you multiply or divide by a constant, it does not change the correlation between the two variables. For example, if you express income as per $1,000, it will not change the relationship of income and sales. As with covariance, the correlation matrix is usually present as half a matrix because the values are symmetrical. Table 3 contains the correlations for the Manager Salary data.
7 Using Statistical Data to Make Decisions: Correlation and Covariance Page 7 Table 3. Correlation Coefficients for the Manager Salary Data RATING SALARY YEARS ORIGIN RATING SALARY YEARS ORIGIN The values on the diagonal are all 1 indicating each variable is perfectly correlated with itself. The value of.684 shows the correlation between RATING and SALARY. Its interpretation is that managers with higher salaries tend to get higher ratings. The correlation is not perfect, but it is moderately large (we will see a scatter plot of these two variables to get a better sense of what a correlation of.684 looks like). Any correlation with a dummy variable (one which has only two values, zero and one) has a very simple interpretation. Since it is a dummy variable that only takes on two values, the interpretation of the correlation coefficient reflects which group has a higher on average level of the other variable. For example, the correlation between ORIGIN and SALARY is This means that managers who are recruited outside the company (ORIGIN =1) have on average, lower salaries. The correlation coefficient is a useful summary measure of a relationship between two variables,. With a single value you can talk about the strength and direction of the relationship. However, we need to be cautious in its use. For one thing, it is a linear measure of association between two variables. A correlation of zero means there is no linear relationship between two variables. It would be represented by a flat line in a graphical representation. However, if the relationship in nonlinear the correlation coefficient would fail to capture the full relationship. Figure 2. Shows a graphical depiction of an obvious and perfect nonlinear relationship. Such a relationship would most likely have a correlation of near zero. A correlation with a continuous variable with a dummy variable has the following interpretation. If the correlation is positive, the category in the dummy variable that is represented by one tends to have higher on average values of the continuous variables. If the correlation is negative, the dummy group represented by one has lower on average values. The correlation coefficient is a linear measure of association. A value of zero only means no linear association between the variables. Nonlinear Relationshp Figure 2. Graphic of a NonLinear Relationship
8 Using Statistical Data to Make Decisions: Correlation and Covariance Page 8 A second caution with correlations is that it does not reflect causality; the fact that two things are correlated does not mean one variable causes the other. This is an easy trap to fall into, but as we will see in multiple regression, bivariate relationships can be deceiving. For example, in the summer, there is a correlation between ice cream sales and the number of people who drown in cities and towns across America. This does not mean that eating ice cream causes people to drown  the two things tend to happen more in the summer time, and the season is the third variable that is related to both of the others. Correlation does not imply causality  be careful not to imply a casual relationship when using correlation coefficients. GRAPHICAL EXAMPLES OF CORRELATIONS A value of 1 or 1, or a value of zero, are relatively easy correlations to interpret. A value of 1 or 1 reflects a perfect linear relationship between two variables. A value of zero reflects no linear relationship. If we drew a line on a scatter plot for a correlation of zero it would be a flat line  any change in the value of X does not influence the value of Y. However, intermediate values of correlations are not as easy to interpret. Often what is large or small depends upon the data you are using and the discipline you are involved with. When the units of analysis are people, correlations of.5 to.6 are relatively large. However, when looking at data over time, correlations tend to be much higher;.90 to.99. Scatter Plot of Salary vs Employee Rating SALARY ($1,000s) RATING Figure 3. Scatter Plot of Salary Versus Rating Scatter plots are a useful way to look at the relationship between two variables. Figure 3 shows the scatter plot of the relationship between SALARY(Yaxis) and RATING (Xaxis). Earlier we noted that the correlation between these two variables was.684. From the graph we can see that the relationship is linear, but not perfect. If we fit a line to the data all the points would not fit on the line.
9 Using Statistical Data to Make Decisions: Correlation and Covariance Page 9 SALARY ($1,000s) Scatter Plot of Salary vs Employee Rating y = x R 2 = RATING Excel will allow you to fit a best fitting line to the scatter plot. This line is a regression line. Figure 4. Scatter Plot with Trendline, Equation, and R 2 In fact, Excl will allow us to fit a best Fitting linear line which is generated from a regression of SALARY on RATING. Using options with the Chart feature in Excel we can add a trend line, include the equation of the line on the chart, and include a measure of association called R 2. Figure 4 shows the same graph with these options. The options can be accessed by selecting the graph in Excel, clicking on Chart in the menu bar, and then clicking on Add Trendline. Once in Trendline you should click on Linear and then you can access options of including the equation and R 2. The best fitting line in Figure 4 is actually a regression line. From the graph we can see that the line fits the data very well. The equation for the line follows the classic formula for a line with an intercept term (a) and a slope coefficient (b) Y = a +b(x). Our line is not a perfect deterministic function (there is scatter around the line) so I am expressing it as an estimate. Estimated Y = (X) R 2 given on the graph is a measure of association from regression. More will be said about this in the next module on regression. For now we can say that R 2 shows how much of the dependent variable (in this case SALARY) is explained by knowing something about the independent variable. It ranges from zero to one. In this case, an R means that 46.7 percent of the variability in SALARY is explained by knowing the RATING of the employee. You should also note that if we squared the correlation coefficient it would equal R 2 (r 2 = R 2 for a bivariate regression). Try it and see. Thus, another interpretation of the correlation coefficient, if squared, is how much variability in one variable is explained by knowing something about another variable.
10 Using Statistical Data to Make Decisions: Correlation and Covariance Page 10 Average State Verbal Scores Versus Math Scores Verbal Scores y = x R 2 = Math Scores Scatter Plots are a good way to see the correlation between two variables. Figure 5. Average State Verbal SAT Scores Versus Math Scores, 2001 Let s look at few other graphic depictions of correlations to better see what a high or low correlation looks like. In Figure 5 we have a scatter plot of average state verbal versus math SAT scores. The correlation is very high,.970. You can see that the pattern is linear and there is very little scatter of the data points around the best fitting line. The positive correlation tells us that states with higher average verbal scores also tend to have higher average math scores, as might be expected. Notice also that R2 for the this line is very high, percent of the variability in verbal scores is explained by knowing the math scores. A scatter plot can show the strength and direction of the relationship, as well as if the relationship is in fact linear. Figure 6 show a strong negative correlation between the average state SAT scores (verbal plus math) versus the Average State SAT scores by Percent Taking the Test, 2001 Average SAT (Math + Verbal) y = x R 2 = Percent Taking Figure 6. Average SAT Scores (Math + Verbal) Versus Percent of High School Class Taking SAT percent of the high school class that took the SAT test. The correlation between these two variables is The scatter plot shows the downward slope of the relationship and that the fit of the line is good, but not perfect.
11 Using Statistical Data to Make Decisions: Correlation and Covariance Page 11 Manager Salary versus Years of Service Salary ($1,000s) y = x R 2 = Years of Service Figure 7. Scatter Plot of a Low Correlation Between Salary and Years of Service Finally, the last graph shows a weak correlation between two variables (Figure 7). The correlation between the managers salary and years of service is The more years of service, the lower the salary, but the relationship is weak. Figure 6 shows far more scatter around the best fitting line. We can see the relationship in the graph, but there is considerable scatter in the data than in the other graphs. CONCLUSIONS Measures of association are useful summary statistics to describe a relationship between two or more variables. In this module we looked at covariance and correlation as two measures of linear association between two variables. Both of these measures are related to each other and to regression. The correlation coefficient is a standardized version of the covariance so it has a known range and is bounded between 1 and 1, with zero indicating no linear relationship. In a single number, the correlation coefficient provides a indication of the strength and direction of the relationship. It is a useful next step in data analysis to begin to examine bivariate relationships with correlation coefficients and to graph these relationships. We also noted that caution should be taken with correlation coefficients in two main areas. First, it is a linear measure of association. We cannot assume that a low value of a correlation means that there is no association, only there is no linear association. The second issue is to be careful not to imply causation when dealing with correlation coefficients. While we noted we can establish that two variables are related to each other, care should be taken not to say that one variable causes the other.
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x  x) B. x 3 x C. 3x  x D. x  3x 2) Write the following as an algebraic expression
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table covariation least squares
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 15 scale to 0100 scores When you look at your report, you will notice that the scores are reported on a 0100 scale, even though respondents
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationSimple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
More informationThe correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
More informationDealing with Data in Excel 2010
Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationMultiple regression  Matrices
Multiple regression  Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,
More informationSection 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
More informationCorrelation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationWEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6
WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent beforethefact, expected values. In particular, the beta coefficient used in
More informationAlgebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationHomework 11. Part 1. Name: Score: / null
Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = 0.80 C. r = 0.10 D. There is
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationWe are often interested in the relationship between two variables. Do people with more years of fulltime education earn higher salaries?
Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of fulltime education earn higher salaries? Do
More informationThe right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationCourse Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.
SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed
More informationThere are six different windows that can be opened when using SPSS. The following will give a description of each of them.
SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet
More informationDescriptive Statistics
Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such
More informationMEASURES OF VARIATION
NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are
More informationBiostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
More informationDetermine If An Equation Represents a Function
Question : What is a linear function? The term linear function consists of two parts: linear and function. To understand what these terms mean together, we must first understand what a function is. The
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationEXCEL Tutorial: How to use EXCEL for Graphs and Calculations.
EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your
More informationMathematics. Probability and Statistics Curriculum Guide. Revised 2010
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
More informationElements of a graph. Click on the links below to jump directly to the relevant section
Click on the links below to jump directly to the relevant section Elements of a graph Linear equations and their graphs What is slope? Slope and yintercept in the equation of a line Comparing lines on
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 111) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationBelow is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.
Excel Tutorial Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Working with Data Entering and Formatting Data Before entering data
More informationCORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there
CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationDESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi  110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationUsing Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a stepbystep guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationElasticity. I. What is Elasticity?
Elasticity I. What is Elasticity? The purpose of this section is to develop some general rules about elasticity, which may them be applied to the four different specific types of elasticity discussed in
More informationUsing Excel for Statistical Analysis
Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure
More information2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)
Exam Name TRUE/FALSE. Write 'T' if the statement is true and 'F' if the statement is false. 1) Regression is always a superior forecasting method to exponential smoothing, so regression should be used
More information3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style
Solving quadratic equations 3.2 Introduction A quadratic equation is one which can be written in the form ax 2 + bx + c = 0 where a, b and c are numbers and x is the unknown whose value(s) we wish to find.
More informationThis unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
More informationData exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
More informationCorrelational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots
Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship
More informationLean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY
TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationCovariance and Correlation
Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a databased relative frequency distribution by measures of location and spread, such
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression  ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationGraphing Linear Equations in Two Variables
Math 123 Section 3.2  Graphing Linear Equations Using Intercepts  Page 1 Graphing Linear Equations in Two Variables I. Graphing Lines A. The graph of a line is just the set of solution points of the
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationOneWay ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 OneWay ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
More information1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
More informationDescriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationPearson s Correlation Coefficient
Pearson s Correlation Coefficient In this lesson, we will find a quantitative measure to describe the strength of a linear relationship (instead of using the terms strong or weak). A quantitative measure
More informationThe Big Picture. Correlation. Scatter Plots. Data
The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered
More informationFactor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
More informationA full analysis example Multiple correlations Partial correlations
A full analysis example Multiple correlations Partial correlations New Dataset: Confidence This is a dataset taken of the confidence scales of 41 employees some years ago using 4 facets of confidence (Physical,
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationChapter 2: Descriptive Statistics
Chapter 2: Descriptive Statistics **This chapter corresponds to chapters 2 ( Means to an End ) and 3 ( Vive la Difference ) of your book. What it is: Descriptive statistics are values that describe the
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationCharts, Tables, and Graphs
Charts, Tables, and Graphs The Mathematics sections of the SAT also include some questions about charts, tables, and graphs. You should know how to (1) read and understand information that is given; (2)
More informationAlgebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More information5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.
The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution
More informationDefinition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.
8 Inequalities Concepts: Equivalent Inequalities Linear and Nonlinear Inequalities Absolute Value Inequalities (Sections 4.6 and 1.1) 8.1 Equivalent Inequalities Definition 8.1 Two inequalities are equivalent
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationMeasures of Central Tendency and Variability: Summarizing your Data for Others
Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :
More informationMeasurement with Ratios
Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve realworld and mathematical
More informationCORRELATION ANALYSIS
CORRELATION ANALYSIS Learning Objectives Understand how correlation can be used to demonstrate a relationship between two factors. Know how to perform a correlation analysis and calculate the coefficient
More informationSTATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
More informationPearson s Correlation
Pearson s Correlation Correlation the degree to which two variables are associated (covary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra  1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationINTRODUCTION TO MULTIPLE CORRELATION
CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary
More informationGeorgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade
Georgia Standards of Excellence Curriculum Map Mathematics GSE 8 th Grade These materials are for nonprofit educational purposes only. Any other use may constitute copyright infringement. GSE Eighth Grade
More informationModule 5: Measuring (step 3) Inequality Measures
Module 5: Measuring (step 3) Inequality Measures Topics 1. Why measure inequality? 2. Basic dispersion measures 1. Charting inequality for basic dispersion measures 2. Basic dispersion measures (dispersion
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationBusiness Valuation Review
Business Valuation Review Regression Analysis in Valuation Engagements By: George B. Hawkins, ASA, CFA Introduction Business valuation is as much as art as it is science. Sage advice, however, quantitative
More informationThis activity will show you how to draw graphs of algebraic functions in Excel.
This activity will show you how to draw graphs of algebraic functions in Excel. Open a new Excel workbook. This is Excel in Office 2007. You may not have used this version before but it is very much the
More informationCOMP6053 lecture: Relationship between two variables: correlation, covariance and rsquared. jn2@ecs.soton.ac.uk
COMP6053 lecture: Relationship between two variables: correlation, covariance and rsquared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution
More informationAlgebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
More informationChapter 3. Introduction to Linear Correlation and Regression Part 1
Tuesday, December 12, 2000 Ch3 Intro Correlation Pt 1 Page: 1 Richard Lowry, 19992000 All rights reserved. Chapter 3. Introduction to Linear Correlation and Regression Part 1 Correlation and regression
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More information