Module 3: Correlation and Covariance


 Lucas Johnson
 1 years ago
 Views:
Transcription
1 Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis is how two or more variables influence each other. We may be searching for a driver than helps explains sales, profits, or revenues; we may be interested in factors that better explain performance of employees; or how which marketing method has the most impact on sales. A basic starting point for understanding a relationship between two variables is covariance, or the more common and standardized measure, correlation. Covariance and correlation are both measures of association between two variables that shows the linear relationship between the variables. Each provides a single summary measure of association that is easily interpreted, and provides a building block for more advanced techniques, such as regression. You will see that correlation and covariance are really similar concepts and are related mathematically. However, of the two terms, correlation is used more often in every day language. When we say two things are correlated we mean that the two things are related to each other. The correlation can be strong or weak, but we understand it as a relationship. In statistics, correlation has the same meaning, but it will be expressed in mathematical terms with a specific interpretation, direction (positive or negative) and strength. In particular, the correlation coefficient provides a good starting point for more advanced data analysis. Along with scatter plots, the correlation coefficient provides insight into bivariate, or two variable, relationships. It is a flexible measure of association which can be used with continuous level variables, ordinal variables, and dummy variables. I think you will find the correlation coefficient intuitive and useful tool to summarize a relationship between two variables. It also has a direct connection with bivariate regression. Key Objectives Understand the properties of measures of association Understand the covariance and correlation as bivariate measures of association Understand how to interpret the correlation coefficient and to read and interpret a correlation matrix Understand how to use correlations as an intermediate step in data analysis In this Module We Will: Describing measures of association Look at covariance and correlation matrices, along with corresponding scatter plots Begin the linkage of correlation with regression For more information, contact: Tom Ilvento 213 Townsend Hall, Newark, DE
2 Using Statistical Data to Make Decisions: Correlation and Covariance Page 2 MEASURES OF ASSOCIATION Measures of association show the relationship between two variables. It is a numerical measure and in most cases a single measure (although it can be several numbers). Most often, they focus on how two variables vary together (or not). There are many measures of association in statistics, developed for their usefulness with different types of data and different situations. Some of them have inferential properties and some are useful solely for their ability to help describe a relationship. Example measures of association include the correlation coefficient, an odds ratio, R 2 in regression, and the regression coefficient. A good starting point for discussion of measures of association is to understand some criteria of any measure of association. These criteria are used to evaluate and compare various measures of association, and as such help us to interpret the measure. The criteria focus on the range of the measure, whether it is bounded by an upper or lower level, whether is is symmetrical, and how to interpret the measure. Each are discussed briefly below. What is the range (from high to low)? We want to know the possible range of a measure of association in order to gain some sense of what is a high or low value. We might ask if it can take on negative values or is it only positive; whether it is centered around a natural midpoint; and if the upper and lower values are the same when it is calculated for every variable. Measures of association are numerical measures which typically focus on how two variables vary together (or not). Criteria for Measures of Association What is the range? Is it bounded? Is it Symmetrical? How to interpret? Is it bounded? Similar to the last point, we want to know if there is a natural upper or lower bound to our measure of association. Some measures of association (such as an odds ratio) have a lower bound, but no upper level. As a result, an odds ratio can be very large. Other measures of association do have natural upper and lower bound that makes it easier to interpret is there is a strong or weak relationship. In some cases, statisticians have been able to reformulate a measure of association to create an upper and lower bound. Is it symmetrical? If a measure of association is symmetrical it means that the relationship between two variables, say X and Y, is the same for when we specify it as X to Y or Y to X. This implies that we do not have to designate one variable as preliminary, independent, or as necessarily influencing the other.
3 Using Statistical Data to Make Decisions: Correlation and Covariance Page 3 How to interpret? Interpretation should be the key criteria for any measure of association  what does it mean for my data? We usually start with trying to understand the extremes. What does it mean to have a perfect relationship (the highest value or the lowest value)? What does it mean if there is no relationship? If you can identify a clear understanding of the extremes you can begin to gain a sense of what an intermediate value means. The next section will begin to discuss covariance and then correlation. We will return to these criteria of measures of association as a way to interpret and compare these two measures of association. COVARIANCE We have already started with the concept of how a single variable varies about its mean as a measure of the spread of the data. We identified the variance as the total sum of squared deviations about the mean (Total Sum of Squares) divided by n1 (the degrees of freedom). We will use a similar concept to talk about how two variables vary about their means together. Another way to express the formula for covariance is given below. SS XY is called the sum of squares cross product. Cov XY = SS n XY The formula for covariance is given below. If you focus on the numerator, it shows that the we are looking at how two variables vary about their means together. Cov XY = n 2 ( X i X ) ( Yi Y ) i= 1 Let me use an illustration to show how covariance works, and then we will use a data example. The following table (Figure 1) represents a the graph of a scatter plot between X (on the horizontal axis) and Y (on the vertical Axis). I have marked the Ymean and the Xmean values on the graph with lines which divide the graph into four quadrants. A data point that is above the mean for both X and Y will fall in the first quadrant, and a data point that is both below the mean for Y and the mean for X will fall in the third quadrant. If a scatter plot tends to have values that fall mainly in the First and Third quadrants the covariance between the two variables will be positive  values of X tend to vary about its mean in the same way that values of Y vary about its mean. Likewise, if values tend to fall in the Second and Fourth quadrants it means that deviations of X values about the X mean tend to be in a different direction than deviations of Y values about its mean. This is associated with negative covariance. n 2 If a scatter plot tends to have values that fall mainly in the I and III quadrants, the covariance between the two variables will be positive. If they fall in the II and IV quadrants, it will result in negative covariance.
4 Using Statistical Data to Make Decisions: Correlation and Covariance Page 4 II I Ymean III IV Xmean Figure 1. Graphic depiction of Covariance Between Two Variables, X and Y Let s look at a data example. The following is some data about midlevel managers in a company. The variables are RATING, a rating scale of the managers from 0 to 10; SALARY, the salary of the manager in $1,000); YEARS, years of service at the company; and ORIGIN, a dummy variable indicating whether they were promoted inside the company (coded as 0) or were recruited from outside the company (coded as 1). The descriptive statistics for these variables are given below. RATING SALARY YEARS ORIGIN Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Table 1. Descriptive Statistics in the Manager Salary Example
5 Using Statistical Data to Make Decisions: Correlation and Covariance Page 5 The mean salary level is 71.63, or $71,630. The mean for ORIGIN is.59, indicating that 59 percent of the managers were recruited from outside the company. The mean and the median levels for all the variables are very close to each other, indicating no great skew in any of the variables. The coefficients of variation (data not shown) indicate that the most variability is with the variable YEARS (CV = 48%). The covariance matrix is given in Table 2. A covariance matrix shows the covariance of each variable with the other variables and itself. It is a symmetric matrix (the of covariance of X with Y is the same as the covariance of Y with X). As a result, you generally only see half the matrix presented as output (the rest is redundant). The values on the diagonal are the covariance of each variable with itself in other words, the variances. If you compare these values with the variances in the descriptive statistics tables you will notice a slight difference. For example, the covariance of RATING with itself is and the variance is given as The slight difference is because the descriptive statistics use the sample formula for the covariance which is divided by n1. Table 2. Covariance Matrix of Manager Salary Data RATING SALARY YEARS ORIGIN RATING SALARY YEARS ORIGIN Limitations of Covariance Covariance is measured in squared crossproducts terms The upper bound is not known Hard to interpret and compare The covariance values in Table 2 point out some of the problems with using covariance as a measure of association. The values are is squared crossproduct terms and are hard to interpret. There is a sign to the values (either positive or negative), but it is not clear how to interpret something in squared, crossproduct terms. Covariance are unbounded, and thus it is difficult to determine if a value is larger or small. As a result, interpretation is difficult. Most of these problems will be solved by making a transformation of the covariance into correlation coefficients. However, the covariance is the building block for regression and many other multivariate analyses. It is important to at least grasp the basic concept of covariance  that it is based on how two variables vary about their means together; that it is similar to the variance and seeks to place the measure of association in the context of variability of the variables; and that it is a symmetric measure of association.
6 Using Statistical Data to Make Decisions: Correlation and Covariance Page 6 CORRELATION If we divided the SS XY by the crossproduct of the standard deviations we generate a new measure of association, the correlation coefficient (often designated by r). The correlation coefficient is a standardized version of the covariance. It is bounded between 1 and 1, and zero means there is no linear relationship between the two variables. Correlation coefficients provide an easy way to summarize the relationship between two variables and that is why they are so often used. You should note that correlation coefficient requires an equal sample size for both variables and any missing values for one variable will cause that observation to be removed from the analysis (this is called pairwise deletion). The formula for the correlation coefficient (also known as the Pearson Product Moment Correlation Coefficient) is given below. Cov r = XY σ σ X Y The correlation coefficient r) has the following useful properties. The correlation coefficient has many nice properties: It is bounded between 1 and 1 It is a symmetric measure of association It is standardized measure and easy to compare It is invariant to scale r has a range from 1 to 1. A value of 1 means perfect negative correlation, a value of 1 means perfect positive correlation, and a value of 0 means no linear association. Thus, it is bounded to 1 to 1. If you obtain a value greater than 1 or less than 1, something is wrong! The correlation coefficient is a symmetrical measure of association. The correlation between X and Y is the same as the correlation between Y and X ( r XY = r YX ) The correlation coefficient is invariant to scale. By this I mean that if you add or subtract a constant to each value in the data set, or you multiply or divide by a constant, it does not change the correlation between the two variables. For example, if you express income as per $1,000, it will not change the relationship of income and sales. As with covariance, the correlation matrix is usually present as half a matrix because the values are symmetrical. Table 3 contains the correlations for the Manager Salary data.
7 Using Statistical Data to Make Decisions: Correlation and Covariance Page 7 Table 3. Correlation Coefficients for the Manager Salary Data RATING SALARY YEARS ORIGIN RATING SALARY YEARS ORIGIN The values on the diagonal are all 1 indicating each variable is perfectly correlated with itself. The value of.684 shows the correlation between RATING and SALARY. Its interpretation is that managers with higher salaries tend to get higher ratings. The correlation is not perfect, but it is moderately large (we will see a scatter plot of these two variables to get a better sense of what a correlation of.684 looks like). Any correlation with a dummy variable (one which has only two values, zero and one) has a very simple interpretation. Since it is a dummy variable that only takes on two values, the interpretation of the correlation coefficient reflects which group has a higher on average level of the other variable. For example, the correlation between ORIGIN and SALARY is This means that managers who are recruited outside the company (ORIGIN =1) have on average, lower salaries. The correlation coefficient is a useful summary measure of a relationship between two variables,. With a single value you can talk about the strength and direction of the relationship. However, we need to be cautious in its use. For one thing, it is a linear measure of association between two variables. A correlation of zero means there is no linear relationship between two variables. It would be represented by a flat line in a graphical representation. However, if the relationship in nonlinear the correlation coefficient would fail to capture the full relationship. Figure 2. Shows a graphical depiction of an obvious and perfect nonlinear relationship. Such a relationship would most likely have a correlation of near zero. A correlation with a continuous variable with a dummy variable has the following interpretation. If the correlation is positive, the category in the dummy variable that is represented by one tends to have higher on average values of the continuous variables. If the correlation is negative, the dummy group represented by one has lower on average values. The correlation coefficient is a linear measure of association. A value of zero only means no linear association between the variables. Nonlinear Relationshp Figure 2. Graphic of a NonLinear Relationship
8 Using Statistical Data to Make Decisions: Correlation and Covariance Page 8 A second caution with correlations is that it does not reflect causality; the fact that two things are correlated does not mean one variable causes the other. This is an easy trap to fall into, but as we will see in multiple regression, bivariate relationships can be deceiving. For example, in the summer, there is a correlation between ice cream sales and the number of people who drown in cities and towns across America. This does not mean that eating ice cream causes people to drown  the two things tend to happen more in the summer time, and the season is the third variable that is related to both of the others. Correlation does not imply causality  be careful not to imply a casual relationship when using correlation coefficients. GRAPHICAL EXAMPLES OF CORRELATIONS A value of 1 or 1, or a value of zero, are relatively easy correlations to interpret. A value of 1 or 1 reflects a perfect linear relationship between two variables. A value of zero reflects no linear relationship. If we drew a line on a scatter plot for a correlation of zero it would be a flat line  any change in the value of X does not influence the value of Y. However, intermediate values of correlations are not as easy to interpret. Often what is large or small depends upon the data you are using and the discipline you are involved with. When the units of analysis are people, correlations of.5 to.6 are relatively large. However, when looking at data over time, correlations tend to be much higher;.90 to.99. Scatter Plot of Salary vs Employee Rating SALARY ($1,000s) RATING Figure 3. Scatter Plot of Salary Versus Rating Scatter plots are a useful way to look at the relationship between two variables. Figure 3 shows the scatter plot of the relationship between SALARY(Yaxis) and RATING (Xaxis). Earlier we noted that the correlation between these two variables was.684. From the graph we can see that the relationship is linear, but not perfect. If we fit a line to the data all the points would not fit on the line.
9 Using Statistical Data to Make Decisions: Correlation and Covariance Page 9 SALARY ($1,000s) Scatter Plot of Salary vs Employee Rating y = x R 2 = RATING Excel will allow you to fit a best fitting line to the scatter plot. This line is a regression line. Figure 4. Scatter Plot with Trendline, Equation, and R 2 In fact, Excl will allow us to fit a best Fitting linear line which is generated from a regression of SALARY on RATING. Using options with the Chart feature in Excel we can add a trend line, include the equation of the line on the chart, and include a measure of association called R 2. Figure 4 shows the same graph with these options. The options can be accessed by selecting the graph in Excel, clicking on Chart in the menu bar, and then clicking on Add Trendline. Once in Trendline you should click on Linear and then you can access options of including the equation and R 2. The best fitting line in Figure 4 is actually a regression line. From the graph we can see that the line fits the data very well. The equation for the line follows the classic formula for a line with an intercept term (a) and a slope coefficient (b) Y = a +b(x). Our line is not a perfect deterministic function (there is scatter around the line) so I am expressing it as an estimate. Estimated Y = (X) R 2 given on the graph is a measure of association from regression. More will be said about this in the next module on regression. For now we can say that R 2 shows how much of the dependent variable (in this case SALARY) is explained by knowing something about the independent variable. It ranges from zero to one. In this case, an R means that 46.7 percent of the variability in SALARY is explained by knowing the RATING of the employee. You should also note that if we squared the correlation coefficient it would equal R 2 (r 2 = R 2 for a bivariate regression). Try it and see. Thus, another interpretation of the correlation coefficient, if squared, is how much variability in one variable is explained by knowing something about another variable.
10 Using Statistical Data to Make Decisions: Correlation and Covariance Page 10 Average State Verbal Scores Versus Math Scores Verbal Scores y = x R 2 = Math Scores Scatter Plots are a good way to see the correlation between two variables. Figure 5. Average State Verbal SAT Scores Versus Math Scores, 2001 Let s look at few other graphic depictions of correlations to better see what a high or low correlation looks like. In Figure 5 we have a scatter plot of average state verbal versus math SAT scores. The correlation is very high,.970. You can see that the pattern is linear and there is very little scatter of the data points around the best fitting line. The positive correlation tells us that states with higher average verbal scores also tend to have higher average math scores, as might be expected. Notice also that R2 for the this line is very high, percent of the variability in verbal scores is explained by knowing the math scores. A scatter plot can show the strength and direction of the relationship, as well as if the relationship is in fact linear. Figure 6 show a strong negative correlation between the average state SAT scores (verbal plus math) versus the Average State SAT scores by Percent Taking the Test, 2001 Average SAT (Math + Verbal) y = x R 2 = Percent Taking Figure 6. Average SAT Scores (Math + Verbal) Versus Percent of High School Class Taking SAT percent of the high school class that took the SAT test. The correlation between these two variables is The scatter plot shows the downward slope of the relationship and that the fit of the line is good, but not perfect.
11 Using Statistical Data to Make Decisions: Correlation and Covariance Page 11 Manager Salary versus Years of Service Salary ($1,000s) y = x R 2 = Years of Service Figure 7. Scatter Plot of a Low Correlation Between Salary and Years of Service Finally, the last graph shows a weak correlation between two variables (Figure 7). The correlation between the managers salary and years of service is The more years of service, the lower the salary, but the relationship is weak. Figure 6 shows far more scatter around the best fitting line. We can see the relationship in the graph, but there is considerable scatter in the data than in the other graphs. CONCLUSIONS Measures of association are useful summary statistics to describe a relationship between two or more variables. In this module we looked at covariance and correlation as two measures of linear association between two variables. Both of these measures are related to each other and to regression. The correlation coefficient is a standardized version of the covariance so it has a known range and is bounded between 1 and 1, with zero indicating no linear relationship. In a single number, the correlation coefficient provides a indication of the strength and direction of the relationship. It is a useful next step in data analysis to begin to examine bivariate relationships with correlation coefficients and to graph these relationships. We also noted that caution should be taken with correlation coefficients in two main areas. First, it is a linear measure of association. We cannot assume that a low value of a correlation means that there is no association, only there is no linear association. The second issue is to be careful not to imply causation when dealing with correlation coefficients. While we noted we can establish that two variables are related to each other, care should be taken not to say that one variable causes the other.
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x  x) B. x 3 x C. 3x  x D. x  3x 2) Write the following as an algebraic expression
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationChapter 3: Central Tendency
Chapter 3: Central Tendency Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents
More informationThe aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree
PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table covariation least squares
More informationLecture 5: Correlation and Linear Regression
Lecture 5: Correlation and Linear Regression 3.5. (Pearson) correlation coefficient The correlation coefficient measures the strength of the linear relationship between two variables. The correlation is
More informationIntroduction to Regression. Dr. Tom Pierce Radford University
Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationLEARNING OBJECTIVES SCALES OF MEASUREMENT: A REVIEW SCALES OF MEASUREMENT: A REVIEW DESCRIBING RESULTS DESCRIBING RESULTS 8/14/2016
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION LEARNING OBJECTIVES Contrast three ways of describing results: Comparing group percentages Correlating scores Comparing group means Describe
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationLesson 4 Part 1. Relationships between. two numerical variables. Correlation Coefficient. Relationship between two
Lesson Part Relationships between two numerical variables Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear between two numerical variables Relationship
More informationElementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination
Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used
More informationA frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes
A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationRegression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology
Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 15 scale to 0100 scores When you look at your report, you will notice that the scores are reported on a 0100 scale, even though respondents
More information11/20/2014. Correlational research is used to describe the relationship between two or more naturally occurring variables.
Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection
More informationModule 2 Project Maths Development Team Draft (Version 2)
5 Week Modular Course in Statistics & Probability Strand 1 Module 2 Analysing Data Numerically Measures of Central Tendency Mean Median Mode Measures of Spread Range Standard Deviation InterQuartile Range
More information4. Describing Bivariate Data
4. Describing Bivariate Data A. Introduction to Bivariate Data B. Values of the Pearson Correlation C. Properties of Pearson's r D. Computing Pearson's r E. Variance Sum Law II F. Exercises A dataset with
More informationStatistical Analysis Using Gnumeric
Statistical Analysis Using Gnumeric There are many software packages that will analyse data. For casual analysis, a spreadsheet may be an appropriate tool. Popular spreadsheets include Microsoft Excel,
More informationData Analysis: Describing Data  Descriptive Statistics
WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationSection 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
More informationCorrelation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
More informationThe correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
More informationSpearman s correlation
Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship
More informationSimple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
More informationLecture  32 Regression Modelling Using SPSS
Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture  32 Regression Modelling Using SPSS (Refer
More informationNumerical Summarization of Data OPRE 6301
Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting
More informationRegression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture  2 Simple Linear Regression
Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture  2 Simple Linear Regression Hi, this is my second lecture in module one and on simple
More informationClass 6: Chapter 12. Key Ideas. Explanatory Design. Correlational Designs
Class 6: Chapter 12 Correlational Designs l 1 Key Ideas Explanatory and predictor designs Characteristics of correlational research Scatterplots and calculating associations Steps in conducting a correlational
More informationDealing with Data in Excel 2010
Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationRegression III: Advanced Methods
Lecture 5: Linear leastsquares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression
More informationSimple Regression and Correlation
Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas
More informationAlgebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
More informationCourse Title: Honors Algebra Course Level: Honors Textbook: Algebra 1 Publisher: McDougall Littell
Course Title: Honors Algebra Course Level: Honors Textbook: Algebra Publisher: McDougall Littell The following is a list of key topics studied in Honors Algebra. Identify and use the properties of operations
More informationResearch Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement
Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.
More informationContent DESCRIPTIVE STATISTICS. Data & Statistic. Statistics. Example: DATA VS. STATISTIC VS. STATISTICS
Content DESCRIPTIVE STATISTICS Dr Najib Majdi bin Yaacob MD, MPH, DrPH (Epidemiology) USM Unit of Biostatistics & Research Methodology School of Medical Sciences Universiti Sains Malaysia. Introduction
More informationAlgebra I: Lesson 54 (5074) SAS Curriculum Pathways
TwoVariable Quantitative Data: Lesson Summary with Examples Bivariate data involves two quantitative variables and deals with relationships between those variables. By plotting bivariate data as ordered
More informationCHAPTER 3: GRAPHS OF QUADRATIC RELATIONS
CHAPTER 3: GRAPHS OF QUADRATIC RELATIONS Specific Expectations Addressed in the Chapter Collect data that can be represented as a quadratic relation, from experiments using appropriate equipment and technology
More informationF. Farrokhyar, MPhil, PhD, PDoc
Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationData. ECON 251 Research Methods. 1. Data and Descriptive Statistics (Review) CrossSectional and TimeSeries Data. Population vs.
ECO 51 Research Methods 1. Data and Descriptive Statistics (Review) Data A variable  a characteristic of population or sample that is of interest for us. Data  the actual values of variables Quantitative
More informationSPSS: Descriptive and Inferential Statistics. For Windows
For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 ChiSquare Test... 10 2.2 T tests... 11 2.3 Correlation...
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationChapter 3 Descriptive Statistics: Numerical Measures. Learning objectives
Chapter 3 Descriptive Statistics: Numerical Measures Slide 1 Learning objectives 1. Single variable Part I (Basic) 1.1. How to calculate and use the measures of location 1.. How to calculate and use the
More informationChapter 15 Multiple Choice Questions (The answers are provided after the last question.)
Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationData reduction and descriptive statistics
Data reduction and descriptive statistics dr. Reinout Heijungs Department of Econometrics and Operations Research VU University Amsterdam August 2014 1 Introduction Economics, marketing, finance, and most
More information17.0 Linear Regression
17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was
More informationUNDERSTANDING MULTIPLE REGRESSION
UNDERSTANDING Multiple regression analysis (MRA) is any of several related statistical methods for evaluating the effects of more than one independent (or predictor) variable on a dependent (or outcome)
More informationAlgebra 1 Chapter 3 Vocabulary. equivalent  Equations with the same solutions as the original equation are called.
Chapter 3 Vocabulary equivalent  Equations with the same solutions as the original equation are called. formula  An algebraic equation that relates two or more reallife quantities. unit rate  A rate
More informationMathematics. Probability and Statistics Curriculum Guide. Revised 2010
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More information2.7. The straight line. Introduction. Prerequisites. Learning Outcomes. Learning Style
The straight line 2.7 Introduction Probably the most important function and graph that you will use are those associated with the straight line. A large number of relationships between engineering variables
More informationAMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015
AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This
More informationBelow is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.
Excel Tutorial Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Working with Data Entering and Formatting Data Before entering data
More informationBiostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
More informationMultiple regression  Matrices
Multiple regression  Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationRegression. In this class we will:
AMS 5 REGRESSION Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationHigh School Algebra 1 Common Core Standards & Learning Targets
High School Algebra 1 Common Core Standards & Learning Targets Unit 1: Relationships between Quantities and Reasoning with Equations CCS Standards: Quantities NQ.1. Use units as a way to understand problems
More informationHomework 11. Part 1. Name: Score: / null
Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = 0.80 C. r = 0.10 D. There is
More informationWe are often interested in the relationship between two variables. Do people with more years of fulltime education earn higher salaries?
Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of fulltime education earn higher salaries? Do
More informationGCSE HIGHER Statistics Key Facts
GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information
More informationRegression III: Dummy Variable Regression
Regression III: Dummy Variable Regression Tom Ilvento FREC 408 Linear Regression Assumptions about the error term Mean of Probability Distribution of the Error term is zero Probability Distribution of
More informationOutline. Correlation & Regression, III. Review. Relationship between r and regression
Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation
More informationDescribe what is meant by a placebo Contrast the doubleblind procedure with the singleblind procedure Review the structure for organizing a memo
Readings: Ha and Ha Textbook  Chapters 1 8 Appendix D & E (online) Plous  Chapters 10, 11, 12 and 14 Chapter 10: The Representativeness Heuristic Chapter 11: The Availability Heuristic Chapter 12: Probability
More informationCanonical Correlation
Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present
More informationLecture 18 Linear Regression
Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation  used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation
More informationStudy Resources For Algebra I. Unit 1C Analyzing Data Sets for Two Quantitative Variables
Study Resources For Algebra I Unit 1C Analyzing Data Sets for Two Quantitative Variables This unit explores linear functions as they apply to data analysis of scatter plots. Information compiled and written
More informationStatistical Foundations: Measures of Location and Central Tendency and Summation and Expectation
Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #49/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationEXCEL Tutorial: How to use EXCEL for Graphs and Calculations.
EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your
More informationFor example, enter the following data in three COLUMNS in a new View window.
Statistics with Statview  18 Paired ttest A paired ttest compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the
More informationCourse Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.
SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed
More informationCORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there
CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the
More informationSIMPLE REGRESSION ANALYSIS
SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two
More informationMEASURES OF VARIATION
NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are
More informationChapter 14: Analyzing Relationships Between Variables
Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between
More informationPrentice Hall Mathematics: Algebra 1 2007 Correlated to: Michigan Merit Curriculum for Algebra 1
STRAND 1: QUANTITATIVE LITERACY AND LOGIC STANDARD L1: REASONING ABOUT NUMBERS, SYSTEMS, AND QUANTITATIVE SITUATIONS Based on their knowledge of the properties of arithmetic, students understand and reason
More informationDESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi  110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics
More information5 Week Modular Course in Statistics & Probability Strand 1. Module 2
5 Week Modular Course in Statistics & Probability Strand 1 Module 2 Analysing Data Numerically Measures of Central Tendency Mean Median Mode Measures of Spread Range Standard Deviation InterQuartile Range
More information0.1 Multiple Regression Models
0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationWEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6
WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent beforethefact, expected values. In particular, the beta coefficient used in
More informationData exploration with Microsoft Excel: analysing more than one variable
Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical
More informationThere are six different windows that can be opened when using SPSS. The following will give a description of each of them.
SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet
More informationUsing Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationPortable Assisted Study Sequence ALGEBRA IIA
SCOPE This course is divided into two semesters of study (A & B) comprised of five units each. Each unit teaches concepts and strategies recommended for intermediate algebra students. The first half of
More informationCOMPARING LINEAR AND NONLINEAR FUNCTIONS
1 COMPARING LINEAR AND NONLINEAR FUNCTIONS LEARNING MAP INFORMATION STANDARDS 8.F.2 Compare two s, each in a way (algebraically, graphically, numerically in tables, or by verbal descriptions). For example,
More informationwhere b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.
Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationSan Jose State University Engineering 10 1
KY San Jose State University Engineering 10 1 Select Insert from the main menu Plotting in Excel Select All Chart Types San Jose State University Engineering 10 2 Definition: A chart that consists of multiple
More information