Covariance and Correlation



Similar documents
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Correlation in Random Variables

Econometrics Simple Linear Regression

Normal distribution. ) 2 /2σ. 2π σ

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Math 431 An Introduction to Probability. Final Exam Solutions

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Joint Exam 1/P Sample Exam 1

Session 7 Bivariate Data and Analysis

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Some probability and statistics

The Bivariate Normal Distribution

Module 3: Correlation and Covariance

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

Data Mining: Algorithms and Applications Matrix Math Review

6.4 Normal Distribution

Solution to HW - 1. Problem 1. [Points = 3] In September, Chapel Hill s daily high temperature has a mean

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

5.1 Identifying the Target Parameter

ST 371 (IV): Discrete Random Variables

Polynomial Invariants

CALCULATIONS & STATISTICS

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Lecture 4: Joint probability distributions; covariance; correlation

Hypothesis Testing for Beginners

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Point and Interval Estimates

Quadratic forms Cochran s theorem, degrees of freedom, and all that

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Part 2: Analysis of Relationship Between Two Variables

Lesson 4 Measures of Central Tendency

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Example: Boats and Manatees

Lecture 6: Discrete & Continuous Probability and Random Variables

Lecture 16 : Relations and Functions DRAFT

CURVE FITTING LEAST SQUARES APPROXIMATION

Notes on Continuous Random Variables

Descriptive Statistics and Measurement Scales

Linear and quadratic Taylor polynomials for functions of several variables.

Holding Period Return. Return, Risk, and Risk Aversion. Percentage Return or Dollar Return? An Example. Percentage Return or Dollar Return? 10% or 10?

Correlation key concepts:

Linear Algebra Notes

Review of Fundamental Mathematics

PYTHAGOREAN TRIPLES KEITH CONRAD

Week 4: Standard Error and Confidence Intervals

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Mathematical Expectation

Vieta s Formulas and the Identity Theorem

How To Find The Correlation Of Random Bits With The Xor Operator

Supply Chain Management: Risk pooling

4. Continuous Random Variables, the Pareto and Normal Distributions

Standard Deviation Estimator

Multivariate Analysis (Slides 13)

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

The Monte Carlo Framework, Examples from Finance and Generating Correlated Random Variables

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Feb 7 Homework Solutions Math 151, Winter Chapter 4 Problems (pages )

1 Portfolio mean and variance

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

More Quadratic Equations

Math 461 Fall 2006 Test 2 Solutions

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Random Variables. Chapter 2. Random Variables 1

x 2 + y 2 = 1 y 1 = x 2 + 2x y = x 2 + 2x + 1

Chapter 4 Lecture Notes

Week 3&4: Z tables and the Sampling Distribution of X

5. Continuous Random Variables

Lesson 20. Probability and Cumulative Distribution Functions

1 Prior Probability and Posterior Probability

The Method of Partial Fractions Math 121 Calculus II Spring 2015

Chapter 5. Conditional CAPM. 5.1 Conditional CAPM: Theory Risk According to the CAPM. The CAPM is not a perfect model of expected returns.

A Short Guide to Significant Figures

Statistics 100A Homework 7 Solutions

ECE302 Spring 2006 HW5 Solutions February 21,

Zeros of a Polynomial Function

Classification Problems

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Statistics. Measurement. Scales of Measurement 7/18/2012

TEACHER NOTES MATH NSPIRED

Risk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014

CORRELATION ANALYSIS

Chapter 4. Probability and Probability Distributions

3. Mathematical Induction

1 if 1 x 0 1 if 0 x 1

Answer Key for California State Standards: Algebra I

Statistics 100A Homework 4 Solutions

University of California, Los Angeles Department of Statistics. Random variables

Solutions to Math 51 First Exam January 29, 2015

1 The Brownian bridge construction

Introduction to Hypothesis Testing

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Means, standard deviations and. and standard errors

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Simple Regression Theory II 2010 Samuel L. Baker

Inner Product Spaces

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Transcription:

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such as the sample mean and sample variance. Likewise, we have seen how to summarize probability distribution of a random variable X by similar measures of location and spread, the mean and variance parameters. Now we ask, For a pair of random variables X and Y having joint probability distribution p(x, y)=p (X = x, Y = y) how might we summarize the distribution? For location features of a joint distribution, we simply use the means µ X and µ Y of the corresponding marginal distributions for X and Y. Likewise, for spread features we use σx 2 and σ2 Y.Forjoint distributions, however, we can go further and explore a further type of feature: the manner in which X and Y are interrelated or manifest dependence. For example, consider the joint distribution given by the following table for p(x, y): X Y 0 1 0.7.1 1.1.1 We see that there is indeed some dependence here: if a pair (X, Y ) is selected at random according to this distribution, the probability that (X, Y ) =(0, 0) is selected is.70, whereas the product of the probabilities that X = 0 and Y =0is.8.8 =.64.70. So the events X = 0 and Y = 0 are dependent events. But we can go further, asking: How might we characterize the extent or quantity of this dependence feature? There are in fact a variety of possible ways to formulate a suitable measure of dependence. We shall consider here one very useful approach: covariance. Covariance One way that X and Y can exhibit dependence is to vary together i.e., the distribution p(x, y) might attach relatively high probability to pairs (x, y) for which the deviation of x above its mean, x µ X, and the deviation of y above its mean, y µ Y, are either both positive or both negative and relatively large in magnitude. Thus, for example, the information that a pair (x, y) had an x with positive deviation x µ X would suggest that, unless something unusual had occurred, the y of the 1

given pair also had a positive deviation above its mean. A natural numerical measure which takes account of this type of information is the sum of terms (1) (x µ X )(y µ Y )p(x, y). x y For the kind of dependence just described, this sum would tend to be dominated by large positive terms. Another way that X and Y could exhibit dependence is to vary oppositely, in which case pairs (x, y) such that one of x µ X and y µ Y is positive and the other negative would receive relatively high probability. In this case the sum (1) would tend to be dominated by negative terms. Consequently, the sum (1) tends to indicate something about the kind of dependence between X and Y. We call it the covariance of X and Y and use the following notation and representations: Cov(X, Y )=σ XY = (x µ X )(y µ Y )p(x, y)=e [(X µ X )(Y µ Y )]. x y It is easily checked that an equivalent formula for computing the covariance is: σ XY = E(XY) E(X)E(Y ). For the example of p(x, y) considered above, we find: µ X = µ Y =.2, E(XY)=.1, σ XY =.06. Does this indicate a strong relationship between X and Y? What kind of relationship? How can we tell whether a particular value for σ XY is meaningfully large or not? We ll return to these questions below, but for now let us explore some other aspects of covariance. Independence of X and Y implies Covariance = 0. This important fact is seen (for the discrete case) as follows. If X and Y are independent, then their joint distribution factors into the product of marginals. Using this, we have σ XY = (x µ X )(y µ Y )p(x, y) x y = (x µ X )(y µ Y )p X (x)p Y (y) x y [ ][ ] = (x µ X )p X (x) (y µ Y )p Y (y) x = [µ X µ X ][µ Y µ Y ] = 0 0=0. Of course, we should want any reasonable measure of dependence to reduce to the value 0 in the case of an absence of dependence. y 2

For some measures of dependence which have been proposed in the literature, the converse holds as well: if the measure has value 0, then the variables are independent. However, for the covariance measure, this converse is not true. EXAMPLE. Consider the joint probability distribution X Y 1 0 1 0 0 1/3 0 1 1/3 0 1/3 Note that for this distribution we have a very strong relationship between X and Y : ignoring pairs (x, y) with 0 probability, we have On the other hand, X = Y 2. σ XY = E(XY) E(X)E(Y )=E(Y 3 ) E(X) 0=E(Y 3 )=0. Thus the covariance measure fails to detect the dependence structure. Preliminary Conclusions about Covariance: a) Independence of X and Y implies σ XY =0; b) σ XY = 0 does not necessarily indicate independence; c) σ XY 0 indicates some kind of dependence is present (what kind?). In order to be able to interpret the meaning of a nonzero covariance value, we ask How big can σ XY be in value? It turns out that the covariance always lies between two limits which may be expressed in terms of the variances of X and Y : σ X σ Y σ XY +σ X σ Y. (This follows from the Cauchy-Schwarz inequality: for any two r.v. s W and Z, E(WZ) [E(W 2 )] 1/2 [E(Z 2 )] 1/2, with equality if and only if W and Z are proportional.) Moreover, the covariance can attain one of these limits only in the case that X µ X and Y µ Y are proportional: i.e., for some constant c, X µ X = c(y µ Y ), i.e., Y = 1 c X +(µ Y µ X c ), i.e., X and Y satisfy a linear relationship, Y = ax + b for some choice of a and b. 3

Interpretation of covariance. The above considerations lead to an interpretation of covariance: σ XY measures the degree of linear relationship between X and Y. (This is why σ XY = 0 for the example in which X = Y 2 with X symmetric about 0. This is a purely quadratic relationship, quite nonlinear.) Thus we see that covariance measures a particular kind of dependence, the degree of linear relationship. We can assess the strength of a covariance measure by comparing its magnitude with the largest possible value, σ X σ Y. If σ XY attains this magnitude, then the variables X and Y have a purely linear relationship. If σ XY = 0, however, then either the relationship between X and Y can be assumed to be of some nonlinear type, or else the variables are independent. Ifσ XY lies between these values in magnitude, then we conclude that X and Y have a relationship which is a mixture of linear and other components. A somewhat undesirable aspect of the covariance measure is that its value changes if we transform the variables involved to other units. For example, if the variables are X and Y and we transform to new variables X = cx, Y = dy, note that Cov(X,Y ) = E [(X E(X ))(Y E(Y ))] = E [c(x E(X))d(Y E(Y ))] = cd Cov(X, Y ). Thus, counter to our intuition that dependence relationships should not be altered by simply rescaling the variables, the covariance measure indeed is affected. It is preferable to have a dependence measure which is not sensitive to irrelevant details such as units of measurement. Next we see how to convert to an equivalent measure which is dimensionless in this sense. Correlation Recall that the upper and lower limits for the possible values which the covariance can take are given in terms of the variances, which also change with rescaling of variables. Consequently, in order to decide whether a particular value for the covariance is big or small, we need to assess it relative to the variances of the two variables. One way to do this is to divide the covariance by the product of the standard deviations of the variables, producing the quantity [( )( )] X µx Y µy ρ XY = E = σ X σ Y Cov(X, Y ) Var(X) Var(Y ) = σ XY σ X σ Y, which we call the correlation of X and Y. It is easily seen (check) that this measure does not change when we rescale the variables X and Y to X = cx, Y = dy 4

as considered above. Also, the upper and lower limits we saw for the covariance translate to the following limits for the value of the correlation: 1 ρ XY 1. Thus we can judge the strength of the dependence by how close the correlation measure comes to either of its extreme values +1 or 1 (and away from the value 0 that suggests an absence of dependence). However, keep in mind that like the covariance the correlation is especially sensitive to a certain kind of dependence, namely linear dependence, and can have a small value (near 0) even when there is strong dependence of other kinds. (We saw in the previous lecture an example when the covariance and hence the correlation is 0 even when there is strong dependence, but of a nonlinear kind.) Let us now illustrate the use of the correlation parameter to judge the degree of (linear) dependence in the following probability distribution which we considered above: Y 0 1 0.7.1 X 1.1.1 We had calculated its covariance to be σ XY =.06, but at that time we did not assess how meaningful this value is. Now we do so, by converting to correlation. First finding the variances σx 2 =.16 = σ2 Y, we then compute ρ XY =.06.16.16 =.36, which we might interpret as moderate (not close to 0, but not close to 1, either). Moreover, note that a positive value of correlation is obtained in this example, indicating that deviations of X from its mean tend to be associated with deviations of Y from its mean in the same direction. In what cases does correlation attain the value +1? 1? Suppose that Y is related to X exactly as a linear transformation of X: Y = a + bx, for some choice of a, b. Then we have the following analysis: and so E(XY)=E[X(a + bx)] = ae(x)+be(x 2 ) σ XY = E(XY) E(X)E(Y )=[ae(x)+be(x 2 )] E(X)E(a + bx) = = bσx 2 and thus, finally, ρ XY = bσ2 X σ X σ Y = b b, 5

which we see takes value +1 if b>0 and value 1 ifb<0. It can be shown, also, that the values +1 and 1 are attainable only in the case of exactly linear relationships. This is the basis for characterizing correlation as a measure of the degree of linear relationship between X and Y. Despite this intuitive appeal of the correlation measure, note that it really doesn t leave us with a precise meaning in the case of a value intermediate between 0 and ±1. intermediate values. Despite any shortcomings the correlation measure may have, it has very wide application. This is based on its fairly successful intuitive appeal as a measure of dependence, the central importance of confirming (or disproving) linear relationships, its application to calculating variances of linear combinations of r.v. s, its application in analyzing and interpreting regression models. As a final illustration, let us consider the special case that Y is defined to be just X itself, i.e., Y = X. This is a special case of the linear transformation with a =0 and b = 1, so we have: ρ = 1, i.e., Corr(X, X)=1. Likewise, we can talk of the covariance of a r.v. X with itself, and we readily find that it reduces to the variance of X: Cov(X, X) = Var(X). (To see this immediately, just go back and check the definition of covariance.) 6