Covariance and Correlation Class 7, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Size: px

Start display at page:

Download "Covariance and Correlation Class 7, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom"

Dana Gregory
7 years ago
Views:

1 1 Learning Goals Covariance and Correlation Class 7, 18.05, Spring 2014 Jerem Orloff and Jonathan Bloom 1. Understand the meaning of covariance and correlation. 2. Be able to compute the covariance and correlation of two random variables. 2 Covariance Covariance is a measure of how much two random variables var together. For eample, height and weight of giraffes have positive covariance because when one is big the other tends also to be big. Definition: Suppose X and Y are random variables with means μ X and μ Y. The covariance of X and Y is defined as Cov(X, Y ) = E((X μ X )(Y μ Y )). 2.1 Properties of covariance 1. Cov(aX + b, cy + d) = accov(x, Y ) for constants a, b, c, d. 2. Cov(X 1 + X 2,Y )=Cov(X 1,Y )+Cov(X 2,Y ). 3. Cov(X, X) =Var(X) 4. Cov(X, Y ) = E(XY ) μ X μ Y. 5. Var(X + Y ) =Var(X)+Var(Y ) + 2Cov(X, Y ) for an X and Y. 6. If X and Y are independent then Cov(X, Y ) =0. Warning: The converse is false. Zero covariance does not alwas impl independence. Note that b Propert 5, the formula in Propert 6 reduces to the earlier formula Var(X + Y ) =Var(X) =Var(Y ) when X and Y are independent. We give the proofs below. However, understanding and using these properties is more important than memorizing their proofs. 2.2 Sums and integrals for computing covariance Since covariance is defined as an epected value we compute it in the usual wa as a sum or integral. 1

2 18.05 class 7, Covariance and Correlation, Spring Discrete case: If X and Y have joint pmf p( i, j )then n m n m Cov(X, Y ) = p( i, j )( i μ X )( j μ Y )= p( i, j ) i j μ X μ Y. i=1 j=1 i=1 j=1 Continuous case: If X and Y have joint pdf f(, ) over range [a, b] [c, d] then d b ( d b ) Cov(X, Y ) = ( μ )( μ )f(, ) d d = f(, ) d d μ μ. c a c a 2.3 Eamples Eample 1. Flip a fair coin 3 times. Let X be the number of heads in the first 2 flips and let Y be the number of heads on the last 2 flips (so there is overlap on the middle flip). Compute Cov(X, Y ). answer: We ll do this twice, first using the joint probabilit table so ou can see how that works, and then using the properties of covariance. With 3 tosses there are onl 8 outcomes {HHH, HHT,...}, so we can create the joint probabilit table directl. X\Y p( i ) 0 1/8 1/8 0 1/4 1 1/8 2/8 1/8 1/ /8 1/8 1/4 p( j ) 1/4 1/2 1/4 1 From the marginals we compute E(X) =1= E(Y ). From the full table we compute E(XY )= = So Cov(X, Y ) = = 4. Net we compute Cov(X, Y ) using the properties of covariance. As usual, let X i be the result of the i th flip, so X i Bernoulli(.5). We have X = X 1 + X 2 and Y = X 2 + X 3. We know E(X i )=1/2 and Var(X i )=1/4. Therefore μ X =1= μ Y. Using Propert 2 of covariance, we have Cov(X, Y ) =Cov(X 1 +X 2,X 2 +X 3 )=Cov(X 1,X 2 )+Cov(X 1,X 3 )+Cov(X 2,X 2 )+Cov(X 2,X 3 ). Since the different tosses are independent we know Cov(X 1,X 2 )=Cov(X 1,X 3 )=Cov(X 2,X 3 )=0.

3 18.05 class 7, Covariance and Correlation, Spring Looking at the epression for Cov(X, Y ) there is onl one non-zero term 1 Cov(X, Y ) =Cov(X 2,X 2 )=Var(X 2 )=. 4 Eample 2. (Zero covariance does not impl independence.) Let X be a random variable that takes values 2, 1, 0, 1, 2; each with probabilit 1/5. Let Y = X 2. Show that Cov(X, Y ) = 0 but X and Y are not independent. answer: We make a joint probabilit table: Y \X p( j ) / / /5 0 1/5 0 2/5 4 1/ /5 1/5 p( i ) 1/5 1/5 1/5 1/5 1/5 1 Using the marginals we compute means E(X) =0 and E(Y ) =2. Net we show that X and Y are not independent b finding one place where p( i, j ) = p( i )p( j ): P (X = 2, Y =0)=0 = 1/25 = P (X = 2) P (Y =0). Finall we compute covariance: 1 Cov(X, Y ) = ( ) μ X μ =0. 5 Discussion: This eample shows that Cov(X, Y ) = 0 does not impl that X and Y are independent. In fact, X and X 2 are as dependent as random variables can be: if ou know the value of X then ou know the value of X 2 with 100% certaint. Theke pointisthatcov(x, Y ) measures the linear relationship between X and Y. In the above eample X and X 2 have a quadratic relationship that is completel missed b Cov(X, Y ). 2.4 Proofs of the properties of covariance 1 and 2 follow from similar properties for epected value. 3. This is the definition of variance: Cov(X, X) = E((X μ X )(X μ X )) = E((X μ X ) 2 )=Var(X). 4. Recall that E(X μ )=0. So Cov(X, Y ) = E((X μ X )(Y μ Y )) = E(XY μ X Y μ Y X + μ X μ Y ) = E(XY ) μ X E(Y ) μ Y E(X)+ μ X μ Y = E(XY ) μ X μ Y μ X μ Y + μ X μ Y = E(XY ) μ X μ Y.

4 18.05 class 7, Covariance and Correlation, Spring Using properties 3 and 2 we get Var(X+Y ) =Cov(X+Y, X+Y ) =Cov(X, X)+2Cov(X, Y )+Cov(Y, Y )=Var(X)+Var(Y )+2Cov(X, Y ).. 6. If X and Y are independent then f(, ) = f X ()f Y (). Therefore Cov(X, Y ) = ( μ X )( μ Y )f X ()f Y () d d = ( μ X )f X () d ( μ Y )f Y () d = E(X μ X )E(Y μ Y ) =0. 3 Correlation The units of covariance Cov(X, Y ) are units of X times units of Y. This makes it hard to compare covariances: if we change scales then the covariance changes as well. Correlation is a wa to remove the scale from the covariance. Definition: The correlation coefficient between X and Y is defined b Cov(X, Y ) Cor(X, Y ) = ρ =. σ X σ Y 3.1 Properties of correlation 1. ρ is the covariance of the standardizations of X and Y. 2. ρ is dimensionless (it s a ratio) ρ 1. Furthermore, ρ = +1 if and onl if Y = ax + b with a > 0, ρ = 1 if and onl if Y = ax + b with a < 0. Propert 3 shows that ρ measures the linear relationship between variables. If the correlation is positive then when X is large, Y will tend to large as well. If the correlation is negative then when X is large, Y will tend to be small. Eample 2 shows that correlation can completel miss higher order relationships. 3.2 Proof of Propert 3 of correlation (This is for the mathematicall interested.) ( ) ( ) ( ) ( ) X Y X Y X Y 0 Var =Var +Var 2Cov, =2 2ρ σ X σy σx σ Y σ X σy ρ 1 ( ) X Y Likewise 0 Var + 1 ρ. σ X σ Y

5 18.05 class 7, Covariance and Correlation, Spring If ρ = 1 then 0 = Var ( X σ X Y σ Y ) X σ X Y σ Y = c. Eample. We continue Eample 1. To compute the correlation we divide the covariance b the standard deviations. In Eample 1 we found Cov(X, Y ) = 1/4 and Var(X) = 2Var(X j ) = 1/2. So, σ X = 1/ 2. Likewise σ Y = 1/ 2. Thus Cov(X, Y ) Cor(X, Y ) = σx σ Y = 1/4 1/2 = 1. 2 We see a positive correlation, which means that larger X tend to go with larger Y and smaller X with smaller Y. In Eample 1 this happens because toss 2 is included in both X and Y, so it contributes to the size of both. 3.3 Bivariate normal distributions The bivariate normal distribution has densit 1 e f(, ) = 2(1 ρ 2 ) [ ( µ X ) 2 σ 2 X + ( µ Y )2 σ 2 Y 2ρ( µ)( µ) σσ ] 2πσ X σ Y 1 ρ 2 For this distribution, the marginal distributions for X and Y are normal and the correlation between X and Y is ρ. In the figures below we used R to simulate the distribution for various values of ρ. Individuall X and Y are standard normal, i.e. µ X = µ Y = 0 and σ X = σ Y = 1. The figures show scatter plots of the results. These plots and the net set show an important feature of correlation. We divide the data into quadrants b drawing a horizontal and a verticle line at the means of the data and data respectivel. A positive correlation corresponds to the data tending to lie in the 1st and 3rd quadrants. A negative correlation corresponds to data tending to lie in the 2nd and 4th quadrants. You can see the data gathering about a line as ρ becomes closer to ± rho= rho=0.30

6 18.05 class 7, Covariance and Correlation, Spring rho= rho= rho= rho= Overlapping uniform distributions We ran simulations in R of the following scenario. X 1, X 2,..., X 20 are i.i.d and follow a U(0, 1) distribution. X and Y are both sums of the same number of X i. We call the number of X i common to both X and Y the overlap. The notation in the figures below indicates the number of X i being summed and the number which overlap. For eample, 5,3 indicates that X and Y were each the sum of 5 of the X i and that 3 of the X i were common to both sums. (The data was generated using rand(1,1000);)

7 18.05 class 7, Covariance and Correlation, Spring (1, 0) cor=0.00, sample_cor= (2, 1) cor=0.50, sample_cor= (5, 1) cor=0.20, sample_cor= (5, 3) cor=0.60, sample_cor= (10, 5) cor=0.50, sample_cor= (10, 8) cor=0.80, sample_cor=0.81

8 MIT OpenCourseWare Introduction to Probabilit and Statistics Spring 2014 For information about citing these materials or our Terms of Use, visit:

Covariance and Correlation

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such