Joint Distribution and Correlation

Size: px

Start display at page:

Download "Joint Distribution and Correlation"

Dulcie Stanley
7 years ago
Views:

1 Joint Distribution and Correlation Michael Ash Lecture 3

2 Reminder: Start working on the Problem Set Mean and Variance of Linear Functions of an R.V. Linear Function of an R.V. Y = a + bx What are the properties of an R.V. built from an underlying R.V.

3 Examples 1. After-Tax Earnings: See the treatment in the book. Ask me if questions. Y = X 2. HEN example: Suppose that the cost of the program per senior (W ) is $10 whether or not the senior participates and $800 for seniors who participate. W = G

4 Principles E(Y ) = E(a + bx ) = E(a) + E(bX ) = a + be(x ) or equivalently µ Y = a + bµ X var(y ) = E [ (Y E(Y )) 2] = E [ (a + bx E(a + bx )) 2] = E [ (a E(a) + bx E(bX )) 2] = E [ (b(x E(X ))) 2] = E [ b 2 (X E(X )) 2] = b 2 E [ (X E(X )) 2] = b 2 var(x )

5 Examples After-Tax Earnings µ Y = µ X σy 2 = (0.8) 2 σx 2 = 0.64σ2 X HEN Example (Warning: Corrections since class!) µ W = µ G σw 2 = (800)2 σg 2 = σ W = =

6 Exercise 2.4 The random variable Y has a mean of 1 and a variance of 4. Let Z = 1 2 (Y 1). Compute µ Z and σ 2 Z. Z = 1 (Y 1) 2 [ ] 1 E(Z) = E (Y 1) 2 [ 1 = E 2 Y 1 ] 2 = 1 2 E [Y ] 1 2 = = 0

7 Exercise 2.4 The random variable Y has a mean of 1 and a variance of 4. Let Z = 1 2 (Y 1). Compute σ2 Z. Z = 1 (Y 1) 2 ( ) 1 var(z) = var (Y 1) 2 ( 1 = var 2 Y 1 ) 2 ( ) 1 2 = var (Y ) 2 = = 1

8 Two Variables: Joint Distribution and Correlation Pr (Y = y X ) E (Y X )

9 Two Variables The probability distribution of Y given X. Pr (Y = y X = x) The expected value of Y given X. E (Y X = x) Are some outcomes of Y associated with some outcomes of the X? If so, then we can use X as a predictor of Y (and may be prepared to consider arguments that X causes Y.

10 Joint Distribution The probability that X is x and Y is y. See Table 2.2. Pr(X = x, Y = y)

11 Marginal and Conditional Distributions Marginal Distribution The probability distribution of Y, ignoring X. Conditional Distributions The probability distribution of Y given, or conditional on, X. Pr (Y = y X = x)

12 Review joint, marginal, and conditional distributions with Table 2.3 Half, or 0.50, of all of the time we get an old computer (A = 0). Thirty-five percent, or 0.35, of all of the time we have an old computer and experience no crashes (A = 0 and M = 0). Of the 0.50 of all of the time that we get an old computer, 0.35 of all of the time we have no crashes. This means that conditional on having an old computer, we experience no crashes = 0.70 of the times that we have an old computer.

13 Bayes Law Start with the intuitive (say this in words): What is the probability that X = x and Y = y are both true? It s the probability that Y = y is true given that X = x is true times the probability that X = x is true. Pr (X = x, Y = y) = Pr (Y = y X = x) Pr (X = x) Reorganize into Bayes Law: Pr (Y = y X = x) = Pr (X = x, Y = y) Pr (X = x)

14 Bayes Law: Alternative Note, by the way, that an alternative decomposition was possible: Pr (X = x, Y = y) = Pr (X = x Y = y) Pr (Y = y) Reorganize into Bayes Law: Pr (X = x Y = y) = Pr (X = x, Y = y) Pr (Y = y)

15 Bayes Law: Final form and interpretation Pr (Y = y X = x) = = = Pr (X = x, Y = y) Pr (X = x) Pr (X = x Y = y) Pr (Y = y) Pr (X = x) Pr (X = x Y = y) Pr (Y = y) Pr (X = x Y = y) + Pr (X = x Y y) Posterior probability depends on the prior and the evidence.

16 Bayes Law: Example Surprising result from false positives on a test for a rare disease Suppose Y is a Bernoulli random variable for having a rare disease. Pr (Y = 1) = 0.01, i.e., one percent prevalence in the population. Suppose X is a Bernoulli random variable for testing positive for the disease. The test can deliver both false positives and false negatives, but it is fairly accurate. Pr (X = 1 Y = 1) = 0.95 and Pr (X = 0 Y = 0) = Thus the false negative rate is 0.05 and the false positive rate is Is a positive test result very bad news? Pr (Y = 1 X = 1) = Pr (X = 1 Y = 1) Pr (Y = 1) Pr (X = 1) = = 0.12 (1)

17 Independence Learning X does not improve our guess about Y. Pr (Y = y X = x) = Pr (Y = y) From Probability Distribution to Expected Value & Variance Key concept: repeat application of the definition of E()

18 Exercise 2.3 applied to Table 2.2 (Rain and Commute) Compute E(Y ) The long-commute rate is the fraction of days that have long commutes. Show that the long-commute rate is given by 1 E(Y ). Calculate E(Y X = 1) and E(Y X = 0). Calculate the long-commute rate for (i) non-rainy days and (ii) rainy days. A randomly selected day was a long commute. What is the probability that it was a non-rainy day? a rainy day? Are weather and commute time independent? Explain.

19 Exercise 2.3 applied to Table 2.2 (Rain and Commute) Compute E(Y ) E(Y ) = 0 Pr(Y = 0) + 1 Pr(Y = 1) = = 0.78 The long-commute rate is the fraction of days that have long commutes. Show that the long-commute rate is given by 1 E(Y ). Create a long-commute random variable, W. Let W 1 Y E(W ) = E(1 Y ) = 1 E(Y ) For discussion: why expected value, not probability?

20 Calculate E(Y X = 1) and E(Y X = 0). E(Y X = 1) = 0 Pr(Y = 0 X = 1) + 1 Pr(Y = 1 X = 1) Pr(Y = 0 X = 1) = Pr(Y = 1 X = 1) = Pr(Y = 0, X = 1) Pr(X = 1) Pr(Y = 1, X = 1) Pr(X = 1) = = 0.1 = = 0.9 E(Y X = 1) = = 0.9 What does this mean in words?

21 E(Y X = 0) = 0 Pr(Y = 0 X = 0) + 1 Pr(Y = 1 X = 0) Pr(Y = 0 X = 0) = Pr(Y = 1 X = 0) = Pr(Y = 0, X = 0) Pr(X = 0) Pr(Y = 1, X = 0) Pr(X = 0) = = 0.5 = = 0.5 E(Y X = 0) = = 0.5 What does this mean in words? Calculate the long-commute rate for (i) non-rainy days and (ii) rainy days. (i) What is the term that we want to compute? E(W X = 1) = 1 E(Y X = 1) = 0.1 (ii) What is the term that we want to compute? E(W X = 0) = 1 E(Y X = 0) = 0.5

22 A randomly selected day was a long commute. What is the probability that it was a non-rainy day? a rainy day? What is the term that we want to compute? Pr(X = 1, Y = 0) Pr(X = 1 Y = 0) = Pr(Y = 0) = What is the term that we want to compute? Pr(X = 0, Y = 0) Pr(X = 0 Y = 0) = Pr(Y = 0) = Are weather and commute time independent? Explain.

23 Covariance Covariance is another mean: The expected value of the product of the deviation of Y from its mean and the deviation of X from its mean. cov(x, Y ) = k l (x j µ X )(y i µ Y ) Pr(X = x j, Y = y i ) i=1 j=1 Observations This is another adding-up ( ) over all the possible outcomes weighted by the likelihood of each outcome Focus on the key term: (x j µ X )(y i µ Y )

24 Interpreting covariance (x j µ X )(y i µ Y ) Are cases where X is above its mean usually paired with cases where Y is above its mean? (If so, then it will also be true that cases where X is below its mean will usually be paired with cases where Y is below its mean.) In this case, the key term will be positive because times is positive and times is positive. Are cases where X is above its mean usually paired with cases where Y is below its mean? (If so, then it will also be true that cases where X is below its mean will usually be paired with cases where Y is above its mean.) In this case, the key term will be negative because times is negative and times is negative.

25 Summary of covariance: Very Important Positive covariance means that X and Y are typically big together or small together. Negative covariance means that when X is big, Y is small (and vice versa).

26 Units and Correlation Covariance has awkward units (units of X units of Y ). A convenient division gives a unitless measure that is bounded between 1 and +1: corr(x, Y ) = cov(x, Y ) s.d.(x ) s.d.(y ) (Recall that s.d.(x ) is measured in units of X and s.d.(y ) is measured in units of Y.) Correlation near +1 means that X and Y are typically big together or small together. Correlation near 1 means that when X is big, Y is small (and vice versa).

27 Mean and Variance of Sums of R.V. s See Key Concept 2.3 Suppose that in a sample of couples X is income earned by the first partner and Y is income earned by the other partner. Household income is defined as the sum of these incomes, or X + Y. The mean value of household income is the sum of the mean value of the first person s earnings and the mean value of the second person s earnings: E (X + Y ) = E(X ) + E(Y ) = µ X + µ Y

28 Mean and Variance of Sums of R.V. s: Example The variance of household income, an interesting measure of inter-household inequality, is more complicated: var (X + Y ) = var(x ) + var(y ) + 2cov(X, Y ) = σ 2 X + σ2 Y + 2σ XY The spread of household income depends on the spread of income for each of the earners and whether high earners are paired with high earners or high earners are paired with low earners. (Can you think of economic or sociological reasons to expect cov(x, Y ) to be positive or negative? What about change over time?)

Covariance and Correlation

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such