Chapter 5: Multivariate Distributions

Transcription

1 Chapter 5: Multivariate Distributions Professor Ron Fricker Naval Postgraduate School Monterey, California 3/15/15 Reading Assignment: Sections

2 Goals for this Chapter Bivariate and multivariate probability distributions Bivariate normal distribution Multinomial distribution Marginal and conditional distributions Independence, covariance and correlation Expected value & variance of a function of r.v.s Linear functions of r.v.s in particular Conditional expectation and variance 3/15/15 2

3 Section 5.1: Bivariate and Multivariate Distributions A bivariate distribution is a probability distribution on two random variables I.e., it gives the probability on the simultaneous outcome of the random variables For example, when playing craps a gambler might want to know the probability that in the simultaneous roll of two dice each comes up 1 Another example: In an air attack on a bunker, the probability the bunker is not hardened and does not have SAM protection is of interest A multivariate distribution is a probability distribution for more than two r.v.s 3/15/15 3

4 Joint Probabilities Bivariate and multivariate distributions are joint probabilities the probability that two or more events occur It s the probability of the intersection of n 2 events: {Y 1 = y 1 }, {Y 2 = y 2 },...,{Y n = y n } We ll denote the joint (discrete) probabilities as P (Y 1 = y 1,Y 2 = y 2,...,Y n = y n ) We ll sometimes use the shorthand notation p(y 1,y 2,...,y n ) This is the probability that the event {Y 1 = y 1 } and the event {Y 2 = y 2 } and and the event {Y n = y n } all occurred simultaneously 3/15/15 4

5 Section 5.2: Bivariate and Multivariate Distributions Simple example of a bivariate distribution arises when throwing two dice Let Y 1 be the number of spots on die 1 Let Y 2 be the number of spots on die 2 Then there are 36 possible joint outcomes (remember the mn rule: 6 x 6 = 36) The outcomes (y 1, y 2 ) are (1,1), (1,2), (1,3),, (6,6) Assuming the dice are independent, all the outcomes are equally likely, so p(y 1,y 2 )=P(Y 1 = y 1,Y 2 = y 2 )=1/36, y 1 =1, 2,...,6, y 2 =1, 2,...,6 3/15/15 5

6 Bivariate PMF for the Example 3/15/15 6 Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7 th edition, Thomson Brooks/Cole.

7 Defining Joint Probability Functions (for Discrete Random Variables) Definition 5.1: Let Y 1 and Y 2 be discrete random variables. The joint (or bivariate) probability (mass) function for Y 1 and Y 2 is P (Y 1 = y 1,Y 2 = y 2 ), 1<y 1 < 1, 1 <y 2 < 1 Theorem 5.1: If Y 1 and Y 2 are discrete r.v.s with joint probability function p(y 1,y 2 ), then 1. p(y 1,y 2 ) 0 for all y 1,y 2 X 2. p(y 1,y 2 )=1, where the sum is over all y 1,y 2 non-zero p(y 1, y 2 ) 3/15/15 7

8 Back to the Dice Example Find P (2 apple Y 1 apple 3, 1 apple Y 2 apple 2). Solution: 3/15/15 8

9 Textbook Example 5.1 A NEX has 3 checkout counters. Two customers arrive independently at the counters at different times when no other customers are present. Let Y 1 denote the number (of the two) who choose counter 1 and let Y 2 be the number who choose counter 2. Find the joint distribution of Y 1 and Y 2. Solution: 3/15/15 9

10 Textbook Example 5.1 Solution Continued 3/15/15 10

11 Joint Distribution Functions Definition 5.2: For any random variables (discrete or continuous) Y 1 and Y 2, the joint (bivariate) (cumulative) distribution function is F (y 1,y 2 )=P(Y 1 apple y 1,Y 2 apple y 2 ), 1 <y 1 < 1, 1 <y 2 < 1, For two discrete r.v.s Y 1 and Y 2, F (y 1,y 2 )= X X t 1 appley 1 t 2 appley 2 p(t 1,t 2 ) 3/15/15 11

12 Back to the Dice Example Find F (2, 3) = P (Y 1 apple 2,Y 2 apple 3). Solution: 3/15/15 12

13 Textbook Example 5.2 Back to Example 5.1. Find F ( 1, 2),F(1.5, 2) and F (5, 7). Solution: 3/15/15 13

14 Properties of Joint CDFs Theorem 5.2: If Y 1 and Y 2 are random variables with joint cdf F(y 1, y 2 ), then 1. F ( 1, 1) =F (y 1, 1) =F ( 1,y 2 )=0 2. F (1, 1) =1 3. If y1 y 1 and y2 y 2, then F (y 1,y 2) F (y 1,y 2 ) F (y 1,y 2)+F (y 1,y 2 ) 0 Property 3 follows because F (y 1,y 2) F (y 1,y 2 ) F (y 1,y 2)+F (y 1,y 2 ) = P (y 1 apple Y 1 apple y 1,y 2 apple Y 2 apple y 2) 0 3/15/15 14

15 Joint Probability Density Functions Definition 5.3: Let Y 1 and Y 2 be continuous random variables with joint distribution function F(y 1, y 2 ). If there exists a nonnegative function f (y 1, y 2 ) such that Z y1 Z y2 F (y 1,y 2 )= f(t 1,t 2 )dt 1 dt 2 t 1 = 1 t 2 = 1 for all 1 <y 1 < 1, 1 <y 2 < 1, then Y 1 and Y 2 are said to be jointly continuous random variables. The function f (y 1, y 2 ) is called the joint probability density function. 3/15/15 15

16 Properties of Joint PDFs Theorem 5.3: If Y 1 and Y 2 are random variables with joint pdf f (y 1, y 2 ), then 1. f(y 1,y 2 ) 0 for all y 1, y 2 2. Z 1 1 Z 1 1 f(y 1,y 2 )dy 1 dy 2 =1 f (y 1, y 2 ) An illustrative joint pdf, which is a surface in 3 dimensions: y 2 3/15/15 y 1 16

17 Volumes Correspond to Probabilities For jointly continuous random variables, volumes under the pdf surface correspond to probabilities E.g., for the pdf in Figure 5.2, the probability P (a 1 apple Y 1 apple a 2,b 1 apple Y 2 apple b 2 ) corresponds to the volume shown It s equal to Z b2 b 1 Z a2 a 1 f(y 1,y 2 )dy 1 dy 2 3/15/15 17 Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7 th edition, Thomson Brooks/Cole.

18 Textbook Example 5.3 Suppose a radioactive particle is located in a square with sides of unit length. Let Y 1 and Y 2 denote the particle s location and assume it is uniformly distributed in the square: 1, 0 apple y1 apple 1, 0 apple y f(y 1,y 2 )= 2 apple 1 0, elsewhere a. Sketch the pdf b. Find F(0.2,0.4) c. Find P (0.1 apple Y 1 apple 0.3, 0.0 apple Y 2 apple 0.5) 3/15/15 18

19 Textbook Example 5.3 Solution 3/15/15 19

20 Textbook Example 5.3 Solution (cont d) 3/15/15 20

21 3/15/15 21 Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7 th edition, Thomson Brooks/Cole.

22 Textbook Example 5.4 Gasoline is stored on a FOB in a bulk tank. Let Y 1 denote the proportion of the tank available at the beginning of the week after restocking. Let Y 2 denote the proportion of the tank that is dispensed over the week. Note that Y 1 and Y 2 must be between 0 and 1 and y 2 must be less than or equal to y 1. Let the joint pdf be 3y1, 0 apple y f(y 1,y 2 )= 2 apple y 1 apple 1 0, elsewhere a. Sketch the pdf b. Find P (0 apple Y 1 apple 0.5, 0.25 apple Y 2 ) 3/15/15 22

23 Textbook Example 5.4 Solution First, sketching the pdf: 3/15/15 23

24 3/15/15 24 Source: Wackerly, D.D., W.M. Mendenhall III, and R.L. Scheaffer (2008). Mathematical Statistics with Applications, 7 th edition, Thomson Brooks/Cole.

25 Textbook Example 5.4 Solution Now, to solve for the probability: 3/15/15 25

26 Calculating Probabilities from Multivariate Distributions Finding probabilities from a bivariate pdf requires double integration over the proper region of distributional support For multivariate distributions, the idea is the same you just have to integrate over n dimensions: P (Y 1 apple y 1,Y 2 apple y 2,...,Y n apple y n )=F (y 1,y 2,...,y n ) = Z y1 1 Z y2 1 Z yn 1 f(t 1,t 2,...,t n )dt n...dt 1 3/15/15 26

27 Section 5.2 Homework Do problems 5.1, 5.4, 5.5, 5.9 3/15/15 27

28 Section 5.3: Marginal and Conditional Distributions Marginal distributions connect the concept of (bivariate) joint distributions to univariate distributions (such as those in Chapters 3 & 4) As we ll see, in the discrete case, the name marginal distribution follows from summing across rows or down columns of a table Conditional distributions are what arise when, in a joint distribution, we fix the value of one of the random variables The name follows from traditional lexicon like The distribution of Y 1, conditional on Y 2 = y 2, is 3/15/15 28

29 An Illustrative Example of Marginal Distributions Remember the dice tossing experiment of the previous section, where we defined Y 1 to be the number of spots on die 1 Y 2 to be the number of spots on die 2 There are 36 possible joint outcomes (y 1, y 2 ), which are (1,1), (1,2), (1,3),, (6,6) Assuming the dice are independent, the joint distribution is p(y 1,y 2 )=P(Y 1 = y 1,Y 2 = y 2 )=1/36, y 1 =1, 2,...,6, y 2 =1, 2,...,6 3/15/15 29

30 An Illustrative Example (cont d) Now, we might want to know the probability of the univariate event P (Y 1 = y 1 ) Given that all of the joint events are mutually exclusive, we have that 6X P (Y 1 = y 1 )= p(y 1,y 2 ) For example, P (Y 1 = 1) = p 1 (1) = y 2 =1 6X p(1,y 2 ) y 2 =1 = p(1, 1) + p(1, 2) + p(1, 3) + p(1, 4) + p(1, 5) + p(1, 6) =1/36 + 1/36 + 1/36 + 1/36 + 1/36 + 1/36 = 1/6 3/15/15 30

31 An Illustrative Example (cont d) In tabular form: y 1 y Total 1 1/36 1/36 1/36 1/36 1/36 1/36 1/6 2 1/36 1/36 1/36 1/36 1/36 1/36 1/6 3 1/36 1/36 1/36 1/36 1/36 1/36 1/6 4 1/36 1/36 1/36 1/36 1/36 1/36 1/6 5 1/36 1/36 1/36 1/36 1/36 1/36 1/6 6 1/36 1/36 1/36 1/36 1/36 1/36 1/6 Total 1/6 1/6 1/6 1/6 1/6 1/6 The marginal distribution of Y 2 : p 2 (y 2 ) The marginal distribution of Y 1 : p 1 (y 1 ) The joint distribution of Y 1 and Y 2 : p(y 1,y 2 ) 3/15/15 31

32 Defining Marginal Probability Functions Definition 5.4a: Let Y 1 and Y 2 be discrete random variables with pmf p(y 1, y 2 ). The marginal probability functions of Y 1 and Y 2, respectively, are p 1 (y 1 )= X p(y 1,y 2 ) all y 2 and p 2 (y 2 )= X p(y 1,y 2 ) all y 1 3/15/15 32

33 Defining Marginal Density Functions Definition 5.4b: Let Y 1 and Y 2 be jointly continuous random variables with pdf f (y 1, y 2 ). The marginal density functions of Y 1 and Y 2, respectively, are Z 1 f 1 (y 1 )= f(y 1,y 2 )dy 2 1 and f 2 (y 2 )= Z 1 1 f(y 1,y 2 )dy 1 3/15/15 33

34 Textbook Example 5.5 From a group of three Navy officers, two Marine Corps officers, and one Army officer, an amphibious assault planning team of two officers is to be randomly selected. Let Y 1 denote the number of Navy officers and Y 2 denote the number of Marine Corps officers on the planning team. Find the joint probability function of Y 1 and Y 2 and then find the marginal pmf of Y 1. Solution: 3/15/15 34

36 Textbook Example 5.6 Let Y 1 and Y 2 have joint density 2y1, 0 apple y f(y 1,y 2 )= 1 apple 1, 0 apple y 2 apple 1 0, elsewhere Sketch f (y 1,y 2 ) and find the marginal densities for Y 1 and Y 2. Solution: 3/15/15 36

38 Conditional Distributions The idea of conditional distributions is that we know some information about one (or more) of the jointly distributed r.v.s This knowledge changes our probability calculus Example: An unconditional distribution: The distribution of heights on a randomly chosen NPS student A conditional distribution: Given the randomly chosen student is female, the distribution of heights The information that the randomly chosen student is female changes the distribution of heights 3/15/15 38

39 Connecting Back to Event-based Probabilities Remember from Chapter 2, for events A and B, the probability of the intersection is P (A \ B) =P (A)P (B A) Now, consider two numerical events, (Y 1 = y 1 ) and (Y 2 = y 2 ), where can can then write p(y 1,y 2 )=p 1 (y 1 )p(y 2 y 1 ) = p 2 (y 2 )p(y 1 y 2 ) where the interpretation of p(y 1 y 2 ) is the probability that r.v. Y 1 equals y 1 given that Y 2 takes on value y 2 3/15/15 39

40 Defining Conditional Probability Functions Definition 5.5: If Y 1 and Y 2 are jointly discrete r.v.s with pmf p(y 1, y 2 ) and marginal pmfs p 1 (y 1 ) and p 2 (y 2 ), respectively, then the conditional probability function of Y 1 given Y 2 is P (Y 1 = y 1 Y 2 = y 2 )= P (Y 1 = y 1,Y 2 = y 2 ) P (Y 2 = y 2 ) = p(y 1,y 2 ) = p(y 1 y 2 ) p 2 (y 2 ) Note that p(y 1 y 2 ) is not defined if p 2 (y 2 )=0 The conditional probability function of Y 2 given Y 1 is similarly defined 3/15/15 40

41 Textbook Example 5.7 Back to Example 5.5: Find the conditional distribution of Y 1 given Y 2 =1. That is, find the conditional distribution of the number of Navy officers on the planning team given that we re told one Marine Corps officer is on the team. Solution: 3/15/15 41

43 Generalizing to Continuous R.V.s Defining a conditional probability function for continuous r.v.s is problematic because P (Y 1 = y 1 Y 2 = y 2 ) is undefined since both (Y 1 = y 1 ) and (Y 2 = y 2 ) are events with zero probability The problem, of course, is that for continuous random variables, the probability the r.v. takes on any particular value is zero So, generalizing Definition 5.5 doesn t work because the denominator is always zero But we can make headway by using a different approach 3/15/15 43

44 Defining Conditional Distribution Functions Definition 5.6: Let Y 1 and Y 2 be jointly continuous random variables with pdf f (y 1, y 2 ). The conditional distribution function of Y 1 given Y 2 = y 2 is F (y 1 y 2 )=P (Y 1 apple y 1 Y 2 = y 2 ) And, with a bit of derivation (see the text), we have that Z y1 f(t 1,y 2 ) F (y 1 y 2 )= f 2 (y 2 ) dt 1 From this, we can define conditional pdfs 3/15/

45 Defining Conditional Density Functions Definition 5.7: Let Y 1 and Y 2 be jointly continuous r.v.s with pdf f (y 1, y 2 ) and marginal densities f 1 (y 1 ) and f 2 (y 2 ), respectively. For any y 2 such that f 2 (y 2 ) > 0, the conditional density function of Y 1 given Y 2 = y 2 is f(y 1 y 2 )= f(y 1,y 2 ) f 2 (y 2 ) The conditional pdf of Y 2 given Y 1 = y 1 is similarly defined as f(y 2 y 1 )=f(y 1,y 2 )/f 1 (y 1 ) 3/15/15 45

46 Textbook Example 5.8 A soft drink machine has a random amount Y 2 (in gallons) in supply at the beginning of the day and dispenses a random amount Y 1 during the day. It is not resupplied during the day, so Y 1 apple Y 2, and the joint pdf is f(y 1,y 2 )= 1/2, 0 apple y1 apple y 2 apple 2 0, elsewhere What is the probability that less than ½ gallon will be sold given that ( conditional on ) the machine containing 1.5 gallons at the start of the day? 3/15/15 46

47 Textbook Example 5.8 Solution Solution: 3/15/15 47

49 Section 5.3 Homework Do problems 5.19, 5.22, /15/15 49

50 Section 5.4: Independent Random Variables In Chapter 2 we defined the independence of events here we now extend that idea to the independence of random variables For two jointly distributed random variables Y 1 and Y 2, the probabilities associated with Y 1 are the same regardless of the observed value of Y 2 The idea is that learning something about Y 2 does not tell you anything, probabilistically speaking, about the distribution of Y 1 3/15/15 50

51 The Idea of Independence Remember that two events A and B are independent if P (A \ B) =P (A) P (B) Now, when talking about r.v.s, we re often interested in events like for some numbers a<b,c<d So, for consistency with the notion of independence of events, what we want is for any real numbers P (a <Y 1 apple b \ c<y 2 apple d) P (a <Y 1 apple b, c < Y 2 apple d) =P (a <Y 1 apple b) P (c <Y 2 apple d) a<b,c<d That is, if Y 1 and Y 2 are independent, then the joint probabilities are the product of the marginal probs 3/15/15 51

52 Defining Independence for R.V.s Definition 5.8: Let Y 1 have cdf F 1 (y 1 ), Y 2 have cdf F 2 (y 2 ), and Y 1 and Y 2 have joint cdf F(y 1,y 2 ). Then Y 1 and Y 2 are independent iff F (y 1,y 2 )=F 1 (y 1 )F 2 (y 2 ) for every pair of real numbers (y 1,y 2 ). If Y 1 and Y 2 are not independent, then they are said to be dependent It is hard to demonstrate independence according to this definition The next theorem will make it easier to determine if two r.v.s are independent 3/15/15 52

53 Determining Independence Theorem 5.4: If Y 1 and Y 2 are discrete r.v.s with joint pmf p(y 1,y 2 ) and marginal pmfs p 1 (y 1 ) and p 2 (y 2 ), then Y 1 and Y 2 are independent iff p(y 1,y 2 )=p 1 (y 1 )p 2 (y 2 ) for all pairs of real numbers (y 1,y 2 ). If Y 1 and Y 2 are continuous r.v.s with joint pdf f (y 1,y 2 ) and marginal pdfs f 1 (y 1 ) and f 2 (y 2 ), then Y 1 and Y 2 are independent iff f(y 1,y 2 )=f 1 (y 1 )f 2 (y 2 ) for all pairs of real numbers (y 1,y 2 ). 3/15/15 53

54 Textbook Example 5.9 For the die-tossing problem of Section 5.2, show that Y 1 and Y 2 are independent. Solution: 3/15/15 54

55 Textbook Example 5.10 In Example 5.5, is the number of Navy officers independent of the number of Marine Corps officers? (I.e., is Y 1 independent of Y 2?) Solution: 3/15/15 55

56 Textbook Example 5.11 Let 6y1 y f(y 1,y 2 )= 2, 2 0 apple y 1 apple 1, 0 apple y 2 apple 1 0, elsewhere Show that Y 1 and Y 2 are independent. Solution: 3/15/15 56

58 Textbook Example 5.12 Let 2, 0 apple y2 apple y f(y 1,y 2 )= 1 apple 1 0, elsewhere Show that Y 1 and Y 2 are dependent. Solution: 3/15/15 58

60 A Useful Theorem for Determining Independence of Two R.V.s Theorem 5.5: Let Y 1 and Y 2 have joint density f (y 1,y 2 ) that is positive iff a apple y 1 apple b and c apple y 2 apple d, for constants a, b, c, and d; and f (y 1,y 2 ) = 0 otherwise. Then Y 1 and Y 2 are independent r.v.s iff f(y 1,y 2 )=g(y 1 )h(y 2 ) where g(y 1 ) is a nonnegative function of y 1 alone and h(y 2 ) is a nonnegative function of y 2 alone Note that g(y 1 ) and h(y 2 ) do not have to be densities! 3/15/15 60

61 Textbook Example 5.13 Let Y 1 and Y 2 have joint density 2y1, 0 apple y f(y 1,y 2 )= 1 apple 1, 0 apple y 2 apple 1 0, elsewhere Are Y 1 and Y 2 independent? Solution: 3/15/15 61

63 Textbook Example 5.14 Referring back to Example 5.4, is Y 1, the proportion of the tank available at the beginning of the week, independent of Y 2, the proportion of the tank that is dispensed over the week? Remember, the joint pdf is Solution: f(y 1,y 2 )= 3y1, 0 apple y 2 apple y 1 apple 1 0, elsewhere 3/15/15 63

66 Section 5.5: Expected Value of a Function of R.V.s Definition 5.9: Let g(y 1, Y 2,, Y k ) be a function of the discrete random variables Y 1, Y 2,, Y k, which have pmf p(y 1, y 2,, y k ). Then the expected value of g(y 1, Y 2,, Y k ) is E [g(y 1,Y 2,...,Y k )] = X X X g(y 1,y 2,...,y k )p(y 1,y 2,...,y k ) all y k all y 2 all y 1 If Y 1,Y 2,,Y k are continuous with pdf f (y 1, y 2,, y k ), then Z 1 1 Z 1 1 Z 1 1 E [g(y 1,Y 2,...,Y k )] = g(y 1,y 2,...,y k )f(y 1,y 2,...,y k )dy 1 dy 2...dy k 3/15/15 66

67 Textbook Example 5.15 Let Y 1 and Y 2 have joint density 2y1, 0 apple y f(y 1,y 2 )= 1 apple 1, 0 apple y 2 apple 1 0, elsewhere Find E(Y 1 Y 2 ). Solution: 3/15/15 67

68 Note the Consistency in Univariate and Bivariate Expectation Definitions Let g(y 1, Y 2 ) = Y 1. Then we have E [g(y 1,Y 2 )] = = = = Z 1 Z Z 1 Z Z 1 Z 1 1 y 1 applez g(y 1,y 2 )f(y 1,y 2 )dy 2 dy 1 y 1 f(y 1,y 2 )dy 2 dy 1 f(y 1,y 2 )dy 2 dy 1 y 1 f 1 (y 1 )dy 1 = E(Y 1 ) 3/15/15 68

69 Textbook Example 5.16 Let Y 1 and Y 2 have joint density 2y1, 0 apple y f(y 1,y 2 )= 1 apple 1, 0 apple y 2 apple 1 0, elsewhere Find E(Y 1 ). Solution: 3/15/15 69

70 Textbook Example 5.17 For Y 1 and Y 2 with joint density as described in Example 5.15, in Figure 5.6 it appears that the mean of Y 2 is 0.5. Find E(Y 2 ) and evaluate whether the visual estimate is correct or not. Solution: 3/15/15 70

71 Textbook Example 5.18 Let Y 1 and Y 2 have joint density 2y1, 0 apple y f(y 1,y 2 )= 1 apple 1, 0 apple y 2 apple 1 0, elsewhere Find V(Y 1 ). Solution: 3/15/15 71

72 Textbook Example 5.19 An industrial chemical process yields a product with two types of impurities. In a given sample, let Y 1 denote the proportion of impurities in the sample and let Y 2 denote the proportion of type I impurities (out of all the impurities). The joint density of is Y 1 and Y 2 is 2(1 y1 ), 0 apple y f(y 1,y 2 )= 1 apple 1, 0 apple y 2 apple 1 0, elsewhere Find the expected value of the proportion of type I impurities in the sample. 3/15/15 72

73 Textbook Example 5.19 Solution 3/15/15 73

74 Section 5.6: Special Theorems The univariate expectation results from Sections 3.3 and 4.3 generalize directly to joint distributions of random variables Theorem 5.6: Let c be a constant. Then E(c) =c Theorem 5.7: Let g(y 1,Y 2 ) be a function of the random variables Y 1 and Y 2 and let c be a constant. Then E [cg(y 1,Y 2 )] = ce [g(y 1,Y 2 )] 3/15/15 74

75 Theorem 5.8: Let Y 1 and Y 2 be random variables and g 1 (Y 1,Y 2 ), g 2 (Y 1,Y 2 ),, g k (Y 1,Y 2 ) be functions of Y 1 and Y 2. Then E [g 1 (Y 1,Y 2 )+g 2 (Y 1,Y 2 )+ + g k (Y 1,Y 2 )] = E [g 1 (Y 1,Y 2 )] + E [g 2 (Y 1,Y 2 )] + + E [g k (Y 1,Y 2 )] We won t prove these proofs follow directly from the way we proved the results of Sections 3.3 and 4.3 3/15/15 75

76 Textbook Example 5.20 Referring back to Example 5.4, find expected value of Y 1 -Y 2, where 3y1, 0 apple y f(y 1,y 2 )= 2 apple y 1 apple 1 0, elsewhere Solution: 3/15/15 76

78 A Useful Theorem for Calculating Expectations Theorem 5.9: Let Y 1 and Y 2 be independent r.v.s and g(y 1 ) and h(y 2 ) be functions of only Y 1 and Y 2, respectively. Then E [g(y 1 )h(y 2 )] = E [g(y 1 )] E [h(y 2 )] provided the expectations exist. Proof: 3/15/15 78

79 Proof of Theorem 5.9 Continued 3/15/15 79

80 Textbook Example 5.21 Referring back to Example 5.19, with the pdf below, Y 1 and Y 2 are independent. f(y 1,y 2 )= Find E(Y 1 Y 2 ). Solution: 2(1 y1 ), 0 apple y 1 apple 1, 0 apple y 2 apple 1 0, elsewhere 3/15/15 80

82 Section 5.5 and 5.6 Homework Do problems 5.72, 5.74, /15/15 82

83 Section 5.7: The Covariance of Two Random Variables When looking at two random variables, a common question is whether they are associated with one-another That is, for example, as one tends to increase, does the other do so as well? If so, the colloquial terminology is to say that the two variables are correlated Correlation is also a technical term used to describe the association But it has a precise definition (that is more restrictive than the colloquial use of the term) And there is a second measure from which correlation is derived: covariance 3/15/15 83

84 Covariance and Correlation Covariance and correlation are measures of dependence between two random variables The figure below shows two extremes: perfect dependence and independence 3/15/15 84

85 Defining Covariance Definition 5.10: If Y 1 and Y 2 are r.v.s with means µ 1 and µ 2, respectively, then the covariance of Y 1 and Y 2 is defined as Cov(Y 1,Y 2 )=E[(Y 1 µ 1 )(Y 2 µ 2 )] Idea: For these observations (Y 1 µ 1 )(Y 2 µ 2 ) > 0 and for these observations (Y 1 µ 1 )(Y 2 µ 2 ) > 0 So expected value is positive And note that if the observations had fallen around a line with a negative slope then the expected value would be negative. 3/15/15 85

86 An Example of Low Covariance For observations in this quadrant (Y 1 µ 1 )(Y 2 µ 2 ) < 0 For observations in this quadrant (Y 1 µ 1 )(Y 2 µ 2 ) > 0 For observations in this quadrant (Y 1 µ 1 )(Y 2 µ 2 ) > 0 For observations in this quadrant (Y 1 µ 1 )(Y 2 µ 2 ) < 0 So the positives and negatives balance out and the expected value is very small to zero. 3/15/15 86

87 Understanding Covariance Covariance gives the strength and direction of linear relationship between Y 1 and Y 2 : Strong-positive: large positive covariance Strong-negative: large negative covariance Weak positive: small positive covariance Weak negative: small negative covariance But what is large and small? It depends on the measurement units of Y 1 and Y 2 3/15/15 87

88 Big Covariance? Height in inches parents ht Height in feet parent ft Cov = 1.46 Cov = Same picture, different scale, different covariance 3/15/15 88

89 Correlation of Random Variables The correlation (coefficient) variables is defined as: = Cov(Y 1,Y 2 ) 1 2 of two random As with covariance, correlation is a measure of the dependence of two r.v.s Y 1 and Y 2 But note that it s re-scaled to be unit-free and measurement invariant In particular, for any random variables Y 1 and Y 2, 1 apple apple +1 3/15/15 89

90 Correlation is Unitless /15/15 Height in inches parents ht Height in feet parent ft =0.339 =

91 Interpreting the Correlation Coefficient A positive correlation coefficient ( > 0) means there is a positive association between Y 1 and Y 2 As Y 1 increases Y 2 also tends to increase A negative correlation coefficient ( < 0) means there is a negative association between Y 1 and Y 2 As Y 1 increases Y 2 tends to decrease A correlation coefficient equal to 0 ( =0) means there is no linear association between Y 1 and Y 2 3/15/15 91

92 Interactive Correlation Demo From R (with the shiny package installed), run: shiny::rungithub('correlationdemo','rdfricker') 3/15/15 92

93 An Alternative Covariance Formula Theorem 5.10: If Y 1 and Y 2 are r.v.s with means µ 1 and µ 2, respectively, then Cov(Y 1,Y 2 )=E[(Y 1 µ 1 )(Y 2 µ 2 )] = E(Y 1 Y 2 ) E(Y 1 )E(Y 2 ) Proof: 3/15/15 93

94 Textbook Example 5.22 Referring back to Example 5.4, find the covariance between Y 1 and Y 2, where 3y1, 0 apple y f(y 1,y 2 )= 2 apple y 1 apple 1 0, elsewhere Solution: 3/15/15 94

96 Textbook Example 5.23 Let Y 1 and Y 2 have the following joint pdf 2y1, 0 apple y f(y 1,y 2 )= 1 apple 1, 0 apple y 2 apple 1 0, elsewhere Find the covariance of Y 1 and Y 2. Solution: 3/15/15 96

97 Correlation and Independence Theorem 5.11: If Y 1 and Y 2 are independent r.v.s, then Cov(Y 1,Y 2 )=0. That is, independent r.v.s are uncorrelated. Proof: 3/15/15 97

98 Textbook Example 5.24 Let Y 1 and Y 2 be discrete r.v.s with joint pmf as shown in Table 5.3. Show that Y 1 and Y 2 are dependent but have zero covariance. Solution: 3/15/15 98

100 Important Take-Aways High correlation merely means that there is a strong linear relationship Not that Y 1 causes Y 2 or Y 2 causes Y 1 E.g., there could be a third factor that causes both Y 1 and Y 2 to move in the same direction No causal connection between Y 1 and Y 2 But high correlation Remember: ü High correlation does not imply causation ü Zero correlation does not mean there is no relationship between Y 1 and Y 2 3/15/15 100

101 Section 5.7 Homework Do problems 5.89, 5.91, 5.92, /15/15 101

102 The Expected Value and Variance of Linear Functions of R.V.s (Section 5.8) In OA3102 you will frequently encounter estimators that are linear combinations of random variables For example, nx U 1 = a 1 Y 1 + a 2 Y a n Y n = a i Y i i=1 where the Y i s are r.v.s and a i s are constants Thus, for example, we might want to know the expected value of U 1, or the variance of U 1, or perhaps the covariance between U 1 and U 2 3/15/15 102

103 Expected Value, Variance, and Covariance of Linear Combinations of R.V.s Theorem 5.12: Let Y 1, Y 2,, Y n and X 1, X 2,, X m be r.v.s with E(Y 1 )=µ i and E(X 1 )= i. Define nx mx U 1 = a i Y i and U 2 = b j X j for constants a 1,, a n and b 1,, b m. Then: nx a. E(U 1 )= a i µ i and similarly E(U 2 )= b. V (U 1 )= i=1 i=1 with a similar expression for j=1 nx b j j j=1 nx a 2 i V (Y i )+2 XX a ia j Cov(Y i,y j ) 1applei<japplen i=1 V (U 2 ) 3/15/15 103

104 Expected Value, Variance, and Covariance of Linear Combinations of R.V.s (cont d) nx mx c. And, Cov(U 1,U 2 )= a i b j Cov(Y i,x j ) i=1 j=1 How will this be relevant? In OA3102 you will need to estimate the population mean using data µ You will use the sample average as an estimator, which is defined as Ȳ = 1 n This is a linear combination of r.v.s with constants a i =1/n for all i nx i=1 Y i 3/15/15 104

105 Textbook Example 5.25 Let Y 1, Y 2, and Y 3 be r.v.s, where: E(Y 1 )=1,E(Y 2 )=2,E(Y 3 )= 1 V (Y 1 )=1,V(Y 2 )=3,V(Y 3 )=5 Cov(Y 1,Y 2 )= 0.4, Cov(Y 1,Y 3 )=0.5, Cov(Y 2,Y 3 )=2 Find the expected value and variance of U = Y 1-2Y 2 +Y 3. If W = 3Y 1 + Y 2, find Cov(U,W). Solution: 3/15/15 105

108 Proof of Theorem 5.12 Now, let s prove Theorem 5.12, including parts a and b 3/15/15 108

111 Textbook Example 5.26 Referring back to Examples 5.4 and 5.20, for the joint distribution 3y1, 0 apple y f(y 1,y 2 )= 2 apple y 1 apple 1 0, elsewhere find the variance of Y 1 - Y 2. Solution: 3/15/15 111

114 Textbook Example 5.27 Let Y 1, Y 2,, Y n be independent random variables with E(Y i )=µ and V (Y i )= 2. Show that E(Ȳ )=µ and V (Ȳ )= 2 /n for Solution: Ȳ = 1 n nx i=1 Y i 3/15/15 114

116 Textbook Example 5.28 The number of defectives Y in a sample of n = 10 items sampled from a manufacturing process follows a binomial distribution. An estimator for the fraction defective in the lot is the random variable ˆp = Y/n. Find the expected value and variance of ˆp. Solution: 3/15/15 116

118 Textbook Example 5.29 Suppose that an urn contains r red balls and (N-r) black balls. A random sample of n balls is drawn without replacement and Y, the number of red balls in the sample, is observed. From Chapter 3 we know that Y has a hypergeometric distribution. Find E(Y ) and V (Y ). Solution: 3/15/15 118

121 Section 5.8 Homework Do problems 5.102, 5.103, 5.106, /15/15 121

122 Section 5.9: The Multinomial Distribution In a binomial experiment, there are n trials, each with a binary outcome of equal probability But there are many situations in which the number of possible outcomes per trial is greater than two When testing a missile, the outcomes may be miss, partial kill, and complete kill When randomly sampling students at NPS, their affiliation may be Army, Navy, Marine Corps, US civilian, non-us military, non-us civilian, other 3/15/15 122

123 Defining a Multinomial Experiment Definition 5.11: A multinomial experiment has the following properties: 1. The experiment consists of n identical trials. 2. The outcome of each trial falls into one of k classes or cells. 3. The probability the outcome of a single trial falls into cell i, p i, i=1,,k remains the same from trial to trial and p 1 + p p k =1. 4. The trials are independent. 5. The random variables of interest are Y 1, Y 2,, Y k, the number of outcomes that fall in each of the cells, where Y 1 + Y Y k = n. 3/15/15 123

124 Defining the Multinomial Distribution Definition 5.12: Assume that p i > 0, i=1,,k are such that p 1 + p p k =1. The random variables Y 1, Y 2,, Y k have a multinomial distribution with parameters p 1, p 2,, p k if their joint distribution is n! p(y 1,y 2,...,y k )= y 1!y 2! y k! py 1 1 py 2 2 py k k kx where, for each i, y i = 0, 1, 2,,n and y i = n. i 1 Note that the binomial is just a special case of the multinomial with k = 2 3/15/15 124

125 Textbook Example 5.30 According to recent Census figures, the table below gives the proportion of US adults that fall into each age category: Age (years) Proportion If 5 adults are randomly sampled out of the population, what s the probability the sample contains one person between 18 and 24 years, two between 25 and 34, and two between 45 and 64 years? 3/15/15 125

126 Textbook Example 5.30 Solution Solution: 3/15/15 126

127 Expected Value, Variance, and Covariance of the Multinomial Distribution Theorem 5.13: If Y 1, Y 2,, Y k have a multinomial distribution with parameters n and p 1, p 2,, p k, then 1. E(Y i )=np i 2. V (Y i )=np i q i where, as always, q i = 1 p i 3. Cov(Y s,y t )= np s p t, if s 6= t Proof: 3/15/15 127

129 Application Example Under certain conditions against a particular type of target, the Tomahawk Land Attack Missile Conventional (TLAM-C) has a 50% chance of a complete target kill, a 30% chance of a partial kill, and a 20% chance of no kill/miss. In 10 independent shots, each to a different target, what s the probability: There are 8 complete kills, 2 partial kills, and 0 no kills/misses? There are at least 9 complete or partial kills? 3/15/15 129

130 Application Example Solution 3/15/15 130

131 Application Example Solution in R The first question via brute force: Now, using the dmultinom() function For the second question, reduce it to a binomial where the probability of a complete or partial kill is 0.8 3/15/15 131

132 Section 5.9 Homework Do problems 5.122, /15/15 132

133 Section 5.10: The Bivariate Normal Distribution The bivariate normal distribution is an important one, both in terms of statistics (you ll need it in OA3103), as well as probability modeling The distribution has 5 parameters: µ 1 the mean of variable Y 1 µ 2 the mean of variable Y the variance of variable Y the variance of variable Y 2 the correlation between Y 1 and Y 2 3/15/15 133

134 The Bivariate Normal PDF The bivariate normal pdf is a bit complicated It s where f(y 1,y 2 )= e Q/ p 1 2 Q = apple (y1 µ 1 ) (y 1 µ 1 )(y 2 µ 2 ) (y 2 µ 2 ) for 1 <y 1 < 1, 1 <y 2 < 1 3/15/15 134

135 Interactive Bivariate Normal Distribution Demo From R (with the shiny package installed), run: shiny::rungithub('bivariatenormdemo','rdfricker') 3/15/15 135

136 Some Facts About the Bivariate Normal If Y 1 and Y 2 are jointly distributed according to a bivariate normal distribution, then Y 1 has a (univariate) normal distribution with mean µ 1 and variance 2 1 Y 2 has a (univariate) normal distribution with mean µ 2 2 and variance 2 Cov(y 1,y 2 )= 1 2 and if Cov(y 1,y 2 )=0 (equivalently =0) then Y 1 and Y 2 are independent Note that this result is not true in general: zero correlation between any two random variables does not imply independence But it does for two normally distributed random variables! 3/15/15 136

137 Illustrative Bivariate Normal Application From R (with the shiny package installed), run: shiny::rungithub('bivariatenormexample','rdfricker') 3/15/15 137

138 The Multivariate Normal Distribution The multivariate normal distributions is a joint distribution for k > 1 random variables The pdf for a multivariate normal is f(y) = where y µ 1 exp (2 ) k/2 1/2 1 2 (y µ)0 1 (y µ) and are k-dim. vectors for the location at which to evaluate the pdf and the means, respectively is the variance-covariance matrix For the bivariate normal: y = {y 1,y 2 }, µ = {µ 1,µ 2 } apple and 2 1 = /15/15 138

139 Section 5.11: Conditional Expectations In Section 5.3 we learned about conditional pmfs and pdfs The idea: A conditional distribution is the distribution of one random variable given information about another jointly distributed random variable Conditional expectations are similar It s the expectation of one random variable given information about another jointly distributed random variable They re defined just like univariate expectations except that the conditional pmfs and pdfs are used 3/15/15 139

140 Defining the Conditional Expectation Definition 5.13: If Y 1 and Y 2 are any random variables, the conditional expectation of g(y 1 ) given that Y 2 = y 2 is defined as Z 1 E (g(y 1 ) Y 2 = y 2 )= g(y 1 )f(y 1 y 2 )dy 1 1 if Y 1 and Y 2 are jointly continuous, and E (g(y 1 ) Y 2 = y 2 )= X g(y 1 )p(y 1 y 2 ) all y 1 if Y 1 and Y 2 are jointly discrete. 3/15/15 140

141 Textbook Example 5.31 For r.v.s Y 1 and Y 2 from Example 5.8, with joint pdf 1/2, 0 apple y1 apple y f(y 1,y 2 )= 2 apple 2 0, elsewhere Find the conditional expectation of Y 1 given Y 2 = 1.5. Solution: 3/15/15 141

143 A Bit About Conditional Expectations Note that the conditional expectation of Y 1 given Y 2 = y 2 is a function of y 2 For example, in Example 5.31 we obtained E(Y 1 Y 2 = y 2 )=y 2 /2 It then follows that E(Y 1 Y 2 )=Y 2 /2 That is, the conditional expectation is a function of a random variable and so it s a random variable itself Thus, it has a mean and a variance as well 3/15/15 143

144 The Expected Value of a Conditional Expectation Theorem 5.14: Let Y 1 and Y 2 denote random variables. Then E [E(Y 1 Y 2 )] = E(Y 1 ) where the inside expectation is with respect to the conditional distribution and the outside expectation is with respect to the pdf of Y 2. Proof: 3/15/15 144

145 Proof of Theorem 5.14 Continued Note: This theorem is often referred to as the Law of Iterated Expectations Then it s also frequently written in the other direction: E(Y 1 )=E[E(Y 1 Y 2 )] 3/15/15 145

146 Textbook Example 5.32 A QC plan requires sampling n = 10 items and counting the number of defectives, Y. Assume Y has a binomial distribution with p, the probability of observing a defective. But p varies from day-to-day according to a U(0, 0.25) distribution. Find the expected value of Y. Solution: 3/15/15 146

148 The Conditional Variance The conditional variance of Y 1 given Y 2 = y 2 is defined by analogy with the ordinary variance: V (Y 1 Y 2 = y 2 )=E(Y 2 1 Y 2 = y 2 ) [E(Y 1 Y 2 = y 2 )] 2 Note that the calculations are basically the same, except the expectations on the right hand side of the equality are taken with respect to the conditional distribution Just like with the conditional expectation, the conditional variance is a function of y 2 3/15/15 148

149 Another Way to Calculate the Variance Theorem 5.15: Let Y1 and Y2 denote random variables. Then, Proof: V (Y 1 )=E [V (Y 1 Y 2 )] + V [E(Y 1 Y 2 )] 3/15/15 149

151 Textbook Example 5.33 From Example 5.32, find the variance of Y. Solution: 3/15/15 151

153 Note How Conditioning Can Help Examples 5.32 & 5.33 could have been solved by first finding the unconditional distribution of Y and then calculating E(Y) and V(Y) But sometimes that can get pretty complicated, requiring: First finding the joint distribution of Y and the parameter (p in the previous examples) Then it s necessary to solve for the marginal distribution of Y And then finally, calculate E(Y) and V(Y) in the usual way 3/15/15 153

154 Note How Conditioning Can Help However, sometimes, like here, it can be easier to work with conditional distributions and the results of Theorems 5.14 and 5.15 In the examples: The distribution conditioned on a fixed value of the parameter was clear Then putting a distribution on the parameter itself accounted for the day-to-day differences So, the E(Y) and V(Y) can be found without having to first solve for the marginal distribution of Y 3/15/15 154

156 Section 5.12: Summary Whew this was an intense chapter! But the material is important and will come up in future classes For example: Linear functions of random variables: OA3102, OA3103 Covariance, correlation, and the bivariate normal distribution: OA3103, OA3602 Conditional expectation and variance, Law of Iterated Expectations, etc: OA3301, OA4301 3/15/15 156

157 What We Have Just Learned Bivariate and multivariate probability distributions Bivariate normal distribution Multinomial distribution Marginal and conditional distributions Independence, covariance and correlation Expected value & variance of a function of r.v.s Linear functions of r.v.s in particular Conditional expectation and variance 3/15/15 157