3. Joint and Conditional Distributions, Stochastic Independence

3. Joint and Conditional Distributions, Stochastic Independence Aim of this section: Multidimensional random variables (random vectors) (joint and marginal distributions) Stochastic (in)dependence and conditional distribution Multivariate normal distribution (definition, properties) Literature: Mood, Graybill, Boes (1974), Chapter IV, pp. 129-174 Wilfling (2011), Chapter 4 94

3.1 Joint and Marginal Distribution Now: Consider several random variables simultaneously Applications: Several economic applications Statistical inference 95

Definition 3.1: (Random vector) Let X 1,, X n be a set of n random variables each representing the same random experiment, i.e. X i : Ω R for i = 1,..., n. Then X = (X 1,..., X n ) is called an n-dimensional random variable or an n-dimensional random vector. Remark: In the literature random vectors are often denoted by X = (X 1,..., X n ) or more simply by X 1,..., X n 96

For n = 2 it is common practice to write X = (X, Y ) or (X, Y ) or X, Y Realizations are denoted by small letters: x = (x 1,..., x n ) R n or x = (x, y) R 2 Now: Characterization of the probability distribution of the random vector X 97

Definition 3.2: (Joint cumulative distribution function) Let X = (X 1,..., X n ) be an n-dimensional random vector. The function defined by F X1,...,X n : R n [0, 1] F X1,...,X n (x 1,..., x n ) = P (X 1 x 1, X 2 x 2,..., X n x n ) is called the joint cumulative distribution function of X. Remark: Definition 3.2 applies to discrete as well as to continuous random variables X 1,..., X n 98

Some properties of the bivariate cdf (n = 2): F X,Y (x, y) is monotone increasing in x and y lim x F X,Y (x, y) = 0 lim y F X,Y (x, y) = 0 lim x + y + F X,Y (x, y) = 1 Remark: Analogous properties hold for the n-dimensional cdf F X1,...,X n (x 1,..., x n ) 99

Now: Joint discrete versus joint continuous random vectors Definition 3.3: (Joint discrete random vector) The random vector X = (X 1,..., X n ) is defined to be a joint discrete random vector if it can assume only a finite (or a countable infinite) number of realizations x = (x 1,..., x n ) such that and P (X 1 = x 1, X 2 = x 2,..., X n = x n ) > 0 P (X1 = x 1, X 2 = x 2,..., X n = x n ) = 1, where the summation is over all possible realizations of X. 100

Definition 3.4: (Joint continuous random vector) The random vector X = (X 1,..., X n ) is defined to be a joint continuous random vector if and only if there exists a nonnegative function f X1,...,X n (x 1,..., x n ) such that xn x1 F X1,...,X n (x 1,..., x n ) =... f X 1,...,X n (u 1,..., u n ) du 1... du n for all (x 1,..., x n ). The function f X1,...,X n is defined to be a joint probability density function of X. Example: Consider X = (X, Y ) with joint pdf f X,Y (x, y) = { x + y, for (x, y) [0, 1] [0, 1] 0, elsewise 101

Joint pdf f X,Y (x, y) 2 1.5 fhx,yl 1 0.5 0 0 0.2 0.4 x 0.6 0.8 1 0 1 0.8 0.6 0.4 y 0.2 102

The joint cdf can be obtained by F X,Y (x, y) = y x f X,Y (u, v) du dv = y 0 x 0 (u + v) du dv =... = (Proof: Class) 0.5(x 2 y + xy 2 ), for (x, y) [0, 1] [0, 1] 0.5(x 2 + x), for (x, y) [0, 1] [1, ) 0.5(y 2 + y), for (x, y) [1, ) [0, 1] 1, for (x, y) [1, ) [1, ) 103

Remarks: If X = (X 1,..., X n ) is a joint continuous random vector, then n F X1,...,X n (x 1,..., x n ) x 1 x n = f X1,...,X n (x 1,..., x n ) The volume under the joint pdf represents probabilities: P (a u 1 < X 1 a o 1,..., au n < X n a o n ) = a o n a u n... a o 1 a u 1 f X1,...,X n (u 1,..., u n ) du 1... du n 104

In this course: Emphasis on joint continuous random vectors Analogous results for joint discrete random vectors (see Mood, Graybill, Boes (1974), Chapter IV) Now: Determination of the distribution of a single random variable X i from the joint distribution of the random vector (X 1,..., X n ) marginal distribution 105

Definition 3.5: (Marginal distribution) Let X = (X 1,..., X n ) be a continuous random vector with joint cdf F X1,...,X n and joint pdf f X1,...,X n. Then F X1 (x 1 ) = F X1,...,X n (x 1, +, +,..., +, + ) F X2 (x 2 ) = F X1,...,X n (+, x 2, +,..., +, + )... F Xn (x n ) = F X1,...,X n (+, +, +,..., +, x n ) are called marginal cdfs while 106

f X1 (x 1 ) = f X2 (x 2 ) = + +... f X 1,...,X n (x 1, x 2,..., x n ) dx 2... dx n + +... f X 1,...,X n (x 1, x 2,..., x n ) dx 1 dx 3... dx n f Xn (x n ) = + +... f X 1,...,X n (x 1, x 2,..., x n ) dx 1 dx 2... dx n 1 are called marginal pdfs of the one-dimensional (univariate) random variables X 1,..., X n. 107

Example: Consider the bivariate pdf f X,Y (x, y) = { 40(x 0.5) 2 y 3 (3 2x y), for (x, y) [0, 1] [0, 1] 0, elsewise 108

Bivariate pdf f X,Y (x, y) 3 fhx,yl 2 1 0 0 0.2 0.4 x 0.6 0.8 1 0 1 0.8 0.6 0.4 y 0.2 109

The marginal pdf of X obtains as f X (x) = 1 0 40(x 0.5)2 y 3 (3 2x y)dy = 40(x 0.5) 2 1 0 (3y3 2xy 3 y 4 )dy [ = 40(x 0.5) 2 3 4 y4 2x 4 y4 1 ] 1 5 y5 = 40(x 0.5) 2 ( 3 4 2x 4 1 5 = 20x 3 + 42x 2 27x + 5.5 ) 0 110

Marginal pdf f X (x) fhxl 1.5 1.25 1 0.75 0.5 0.25 0.2 0.4 0.6 0.8 1 x 111

The marginal pdf of Y obtains as f Y (y) = 1 0 40(x 0.5)2 y 3 (3 2x y)dx = 40y 3 1 = 10 3 y3 (y 2) 0 (x 0.5)2 (3 2x y)dx 112

Marginal pdf f Y (y) fhyl 3 2.5 2 1.5 1 0.5 0.2 0.4 0.6 0.8 1 y 113

Remarks: When considering the marginal instead of the joint distributions, we are faced with an information loss (the joint distribution uniquely determines all marginal distributions, but the converse does not hold in general) Besides the respective univariate marginal distributions, there are also multivariate distributions which can be obtained from the joint distribution of X = (X 1,..., X n ) 114

Example: For n = 5 consider X = (X 1,..., X 5 ) with joint pdf f X1,...,X 5 Then the marginal pdf of Z = (X 1, X 3, X 5 ) obtains as f X1,X 3,X 5 (x 1, x 3, x 5 ) = + + f X 1,...,X 5 (x 1, x 2, x 3, x 4, x 5 ) dx 2 dx 4 (integrate out the irrelevant components) 115

3.2 Conditional Distribution and Stochastic Independence Now: Distribution of a random variable X under the condition that another random variable Y has already taken on the realization y (conditional distribution of X given Y = y) 116

Definition 3.6: (Conditional distribution) Let X = (X, Y ) be a bivariate continuous random vector with joint pdf f X,Y (x, y). The conditional density of X given Y = y is defined to be f X Y =y (x) = f X,Y (x, y). f Y (y) Analogously, the conditional density of Y given X = x is defined to be f Y X=x (y) = f X,Y (x, y). f X (x) 117

Remark: Conditional densities of random vectors are defined analogously, e.g. f X1,X 2,X 4 X 3 =x 3,X 5 =x 5 (x 1, x 2, x 4 ) = f X1,X 2,X 3,X 4,X 5 (x 1, x 2, x 3, x 4, x 5 ) f X3,X 5 (x 3, x 5 ) 118

Example: Consider the bivariate pdf f X,Y (x, y) { 40(x 0.5) = 2 y 3 (3 2x y), for (x, y) [0, 1] [0, 1] 0, elsewise with marginal pdf f Y (y) = 10 3 y3 (y 2) (cf. Slides 108-112) 119

It follows that f X Y =y (x) = f X,Y (x, y) f Y (y) = 40(x 0.5)2 y 3 (3 2x y) 10 3 y3 (y 2) = 12(x 0.5)2 (3 2x y) 2 y 120

Conditional pdf f X Y =0.01 (x) of X given Y = 0.01 Bedingte 3 Dichte 2.5 2 1.5 1 0.5 0.2 0.4 0.6 0.8 1 x 121

Conditional pdf f X Y =0.95 (x) of X given Y = 0.95 Bedingte 1.2 Dichte 1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 x 122

Now: Combine the concepts joint distribution and conditional distribution to define the notion stochastic independence (for two random variables first) Definition 3.7: (Stochastic Independence [I]) Let (X, Y ) be a bivariate continuous random vector with joint pdf f X,Y (x, y). X and Y are defined to be stochastically independent if and only if f X,Y (x, y) = f X (x) f Y (y) for all x, y R. 123

Remarks: Alternatively, stochastic independence can be defined via the cdfs: X and Y are stochastically independent, if and only if F X,Y (x, y) = F X (x) F Y (y) for all x, y R. If X and Y are independent, we have f X Y =y (x) = f X,Y (x, y) f Y (y) = f X(x) f Y (y) f Y (y) = f X (x) f Y X=x (y) = f X,Y (x, y) f X (x) = f X(x) f Y (y) f X (x) = f Y (y) If X and Y are independent and g and h are two continuous functions, then g(x) and h(y ) are also independent 124

Now: Extension to n random variables Definition 3.8: (Stochastic independence [II]) Let (X 1,..., X n ) be a continuous random vector with joint pdf f X1,...,X n (x 1,..., x n ) and joint cdf F X1,...,X n (x 1,..., x n ). X 1,..., X n are defined to be stochastically independent, if and only if for all (x 1,..., x n ) R n or f X1,...,X n (x 1,..., x n ) = f X1 (x 1 )... f Xn (x n ) F X1,...,X n (x 1,..., x n ) = F X1 (x 1 )... F Xn (x n ). 125

Remarks: For discrete random vectors we define: X 1,..., X n are stochastically independent, if and only if for all (x 1,..., x n ) R n or P (X 1 = x 1,..., X n = x n ) = P (X 1 = x 1 )... P (X n = x n ) F X1,...,X n (x 1,..., x n ) = F X1 (x 1 )... F Xn (x n ) In the case of independence, the joint distribution results from the marginal distributions If X 1,..., X n are stochastically independent and g 1,..., g n are continuous functions, then Y 1 = g 1 (X 1 ),..., Y n = g n (X n ) are also stochastically independent 126

3.3 Expectation and Joint Moment Generating Functions Now: Definition of the expectation of a function g : R n R (x 1,..., x n ) g(x 1,... x n ) of a continuous random vector X = (X 1,..., X n ) 127

Definition 3.9: (Expectation of a function) Let (X 1,..., X n ) be a continuous random vector with joint pdf f X1,...,X n (x 1,..., x n ) and g : R n R a real-valued continuous function. The expectation of the function g of the random vector is defined to be E[g(X 1,..., X n )] = +... + g(x 1,..., x n ) f X1,...,X n (x 1,..., x n ) dx 1... dx n. 128

Remarks: For a discrete random vector (X 1,..., X n ) the analogous definition is E[g(X 1,..., X n )] = g(x 1,..., x n ) P (X 1 = x 1,..., X n = x n ), where the summation is over all realizationen of the vector Definition 3.9 includes the expectation of a univariate random variable X: Set n = 1 and g(x) = x E(X 1 ) E(X) = + xf X(x) dx Definition 3.9 includes the variance of X: Set n = 1 and g(x) = [x E(X)] 2 Var(X 1 ) Var(X) = + [x E(X)]2 f X (x) dx 129

Definition 3.9 includes the covariance of two variables: Set n = 2 and g(x 1, x 2 ) = [x 1 E(X 1 )] [x 2 E(X 2 )] Cov(X 1, X 2 ) = + + [x 1 E(X 1 )][x 2 E(X 2 )]f X1,X 2 (x 1, x 2 ) dx 1 dx 2 Via the covariance we define the correlation coefficient: Corr(X 1, X 2 ) = Cov(X 1, X 2 ) Var(X 1 ) Var(X 2 ) General properties of expected values, variances, covariances and the correlation coefficient Class 130

Now: Expectation and variances of random vectors Definition 3.10: (Expected vector, covariance matrix) Let X = (X 1,..., X n ) be a random vector. The expected vector of X is defined to be E(X) = E(X 1 ). E(X n ) The covariance matrix of X is defined to be Cov(X) = Var(X 1 ) Cov(X 1, X 2 )... Cov(X 1, X n ) Cov(X 2, X 1 ) Var(X 2 )... Cov(X 2, X n )...... Cov(X n, X 1 ) Cov(X n, X 2 )... Var(X n ).. 131

Remark: Obviously, the covariance matrix is symmetric per definition Now: Expected vectors and covariance matrices under linear transformations of random vectors Let X = (X 1,..., X n ) be a n-dimensional random vector A be an (m n) matrix of real numbers b be an (m 1) column vector of real numbers 132

Obviously: Y = AX + b is an (m 1) random vector: Y = a 11 a 12... a 1n a 21 a 22... a 2n...... a m1 a m2... a mn X 1 X 2. X n + b 1 b 2. b m = a 11 X 1 + a 12 X 2 +... + a 1n X n + b 1 a 21 X 1 + a 22 X 2 +... + a 2n X n + b 2. a m1 X 1 + a m2 X 2 +... + a mn X n + b m 133

The expected vector of Y is given by E(Y) = a 11 E(X 1 ) + a 12 E(X 2 ) +... + a 1n E(X n ) + b 1 a 21 E(X 1 ) + a 22 E(X 2 ) +... + a 2n E(X n ) + b 2. a m1 E(X 1 ) + a m2 E(X 2 ) +... + a mn E(X n ) + b m = AE(X) + b The covariance matrix of Y is given by Cov(Y) = Var(Y 1 ) Cov(Y 1, Y 2 )... Cov(Y 1, Y n ) Cov(Y 2, Y 1 ) Var(Y 2 )... Cov(Y 2, Y n )...... Cov(Y n, Y 1 ) Cov(Y n, Y 2 )... Var(Y n ) (Proof: Class) = ACov(X)A 134

Remark: Cf. the analogous results for univariate variables: E(a X + b) = a E(X) + b Var(a X + b) = a 2 Var(X) Up to now: Expected values for unconditional distributions Now: Expected values for conditional distributions (cf. Definition 3.6, Slide 117) 135

Definition 3.11: (Conditional expected value of a function) Let (X, Y ) be a continuous random vector with joint pdf f X,Y (x, y) and let g : R 2 R be a real-valued function. The conditional expected value of the function g given X = x is defined to be E[g(X, Y ) X = x] = + g(x, y) f Y X (y) dy. 136

Remarks: An analogous definition applies to a discrete random vector (X, Y ) Definition 3.11 naturally extends to higher-dimensional distributions For g(x, y) = y we obtain the special case E[g(X, Y ) X = x] = E(Y X = x) Note that E[g(X, Y ) X = x] is a function of x 137

Example: Consider the joint pdf f X,Y (x, y) = { x + y, for (x, y) [0, 1] [0, 1] 0, elsewise The conditional distribution of Y given X = x is given by f Y X=x (y) = x + y x + 0.5, for (x, y) [0, 1] [0, 1] 0, elsewise For g(x, y) = y the conditional expectation is given as E(Y X = x) = 1 0 y x + y x + 0.5 dy = 1 x + 0.5 ( x 2 + 1 3 ) 138

Remarks: Consider the function g(x, y) = g(y) (i.e. g does not depend on x) Denote h(x) = E[g(Y ) X = x] We calculate the unconditional expectation of the transformed variable h(x) We have 139

E {E[g(Y ) X = x]} = E[h(X)] = + h(x) f X(x) dx = = + E[g(Y ) X = x] f X(x) dx [ + + g(y) f Y X (y) dy ] f X (x) dx = + + g(y) f Y X (y) f X(x) dy dx = + + g(y) f X,Y (x, y) dy dx = E[g(Y )] 140

Theorem 3.12: Let (X, Y ) be an arbitrary discrete or continuous random vector. Then and, in particular, E[g(Y )] = E {E[g(Y ) X = x]} E[Y ] = E {E[Y X = x]}. Now: Three important rules for conditional and unconditional expected values 141

Theorem 3.13: Let (X, Y ) be an arbitrary discrete or continuous random vector and g 1 ( ), g 2 ( ) two unidimensional functions. Then 1. E[g 1 (Y ) + g 2 (Y ) X = x] = E[g 1 (Y ) X = x] + E[g 2 (Y ) X = x], 2. E[g 1 (Y ) g 2 (X) X = x] = g 2 (x) E[g 1 (Y ) X = x]. 3. If X and Y are stochastically independent we have E[g 1 (X) g 2 (Y )] = E[g 1 (X)] E[g 2 (Y )]. 142

Finally: Moment generating function for random vectors Definition 3.14: (Joint moment generating function) Let X = (X 1,..., X n ) be an arbitrary discrete or continuous random vector. The joint moment generating function of X is defined to be m X1,...,X n (t 1,..., t n ) = E [ e t ] 1 X 1 +...+t n X n if this expectation exists for all t 1,..., t n with h < t j < h for an arbitary value h > 0 and for all j = 1,..., n. 143

Remarks: Via the joint moment generating function m X1,...,X n (t 1,..., t n ) we can derive the following mathematical objects: the marginal moment generating functions m X1 (t 1 ),..., m Xn (t n ) the moments of the marginal distributions the so-called joint moments 144

Important result: (cf. Theorem 2.23, Slide 85) For any given joint moment generating function m X1,...,X n (t 1,..., t n ) there exists a unique joint cdf F X1,...,X n (x 1,..., x n ) 145

3.4 The Multivariate Normal Distribution Now: Extension of the univariate normal distribution Definition 3.15: (Multivariate normal distribution) Let X = (X 1,..., X n ) be an continuous random vector. X is defined to have a multivariate normal distribution with parameters µ 1 σ 2 µ =. and Σ 1 σ 1n =....., µ n σ n1 σn 2 if for x = (x 1,..., x n ) R n its joint pdf is given by { f X (x) = (2π) n/2 [det(σ)] 1/2 exp 1 2 (x µ) Σ 1 (x µ) }. 146

Remarks: See Chang (1984, p. 92) for a definition and the properties of the determinant det(a) of the matrix A Notation: X N(µ, Σ) µ is a column vector with µ 1,..., µ n R Σ is a regular, positive definite, symmetric (n n) matrix Role of the parameters: E(X) = µ and Cov(X) = Σ 147

Joint pdf of the multiv. standard normal distribution N(0, I n ): { φ(x) = (2π) n/2 exp 1 } 2 x x Cf. the analogy to the univariate pdf in Definition 2.24, Slide 91 Properties of the N(µ, Σ) distribution: Partial vectors (marginal distributions) of X also have multivariate normal distributions, i.e. if then X = [ X1 X 2 ] N ([ µ1 µ 2 ] X 1 N(µ 1, Σ 11 ) X 2 N(µ 2, Σ 22 ), [ Σ11 Σ 12 Σ 21 Σ 22 ]) 148

Thus, all univariate variables of X = (X 1,..., X n ) have univariate normal distributions: X 1 N(µ 1, σ 2 1 ) X 2 N(µ 2, σ 2 2 ). X n N(µ n, σ 2 n) The conditional distributions are also (univariately or multivariately) normal: X 1 X 2 = x 2 N ( µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ), Σ 11 Σ 12 Σ 1 22 Σ 21 Linear transformations: Let A be an (m n) matrix, b an (m 1) vector of real numbers and X = (X 1,..., X n ) N(µ, Σ). Then AX + b N(Aµ + b, AΣA ) ) 149

Example: Consider X N(µ, Σ) ([ 0 N 1 ], [ 1 0.5 0.5 2 Find the distribution of Y = AX + b where [ ] [ ] 1 2 1 A =, b = 3 4 2 It follows that Y N(Aµ + b, AΣA ) ]) In particular, Aµ + b = [ 3 6 ] and AΣA = [ 12 24 24 53 ] 150

Now: Consider the bivariate case (n = 2), i.e. X = (X, Y ), E(X) = We have [ µx µ Y ], Σ = [ σ 2 X σ XY σ Y X σ 2 Y ] σ XY = σ Y X = Cov(X, Y ) = σ X σ Y Corr(X, Y ) = σ X σ Y ρ The joint pdf follows from Definition 3.15 with n = 2 f X,Y (x, y) = 1 2πσ X σ Y 1 ρ 2 exp (Derivation: Class) [ (x µx ) 2 σ 2 X 1 2 ( 1 ρ 2) 2ρ(x µ X)(y µ Y ) + (y µ Y ) 2 ]} σ X σ Y σy 2 151

f X,Y (x, y) for µ X = µ Y = 0, σ x = σ Y = 1 and ρ = 0 0.15 fhx,yl0.1 0.05 0-2 0 y 2 0 x 2-2 152

f X,Y (x, y) for µ X = µ Y = 0, σ x = σ Y = 1 and ρ = 0.9 0.3 fhx,yl0.2 0.1 0-2 0 y 2 0 x 2-2 153

Remarks: The marginal distributions are given by X N(µ X, σ 2 X ) and Y N(µ Y, σ 2 Y ) interesting result for the normal distribution: If (X, Y ) has a bivariate normal distribution, then X and Y are independent if and only if ρ = Corr(X, Y ) = 0 The conditional distributions are given by X Y = y N Y X = x N (Proof: Class) ( ( µ X + ρ σ X (y µ Y ), σx 2 σ Y µ Y + ρ σ Y σ X (x µ X ), σ 2 Y ( 1 ρ 2 )) ( 1 ρ 2 )) 154