Some probability and statistics

Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y, Z, etc, and their probability distributions, defined by the cumulative distribution function CDF) F X x)=px x), etc To a random experiment we define a sample space Ω, that contains all the outcomes that can occur in the experiment A random variable, X, is a function defined on a sample space Ω To each outcome ω Ω it defines a real number Xω), that represents the value of a numeric quantity that can be measured in the experiment For X to be called a random variable, the probability PX x) has to be defined for all real x Distributions and moments A probability distribution with cumulative distribution function F X x) can be discrete or continuous with probability function p X x) and probability density function f X x), respectively, such that F X x)=px x)= k x p X k), if X takes only integer values, x f Xy)dy A distribution can be of mixed type The distribution function is then an integral plus discrete jumps; see Appendix B The expectation of a random variable X is defined as the center of gravity in the distribution, k kp X k), E[X]=m X = x= x f Xx)dx The variance is a simple measure of the spreading of the distribution and is 26

262 SOME PROBABILITY AND STATISTICS defined as V[X]=E[X m X ) 2 ]=E[X 2 ] m 2 X = Chebyshev s inequality states that, for all ε > 0, k k m X ) 2 p X k), P X m X >ε) E[X m X) 2 ] ε 2 x= x m X) 2 f X x)dx In order to describe the statistical properties of a random function one needs the notion of multivariate distributions If the result of a experiment is described by two different quantities, denoted by X and Y, eg length and height of randomly chosen individual in a population, or the value of a random function at two different time points, one has to deal with a two-dimensional random variable This is described by its two-dimensional distribution function F X,Y x,y)= PX x,y y), or the corresponding two-dimensional probability or density function, f X,Y x,y)= 2 F X,Y x,y) x y Two random variables X,Y are independent if, for all x, y, F X,Y x,y)=f X x)f Y y) An important concept is the covariance between two random variables X and Y, defined as C[X,Y]=E[X m X )Y m Y )]=E[XY] m X m Y The correlation coefficient is equal to the dimensionless, normalized covariance, ρ[x,y]= C[X,Y] V[X]V[Y] Two random variable with zero correlation, ρ[x,y] = 0, are called uncorrelated Note that if two random variables X and Y are independent, then they are also uncorrelated, but the reverse does not hold It can happen that two uncorrelated variables are dependent Speaking about the correlation between two random quantities, one often means the degree of covariation between the two However, one has to remember that the correlation only measures the degree of linear covariation Another meaning of the term correlation is used in connection with two data series, x,,x n ) and y,,y n ) Then sometimes the sum of products, n x ky k, can be called correlation, and a device that produces this sum is called a correlator

MULTIDIMENSIONAL NORMAL DISTRIBUTION 263 Conditional distributions If X and Y are two random variables with bivariate density function f X,Y x,y), we can define the conditional distribution for X given Y = y, by the conditional density, f X Y=y x)= f X,Yx,y), f Y y) for every y where the marginal density f Y y) is non-zero The expectation in this distribution, the conditional expectation, is a function of y, and is denoted and defined as E[X Y = y]= x f X Y=y x)dx=my) The conditional variance is defined as x V[X Y = y]= x my)) 2 f X Y=y x)dx= σ X Y y) x The unconditional expectation of X can be obtained from the law of total probability, and computed as } E[X]= x f X Y=y x)dx f Y y)dy=e[e[x Y]] y x The unconditional variance of X is given by V[X]=E[V[X Y]]+V[E[X Y]] A2 Multidimensional normal distribution A one-dimensional normal random variable X with expectation m and variance σ 2 has probability density function f X x)= exp 2πσ 2 2 ) } x m 2, σ and we write X Nm,σ 2 ) If m=0 and σ =, the normal distribution is standardized If X N0,), then σx+ m Nm,σ 2 ), and if X Nm,σ 2 ) then X m)/σ N0, ) We accept a constant random variable, X m, as a normal variable, X Nm,0)

264 SOME PROBABILITY AND STATISTICS Now, let X,,X n have expectation m k = E[X k ] and covariances σ jk = C[X j,x k ], and define, with for transpose), µ = m,,m n ), Σ = σ jk )=the covariance matrix for X,,X n It is a characteristic property of the normal distributions that all linear combinations of a multivariate normal variable also has a normal distribution To formulate the definition, write a=a,,a n ) and X=X,,X n ), with a X=a X + +a n X n Then, E[a X]=a µ = V[a X]=a Σa= n j= n j,k a j m j, a j a k σ jk A) Definition A The random variables X,,X n, are said to have an n-dimensional normal distribution is every linear combination a X + +a n X n has a normal distribution From A) we have that X=X,,X n ) is n-dimensional normal, if and only if a X Na µ,a Σa) for all a=a,,a n ) Obviously, X k = 0 X + + X k + +0 X n, is normal, ie, all marginal distributions in an n-dimensional normal distribution are onedimensional normal However, the reverse is not necessarily true; there are variables X,,X n, each of which is one-dimensional normal, but the vector X,,X n ) is not n-dimensional normal It is an important consequence of the definition that sums and differences of n-dimensional normal variables have a normal distribution If the covariance matrix Σ is non-singular, the n-dimensional normal distribution has a probability density function with x=x,,x n ) ) 2π) n/2 det Σ exp } 2 x µ) Σ x µ) A2) The distribution is said to be non-singular The density A2) is constant on every ellipsoid x µ) Σ x µ)= C in R n

MULTIDIMENSIONAL NORMAL DISTRIBUTION 265 Note: the density function of an n-dimensional normal distribution is uniquely determined by the expectations and covariances Example A Suppose X,X 2 have a two-dimensional normal distribution If then Σ is non-singular, and det Σ=σ σ 22 σ 2 2 > 0, Σ = det Σ With Qx,x 2 )=x µ) Σ x µ), σ22 σ 2 σ 2 σ Qx,x 2 )= = σ σ 22 σ2 2 x m ) 2 σ 22 2x m )x 2 m 2 )σ 2 +x 2 m 2 ) 2 } σ = ) = ρ 2 x m σ ) 2 2ρ x m σ ) x2 m 2 σ22 )+ x2 m 2 σ22 ) 2 }, where we also used the correlation coefficient ρ = σ 2 σ σ 22, and, f X,X 2 x,x 2 )= 2π σ σ 22 ρ 2 ) exp ) 2 Qx,x 2 ) A3) For variables with m = m 2 = 0 and σ = σ 22 = σ 2, the bivariate density is ) f X,X 2 x,x 2 )= 2πσ 2 ρ exp 2 2σ 2 ρ 2 ) x2 2ρx x 2 + x 2 2 ) We see that, if ρ = 0, this is the density of two independent normal variables Figure A shows the density function f X,X 2 x,x 2 ) and level curves for a bivariate normal distribution with expectation µ = 0, 0), and covariance matrix ) 05 Σ= 05 The correlation coefficient is ρ = 05 Remark A If the covariance matrix Σ is singular and non-invertible, then there exists at least one set of constants a,,a n, not all equal to 0, such that a Σa=0 From A) it follows thatv[a X]=0, which means that a X is constant equal to a µ The distribution of X is concentrated to a hyper plane a x = constant in R n The distribution is said to be singular and it has no density function inr n

266 SOME PROBABILITY AND STATISTICS 02 0 0 2 0 2 2 0 2 2 0 2 2 0 2 Figure A Two-dimensional normal density Left: density function; Right: elliptic level curves at levels 00, 002, 005, 0, 05 Remark A2 Formula A) implies that every covariance matrix Σ is positive definite or positive semi-definite, ie, j,k a j a k σ jk 0 for all a,,a n Conversely, if Σ is a symmetric, positive definite matrix of size n n, ie, if j,k a j a k σ jk > 0 for all a,,a n 0,,0, then A2) defines the density function for an n-dimensional normal distribution with expectation m k and covariances σ jk Every symmetric, positive definite matrix is a covariance matrix for a non-singular distribution Furthermore, for every symmetric, positive semi-definite matrix, Σ, ie, such that a j a k σ jk 0 j,k for all a,,a n with equality holding for some choice of a,,a n 0,,0, there exists an n-dimensional normal distribution that has Σ as its covariance matrix For n-dimensional normal variables, uncorrelated and independent are equivalent Theorem A If the random variables X,,X n are n- dimensional normal and uncorrelated, then they are independent Proof We show the theorem only for non-singular variables with density function It is true also for singular normal variables If X,,X n are uncorrelated, σ jk = 0 for j k, then Σ, and also Σ are

MULTIDIMENSIONAL NORMAL DISTRIBUTION 267 diagonal matrices, ie, note that σ j j =V[X j ]), σ 0 det Σ= σ j j, Σ = j 0 σ nn This means that x µ) Σ x µ) = j x j µ j ) 2 /σ j j, and the density A2) is exp x j m j ) 2 } 2πσ j j j 2σ j j Hence, the joint density function for X,,X n is a product of the marginal densities, which says that the variables are independent A2 Conditional normal distribution This section deals with partial observations in a multivariate normal distribution It is a special property of this distribution, that conditioning on observed values of a subset of variables, leads to a conditional distribution for the unobserved variables that is also normal Furthermore, the expectation in the conditional distribution is linear in the observations, and the covariance matrix does not depend on the observed values This property is particularly useful in prediction of Gaussian time series, as formulated by the Kalman filter Conditioning in the bivariate normal distribution Let X and Y have a bivariate normal distribution with expectations m X and m Y, variances σx 2 and σ Y 2, respectively, and with correlation coefficient ρ = C[X,Y]/σ X σ Y ) The simultaneous density function is given by A3) The conditional density function for X given that Y = y is f X Y=y x)= f X,Yx,y) f Y y) = σ X ρ 2 2π exp x m X+ σ X ρy m Y )/σ Y )) 2 2σ 2 X ρ2 ) Hence, the conditional distribution of X given Y = y is normal with expectation and variance m X Y=y = m X + σ X ρy m Y )/σ Y, σ 2 X Y=y = σ 2 X ρ2 ) Note: the conditional expectation depends linearly on the observed y-value, and the conditional variance is constant, independent of Y = y }

268 SOME PROBABILITY AND STATISTICS Conditioning in the multivariate normal distribution Let X=X,,X n ) and Y=Y,,Y m ) be two multivariate normal variables, of size n and m, respectively, such that Z=X,,X n,y,,y m ) is n + m)-dimensional normal Denote the expectations E[X]=m X, E[Y]=m Y, and partition the covariance matrix for Z with Σ XY = Σ YX), [ ) )] ) X X ΣXX Σ Σ=Cov ; = XY A4) Y Y Σ YX Σ YY If the covariance matrix Σ is positive definite, the distribution of X, Y) has the density function f XY x,y)= 2π) m+n)/2 det Σ e 2 x m X,y m Y )Σ x m X,y m Y ), while the m-dimensional density of Y is f Y y)= 2π) m/2 e 2 y m Y) Σ det Σ YY To find the conditional density of X given that Y = y, we need the following matrix property YY y my) f X Y x y)= f YXy,x), A5) f Y y) Theorem A2 Matrix inversions lemma ) Let B be a p p-matrix p = n+m): ) B B B= 2, B 2 B 22 where the sub-matrices have dimension n n, n m, etc Suppose B,B,B 22 are non-singular, and partition the inverse in the same way as B, ) A=B A A = 2 A 2 A 22 Then A= B B 2 B22 B 2) B B 2 B 22 B ) 2) B 2 B 22 B 22 B 2 B B 2) B 2 B B 22 B 2 B B 2)

MULTIDIMENSIONAL NORMAL DISTRIBUTION 269 Proof For the proof, see a matrix theory textbook, for example, [22] Theorem A3 Conditional normal distribution ) The conditional normal distribution for X, given that Y = y, is n-dimensional normal with expectation and covariance matrix E[X Y=y] = m X Y=y = m X + Σ XY Σ YY y m Y), A6) C[X Y=y] = Σ XX Y = Σ XX Σ XY Σ YY Σ YX A7) These formulas are easy to remember: the dimension of the submatrices, for example in the covariance matrix Σ XX Y, are the only possible for the matrix multiplications in the right hand side to be meaningful Proof To simplify calculations, we start with m X = m Y = 0, and add the expectations afterwards The conditional distribution of X given that Y = y is, according to A5), the ratio between two multivariate normal densities, and hence it is of the form, c exp 2 x,y ) Σ x y ) + 2 y Σ YY y }=c exp 2 Qx,y)}, where c is a normalization constant, independent of x and y The matrix Σ can be partitioned as in A4), and if we use the matrix inversion lemma, we find that ) Σ A A = A= 2 A8) A 2 A 22 = Σ XX Σ XY Σ YY Σ YX ) Σ YY Σ YX Σ XX Σ XY ) Σ YX Σ XX We also see that Σ XX Σ XY Σ YY Σ YX ) Σ XY Σ YY Σ YY Σ YX Σ XX Σ XY ) Qx,y) = x A x+x A 2 y+y A 2 x+y A 22 y y Σ YY y = x y C )A x Cy)+ Qy) = x A x x A Cy y C A x+y C A Cy+ Qy), for some matrix C and quadratic form Qy) in y )

270 SOME PROBABILITY AND STATISTICS Here A = Σ solving XX Y, according to A7) and A8), while we can find C by A C=A 2, ie C= A A 2 = Σ XY Σ YY, according to A8) This is precisely the matrix in A6) If we reinstall the deleted m X and m Y, we get the conditional density for X given Y=y to be of the form c exp 2 Qx,y)}=c exp 2 x m X Y=y )Σ XX Y x m X Y=y)}, which is the normal density we were looking for A22 Complex normal variables In most of the book, we have assumed all random variables to be real valued In many applications, and also in the mathematical background, it is advantageous to consider complex variables, simply defined as Z = X+ iy, where X and Y have a bivariate distribution The mean value of a complex random variable is simply E[Z]=E[RZ]+iE[IZ], while the variance and covariances are defined with complex conjugate on the second variable, C[Z,Z 2 ]=E[Z Z 2 ] m Z m Z2, V[Z]=C[Z,Z]=E[ Z 2 ] m Z 2 Note, that for a complex Z = X+ iy, withv[x]= V[Y]=σ 2, C[Z,Z]=V[X]+V[Y]=2σ 2, C[Z,Z]=V[X] V[Y]+2iC[X,Y]=2iC[X,Y] Hence, if the real and imaginary parts are uncorrelated with the same variance, then the complex variable Z is uncorrelated with its own complex conjugate, Z Often, one uses the term orthogonal, instead of uncorrelated for complex variables