Some probability and statistics



Similar documents
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

The Bivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Introduction to General and Generalized Linear Models

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Covariance and Correlation

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Introduction to Matrix Algebra

Multivariate normal distribution and testing for means (see MKB Ch 3)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Linear Algebra Review. Vectors

Vector and Matrix Norms

MULTIVARIATE PROBABILITY DISTRIBUTIONS

NOTES ON LINEAR TRANSFORMATIONS

Factorization Theorems

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

Econometrics Simple Linear Regression

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

1 VECTOR SPACES AND SUBSPACES

Multivariate Normal Distribution Rebecca Jennings, Mary Wakeman-Linn, Xin Zhao November 11, 2010

1 Introduction to Matrices

Finite dimensional C -algebras

Mathematics Course 111: Algebra I Part IV: Vector Spaces

E3: PROBABILITY AND STATISTICS lecture notes

Section Inner Products and Norms

Solving Linear Systems, Continued and The Inverse of a Matrix

Lecture 21. The Multivariate Normal Distribution

Notes on Determinant

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

ST 371 (IV): Discrete Random Variables

Principle of Data Reduction

Lecture 9: Introduction to Pattern Analysis

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Lecture 5 Principal Minors and the Hessian

Linear Algebra Notes

Probability Calculator

MATHEMATICAL METHODS OF STATISTICS

Numerical Analysis Lecture Notes

8 Square matrices continued: Determinants

Linear Algebra Notes for Marsden and Tromba Vector Calculus

CONDITIONAL, PARTIAL AND RANK CORRELATION FOR THE ELLIPTICAL COPULA; DEPENDENCE MODELLING IN UNCERTAINTY ANALYSIS

THE DYING FIBONACCI TREE. 1. Introduction. Consider a tree with two types of nodes, say A and B, and the following properties:

The Characteristic Polynomial

Probability and Random Variables. Generation of random variables (r.v.)

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Adding vectors We can do arithmetic with vectors. We ll start with vector addition and related operations. Suppose you have two vectors

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

3. INNER PRODUCT SPACES

Solving Systems of Linear Equations

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

Lecture 13: Martingales

Data Mining: Algorithms and Applications Matrix Math Review

5. Orthogonal matrices

LINEAR ALGEBRA. September 23, 2010

Inner Product Spaces

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

General Sampling Methods

A characterization of trace zero symmetric nonnegative 5x5 matrices

Gaussian Conjugate Prior Cheat Sheet

FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 22

Linear Algebra I. Ronald van Luijk, 2012

Correlation in Random Variables

Systems of Linear Equations

Multi-variable Calculus and Optimization

Lecture Notes 1. Brief Review of Basic Probability

Similarity and Diagonalization. Similar Matrices

How To Prove The Dirichlet Unit Theorem

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

Sections 2.11 and 5.8

An Introduction to Basic Statistics and Probability

Linear Threshold Units

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Chapter 17. Orthogonal Matrices and Symmetries of Space

Notes on Continuous Random Variables

Lecture 8. Confidence intervals and the central limit theorem

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Polynomial Invariants

BEGINNING ALGEBRA ACKNOWLEDMENTS

MAT188H1S Lec0101 Burbulla

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

THREE DIMENSIONAL GEOMETRY

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

STAT 830 Convergence in Distribution

Methods for Finding Bases

The Determinant: a Means to Calculate Volume

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

Name: Section Registered In:

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

Nonlinear Programming Methods.S2 Quadratic Programming

Transcription:

Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y, Z, etc, and their probability distributions, defined by the cumulative distribution function CDF) F X x)=px x), etc To a random experiment we define a sample space Ω, that contains all the outcomes that can occur in the experiment A random variable, X, is a function defined on a sample space Ω To each outcome ω Ω it defines a real number Xω), that represents the value of a numeric quantity that can be measured in the experiment For X to be called a random variable, the probability PX x) has to be defined for all real x Distributions and moments A probability distribution with cumulative distribution function F X x) can be discrete or continuous with probability function p X x) and probability density function f X x), respectively, such that F X x)=px x)= k x p X k), if X takes only integer values, x f Xy)dy A distribution can be of mixed type The distribution function is then an integral plus discrete jumps; see Appendix B The expectation of a random variable X is defined as the center of gravity in the distribution, k kp X k), E[X]=m X = x= x f Xx)dx The variance is a simple measure of the spreading of the distribution and is 26

262 SOME PROBABILITY AND STATISTICS defined as V[X]=E[X m X ) 2 ]=E[X 2 ] m 2 X = Chebyshev s inequality states that, for all ε > 0, k k m X ) 2 p X k), P X m X >ε) E[X m X) 2 ] ε 2 x= x m X) 2 f X x)dx In order to describe the statistical properties of a random function one needs the notion of multivariate distributions If the result of a experiment is described by two different quantities, denoted by X and Y, eg length and height of randomly chosen individual in a population, or the value of a random function at two different time points, one has to deal with a two-dimensional random variable This is described by its two-dimensional distribution function F X,Y x,y)= PX x,y y), or the corresponding two-dimensional probability or density function, f X,Y x,y)= 2 F X,Y x,y) x y Two random variables X,Y are independent if, for all x, y, F X,Y x,y)=f X x)f Y y) An important concept is the covariance between two random variables X and Y, defined as C[X,Y]=E[X m X )Y m Y )]=E[XY] m X m Y The correlation coefficient is equal to the dimensionless, normalized covariance, ρ[x,y]= C[X,Y] V[X]V[Y] Two random variable with zero correlation, ρ[x,y] = 0, are called uncorrelated Note that if two random variables X and Y are independent, then they are also uncorrelated, but the reverse does not hold It can happen that two uncorrelated variables are dependent Speaking about the correlation between two random quantities, one often means the degree of covariation between the two However, one has to remember that the correlation only measures the degree of linear covariation Another meaning of the term correlation is used in connection with two data series, x,,x n ) and y,,y n ) Then sometimes the sum of products, n x ky k, can be called correlation, and a device that produces this sum is called a correlator

MULTIDIMENSIONAL NORMAL DISTRIBUTION 263 Conditional distributions If X and Y are two random variables with bivariate density function f X,Y x,y), we can define the conditional distribution for X given Y = y, by the conditional density, f X Y=y x)= f X,Yx,y), f Y y) for every y where the marginal density f Y y) is non-zero The expectation in this distribution, the conditional expectation, is a function of y, and is denoted and defined as E[X Y = y]= x f X Y=y x)dx=my) The conditional variance is defined as x V[X Y = y]= x my)) 2 f X Y=y x)dx= σ X Y y) x The unconditional expectation of X can be obtained from the law of total probability, and computed as } E[X]= x f X Y=y x)dx f Y y)dy=e[e[x Y]] y x The unconditional variance of X is given by V[X]=E[V[X Y]]+V[E[X Y]] A2 Multidimensional normal distribution A one-dimensional normal random variable X with expectation m and variance σ 2 has probability density function f X x)= exp 2πσ 2 2 ) } x m 2, σ and we write X Nm,σ 2 ) If m=0 and σ =, the normal distribution is standardized If X N0,), then σx+ m Nm,σ 2 ), and if X Nm,σ 2 ) then X m)/σ N0, ) We accept a constant random variable, X m, as a normal variable, X Nm,0)

264 SOME PROBABILITY AND STATISTICS Now, let X,,X n have expectation m k = E[X k ] and covariances σ jk = C[X j,x k ], and define, with for transpose), µ = m,,m n ), Σ = σ jk )=the covariance matrix for X,,X n It is a characteristic property of the normal distributions that all linear combinations of a multivariate normal variable also has a normal distribution To formulate the definition, write a=a,,a n ) and X=X,,X n ), with a X=a X + +a n X n Then, E[a X]=a µ = V[a X]=a Σa= n j= n j,k a j m j, a j a k σ jk A) Definition A The random variables X,,X n, are said to have an n-dimensional normal distribution is every linear combination a X + +a n X n has a normal distribution From A) we have that X=X,,X n ) is n-dimensional normal, if and only if a X Na µ,a Σa) for all a=a,,a n ) Obviously, X k = 0 X + + X k + +0 X n, is normal, ie, all marginal distributions in an n-dimensional normal distribution are onedimensional normal However, the reverse is not necessarily true; there are variables X,,X n, each of which is one-dimensional normal, but the vector X,,X n ) is not n-dimensional normal It is an important consequence of the definition that sums and differences of n-dimensional normal variables have a normal distribution If the covariance matrix Σ is non-singular, the n-dimensional normal distribution has a probability density function with x=x,,x n ) ) 2π) n/2 det Σ exp } 2 x µ) Σ x µ) A2) The distribution is said to be non-singular The density A2) is constant on every ellipsoid x µ) Σ x µ)= C in R n

MULTIDIMENSIONAL NORMAL DISTRIBUTION 265 Note: the density function of an n-dimensional normal distribution is uniquely determined by the expectations and covariances Example A Suppose X,X 2 have a two-dimensional normal distribution If then Σ is non-singular, and det Σ=σ σ 22 σ 2 2 > 0, Σ = det Σ With Qx,x 2 )=x µ) Σ x µ), σ22 σ 2 σ 2 σ Qx,x 2 )= = σ σ 22 σ2 2 x m ) 2 σ 22 2x m )x 2 m 2 )σ 2 +x 2 m 2 ) 2 } σ = ) = ρ 2 x m σ ) 2 2ρ x m σ ) x2 m 2 σ22 )+ x2 m 2 σ22 ) 2 }, where we also used the correlation coefficient ρ = σ 2 σ σ 22, and, f X,X 2 x,x 2 )= 2π σ σ 22 ρ 2 ) exp ) 2 Qx,x 2 ) A3) For variables with m = m 2 = 0 and σ = σ 22 = σ 2, the bivariate density is ) f X,X 2 x,x 2 )= 2πσ 2 ρ exp 2 2σ 2 ρ 2 ) x2 2ρx x 2 + x 2 2 ) We see that, if ρ = 0, this is the density of two independent normal variables Figure A shows the density function f X,X 2 x,x 2 ) and level curves for a bivariate normal distribution with expectation µ = 0, 0), and covariance matrix ) 05 Σ= 05 The correlation coefficient is ρ = 05 Remark A If the covariance matrix Σ is singular and non-invertible, then there exists at least one set of constants a,,a n, not all equal to 0, such that a Σa=0 From A) it follows thatv[a X]=0, which means that a X is constant equal to a µ The distribution of X is concentrated to a hyper plane a x = constant in R n The distribution is said to be singular and it has no density function inr n

266 SOME PROBABILITY AND STATISTICS 02 0 0 2 0 2 2 0 2 2 0 2 2 0 2 Figure A Two-dimensional normal density Left: density function; Right: elliptic level curves at levels 00, 002, 005, 0, 05 Remark A2 Formula A) implies that every covariance matrix Σ is positive definite or positive semi-definite, ie, j,k a j a k σ jk 0 for all a,,a n Conversely, if Σ is a symmetric, positive definite matrix of size n n, ie, if j,k a j a k σ jk > 0 for all a,,a n 0,,0, then A2) defines the density function for an n-dimensional normal distribution with expectation m k and covariances σ jk Every symmetric, positive definite matrix is a covariance matrix for a non-singular distribution Furthermore, for every symmetric, positive semi-definite matrix, Σ, ie, such that a j a k σ jk 0 j,k for all a,,a n with equality holding for some choice of a,,a n 0,,0, there exists an n-dimensional normal distribution that has Σ as its covariance matrix For n-dimensional normal variables, uncorrelated and independent are equivalent Theorem A If the random variables X,,X n are n- dimensional normal and uncorrelated, then they are independent Proof We show the theorem only for non-singular variables with density function It is true also for singular normal variables If X,,X n are uncorrelated, σ jk = 0 for j k, then Σ, and also Σ are

MULTIDIMENSIONAL NORMAL DISTRIBUTION 267 diagonal matrices, ie, note that σ j j =V[X j ]), σ 0 det Σ= σ j j, Σ = j 0 σ nn This means that x µ) Σ x µ) = j x j µ j ) 2 /σ j j, and the density A2) is exp x j m j ) 2 } 2πσ j j j 2σ j j Hence, the joint density function for X,,X n is a product of the marginal densities, which says that the variables are independent A2 Conditional normal distribution This section deals with partial observations in a multivariate normal distribution It is a special property of this distribution, that conditioning on observed values of a subset of variables, leads to a conditional distribution for the unobserved variables that is also normal Furthermore, the expectation in the conditional distribution is linear in the observations, and the covariance matrix does not depend on the observed values This property is particularly useful in prediction of Gaussian time series, as formulated by the Kalman filter Conditioning in the bivariate normal distribution Let X and Y have a bivariate normal distribution with expectations m X and m Y, variances σx 2 and σ Y 2, respectively, and with correlation coefficient ρ = C[X,Y]/σ X σ Y ) The simultaneous density function is given by A3) The conditional density function for X given that Y = y is f X Y=y x)= f X,Yx,y) f Y y) = σ X ρ 2 2π exp x m X+ σ X ρy m Y )/σ Y )) 2 2σ 2 X ρ2 ) Hence, the conditional distribution of X given Y = y is normal with expectation and variance m X Y=y = m X + σ X ρy m Y )/σ Y, σ 2 X Y=y = σ 2 X ρ2 ) Note: the conditional expectation depends linearly on the observed y-value, and the conditional variance is constant, independent of Y = y }

268 SOME PROBABILITY AND STATISTICS Conditioning in the multivariate normal distribution Let X=X,,X n ) and Y=Y,,Y m ) be two multivariate normal variables, of size n and m, respectively, such that Z=X,,X n,y,,y m ) is n + m)-dimensional normal Denote the expectations E[X]=m X, E[Y]=m Y, and partition the covariance matrix for Z with Σ XY = Σ YX), [ ) )] ) X X ΣXX Σ Σ=Cov ; = XY A4) Y Y Σ YX Σ YY If the covariance matrix Σ is positive definite, the distribution of X, Y) has the density function f XY x,y)= 2π) m+n)/2 det Σ e 2 x m X,y m Y )Σ x m X,y m Y ), while the m-dimensional density of Y is f Y y)= 2π) m/2 e 2 y m Y) Σ det Σ YY To find the conditional density of X given that Y = y, we need the following matrix property YY y my) f X Y x y)= f YXy,x), A5) f Y y) Theorem A2 Matrix inversions lemma ) Let B be a p p-matrix p = n+m): ) B B B= 2, B 2 B 22 where the sub-matrices have dimension n n, n m, etc Suppose B,B,B 22 are non-singular, and partition the inverse in the same way as B, ) A=B A A = 2 A 2 A 22 Then A= B B 2 B22 B 2) B B 2 B 22 B ) 2) B 2 B 22 B 22 B 2 B B 2) B 2 B B 22 B 2 B B 2)

MULTIDIMENSIONAL NORMAL DISTRIBUTION 269 Proof For the proof, see a matrix theory textbook, for example, [22] Theorem A3 Conditional normal distribution ) The conditional normal distribution for X, given that Y = y, is n-dimensional normal with expectation and covariance matrix E[X Y=y] = m X Y=y = m X + Σ XY Σ YY y m Y), A6) C[X Y=y] = Σ XX Y = Σ XX Σ XY Σ YY Σ YX A7) These formulas are easy to remember: the dimension of the submatrices, for example in the covariance matrix Σ XX Y, are the only possible for the matrix multiplications in the right hand side to be meaningful Proof To simplify calculations, we start with m X = m Y = 0, and add the expectations afterwards The conditional distribution of X given that Y = y is, according to A5), the ratio between two multivariate normal densities, and hence it is of the form, c exp 2 x,y ) Σ x y ) + 2 y Σ YY y }=c exp 2 Qx,y)}, where c is a normalization constant, independent of x and y The matrix Σ can be partitioned as in A4), and if we use the matrix inversion lemma, we find that ) Σ A A = A= 2 A8) A 2 A 22 = Σ XX Σ XY Σ YY Σ YX ) Σ YY Σ YX Σ XX Σ XY ) Σ YX Σ XX We also see that Σ XX Σ XY Σ YY Σ YX ) Σ XY Σ YY Σ YY Σ YX Σ XX Σ XY ) Qx,y) = x A x+x A 2 y+y A 2 x+y A 22 y y Σ YY y = x y C )A x Cy)+ Qy) = x A x x A Cy y C A x+y C A Cy+ Qy), for some matrix C and quadratic form Qy) in y )

270 SOME PROBABILITY AND STATISTICS Here A = Σ solving XX Y, according to A7) and A8), while we can find C by A C=A 2, ie C= A A 2 = Σ XY Σ YY, according to A8) This is precisely the matrix in A6) If we reinstall the deleted m X and m Y, we get the conditional density for X given Y=y to be of the form c exp 2 Qx,y)}=c exp 2 x m X Y=y )Σ XX Y x m X Y=y)}, which is the normal density we were looking for A22 Complex normal variables In most of the book, we have assumed all random variables to be real valued In many applications, and also in the mathematical background, it is advantageous to consider complex variables, simply defined as Z = X+ iy, where X and Y have a bivariate distribution The mean value of a complex random variable is simply E[Z]=E[RZ]+iE[IZ], while the variance and covariances are defined with complex conjugate on the second variable, C[Z,Z 2 ]=E[Z Z 2 ] m Z m Z2, V[Z]=C[Z,Z]=E[ Z 2 ] m Z 2 Note, that for a complex Z = X+ iy, withv[x]= V[Y]=σ 2, C[Z,Z]=V[X]+V[Y]=2σ 2, C[Z,Z]=V[X] V[Y]+2iC[X,Y]=2iC[X,Y] Hence, if the real and imaginary parts are uncorrelated with the same variance, then the complex variable Z is uncorrelated with its own complex conjugate, Z Often, one uses the term orthogonal, instead of uncorrelated for complex variables