Statistics 203: Introduction to Regression and Analysis of Variance Multiple Linear Regression + Multivariate Normal Jonathan Taylor - p. 1/13
Today Multiple linear regression Some proofs: multivariate normal distribution. - p. 2/13
Multiple linear regression Specifying the model. Fitting the model: least squares. Interpretation of the coefficients. - p. 3/13
Model Basically, rather than one predictor, we more than one predictor, say p 1. Y i = β 0 + β 1 X i1 + + β p 1 X i,p 1 + ε i Errors (ε i ) 1 i n are assumed independent N(0, σ 2 ), as in simple linear regression. Coefficients are called (partial) regression coefficients because they allow for the (partial) effect of other variables. - p. 4/13
Design matrix Define the n p matrix 1 X 11 X 12... X 1,p 1 X =........ 1 X n1 X n2... X n,p 1 and the column vectors X j = (X 1j,..., X nj ). Model can be expressed as Y = Xβ + ε. - p. 5/13
Fitting the model: SSE Just as in simple linear regression, model is fit by minimizing SSE(β 0,..., β p ) = n (Y i (β 0 + i=1 p β j X ij )) 2. j=1 Minimizers: β = ( β 0,..., β p ) are the least squares estimates and are also normally distributed as in simple linear regression. Explicit expression when X is full rank (next slide) β = (X t X) 1 X t Y. - p. 6/13
Solving for β Normal equations SSE β j = 2 bβ Equivalent to ( Y X β) t Xj = 0, 0 j p 1. (Y X β) t X = 0 Y t X = β t (X t X) X t Y = (X t X) β β = (X t X) 1 X t Y Properties: after some facts about multivariate normal random vectors. - p. 7/13
Multivariate normal Z = (Z 1,..., Z n ) R n is multivariate Gaussian if, for every α = (α 1,..., α n ) R n, α, Z = n i=1 α iz i is Gaussian. Mean vector: µ R n has components µ i = E(Z i ). Covariance matrix: Σ a non-negative definite n n matrix Σ ij = Cov(Z i, Z j ). Non-negative (positive) definite: for any α R n We write Z N(µ, Σ). α t Σα (>)0. - p. 8/13
Multivariate normal For any m n matrix A AZ N(Aµ, AΣA t ). If Σ is positive definite then the density of Z is f Z (z) = (2π) n/2 Σ 1/2 e (z µ)t Σ 1 (z µ)/2. If Σ is only non-negative definite (i.e. rank of Σ < n) then Z lives on a lower dimensional space and has no density on R n. - p. 9/13
Projections If an n n matrix P satisfies P 2 = P (idempotent) P = P t (symmetric) then P is a projection matrix. That is, there exists a subspace L R n of dimension r n such that for any z R n P z is the projection of z onto L. We write P L to denote the subspace L projects onto. Given any orthonormal basis {w 1,..., w r } of L P L z = r z, w j w j. j=1 If P L is a projection matrix then I P L = P L is also a projection matrix which projects onto L, the orthogonal complement of L in R n. - p. 10/13
Projections Let {X 1,..., X r } be a set of linearly independent vectors in R n and ) X = (X 1 X 2... X r is the n r matrix made by concatenating the X i s. If L = span(x 1,..., X r ) is the subspace of R n spanned by {X 1,..., X r } then P L = X(X t X) 1 X t. - p. 11/13
Identity covariance, If Σ = σ 2 I and L is a subspace of R n then P L Z N(P L µ, σ 2 P L ) where P L is the projection matrix onto L. If P L µ = 0 then P L Z 2 χ 2 dim(l) and dim(l) = Tr(P L ). If P L µ 0 then P L Z 2 χ 2 dim(l), P L µ 2 has a non-central χ 2 distribution. - p. 12/13
Properties of multiple β N(β, σ 2 (X t X) 1 ). As in simple regression independent of β. σ 2 = MSE = SSE n p σ2 χ2 n p n p The least squares estimates are minimum variance linear unbiased estimators. (Gauss-Markov theorem) - p. 13/13