Multiple Linear Regression + Multivariate Normal

Similar documents
Recall that two vectors in are perpendicular or orthogonal provided that their dot

Orthogonal Diagonalization of Symmetric Matrices

Introduction to General and Generalized Linear Models

1 Introduction to Matrices

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Inner product. Definition of inner product

Lecture 8: Signal Detection and Noise Assumption

Sections 2.11 and 5.8

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Similarity and Diagonalization. Similar Matrices

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

NOTES ON LINEAR TRANSFORMATIONS

Inner Product Spaces

Orthogonal Projections

Section Continued

α = u v. In other words, Orthogonal Projection

3.1 Least squares in matrix form

MAT 242 Test 3 SOLUTIONS, FORM A

Linear Algebra Review. Vectors

Linearly Independent Sets and Linearly Dependent Sets

T ( a i x i ) = a i T (x i ).

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Multivariate Normal Distribution

Least-Squares Intersection of Lines

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

17. Inner product spaces Definition Let V be a real vector space. An inner product on V is a function

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

CS229 Lecture notes. Andrew Ng

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

160 CHAPTER 4. VECTOR SPACES

Orthogonal Projections and Orthonormal Bases

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

Math 215 HW #6 Solutions

Name: Section Registered In:

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

Introduction: Overview of Kernel Methods

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

( ) which must be a vector

Inner Product Spaces and Orthogonality

Chapter 3: The Multiple Linear Regression Model

Methods for Finding Bases

Introduction to Matrix Algebra

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

Least Squares Estimation

is in plane V. However, it may be more convenient to introduce a plane coordinate system in V.

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Degrees of Freedom and Model Search

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Applied Linear Algebra I Review page 1

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Lecture Notes 1. Brief Review of Basic Probability

Examination paper for TMA4115 Matematikk 3

Multidimensional data and factorial methods

x + y + z = 1 2x + 3y + 4z = 0 5x + 6y + 7z = 3

1 VECTOR SPACES AND SUBSPACES

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Chapter 6. Orthogonality

5. Orthogonal matrices

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Math Practice Exam 2 with Some Solutions

Some probability and statistics

Matrix Representations of Linear Transformations and Changes of Coordinates

Estimation of σ 2, the variance of ɛ

Section Inner Products and Norms

Linear Regression. Guy Lebanon

Partial Least Squares (PLS) Regression.

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj

Regression III: Advanced Methods

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

University of Lille I PC first year list of exercises n 7. Review

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

STA 4273H: Statistical Machine Learning

Probability and Random Variables. Generation of random variables (r.v.)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

[1] Diagonal factorization

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Multivariate Analysis (Slides 13)

A =

Solving Systems of Linear Equations

Notes from Design of Experiments

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

Vector and Matrix Norms

MATHEMATICAL METHODS OF STATISTICS

Factor analysis. Angela Montanari

Systems of Linear Equations

Linear Algebra Notes

Master s Theory Exam Spring 2006

Lecture Notes 2: Matrices as Systems of Linear Equations

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Vector Spaces 4.4 Spanning and Independence

3. INNER PRODUCT SPACES

Lecture 3: Linear methods for classification

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Notes on Applied Linear Regression

Transcription:

Statistics 203: Introduction to Regression and Analysis of Variance Multiple Linear Regression + Multivariate Normal Jonathan Taylor - p. 1/13

Today Multiple linear regression Some proofs: multivariate normal distribution. - p. 2/13

Multiple linear regression Specifying the model. Fitting the model: least squares. Interpretation of the coefficients. - p. 3/13

Model Basically, rather than one predictor, we more than one predictor, say p 1. Y i = β 0 + β 1 X i1 + + β p 1 X i,p 1 + ε i Errors (ε i ) 1 i n are assumed independent N(0, σ 2 ), as in simple linear regression. Coefficients are called (partial) regression coefficients because they allow for the (partial) effect of other variables. - p. 4/13

Design matrix Define the n p matrix 1 X 11 X 12... X 1,p 1 X =........ 1 X n1 X n2... X n,p 1 and the column vectors X j = (X 1j,..., X nj ). Model can be expressed as Y = Xβ + ε. - p. 5/13

Fitting the model: SSE Just as in simple linear regression, model is fit by minimizing SSE(β 0,..., β p ) = n (Y i (β 0 + i=1 p β j X ij )) 2. j=1 Minimizers: β = ( β 0,..., β p ) are the least squares estimates and are also normally distributed as in simple linear regression. Explicit expression when X is full rank (next slide) β = (X t X) 1 X t Y. - p. 6/13

Solving for β Normal equations SSE β j = 2 bβ Equivalent to ( Y X β) t Xj = 0, 0 j p 1. (Y X β) t X = 0 Y t X = β t (X t X) X t Y = (X t X) β β = (X t X) 1 X t Y Properties: after some facts about multivariate normal random vectors. - p. 7/13

Multivariate normal Z = (Z 1,..., Z n ) R n is multivariate Gaussian if, for every α = (α 1,..., α n ) R n, α, Z = n i=1 α iz i is Gaussian. Mean vector: µ R n has components µ i = E(Z i ). Covariance matrix: Σ a non-negative definite n n matrix Σ ij = Cov(Z i, Z j ). Non-negative (positive) definite: for any α R n We write Z N(µ, Σ). α t Σα (>)0. - p. 8/13

Multivariate normal For any m n matrix A AZ N(Aµ, AΣA t ). If Σ is positive definite then the density of Z is f Z (z) = (2π) n/2 Σ 1/2 e (z µ)t Σ 1 (z µ)/2. If Σ is only non-negative definite (i.e. rank of Σ < n) then Z lives on a lower dimensional space and has no density on R n. - p. 9/13

Projections If an n n matrix P satisfies P 2 = P (idempotent) P = P t (symmetric) then P is a projection matrix. That is, there exists a subspace L R n of dimension r n such that for any z R n P z is the projection of z onto L. We write P L to denote the subspace L projects onto. Given any orthonormal basis {w 1,..., w r } of L P L z = r z, w j w j. j=1 If P L is a projection matrix then I P L = P L is also a projection matrix which projects onto L, the orthogonal complement of L in R n. - p. 10/13

Projections Let {X 1,..., X r } be a set of linearly independent vectors in R n and ) X = (X 1 X 2... X r is the n r matrix made by concatenating the X i s. If L = span(x 1,..., X r ) is the subspace of R n spanned by {X 1,..., X r } then P L = X(X t X) 1 X t. - p. 11/13

Identity covariance, If Σ = σ 2 I and L is a subspace of R n then P L Z N(P L µ, σ 2 P L ) where P L is the projection matrix onto L. If P L µ = 0 then P L Z 2 χ 2 dim(l) and dim(l) = Tr(P L ). If P L µ 0 then P L Z 2 χ 2 dim(l), P L µ 2 has a non-central χ 2 distribution. - p. 12/13

Properties of multiple β N(β, σ 2 (X t X) 1 ). As in simple regression independent of β. σ 2 = MSE = SSE n p σ2 χ2 n p n p The least squares estimates are minimum variance linear unbiased estimators. (Gauss-Markov theorem) - p. 13/13