Quadratic forms Cochran s theorem, degrees of freedom, and all that



Similar documents
Multivariate normal distribution and testing for means (see MKB Ch 3)

Introduction to General and Generalized Linear Models

Similarity and Diagonalization. Similar Matrices

Linear Algebra Review. Vectors

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Orthogonal Diagonalization of Symmetric Matrices

Solution to Homework 2

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Section Inner Products and Norms

Some probability and statistics

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Notes on Applied Linear Regression

Chapter 6. Orthogonality

1 Introduction to Matrices

Review Jeopardy. Blue vs. Orange. Review Jeopardy

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Inner product. Definition of inner product

Introduction to Matrix Algebra

Numerical Analysis Lecture Notes

Econometrics Simple Linear Regression

Covariance and Correlation

Lecture 21. The Multivariate Normal Distribution

Solving Linear Systems, Continued and The Inverse of a Matrix

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Linear Algebra Notes

1 Determinants and the Solvability of Linear Systems

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

1.5 Oneway Analysis of Variance

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Similar matrices and Jordan form

A note on companion matrices

Solving Systems of Linear Equations

More than you wanted to know about quadratic forms

Least-Squares Intersection of Lines

Linear Regression. Guy Lebanon

Statistical Machine Learning

Continued Fractions and the Euclidean Algorithm

Methods for Finding Bases

Sections 2.11 and 5.8

Lecture 5: Singular Value Decomposition SVD (1)

[1] Diagonal factorization

STAT 830 Convergence in Distribution

by the matrix A results in a vector which is a reflection of the given

Chapter 17. Orthogonal Matrices and Symmetries of Space

Topic 8. Chi Square Tests

Elasticity Theory Basics

Chapter 3: The Multiple Linear Regression Model

Lecture 8. Confidence intervals and the central limit theorem

160 CHAPTER 4. VECTOR SPACES

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

LINEAR ALGEBRA. September 23, 2010

Linearly Independent Sets and Linearly Dependent Sets

4.5 Linear Dependence and Linear Independence

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

The Bivariate Normal Distribution

Vector Math Computer Graphics Scott D. Anderson

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

minimal polyonomial Example

Part 2: Analysis of Relationship Between Two Variables

Inner products on R n, and more

α = u v. In other words, Orthogonal Projection

Conductance, the Normalized Laplacian, and Cheeger s Inequality

Lecture 5 Principal Minors and the Hessian

Notes on Symmetric Matrices

1 Sufficient statistics

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Inner Product Spaces

3.1 Least squares in matrix form

Data Mining: Algorithms and Applications Matrix Math Review

Applied Linear Algebra I Review page 1

Linear Models for Continuous Data

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Credit Risk Models: An Overview

Lecture 3: Linear methods for classification

1 Homework 1. [p 0 q i+j p i 1 q j+1 ] + [p i q j ] + [p i+1 q j p i+j q 0 ]

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Recall that two vectors in are perpendicular or orthogonal provided that their dot

The Determinant: a Means to Calculate Volume

Factor Analysis. Factor Analysis

DATA ANALYSIS II. Matrix Algorithms

T ( a i x i ) = a i T (x i ).

LS.6 Solution Matrices

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Week 3&4: Z tables and the Sampling Distribution of X

Topic 3. Chapter 5: Linear Regression in Matrix Form

1 Simple Linear Regression I Least Squares Estimation

Factor analysis. Angela Montanari

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

3. Mathematical Induction

Linear Algebra: Determinants, Inverses, Rank

MATH APPLIED MATRIX THEORY

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Transcription:

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1

Why We Care Cochran s theorem tells us about the distributions of partitioned sums of squares of normally distributed random variables. Traditional linear regression analysis relies upon making statistical claims about the distribution of sums of squares of normally distributed random variables (and ratios between them) i.e. in the simple normal regression model SSE/σ 2 = (Y i Ŷi) 2 χ 2 (n 2) Where does this come from? Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 2

Outline Review some properties of multivariate Gaussian distributions and sums of squares Establish the fact that the multivariate Gaussian sum of squares is χ 2 (n) distributed Provide intuition for Cochran s theorem Prove a lemma in support of Cochran s theorem Prove Cochran s theorem Connect Cochran s theorem back to matrix linear regression Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 3

Preliminaries Let Y 1, Y 2,, Y n be N(µ i,σ i2 ) random variables. As usual define Z i = Y i µ i σ i Then we know that each Z i ~ N(0,1) From Wackerly et al, 306 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 4

Theorem 0 : Statement The sum of squares of n N(0,1) random variables is χ 2 distributed with n degrees of freedom ( n i=1 Z2 i ) χ2 (n) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 5

Theorem 0: Givens Proof requires knowing both 1. Z 2 i Zi 2 χ2 (ν), ν=1orequivalently Γ(ν/2,2), ν=1 Homework, midterm? 2.If Y 1, Y 2,., Y n are independent random variables with moment generating functions m Y1 (t), m Y2 (t), m Yn (t), then when U = Y 1 + Y 2 + Y n m U (t)=m Y1 (t) m Y2 (t)... m Yn (t) and from the uniqueness of moment generating functions that m U (t) fully characterizes the distribution of U Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 6

Theorem 0: Proof The moment generating function for a χ 2 (ν) distribution is (Wackerley et al, back cover) m Z 2 i (t)=(1 2t) ν/2, wherehere ν=1 The moment generating function for V =( n i=1 Z2 i ) is (by given prerequisite) m V (t)=m Z 2 1 (t) m Z 2 2 (t) m Z 2 n (t) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 7

But is just Theorem 0: Proof m V (t)=m Z 2 1 (t) m Z 2 2 (t) m Z 2 n (t) m V (t)=(1 2t) 1/2 (1 2t) 1/2 (1 2t) 1/2 Which is itself, by inspection, just the moment generating function for a χ 2 (n) random variable m V (t)=(1 2t) n/2 which implies (by uniqueness) that V =( n i=1 Z2 i ) χ2 (n) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 8

Quadratic Forms and Cochran s Theorem Quadratic forms of normal random variables are of great importance in many branches of statistics Least squares ANOVA Regression analysis etc. General idea Split the sum of the squares of observations into a number of quadratic forms where each corresponds to some cause of variation Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 9

Quadratic Forms and Cochran s Theorem The conclusion of Cochran s theorem is that, under the assumption of normality, the various quadratic forms are independent and χ 2 distributed. This fact is the foundation upon which many statistical tests rest. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 10

Preliminaries: A Common Quadratic Form Let x N(µ,Λ) Consider the (important) quadratic form that appears in the exponent of the normal density (x µ) Λ 1 (x µ) In the special case of µ = 0 and Λ = I this reduces to x x which by what we just proved we know is χ 2 (n) distributed Let s prove that this holds in the general case Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 11

Lemma 1 Suppose that x~n(µ, Λ) with Λ > 0 then (where n is the dimension of x) (x µ) Λ 1 (x µ) χ 2 (n) Proof: Set y = Λ -1/2 (x-µ) then E(y) = 0 Cov(y) = Λ -1/2 Λ Λ -1/2 = I That is y ~N(0,I) and thus (x µ) Λ 1 (x µ)=y y χ 2 (n) Note: this is sometimes called sphering data Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 12

What do we have? The Path (x µ) Λ 1 (x µ)=y y χ 2 (n) Where are we going? (Cochran s Theorem) Let X 1, X 2,, X n be independent N(0,σ 2 )-distributed random variables, and suppose that n i=1 X2 i = Q 1+ Q 2 +...+Q k Where Q 1, Q 2,, Q k are positive semi-definite quadratic forms in the random variables X 1, X 2,, X m, that is, Q i =X A i X, i=1,2,..., k Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 13

Cochran s Theorem Statement Set Rank A i = r i, i=1,2,, k. If r 1 + r 2 +...+r k = n Then 1. Q 1, Q 2,, Q k are independent 2. Q i ~ σ 2 χ 2 (r i ) Reminder: the rank of a matrix is the number of linearly independent rows / columns in the matrix, or, equivalently, the number of its non-zero eigenvalues Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 14

Closing the Gap We start with a lemma that will help us prove Cochran s theorem This lemma is a linear algebra result We also need to know a couple results regarding linear transformations of normal vectors We attend to those first. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 15

Linear transformations Theorem 1: Let X be a normal random vector. The components of X are independent iff they are uncorrelated. Demonstrated in class by setting Cov(X i, X j ) = 0 and then deriving product form of joint density Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 16

Linear transformations Theorem 2: Let X ~ N(µ, Λ) and set Y = C X where the orthogonal matrix C is such that C Λ C = D. Then Y ~ N(C µ, D); the components of Y are independent; and Var Y k = λ k, k =1 n, where λ 1, λ 2,, λ n are the eigenvalues of Λ Look up singular value decomposition. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 17

Orthogonal transforms of iid N(0,σ 2 ) variables Let X ~ N(µ, σ 2 I ) where σ 2 > 0 and set Y = CX where C is an orthogonal matrix. Then Cov{Y} = Cσ 2 IC = σ 2 I This leads to Theorem 2: Let X ~ N(µ, σ 2 I ) where σ 2 > 0, let C be an arbitrary orthogonal matrix, and set Y=CX. The Y N(Cµ, σ 2 I); in particular, Y 1, Y 2,, Y n are independent normal random variables with the same variance σ 2. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 18

Where we are Now we can transform N(µ, ) random variables into N(0,D) random variables. We know that orthogonal transformations of a random vector X ~ N(µ,σ 2 I) results in a transformed vector whose elements are still independent The preliminaries are over, now we proceed to proving a lemma that forms the backbone of Cochran s theorem. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 19

Lemma 1 Let x 1, x 2,, x n be real numbers. Suppose that x i 2 can be split into a sum of positive semidefinite quadratic forms, that is, n i=1 x2 i = Q 1+ Q 2 +...+Q k where Q i = x A i x and (rank Q i = ) rank A i = r i, i=1,2,,k. If r i = n then there exists an orthogonal matrix C such that, with x = Cy we have Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 20

Lemma 2 cont. Q 1 = y 2 1+ y 2 2+...+y 2 r 1 Q 2 = y 2 r 1 +1+ y 2 r 1 +2+...+y 2 r 1 +r 2 Q 3 = y 2 r 1 +r 2 +1+ y 2 r 1 +r 2 +2+...+y 2 r 1 +r 2 +r 3. Q k = y 2 n r k +1+ y 2 n r k +2+...+y 2 n Remark: Note that different quadratic forms contain different y-variables and that the number of terms in each Q i equals the rank, r i, of Q i Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 21

What s the point? We won t construct this matrix C, it s just useful for proving Cochran s theorem. We care that The y i2 s end up in different sums we ll use this to prove independence of the different quadratic forms. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 22

Proof We prove the n=2 case. The general case is obtained by induction. [Gut 95] For n=2 we have Q= n i=1 x2 i =x A 1 x+x A 2 x (=Q 1 +Q 2 ) where A 1 and A 2 are positive semi-definite matrices with ranks r 1 and r 2 respectively and r 1 + r 2 = n Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 23

Proof: Cont. By assumption there exists an orthogonal matrix C such that C A 1 C=D where D is a diagonal matrix, the diagonal elements of which are the eigenvalues of A 1 ; λ 1, λ 2,, λ n. Since Rank(A 1 ) =r 1 then r 1 eigenvalues are positive and n-r 1 eigenvalues equal zero. Suppose without restriction that the first r 1 eigenvalues are positive and the rest are zero. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 24

Set Proof : Cont x=cy and remember that when C is an orthogonal matrix that x x=(cy) Cy=y C Cy=y y then Q= y 2 i = r 1 i=1 λ iy 2 i +y C A 2 Cy Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 25

Proof : Cont Or, rearranging terms slightly and expanding the second matrix product ri i=1 (1 λ i)y 2 i + n i=r 1 +1 y2 i =y C A 2 Cy Since the rank of the matrix A 2 equals r 2 ( = n-r 1 ) we can conclude that λ 1 = λ 2 =...=λ r1 =1 Q 1 = r 1 i=1 y2 i and Q 2= n i=r 1 +1 y2 i which proves the lemma for the case n=2. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 26

What does this mean again? This lemma only has to do with real numbers, not random variables. It says that if x i 2 can be split into a sum of positive semi-definite quadratic forms then there is a orthogonal (projection) matrix x=cy (or C x = y) that makes each of the quadratic forms have some very nice properties, foremost of which is that Each y i appears in only one resulting sum of squares. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 27

Cochran s Theorem Let X 1, X 2,, X n be independent N(0,σ 2 )-distributed random variables, and suppose that n i=1 X2 i = Q 1+ Q 2 +...+Q k Where Q 1, Q 2,, Q k are positive semi-definite quadratic forms in the random variables X 1, X 2,, X m, that is, Q i =X A i X, i=1,2,..., k Set Rank A i = r i, i=1,2,, k. If r 1 + r 2 +...+r k = n then 1. Q 1, Q 2,, Q k are independent 2. Q i ~ σ 2 χ 2 (r i ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 28

Proof [from Gut 95] From the previous lemma we know that there exists an orthogonal matrix C such that the transformation X=CY yields Q 1 = Y1 2 + Y2 2 +...+Yr 2 1 Q 2 = Yr 2 1 +1+ Yr 2 1 +2+...+Yr 2 1 +r 2 Q 3 = Yr 2 1 +r 2 +1+ Yr 2 1 +r 2 +2+...+Yr 2 1 +r 2 +r 3. Q k = Y 2 n r k +1+ Y 2 n r k +2+...+Y 2 n But since every Y 2 occurs in exactly one Q j and the Y i s are all independent N(0, σ 2 ) RV s (because C is an orthogonal matrix) Cochran s theorem follows. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 29

Huh? Best to work an example to understand why this is important Let s consider the distribution of a sample variance (not regression model yet). Let Y i, i=1 n be samples from Y ~ N(0, σ 2 ). We can use Cochran s theorem to establish the distribution of the sample variance (and it s independence from the sample mean). Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 30

Example Recall form of SSTO for regression model and note that the form of SSTO = (n-1) s 2 {Y} SSTO= (Y i Ȳ)2 = Y 2 i ( Y i ) 2 n Recognize that this can be rearranged and the re-expressed in matrix form Y 2 i = (Y i Ȳ)2 + ( Y i ) 2 n Y IY=Y (I 1 n J)Y+Y ( 1 n J)Y Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 31

Example cont. From earlier we know that Y IY σ 2 χ 2 (n) but we can read off the rank of the quadratic form as well (rank(i) = n) The ranks of the remaining quadratic forms can be read off too (with some linear algebra reminders) Y Y=Y (I 1 n J)Y+Y ( 1 n J)Y Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 32

Linear Algebra Reminders For a symmetric and idempotent matrix A, rank(a) = trace(a), the number of non-zero eigenvalues of A. Is (1/n)J symmetric and idempotent? How about (I-(1/n)J)? trace(a + B) = trace(a) + trace(b) Assuming they are we can read off the ranks of each quadratic form Y IY=Y (I 1 n J)Y+Y ( 1 n J)Y rank: n rank: n-1 rank: 1 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 33

Cochran s Theorem Usage Y IY=Y (I 1 n J)Y+Y ( 1 n J)Y rank: n rank: n-1 rank: 1 Cochran s theorem tells us, immediately, that Y 2 i σ 2 χ 2 (n), (Y i Ȳ)2 σ 2 χ 2 (n 1), ( Y i ) 2 n σ 2 χ 2 (1) because each of the quadratic forms is χ 2 distributed with degrees of freedom given by the rank of the corresponding quadratic form and each sum of squares is independent of the others. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 34

What about regression? Quick comment: in the preceding, one can think about having modeled the population with a single parameter model the parameter being the mean. The number of degrees of freedom in the sample variance sum of squares is reduced by the number of parameters fit in the linear model (one, the mean) Now regression. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 35

Rank of ANOVA Sums of Squares SSTO = Y [I 1 n J]Y Rank n-1 SSE SSR = Y (I H)Y = Y [H 1 n J]Y n-p p-1 Slightly stronger version of Cochran s theorem needed (will assume it exists) to prove the following claim(s). good midterm question Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 36

Distribution of General Multiple Regression ANOVA Sums of Squares From Cochran s theorem, knowing the ranks of SSTO = Y [I 1 n J]Y SSE = Y (I H)Y SSR = Y [H 1 n J]Y gives you this immediately SSTO SSE SSR σ 2 χ 2 (n 1) σ 2 χ 2 (n p) σ 2 χ 2 (p 1) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 37

F Test for Regression Relation Now the test of whether there is a regression relation between the response variable Y and the set of X variables X 1,, X p-1 makes more sense The F distribution is defined to be the ratio of χ 2 distributions that have themselves been normalized by their number of degrees of freedom. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 38

F Test Hypotheses If we want to choose between the alternatives H 0 : β 1 = β 2 = β 3 = β p-1 = 0 H 1 : not all β k k=1 n equal zero We can use the defined test statistic F = MSR σ2χ2(p 1) MSE p 1 σ 2 χ 2 (n p) n p The decision rule to control the Type I error at α is If F F(1 α; p 1, n p),conclude H 0 If F > F(1 α; p 1, n p),conclude H a Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 39