Quadratic forms Cochran s theorem, degrees of freedom, and all that
|
|
|
- Evangeline Weaver
- 9 years ago
- Views:
Transcription
1 Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 1
2 Why We Care Cochran s theorem tells us about the distributions of partitioned sums of squares of normally distributed random variables. Traditional linear regression analysis relies upon making statistical claims about the distribution of sums of squares of normally distributed random variables (and ratios between them) i.e. in the simple normal regression model SSE/σ 2 = (Y i Ŷi) 2 χ 2 (n 2) Where does this come from? Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 2
3 Outline Review some properties of multivariate Gaussian distributions and sums of squares Establish the fact that the multivariate Gaussian sum of squares is χ 2 (n) distributed Provide intuition for Cochran s theorem Prove a lemma in support of Cochran s theorem Prove Cochran s theorem Connect Cochran s theorem back to matrix linear regression Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 3
4 Preliminaries Let Y 1, Y 2,, Y n be N(µ i,σ i2 ) random variables. As usual define Z i = Y i µ i σ i Then we know that each Z i ~ N(0,1) From Wackerly et al, 306 Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 4
5 Theorem 0 : Statement The sum of squares of n N(0,1) random variables is χ 2 distributed with n degrees of freedom ( n i=1 Z2 i ) χ2 (n) Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 5
6 Theorem 0: Givens Proof requires knowing both 1. Z 2 i Zi 2 χ2 (ν), ν=1orequivalently Γ(ν/2,2), ν=1 Homework, midterm? 2.If Y 1, Y 2,., Y n are independent random variables with moment generating functions m Y1 (t), m Y2 (t), m Yn (t), then when U = Y 1 + Y 2 + Y n m U (t)=m Y1 (t) m Y2 (t)... m Yn (t) and from the uniqueness of moment generating functions that m U (t) fully characterizes the distribution of U Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 6
7 Theorem 0: Proof The moment generating function for a χ 2 (ν) distribution is (Wackerley et al, back cover) m Z 2 i (t)=(1 2t) ν/2, wherehere ν=1 The moment generating function for V =( n i=1 Z2 i ) is (by given prerequisite) m V (t)=m Z 2 1 (t) m Z 2 2 (t) m Z 2 n (t) Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 7
8 But is just Theorem 0: Proof m V (t)=m Z 2 1 (t) m Z 2 2 (t) m Z 2 n (t) m V (t)=(1 2t) 1/2 (1 2t) 1/2 (1 2t) 1/2 Which is itself, by inspection, just the moment generating function for a χ 2 (n) random variable m V (t)=(1 2t) n/2 which implies (by uniqueness) that V =( n i=1 Z2 i ) χ2 (n) Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 8
9 Quadratic Forms and Cochran s Theorem Quadratic forms of normal random variables are of great importance in many branches of statistics Least squares ANOVA Regression analysis etc. General idea Split the sum of the squares of observations into a number of quadratic forms where each corresponds to some cause of variation Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 9
10 Quadratic Forms and Cochran s Theorem The conclusion of Cochran s theorem is that, under the assumption of normality, the various quadratic forms are independent and χ 2 distributed. This fact is the foundation upon which many statistical tests rest. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 10
11 Preliminaries: A Common Quadratic Form Let x N(µ,Λ) Consider the (important) quadratic form that appears in the exponent of the normal density (x µ) Λ 1 (x µ) In the special case of µ = 0 and Λ = I this reduces to x x which by what we just proved we know is χ 2 (n) distributed Let s prove that this holds in the general case Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 11
12 Lemma 1 Suppose that x~n(µ, Λ) with Λ > 0 then (where n is the dimension of x) (x µ) Λ 1 (x µ) χ 2 (n) Proof: Set y = Λ -1/2 (x-µ) then E(y) = 0 Cov(y) = Λ -1/2 Λ Λ -1/2 = I That is y ~N(0,I) and thus (x µ) Λ 1 (x µ)=y y χ 2 (n) Note: this is sometimes called sphering data Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 12
13 What do we have? The Path (x µ) Λ 1 (x µ)=y y χ 2 (n) Where are we going? (Cochran s Theorem) Let X 1, X 2,, X n be independent N(0,σ 2 )-distributed random variables, and suppose that n i=1 X2 i = Q 1+ Q Q k Where Q 1, Q 2,, Q k are positive semi-definite quadratic forms in the random variables X 1, X 2,, X m, that is, Q i =X A i X, i=1,2,..., k Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 13
14 Cochran s Theorem Statement Set Rank A i = r i, i=1,2,, k. If r 1 + r r k = n Then 1. Q 1, Q 2,, Q k are independent 2. Q i ~ σ 2 χ 2 (r i ) Reminder: the rank of a matrix is the number of linearly independent rows / columns in the matrix, or, equivalently, the number of its non-zero eigenvalues Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 14
15 Closing the Gap We start with a lemma that will help us prove Cochran s theorem This lemma is a linear algebra result We also need to know a couple results regarding linear transformations of normal vectors We attend to those first. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 15
16 Linear transformations Theorem 1: Let X be a normal random vector. The components of X are independent iff they are uncorrelated. Demonstrated in class by setting Cov(X i, X j ) = 0 and then deriving product form of joint density Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 16
17 Linear transformations Theorem 2: Let X ~ N(µ, Λ) and set Y = C X where the orthogonal matrix C is such that C Λ C = D. Then Y ~ N(C µ, D); the components of Y are independent; and Var Y k = λ k, k =1 n, where λ 1, λ 2,, λ n are the eigenvalues of Λ Look up singular value decomposition. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 17
18 Orthogonal transforms of iid N(0,σ 2 ) variables Let X ~ N(µ, σ 2 I ) where σ 2 > 0 and set Y = CX where C is an orthogonal matrix. Then Cov{Y} = Cσ 2 IC = σ 2 I This leads to Theorem 2: Let X ~ N(µ, σ 2 I ) where σ 2 > 0, let C be an arbitrary orthogonal matrix, and set Y=CX. The Y N(Cµ, σ 2 I); in particular, Y 1, Y 2,, Y n are independent normal random variables with the same variance σ 2. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 18
19 Where we are Now we can transform N(µ, ) random variables into N(0,D) random variables. We know that orthogonal transformations of a random vector X ~ N(µ,σ 2 I) results in a transformed vector whose elements are still independent The preliminaries are over, now we proceed to proving a lemma that forms the backbone of Cochran s theorem. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 19
20 Lemma 1 Let x 1, x 2,, x n be real numbers. Suppose that x i 2 can be split into a sum of positive semidefinite quadratic forms, that is, n i=1 x2 i = Q 1+ Q Q k where Q i = x A i x and (rank Q i = ) rank A i = r i, i=1,2,,k. If r i = n then there exists an orthogonal matrix C such that, with x = Cy we have Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 20
21 Lemma 2 cont. Q 1 = y 2 1+ y y 2 r 1 Q 2 = y 2 r y 2 r y 2 r 1 +r 2 Q 3 = y 2 r 1 +r y 2 r 1 +r y 2 r 1 +r 2 +r 3. Q k = y 2 n r k +1+ y 2 n r k y 2 n Remark: Note that different quadratic forms contain different y-variables and that the number of terms in each Q i equals the rank, r i, of Q i Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 21
22 What s the point? We won t construct this matrix C, it s just useful for proving Cochran s theorem. We care that The y i2 s end up in different sums we ll use this to prove independence of the different quadratic forms. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 22
23 Proof We prove the n=2 case. The general case is obtained by induction. [Gut 95] For n=2 we have Q= n i=1 x2 i =x A 1 x+x A 2 x (=Q 1 +Q 2 ) where A 1 and A 2 are positive semi-definite matrices with ranks r 1 and r 2 respectively and r 1 + r 2 = n Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 23
24 Proof: Cont. By assumption there exists an orthogonal matrix C such that C A 1 C=D where D is a diagonal matrix, the diagonal elements of which are the eigenvalues of A 1 ; λ 1, λ 2,, λ n. Since Rank(A 1 ) =r 1 then r 1 eigenvalues are positive and n-r 1 eigenvalues equal zero. Suppose without restriction that the first r 1 eigenvalues are positive and the rest are zero. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 24
25 Set Proof : Cont x=cy and remember that when C is an orthogonal matrix that x x=(cy) Cy=y C Cy=y y then Q= y 2 i = r 1 i=1 λ iy 2 i +y C A 2 Cy Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 25
26 Proof : Cont Or, rearranging terms slightly and expanding the second matrix product ri i=1 (1 λ i)y 2 i + n i=r 1 +1 y2 i =y C A 2 Cy Since the rank of the matrix A 2 equals r 2 ( = n-r 1 ) we can conclude that λ 1 = λ 2 =...=λ r1 =1 Q 1 = r 1 i=1 y2 i and Q 2= n i=r 1 +1 y2 i which proves the lemma for the case n=2. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 26
27 What does this mean again? This lemma only has to do with real numbers, not random variables. It says that if x i 2 can be split into a sum of positive semi-definite quadratic forms then there is a orthogonal (projection) matrix x=cy (or C x = y) that makes each of the quadratic forms have some very nice properties, foremost of which is that Each y i appears in only one resulting sum of squares. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 27
28 Cochran s Theorem Let X 1, X 2,, X n be independent N(0,σ 2 )-distributed random variables, and suppose that n i=1 X2 i = Q 1+ Q Q k Where Q 1, Q 2,, Q k are positive semi-definite quadratic forms in the random variables X 1, X 2,, X m, that is, Q i =X A i X, i=1,2,..., k Set Rank A i = r i, i=1,2,, k. If r 1 + r r k = n then 1. Q 1, Q 2,, Q k are independent 2. Q i ~ σ 2 χ 2 (r i ) Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 28
29 Proof [from Gut 95] From the previous lemma we know that there exists an orthogonal matrix C such that the transformation X=CY yields Q 1 = Y1 2 + Y Yr 2 1 Q 2 = Yr Yr Yr 2 1 +r 2 Q 3 = Yr 2 1 +r Yr 2 1 +r Yr 2 1 +r 2 +r 3. Q k = Y 2 n r k +1+ Y 2 n r k Y 2 n But since every Y 2 occurs in exactly one Q j and the Y i s are all independent N(0, σ 2 ) RV s (because C is an orthogonal matrix) Cochran s theorem follows. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 29
30 Huh? Best to work an example to understand why this is important Let s consider the distribution of a sample variance (not regression model yet). Let Y i, i=1 n be samples from Y ~ N(0, σ 2 ). We can use Cochran s theorem to establish the distribution of the sample variance (and it s independence from the sample mean). Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 30
31 Example Recall form of SSTO for regression model and note that the form of SSTO = (n-1) s 2 {Y} SSTO= (Y i Ȳ)2 = Y 2 i ( Y i ) 2 n Recognize that this can be rearranged and the re-expressed in matrix form Y 2 i = (Y i Ȳ)2 + ( Y i ) 2 n Y IY=Y (I 1 n J)Y+Y ( 1 n J)Y Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 31
32 Example cont. From earlier we know that Y IY σ 2 χ 2 (n) but we can read off the rank of the quadratic form as well (rank(i) = n) The ranks of the remaining quadratic forms can be read off too (with some linear algebra reminders) Y Y=Y (I 1 n J)Y+Y ( 1 n J)Y Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 32
33 Linear Algebra Reminders For a symmetric and idempotent matrix A, rank(a) = trace(a), the number of non-zero eigenvalues of A. Is (1/n)J symmetric and idempotent? How about (I-(1/n)J)? trace(a + B) = trace(a) + trace(b) Assuming they are we can read off the ranks of each quadratic form Y IY=Y (I 1 n J)Y+Y ( 1 n J)Y rank: n rank: n-1 rank: 1 Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 33
34 Cochran s Theorem Usage Y IY=Y (I 1 n J)Y+Y ( 1 n J)Y rank: n rank: n-1 rank: 1 Cochran s theorem tells us, immediately, that Y 2 i σ 2 χ 2 (n), (Y i Ȳ)2 σ 2 χ 2 (n 1), ( Y i ) 2 n σ 2 χ 2 (1) because each of the quadratic forms is χ 2 distributed with degrees of freedom given by the rank of the corresponding quadratic form and each sum of squares is independent of the others. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 34
35 What about regression? Quick comment: in the preceding, one can think about having modeled the population with a single parameter model the parameter being the mean. The number of degrees of freedom in the sample variance sum of squares is reduced by the number of parameters fit in the linear model (one, the mean) Now regression. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 35
36 Rank of ANOVA Sums of Squares SSTO = Y [I 1 n J]Y Rank n-1 SSE SSR = Y (I H)Y = Y [H 1 n J]Y n-p p-1 Slightly stronger version of Cochran s theorem needed (will assume it exists) to prove the following claim(s). good midterm question Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 36
37 Distribution of General Multiple Regression ANOVA Sums of Squares From Cochran s theorem, knowing the ranks of SSTO = Y [I 1 n J]Y SSE = Y (I H)Y SSR = Y [H 1 n J]Y gives you this immediately SSTO SSE SSR σ 2 χ 2 (n 1) σ 2 χ 2 (n p) σ 2 χ 2 (p 1) Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 37
38 F Test for Regression Relation Now the test of whether there is a regression relation between the response variable Y and the set of X variables X 1,, X p-1 makes more sense The F distribution is defined to be the ratio of χ 2 distributions that have themselves been normalized by their number of degrees of freedom. Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 38
39 F Test Hypotheses If we want to choose between the alternatives H 0 : β 1 = β 2 = β 3 = β p-1 = 0 H 1 : not all β k k=1 n equal zero We can use the defined test statistic F = MSR σ2χ2(p 1) MSE p 1 σ 2 χ 2 (n p) n p The decision rule to control the Type I error at α is If F F(1 α; p 1, n p),conclude H 0 If F > F(1 α; p 1, n p),conclude H a Frank Wood, [email protected] Linear Regression Models Lecture 1, Slide 39
Multivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
Linear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka [email protected] http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Orthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
Solution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Section 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
Some probability and statistics
Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
Notes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:
Chapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
Inner product. Definition of inner product
Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Covariance and Correlation
Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such
Lecture 21. The Multivariate Normal Distribution
Lecture. The Multivariate Normal Distribution. Definitions and Comments The joint moment-generating function of X,...,X n [also called the moment-generating function of the random vector (X,...,X n )]
Solving Linear Systems, Continued and The Inverse of a Matrix
, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the
Linear Algebra Notes
Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note
1 Determinants and the Solvability of Linear Systems
1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
Similar matrices and Jordan form
Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive
A note on companion matrices
Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod
Solving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More than you wanted to know about quadratic forms
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences More than you wanted to know about quadratic forms KC Border Contents 1 Quadratic forms 1 1.1 Quadratic forms on the unit
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
Linear Regression. Guy Lebanon
Linear Regression Guy Lebanon Linear Regression Model and Least Squares Estimation Linear regression is probably the most popular model for predicting a RV Y R based on multiple RVs X 1,..., X d R. It
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
Methods for Finding Bases
Methods for Finding Bases Bases for the subspaces of a matrix Row-reduction methods can be used to find bases. Let us now look at an example illustrating how to obtain bases for the row space, null space,
Sections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
Lecture 5: Singular Value Decomposition SVD (1)
EEM3L1: Numerical and Analytical Techniques Lecture 5: Singular Value Decomposition SVD (1) EE3L1, slide 1, Version 4: 25-Sep-02 Motivation for SVD (1) SVD = Singular Value Decomposition Consider the system
[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
STAT 830 Convergence in Distribution
STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31
by the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
Chapter 17. Orthogonal Matrices and Symmetries of Space
Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length
Topic 8. Chi Square Tests
BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test
Elasticity Theory Basics
G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold
Chapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
Lecture 8. Confidence intervals and the central limit theorem
Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of
160 CHAPTER 4. VECTOR SPACES
160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of
LINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
Linearly Independent Sets and Linearly Dependent Sets
These notes closely follow the presentation of the material given in David C. Lay s textbook Linear Algebra and its Applications (3rd edition). These notes are intended primarily for in-class presentation
4.5 Linear Dependence and Linear Independence
4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then
Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.
Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(
The Bivariate Normal Distribution
The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included
Vector Math Computer Graphics Scott D. Anderson
Vector Math Computer Graphics Scott D. Anderson 1 Dot Product The notation v w means the dot product or scalar product or inner product of two vectors, v and w. In abstract mathematics, we can talk about
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
minimal polyonomial Example
Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Inner products on R n, and more
Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +
α = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
Conductance, the Normalized Laplacian, and Cheeger s Inequality
Spectral Graph Theory Lecture 6 Conductance, the Normalized Laplacian, and Cheeger s Inequality Daniel A. Spielman September 21, 2015 Disclaimer These notes are not necessarily an accurate representation
Lecture 5 Principal Minors and the Hessian
Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and
Notes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
3.1 Least squares in matrix form
118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Applied Linear Algebra I Review page 1
Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties
Linear Models for Continuous Data
Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear
Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013
Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear
Credit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
1 Homework 1. [p 0 q i+j +... + p i 1 q j+1 ] + [p i q j ] + [p i+1 q j 1 +... + p i+j q 0 ]
1 Homework 1 (1) Prove the ideal (3,x) is a maximal ideal in Z[x]. SOLUTION: Suppose we expand this ideal by including another generator polynomial, P / (3, x). Write P = n + x Q with n an integer not
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Recall that two vectors in are perpendicular or orthogonal provided that their dot
Orthogonal Complements and Projections Recall that two vectors in are perpendicular or orthogonal provided that their dot product vanishes That is, if and only if Example 1 The vectors in are orthogonal
The Determinant: a Means to Calculate Volume
The Determinant: a Means to Calculate Volume Bo Peng August 20, 2007 Abstract This paper gives a definition of the determinant and lists many of its well-known properties Volumes of parallelepipeds are
Factor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
T ( a i x i ) = a i T (x i ).
Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)
LS.6 Solution Matrices
LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions
problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random
Week 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
Topic 3. Chapter 5: Linear Regression in Matrix Form
Topic Overview Statistics 512: Applied Linear Models Topic 3 This topic will cover thinking in terms of matrices regression on multiple predictor variables case study: CS majors Text Example (NKNW 241)
1 Simple Linear Regression I Least Squares Estimation
Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and
Factor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
Linear Algebra: Determinants, Inverses, Rank
D Linear Algebra: Determinants, Inverses, Rank D 1 Appendix D: LINEAR ALGEBRA: DETERMINANTS, INVERSES, RANK TABLE OF CONTENTS Page D.1. Introduction D 3 D.2. Determinants D 3 D.2.1. Some Properties of
MATH 551 - APPLIED MATRIX THEORY
MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points
For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )
Probability Review 15.075 Cynthia Rudin A probability space, defined by Kolmogorov (1903-1987) consists of: A set of outcomes S, e.g., for the roll of a die, S = {1, 2, 3, 4, 5, 6}, 1 1 2 1 6 for the roll
