Multivariate normal distribution and testing for means (see MKB Ch 3)



Similar documents
Quadratic forms Cochran s theorem, degrees of freedom, and all that

Introduction to General and Generalized Linear Models

Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Linear Algebra Review. Vectors

STAT 830 Convergence in Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Similarity and Diagonalization. Similar Matrices

1.5 Oneway Analysis of Variance

Simple Linear Regression Inference

160 CHAPTER 4. VECTOR SPACES

Orthogonal Diagonalization of Symmetric Matrices

Principle of Data Reduction

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

Some probability and statistics

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Data Mining: Algorithms and Applications Matrix Math Review

T ( a i x i ) = a i T (x i ).

October 3rd, Linear Algebra & Properties of the Covariance Matrix

3.4 Statistical inference for 2 populations based on two samples

Least Squares Estimation

Review Jeopardy. Blue vs. Orange. Review Jeopardy

1 Introduction to Matrices

Gaussian Conjugate Prior Cheat Sheet

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Inner Product Spaces

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Methods for Finding Bases

Efficiency and the Cramér-Rao Inequality

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Chapter 6. Orthogonality

Part 2: Analysis of Relationship Between Two Variables

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Hypothesis Testing for Beginners

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Solving Linear Systems, Continued and The Inverse of a Matrix

Section 13, Part 1 ANOVA. Analysis Of Variance

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

1 Sufficient statistics

Chapter 3: The Multiple Linear Regression Model

Linear Algebra Notes

Lecture 8. Confidence intervals and the central limit theorem

Chapter 4: Statistical Hypothesis Testing

α = u v. In other words, Orthogonal Projection

Multivariate Statistical Inference and Applications

HYPOTHESIS TESTING: POWER OF THE TEST

Subspaces of R n LECTURE Subspaces

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Least-Squares Intersection of Lines

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Permutation Tests for Comparing Two Populations

Math 312 Homework 1 Solutions

17. Inner product spaces Definition Let V be a real vector space. An inner product on V is a function

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Econometrics Simple Linear Regression

General Sampling Methods

E3: PROBABILITY AND STATISTICS lecture notes

by the matrix A results in a vector which is a reflection of the given

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

3.1 Least squares in matrix form

Notes on Symmetric Matrices

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Solving Systems of Linear Equations

Introduction to Matrix Algebra

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Notes on Determinant

Nonparametric Statistics

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

1. Introduction to multivariate data

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

Solutions to Math 51 First Exam January 29, 2015

University of Lille I PC first year list of exercises n 7. Review

Factor analysis. Angela Montanari

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Vector and Matrix Norms

1 Prior Probability and Posterior Probability

Lecture 5 Principal Minors and the Hessian

Section Inner Products and Norms

Multivariate Normal Distribution Rebecca Jennings, Mary Wakeman-Linn, Xin Zhao November 11, 2010

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

minimal polyonomial Example

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Multivariate Analysis of Variance (MANOVA)

Sections 2.11 and 5.8

Exact Confidence Intervals

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Inner Product Spaces and Orthogonality

x = + x 2 + x

Transcription:

Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate)................................................. 4 One sample T 2 -test (multivariate)............................................... 5 Two sample T 2 -test (multivariate)............................................... 6 Multivariate normal distribution 7 Definition................................................................ 8 Why is it important in multivariate statistics?....................................... 9 Why is it important in multivariate statistics?....................................... 10 Characterization........................................................... 11 Properties............................................................... 12 Properties............................................................... 13 Data matrices............................................................. 14 Wishart distribution 15 Quadratic forms of normal data matrices.......................................... 16 Definition................................................................ 17 Properties............................................................... 18 Properties............................................................... 19 Properties............................................................... 20 Hotelling s T 2 distribution 21 Definition................................................................ 22 One-sample T 2 test......................................................... 23 Relationship with F distribution................................................ 24 Mahalonobis distance........................................................ 25 Two-sample T 2 test......................................................... 26 Final remarks 27 Assumptions for one-sample T 2 test.............................................. 28 Assumptions for two-sample T 2 test.............................................. 29 Multivariate test versus univariate tests........................................... 30 1

Where are we going? 2 / 30 One-sample t-test (univariate) Suppose that x 1,...,x n are i.i.d. N(µ,σ 2 ). Then x = 1 n n i=1 x i N(µ,σ 2 /n) ns 2 = n i=1 (x i x) 2 σ 2 χ 2 n 1 x and s 2 are independent Suppose that the mean µ is unknown. Then we can do a one-sample t-test: H 0 : µ = µ 0, H a : µ µ 0. Test statistic: t = x µ 0 s u/ n, where s u is sample standard deviation with n 1 in the denominator. Under H 0, t t n 1, i.e., it has a student t-distribution with n 1 degrees of freedom. Compute p-value. If p-value < 0.05, we reject the null hypothesis. 3 / 30 Two-sample t-test (univariate) Suppose that x 1,...,x n are i.i.d. N(µ X,σX 2 ), and let y 1,...,y m be i.i.d. N(µ Y,σY 2 ) (independent of x 1,...,x n ). Suppose we want to test whether µ X = µ Y, under the assumption that σ X = σ Y. Then we can do a two-sample t-test: H 0 : µ X = µ Y, H a : µ X µ Y. x ȳ Test statistic: t = where 1n 1m ( + ), s 2 p s 2 p = 1 ( ns 2 n + m 2 X + ms 2 ) Y. Under H 0, t t n+m 2. 4 / 30 One sample T 2 -test (multivariate) Suppose that x 1,...,x n are i.i.d. N p (µ,σ). Then x = 1 n n i=1 x i N p (µ,σ/n). ns W p (Σ,n 1), where S is the sample covariance matrix and W p is a Wishart distribution. x and S are independent. Suppose that the mean µ is unknown. Then we can do a one-sample t-test: H 0 : µ = µ 0, H a : µ µ 0. Test statistic: T = n( x µ 0 ) S 1 u ( x µ 0), where S u is the sample covariance matrix with n 1 in the denominator. Under H 0, T T 2 (p,n 1), i.e., it has Hotelling s T 2 distribution with parameters p and n 1. 5 / 30 2

Two sample T 2 -test (multivariate) Suppose that x 1,...,x n are i.i.d. N p (µ X,Σ X ), and let y 1,...,y m be i.i.d. N p (µ Y,Σ Y ) (independent of x 1,...,x n ). Suppose we want to test whether µ X = µ Y, under the assumption that Σ X = Σ Y. Then we can do a two-sample t-test: H 0 : µ X = µ Y, H a : µ X µ Y. Test statistic: T = (nm/(n + m))( x ȳ) S 1 u ( x ȳ), where S u = 1 n + m 2 (ns 1 + ms 2 ). Under H 0, T T 2 (p,n + m 2). 6 / 30 Multivariate normal distribution 7 / 30 Definition A random variable x R has a univariate normal distribution with mean µ and variance σ 2 (we write x N(µ,σ 2 )) iff its density can be written as f(x) = 1 ( exp 1 (x µ) 2 ) 2πσ 2 σ ( 2 = {2πσ 2 } 1/2 exp 1 ) 2 (x µ){σ2 } 1 (x µ). A random vector x R p has a p-variate normal distribution with mean vector µ and covariance matrix Σ (we write x N p (µ,σ)) iff its density can be written as ( f(x) = 2πΣ 1/2 exp 1 ) 2 (x µ) Σ 1 (x µ). 8 / 30 Why is it important in multivariate statistics? It is an easy generalization of the univariate normal distribution. Such generalizations are not obvious for all univariate distributions; sometimes there are several plausible ways to generalize them. The multivariate normal distribution is entirely defined by its first two moments. Hence, it has a sparse parametrization using only p(p + 3)/2 parameters (see board). In the case of normal variables, zero correlation implies independence, and pairwise independence implies mutual independence. These properties do not hold for many other distributions. 9 / 30 3

Why is it important in multivariate statistics? Linear functions of a multivariate normal are univariate normal. This yields simple derivations. Even when the original data is not multivariate normal, certain functions like the sample mean will be approximately multivariate normal due to the central limit theorem. The multivariate normal distribution has a simple geometry. Its equiprobability contours are ellipsoids (see picture on slide). 10 / 30 Characterization x is p-variate normal iff a x is univariate normal for all fixed vectors a R p (To allow for a = 0, we regard constants as degenerate forms of the normal distribution.) Geometric interpretation: x is p-variate normal iff its projection on any univariate subspace is normal. This characterization will allow us to derive many properties of the multivariate normal without writing down densities. 11 / 30 Properties (Th 3.1.1 of MKB) If x is p-variate normal, and if y = Ax + c where A is any q p matrix and c is any q-vector, then y is q-variate normal (see proof on board). (Cor 3.1.1.1 of MKB) Any subset of elements of a multivariate normal vector are multivariate normal. In particular, all individual elements are univariate normal (see proof on board). 12 / 30 Properties (Th 3.2.1 of MKB) If x N p (µ,σ) and y = Ax + c, then y N q (Aµ + c,aσa ) (see proof on board). (Cor 3.2.1.1 of MKB) If x N p (µ,σ) with Σ > 0, then y = Σ 1/2 (x µ) N p (0,I) and (x µ) Σ 1 (x µ) = p i=1 y2 i χ2 p (see proof on board). 13 / 30 4

Data matrices Let x 1,...,x n be a random sample from N(µ,Σ). Then we call X = (x 1,...,x n ) a data matrix from N(µ,Σ) or a normal data matrix. (Th. 3.3.2 of MKB, without proof) If X(n p) is a normal data matrix from N p (µ,σ) and if Y (m q) satisfies Y = AXB, then Y is a normal matrix iff the following two properties hold: A1 = α1 for some scalar α, or B µ = 0 AA = βi for some scalar β, or B ΣB = 0 When both these conditions are satisfied then Y is a normal data matrix from N q (αb µ,βb ΣB). Note: Pre-multiplication with A means that we take linear combinations of the rows. The conditions on A ensure that the new rows are independent. Post-multiplication with B means that we take linear combinations of the columns (variables). 14 / 30 Wishart distribution 15 / 30 Quadratic forms of normal data matrices We now consider quadratic forms of normal data matrices, i.e., functions of the form X CX for some symmetric matrix C. A special case of such a quadratic form is the covariance matrix, which we obtain when C = n 1 (I n 1 11 ) (see proof on board; H = I n 1 11 is called the centering matrix). 16 / 30 Definition If M(p p) can be written as X X where X(m p) is a data matrix from N p (0,Σ), then M is said to have a p-variate Wishart distribution with scale matrix Σ and m degrees of freedom. We write M W p (Σ,m). When Σ = I p, then the distribution is said to be in standard form. Note: The Wishart distribution is a multivariate generalization of the χ 2 distribution: when p = 1, the W 1 (σ 2,m) distribution is given by x x, where x R m contains i.i.d. N 1 (0,σ 2 ) variables. Hence, W 1 (σ 2,m) = σ 2 χ 2 m. 17 / 30 5

Properties (Th 3.4.1 of MKB) If M W p (Σ,m) and B is a p q matrix, then B MB W q (B ΣB,m) (see proof on board). (Cor 3.4.1.1 of MKB) Diagonal submatrices of M (square submatrices of M whose diagonal corresponds to the diagonal of M) have a Wishart distribution. (Cor 3.4.1.2 of MKB) Σ 1/2 MΣ 1/2 W p (I,m) (Cor 3.4.1.3 of MKB) If M W p (I,m) and B(p q) satisfies B B = I q, then B MB W q (I,m) (Cor 3.4.2.1 of MKB) The ith diagonal element of M, m ii, has a σ 2 i χ2 m distribution (where σ2 i is the ith diagonal element of Σ). All these corollaries follow by choosing particular values of B in Th 3.4.1. 18 / 30 Properties (Th 3.4.3 of MKB) If M 1 W p (Σ,m 1 ) and M 2 W p (Σ,m 2 ), and if M 1 and M 2 are independent, then M 1 + M 2 W p (Σ,m 1 + m 2 ) (see proof on board). 19 / 30 Properties (Th 3.4.4 of MKB) If X(n p) is a data matrix from N p (0,Σ) and C(n n) is a symmetric matrix, then X CX has the same distribution as a weighted sum of independent W p (Σ,1) matrices, where the weights are eigenvalues of C; X CX has a Wishart distribution if C is idempotent. In this case X CX W p (Σ,r) where r = tr(c) = rank(c); If S = n 1 X HX is the sample covariance matrix, then ns W p (Σ,n 1). (See proof on board) 20 / 30 Hotelling s T 2 distribution 21 / 30 Definition If α can be written as md M 1 d where d and M are independently distributed as N p (0,I) and W p (I,m), then we say that α has the Hotelling T 2 distribution with parameters p and m. We write α T 2 (p,m). Hotelling s T 2 distribution is a generalization of the student t-distribution. If x t m, then x 2 T 2 (1,m). 22 / 30 6

One-sample T 2 test (Th 3.5.1 of MKB) If x and M are independently distributed as N p (µ,σ) and W p (Σ,m) then m(x µ) M 1 (x µ) T 2 (p,m) (see proof on board). (Cor 3.5.1.1 of MKB) if x and S are the mean vector and covariance matrix of a sample of size n from N p (µ,σ), and S u = (n/(n 1))S, then T1 2 = (n 1)( x µ) S 1 ( x µ) = n( x µ)su 1 ( x µ) has a T 2 (p,n 1) distribution (see proof on board). 23 / 30 Relationship with F distribution The T 2 distribution is not readily available in R. But the T 2 distribution is closely related to the F-distribution: (Th 3.5.2 of MKB, without proof) T 2 (p,m) = {mp/(m p + 1)}F p,m p+1. (Cor 3.5.2.1 of MKB) If x and S are the mean and covariance of a sample of size n from N p (µ,σ), then n p (n 1)p T 2 1 has a F p,n p distribution. 24 / 30 Mahalonobis distance The so-called Mahalonobis distance between two populations with means µ 1 and µ 2 and common covariance matrix Σ is given by, where 2 = (µ 1 µ 2 ) Σ 1 (µ 1 µ 2 ) In other words, D is the euclidian distance between the re-scaled vectors Σ 1/2 µ 1 and Σ 1/2 µ 2. The sample version of the Mahalonobis distance, D, is defined by D 2 = ( x 1 x 2 ) S 1 u ( x 1 x 2 ), where S u = (n 1 S 1 + n 2 S 2 )/(n 2), x i is the sample mean of sample i, n i is the sample size of sample i, and n = n 1 + n 2. 25 / 30 7

Two-sample T 2 test (Th 3.6.1 of MKB, without proof) If X 1 and X 2 are independent data matrices, and if the n i rows of X i are i.i.d. N p (µ i,σ i ), i = 1,2, then when µ 1 = µ 2 and Σ 1 = Σ 2, has a T 2 (p,n 2) distribution. T 2 2 = n 1n 2 n D2 Corollary: n 1 p (n 2)p T 2 2 has a F p,n 1 p distribution. 26 / 30 Final remarks 27 / 30 Assumptions for one-sample T 2 test We have a simple random sample from the population The population has a multivariate normal distribution 28 / 30 Assumptions for two-sample T 2 test We have a simple random sample from each population In each population the variables have a multivariate normal distribution The two populations have the same covariance matrix 29 / 30 Multivariate test versus univariate tests See board 30 / 30 8