Sections 2.11 and 5.8

Similar documents

Multivariate Normal Distribution

CS229 Lecture notes. Andrew Ng

Lecture Notes 1. Brief Review of Basic Probability

Stat 704 Data Analysis I Probability Review

Data Mining: Algorithms and Applications Matrix Math Review

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Chapter 6: Point Estimation. Fall Probability & Statistics

The Monte Carlo Framework, Examples from Finance and Generating Correlated Random Variables

Slides for Risk Management

Slides for Risk Management VaR and Expected Shortfall

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Lecture 3: Linear methods for classification

Joint Exam 1/P Sample Exam 1

Multivariate Analysis (Slides 13)

Introduction to Multivariate Analysis

Correlation in Random Variables

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Factor analysis. Angela Montanari

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Some probability and statistics

Multivariate Normal Distribution Rebecca Jennings, Mary Wakeman-Linn, Xin Zhao November 11, 2010

Multivariate normal distribution and testing for means (see MKB Ch 3)

Statistical Machine Learning

Lecture 8: Signal Detection and Noise Assumption

Covariance and Correlation

Lecture 9: Introduction to Pattern Analysis

Lecture 21. The Multivariate Normal Distribution

An Introduction to Machine Learning

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

Correlation key concepts:

Multivariate Statistical Inference and Applications

Least Squares Estimation

The Bivariate Normal Distribution

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Linear Threshold Units

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Section 3 Part 1. Relationships between two numerical variables

Master s Theory Exam Spring 2006

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

3.1 Least squares in matrix form

A Tutorial on Probability Theory

Factor Analysis. Factor Analysis

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

Department of Economics

On Mardia s Tests of Multinormality

Module 3: Correlation and Covariance

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

How To Find The Correlation Of Random Bits With The Xor Operator

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Inner Product Spaces

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Geostatistics Exploratory Analysis

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Autocovariance and Autocorrelation

Week 5: Multiple Linear Regression

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Principal Component Analysis

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Econometrics Simple Linear Regression

STAT 830 Convergence in Distribution

New Methods Providing High Degree Polynomials with Small Mahler Measure

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Exploratory Factor Analysis

Homework 11. Part 1. Name: Score: / null

CONDITIONAL, PARTIAL AND RANK CORRELATION FOR THE ELLIPTICAL COPULA; DEPENDENCE MODELLING IN UNCERTAINTY ANALYSIS

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Introduction: Overview of Kernel Methods

Linear Classification. Volker Tresp Summer 2015

1.5. Factorisation. Introduction. Prerequisites. Learning Outcomes. Learning Style

Introduction to General and Generalized Linear Models

Dimensionality Reduction: Principal Components Analysis

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 )

Degrees of Freedom and Model Search

The 3-Level HLM Model

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

An extension of the factoring likelihood approach for non-monotone missing data

Chapter 2 Fundamentals of Statistical Analysis

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Question 2: How do you solve a matrix equation using the matrix inverse?

CORRELATION ANALYSIS

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

1 Prior Probability and Posterior Probability

Chapter 9 Descriptive Statistics for Bivariate Data

Statistical Models in Data Mining

Maximum likelihood estimation of mean reverting processes

Topic 3. Chapter 5: Linear Regression in Matrix Form

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Transcription:

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25

Gesell data Let X be the age in in months a child speaks his/her first word and let Y be the Gesell adaptive score, a measure of a child s aptitude (observed later on) Are X and Y related? How does the child s aptitude change with how long it takes them to speak? Here s the Gesell score y i and age at first word in months x i data, i = 1,,21 x i y i x i y i x i y i x i y i x i y i 15 95 26 71 10 83 9 91 15 102 20 87 18 93 11 100 8 104 20 94 7 113 9 96 10 83 11 84 11 102 10 100 12 105 42 57 17 121 11 86 10 100 In R, we compute r = 0640, a moderately strong negative relationship between age at first word spoken and Gesell score > age=c(15,26,10,9,15,20,18,11,8,20,7,9,10,11,11,10,12,42,17,11,10) > Gesell=c(95,71,83,91,102,87,93,100,104,94,113,96,83,84,102,100,105,57,121,86,100) > plot(age,gesell) > cor(age,gesell) [1] -064029 2/25

Scatterplot of (x 1,y 1 ),,(x 21,y 21 ) Gesell 60 70 80 90 100 110 120 10 15 20 25 30 35 40 age 3/25

Random vectors A random vector X = variables X 1 X 2 X k is made up of, say, k random A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just as a random variable X has a mean E(X) and variance var(x), a random vector also has a mean vector E(X) and a covariance matrix cov(x) A 4/25

Mean vector & covariance matrix Let X = (X 1,,X k ) be a random vector with density f(x 1,,x k ) The mean of X is the vector of marginal means X 1 E(X 1 ) X 2 E(X) = E = E(X 2 ) (538) X k E(X k ) The covariance matrix of X is given by cov(x 1,X 1 ) cov(x 1,X 2 ) cov(x 1,X k ) cov(x 2,X 1 ) cov(x 2,X 2 ) cov(x 2,X k ) cov(x) = cov(x k,x 1 ) cov(x k,x 2 ) cov(x k,x k ) (542) 5/25

Multivariate normal distribution The normal distribution generalizes to multiple dimensions We ll first look at two jointly distributed normal random variables, then discuss three or more The bivariate normal density for (X 1,X 2 ) is given by f(x 1,x 2 ) = { [ (x1 ) 1 2πσ 1 σ 2 1 ρ 2 exp 1 µ 2 ( )( ) ( ) ]} 1 x1 µ 1 x2 µ 2 x2 µ 2 2 2(1 ρ 2 2ρ + ) σ 1 σ 1 σ 2 σ 2 There are 5 parameters: (µ 1,µ 2,σ 1,σ 2,ρ) Besides 58, also see 211 pp78 83 6/25

Bivariate normal distribution This density jointly defines X 1 and X 2, which live in R 2 = (, ) (, ) Marginally, X 1 N(µ 1,σ1 2) and X 2 N(µ 2,σ2 2 ) (p 79) The correlation between X 1 and X 2 is given by corr(x 1,X 2 ) = ρ (p 80) For jointly normal random variables, if the correlation is zero then they are independent This is not true in general for jointly defined random variables E(X) = [ µ1 ], cov(x) = [ σ 2 1 σ 1 σ 2 ρ µ 2 σ 1 σ 2 ρ σ2 2 Next slide: µ 1 = 0,1; µ 2 = 0,2; σ1 2 = σ2 2 = 1; ρ = 0,09, 06 ] 7/25

Bivariate normal PDF level curves 4 0 2 4 4 0 2 4 4 2 0 2 4 4 2 0 2 4 4 0 2 4 4 0 2 4 4 2 0 2 4 4 2 0 2 4 8/25

Proof that X 1 indpendent X 2 when ρ = 0 When ρ = 0 the joint density for (X 1,X 2 ) simplifies to { [ 1 f(x 1,x 2 ) = exp 1 (x1 ) µ 2 ( ) ]} 1 x2 µ 2 2 + 2πσ 1 σ 2 2 σ 1 σ 2 [ ( ) 1 = e 05 x1 µ 2 ][ ( ) 1 1 σ 1 e 05 x2 µ 2 1 2 ] σ 2 2πσ1 2πσ2 Since these are each respectively functions of x 1 and x 2 only, and the range of (X 1,X 2 ) factors into the produce of two sets, X 1 and X 2 are independent and in fact X 1 N(µ 1,σ 2 1 ) and X 2 N(µ 2,σ 2 2 ) 9/25

Conditional distributions [X 1 X 2 = x 2 ] and [X 2 X 1 = x 1 ] (pp 80 81) The conditional distribution of X 1 given X 2 = x 2 is ( [X 1 X 2 = x 2 ] N µ 1 + σ ) 1 ρ(x 2 µ 2 ),σ 2 σ 1(1 ρ 2 ) 2 Similarly, [X 2 X 1 = x 1 ] N This ties directly to linear regression: ( µ 2 + σ ) 2 ρ(x 1 µ 1 ),σ 2 σ 2(1 ρ 2 ) 1 To predict X 2 X 1 = x 1, we have [ E(X 2 X 1 = x 1 ) = µ 2 σ ] [ ] 2 σ2 ρµ 1 + ρ x 1 = β 0 +β 1 x 1 σ 1 σ 1 10/25

Bivariate normal distribution as data model Here we assume or succinctly, [ Xi1 X i2 ] iid N2 ([ µ1 µ 2 ], X i iid N2 (µ,σ) [ ]) σ11 σ 12, σ 21 σ 22 If the bivariate normal model is appropriate for paired outcomes, it provides a convenient probability model with some nice properties The sample mean X = 1 n n i=1 X i is the MLE of µ and the sample covariance matrix S = 1 n n 1 i=1 (X i X)(X i X) is unbiased for Σ It can be shown that X N 2 ( µ, 1 n Σ ) The matrix (n 1)S has a Wishart distribution (generalizes χ 2 ) 11/25

Sample mean vector & covariance matrix Say n outcome pairs are to be recorded: {(X 11,X 12 ),(X 21,X 22 ),,(X n1,x n2 )} The i th pair is (X i1,x i2 ) The sample mean vector is given elementwise by X = [ X1 X 2 ] = [ 1 n n i=1 X i1 1 n n i=1 X i2 and the sample covariance matrix is given elementwise by [ 1 n S = n 1 i=1 (X i1 X 1 ) 2 1 n n 1 i=1 (X ] i1 X 1 )(X i2 X 2 ) 1 n n 1 i=1 (X i1 X 1 )(X i2 X 1 n 2 ) n 1 i=1 (X i2 X 2 ) 2 ], 12/25

Estimation The sample mean vector X estimates µ = [ µ1 µ 2 ] and the sample covariance [ matrix S estimates ] [ ] σ1 Σ = 2 ρσ 1 σ 2 σ11 σ ρσ 1 σ 2 σ2 2 = 12 σ 21 σ 22 We will place hats on parameter estimators based on the data So ˆµ 1 = X 1, ˆµ 2 = X 2, ˆσ 2 1 = s 2 1 = 1 n 1 n (X i1 X 1 ) 2, i=1 Also, ˆσ 2 2 = s 2 2 = 1 n 1 ĉov(x 1,X 2 ) = 1 n 1 n (X i2 X 2 ) 2 i=1 n (X i1 X)(X i2 X 2 ) i=1 13/25

Correlation coefficient r So a natural estimate of ρ is then ˆρ = ĉov(x 1,X 2 ) ˆσ 1ˆσ 2 = 1 n 1 1 n n 1 i=1 (X i1 X 1 )(X i2 X 2 ) n i=1 (X i1 X 1 ) 2 n i=1 (X i1 X 1 ) 2 1 n 1 This is in fact the MLE estimate based on the bivariate normal model It is also a plug-in estimator based on the method-of-moments too It is commonly referred to as the Pearson correlation coefficient You can get it as, eg, cor(age,gesell) in R This estimate of correlation can be unduly influenced by outliers in the sample An alternative measure of linear association is the Spearman correlation based on ranks, discussed a few lectures ago > cor(age,gesell) [1] -064029 > cor(age,gesell,method="spearman") [1] -03166224 14/25

Gesell data Recall: X is age in months a child speaks his/her first word and let Y is Gesell adaptive score, a measure of a child s aptitude Question: how does the child s aptitude change with how long it takes them to speak? [ Here, ] n = 21 [ ] 1438 6014 6778 In R we find X = Also, S = 9367 6778 18632 Assuming a bivariate model, we plug in the estimates and obtain the estimated PDF for (X,Y): f(x,y) = exp( 6022+13006x 00134x 2 +09520y 00098xy 00043y 2 ) We can further find from Y N(9367,18632), f Y (y) = exp( 3557 000256(y 9367) 2 ) 15/25

3D plot of f(x,y) for (X,Y) estimated from data 0 00015 0001 00005 0 10 120 20 100 30 60 80 16/25

Gesell conditional distribution 0035 003 0025 002 0015 001 0005 80 100 120 140 Solid is f Y (y); left dashed is f Y X (y 25) the right dashed is f Y X (y 10) As the age in months of first words X = x increases, the distribution of Gesell Adaptive Scores Y decreases 17/25

Density estimate with actual data Gesell 40 60 80 100 120 0 10 20 30 40 50 age 18/25

Multivariate normal distribution In general, a k-variate normal is defined through the mean and covariance matrix: X 1 µ 1 σ 11 σ 12 σ 1k X 2 N µ 2 k, σ 21 σ 22 σ 2k X k µ k σ k1 σ k2 σ kk Succinctly, X N k (µ,σ) Recall that if Z N(0,1), then X = µ+σz N(µ,σ 2 ) The definition of the multivariate normal distribution just extends this idea 19/25

Multivariate normal made from independent normals Instead of one standard normal, we have a list of k independent standard normals Z = (Z 1,,Z k ), and consider the same sort of transformation in the multivariate case using matrices and vectors Let Z 1,,Z k iid N(0,1) The joint pdf of (Z1,,Z k ) is given by Let µ = f(z 1,,z k ) = µ 1 µ 2 µ k k exp( 05zi 2 )/ 2π i=1 σ 11 σ 12 σ 1k and Σ = σ 21 σ 22 σ 2k, σ k1 σ k2 σ kk where Σ is symmetric (ie Σ = Σ, which implies σ ij = σ ji for all 1 i,j k) 20/25

Multivariate normal made from independent normals Let Σ 1/2 be any matrix such that Σ 1/2 Σ 1/2 = Σ Then X = µ+σ 1/2 Z is said to have a multivariate normal distribution with mean vector µ and covariance matrix Σ, written Written in terms of matrices X N k (µ,σ) X = X 1 X 2 X k = µ 1 µ 2 µ k + σ 11 σ 12 σ 1k σ 21 σ 22 σ 2k σ k1 σ k2 σ kk 1/2 Z 1 Z 2 Z k 21/25

Joint PDF Using some math, it can be shown that the pdf of the new vector X = (X 1,,X k ) is given by f(x 1,,x k µ,σ) = 2πΣ 1/2 exp{ 05(x µ) Σ 1 (x µ)} In the one-dimensional case, this simplifies to our old friend f(x 1 µ,σ 2 ) = (2πσ 2 ) 1/2 exp{ 05(x µ)(σ 2 ) 1 (x µ)}, the pdf of a N(µ,σ 2 ) random variable X A is the determinant of the matrix A, and is a function of the elements of A, but beyond this course 22/25

Properties of multivariate normal vectors Let Then X N k (µ,σ) 1 For each X i in X = (X 1,,X k ), E(X i ) = µ i and var(x i ) = σ ii That is, marginally, X i N(µ i,σ ii ) 2 For any r k matrix M, MX N r (Mµ,MΣM ) 3 For any two (X i,x j ) where 1 i < j k, cov(x i,x j ) = σ ij The off-diagonal elements of Σ give the covariance between two elements of (X 1,,X k ) Note then ρ(x i,x j ) = σ ij / σ ii σ jj 4 For any k 1 vector m = (m 1,,m k ) and Y N k (µ,σ), m+y N k (m+µ,σ) 23/25

Example Let Define X 1 X 2 N 3 2 5, 2 1 1 1 3 1 X 3 0 1 1 4 [ ] [ ] 1 0 1 Y1 M = 1 1 1 and Y = = MX = 3 3 3 Y 2 [ 1 0 1 1 1 1 3 3 3 ] X 1 X 2 X 3 Then X 2 N(5,3), cov(x 2,X 3 ) = 1 and [ ] 1 0 1 1 1 1 X 1 X 2 3 3 3 X 3 [ ] 1 0 1 N 2 1 1 1 2 [ ] 1 0 1 5, 1 1 1 2 1 1 1 3 1 3 3 3 0 3 3 3 1 1 4 or simplifying, [ Y1 Y 2 ] = [ 1 0 1 1 1 1 3 3 3 ] X 1 X 2 X 3 N 2 ([ 2 1 ] [ 4 0, 0 11 9 1 1 3 0 1 3 1 1 3 Note that for the transformed vector Y = (Y 1,Y 2 ), cov(y 1,Y 2 ) = 0 and therefore Y 1 and Y 2 are uncorrelated, ie ρ(y 1,Y 2 ) = 0 ]), 24/25

Simple linear regression For the linear model (eg simple linear regression or the two-sample model) Y = Xβ +ǫ, the error vector is assumed (pp 222 223) ǫ N n (0,I n n σ 2 ) Then the least squares estimators have a multivariate normal distribution β N p (β,(x X) 1 σ 2 ) p = 2 is the number of mean parameters (The MSE has a gamma distribution) We ll discuss this shortly! 25/25