THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n



Similar documents
Hypothesis testing. Null and alternative hypotheses

Soving Recurrence Relations

I. Chi-squared Distributions

Measures of Spread and Boxplots Discrete Math, Section 9.4

Sampling Distribution And Central Limit Theorem

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

1 Computing the Standard Deviation of Sample Means

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Chapter 7 Methods of Finding Estimators

Confidence Intervals for One Mean

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Lesson 15 ANOVA (analysis of variance)

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

5: Introduction to Estimation

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

1 Correlation and Regression Analysis

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Properties of MLE: consistency, asymptotic normality. Fisher information.

Basic Elements of Arithmetic Sequences and Series

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Incremental calculation of weighted mean and variance

Lesson 17 Pearson s Correlation Coefficient

Determining the sample size

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Math C067 Sampling Distributions

Now here is the important step

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Maximum Likelihood Estimators.

LECTURE 13: Cross-validation

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

Output Analysis (2, Chapters 10 &11 Law)

1. C. The formula for the confidence interval for a population mean is: x t, which was

Chapter 14 Nonparametric Statistics

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Elementary Theory of Russian Roulette

Chapter 7: Confidence Interval and Sample Size

PSYCHOLOGICAL STATISTICS

Normal Distribution.

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Chapter 5: Inner Product Spaces


Sequences and Series

Department of Computer Science, University of Otago

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

CHAPTER 3 THE TIME VALUE OF MONEY

Notes on exponential generating functions and structures.

A probabilistic proof of a binomial identity

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

THE TWO-VARIABLE LINEAR REGRESSION MODEL

A Mathematical Perspective on Gambling

CS103X: Discrete Structures Homework 4 Solutions

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

5.3. Generalized Permutations and Combinations

Review: Classification Outline

3 Basic Definitions of Probability Theory

Quadrat Sampling in Population Ecology

One-sample test of proportions

Convexity, Inequalities, and Norms

Lecture 4: Cheeger s Inequality

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Confidence intervals and hypothesis tests

Systems Design Project: Indoor Location of Wireless Devices

Permutations, the Parity Theorem, and Determinants

Descriptive Statistics

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

5 Boolean Decision Trees (February 11)

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

3. Greatest Common Divisor - Least Common Multiple

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

Infinite Sequences and Series

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

The Binomial Multi- Section Transformer

Solving equations. Pre-test. Warm-up

Simple Annuities Present Value.

Overview of some probability distributions.

Statistical inference: example 1. Inferential Statistics

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Transcription:

We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample from a populatio with mea zero ad stadard deviatio σ. I most cases we also assume that this populatio is ormally distributed. The multiple liear regressio model is i = + x i + x i + x i + + x i + ε i for i =,,,, This model icludes the assumptio about the ε i s stated just above. This requires buildig up our symbols ito vectors. Thus = captures the etire depedet variable i a sigle symbol. The part of the otatio is just a shape remider. These get dropped oce the cotext is clear. For simple liear regressio, we will capture the idepedet variable through this matrix: X = x x x x The coefficiet vector will be = ad the oise vector will be ε = ε ε ε. ε

The simple liear regressio model is writte the as = X + ε. The product part, meaig X, is foud through the usual rule for matrix multiplicatio as X x + x x + x = x = + x x + x We usually write the model without the shape remiders as = X + ε. otatio for This is a shorthad + x +ε + x +ε = + x +ε + x +ε It is helpful that the multiple regressio story with predictors leads to the same model expressio = X + ε (just with differet shapes). As a otatioal coveiece, let p = +. I the multiple regressio case, we have X = p x x x x x x x x x x x x x x x x x x x x x 4 4 4 5 5 5 6 6 6 ad = p The detail show here is to suggest that X is a tall, skiy matrix. We formally require p. I most applicatios, is much, much larger tha p. The ratio p is ofte i the hudreds.

If it happes that p is as small as 5, we will worry that we do t have eough data (reflected i ) to estimate the umber of parameters i (reflected i p). The multiple regressio model is ow = X + ε, ad this is a shorthad for p p + x + x + x + + x + ε + x + x + x + + x + ε = + x + x + x + + x +ε + x + x + x + + x + ε The model form = X + ε is thus completely geeral. The assumptios o the oise terms ca be writte as E ε = ad Var ε = σ I. The I here is the idetity matrix. That is, I = The variace assumptio ca be writte as Var ε = this expressed as Cov( ε i, ε j ) = σ δ ij, where σ σ σ. ou may see σ δ ij if i = = if i j j

We will call b as the estimate for ukow parameter vector. ou will also fid the otatio ˆ as the estimate. Oce we get b, we ca compute the fitted vector ˆ = X b. This fitted value represets a ex-post guess at the expected value of. The estimate b is foud so that the fitted vector ˆ is close to the actual data vector. Closeess is defied i the least squares sese, meaig that we wat to miimize the criterio Q, where ( i th i etry ) Xb Q = ( ) This ca be doe by differetiatig this quatity p = + times, oce with respect to b, oce with respect to b,.., ad oce with respect to b. This is routie i simple regressio ( = ), ad it s possible with a lot of messy work i geeral. It happes that Q is the squared legth of the vector differece ca write Xb. This meas that we Q = ( Xb) ( Xb) This represets Q as a matrix, ad so we ca thik of Q as a ordiary umber. There are several ways to fid the b that miimizes Q. The simple solutio we ll show here (alas) requires kowig the aswer ad workig backward. Defie the matrix ( ) H = X X X X. We will call H as the hat matrix, ad it has p p p p some importat uses. There are several techical commets about H : () Fidig H requires the ability to get ( ) X X. This matrix iversio is p p possible if ad oly if X has full rak p. Thigs get very iterestig whe X almost has full rak p ; that s a loger story for aother time. () The matrix H is idempotet. The defiig coditio for idempotece is this: The matrix C is idempotet C C = C. Oly square matrices ca be idempotet. Sice H is square (It s.), it ca be checked for idempotece. ou will ideed fid that H H = H. 4

() The i th diagoal etry, that i positio (i, i), will be idetified for later use as the i th leverage value. The otatio is usually h i, but you ll also see h ii. Now write i the form H + (I H). Now let s develop Q. This will require usig the fact that H is symmetric, meaig H = H. This will also require usig the traspose of a matrix product. Specifically, the property will be ( X b) = b X. Q = ( Xb) ( Xb ) = ( I ) ( { } ) { ( I ) } ( ) H + H Xb H+ H Xb ( H Xb + H ) ( { H Xb} + ( I H) ) = { } ( I ) = { H Xb} { H Xb} { H Xb} ( I H) (( I H ) ) { H Xb} (( I H) ) ( I H) + + + The secod ad third summads above are zero, as a cosequece of X HX = X X ( X X) X X = X X =. ( I H ) X = { } { } ( I ) ( ) ( I ) = H Xb H Xb + H H If this is to be miimized over choices of b, the the miimizatio ca oly be doe with regard H Xb H Xb. It is possible to make the vector to the first summad { } { } - H Xb equal to by selectig b = ( X X) - H = X ( X X) X. X. This is very easy to see, as This b = ( ) - X X X is kow as the least squares estimate of. 5

b For the simple liear regressio case =, the estimate b = ad be foud with relative b Sxy ease. The slope estimate is b = xi x i = xii x ad where S xx = ( x x ) S, where S xy = ( )( ) xx i = xi ( x). For the multiple regressio case, the calculatio ivolves the iversio of the p p matrix X X. This task is best left to computer software. There is a computatioal trick, called mea-ceterig, that coverts the problem to a simpler oe of ivertig a matrix. The matrix otatio will allow the proof of two very helpful facts: * E b =. This meas that b is a ubiased estimate of. This is a good thig, but there are circumstaces i which biased estimates will work a little bit better. * Var b = ( ) σ X X. This idetifies the variaces ad covariaces of the estimated coefficiets. It s critical to ote that the separate etries of b are ot statistically idepedet. 6