Inference in Regression Analysis. Dr. Frank Wood

Similar documents
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Estimation of σ 2, the variance of ɛ

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Elements of statistics (MATH0487-1)

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Fairfield Public Schools

Example: Boats and Manatees

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Regression Analysis: A Complete Example

2. Linear regression with multiple regressors

1.5 Oneway Analysis of Variance

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Simple Linear Regression Inference

Notes on Applied Linear Regression

Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

Part 2: Analysis of Relationship Between Two Variables

Simple Regression Theory II 2010 Samuel L. Baker

Experimental Designs (revisited)

Econometrics Simple Linear Regression

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

1 Simple Linear Regression I Least Squares Estimation

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Introduction to General and Generalized Linear Models

1 Another method of estimation: least squares

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Population Mean (Known Variance)

Stat 704 Data Analysis I Probability Review

Multivariate normal distribution and testing for means (see MKB Ch 3)

Review of Bivariate Regression

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

3.4 Statistical inference for 2 populations based on two samples

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Final Exam Practice Problem Answers

CHAPTER 14 NONPARAMETRIC TESTS

Multivariate Normal Distribution

17. SIMPLE LINEAR REGRESSION II

Section 13, Part 1 ANOVA. Analysis Of Variance

One-Way Analysis of Variance (ANOVA) Example Problem

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Hypothesis Testing for Beginners

Coefficient of Determination

Factors affecting online sales

Multiple Linear Regression

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Regression III: Advanced Methods

Module 5: Multiple Regression Analysis

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Section 1: Simple Linear Regression

Statistics 104: Section 6!

Penalized regression: Introduction

International Statistical Institute, 56th Session, 2007: Phil Everson

Multiple Linear Regression in Data Mining

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Simple linear regression

5. Multiple regression

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Nonlinear Regression Functions. SW Ch 8 1/54/

MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis

MTH 140 Statistics Videos

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

You have data! What s next?

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Introduction to Regression and Data Analysis

12: Analysis of Variance. Introduction

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Comparing Means in Two Populations

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

Lecture 8. Confidence intervals and the central limit theorem

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

2. Simple Linear Regression

5. Linear Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING

Hypothesis testing - Steps

Some Essential Statistics The Lure of Statistics

STAT 350 Practice Final Exam Solution (Spring 2015)

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Projects Involving Statistics (& SPSS)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Basics of Statistical Machine Learning

Week 5: Multiple Linear Regression

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

SIMPLE LINEAR REGRESSION

Non Parametric Inference

5.1 Identifying the Target Parameter

Unit 26 Estimation with Confidence Intervals

Chapter 7: Simple linear regression Learning Objectives

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Forecasting in supply chains

Statistics for Analysis of Experimental Data

August 2012 EXAMINATIONS Solution Part I

Transcription:

Inference in Regression Analysis Dr. Frank Wood

Inference in the Normal Error Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor variable in the i th trial ɛ i iid N(0, σ 2 ) i = 1,..., n

Inference concerning β 1 Tests concerning β 1 (the slope) are often of interest, particularly H 0 : β 1 = 0 H a : β1 0 the null hypothesis model Y i = β 0 + (0)X i + ɛ i implies that there is no relationship between Y and X. Note the means of all the Y i s are equal at all levels of X i.

Quick Review : Hypothesis Testing Elements of a statistical test Null hypothesis, H 0 Alternative hypothesis, Ha Test statistic Rejection region

Quick Review : Hypothesis Testing - Errors Errors A type I error is made if H 0 is rejected when H 0 is true. The probability of a type I error is denoted by α. The value of α is called the level of the test. A type II error is made if H0 is accepted when H a is true. The probability of a type II error is denoted by β.

P-value The p-value, or attained significance level, is the smallest level of significance α for which the observed data indicate that the null hypothesis should be rejected.

Null Hypothesis If the null hypothesis is that β 1 = 0 then b 1 should fall in the range around zero. The further it is from 0 the less likely the null hypothesis is to hold.

Alternative Hypothesis : Least Squares Fit If we find that our estimated value of b 1 deviates from 0 then we have to determine whether or not that deviation would be surprising given the model and the sampling distribution of the estimator. If it is sufficiently (where we define what sufficient is by a confidence level) different then we reject the null hypothesis.

Testing This Hypothesis Only have a finite sample Different finite set of samples (from the same population / source) will (almost always) produce different point estimates of β 0 and β 1 (b 0, b 1 ) given the same estimation procedure Key point: b 0 and b 1 are random variables whose sampling distributions can be statistically characterized Hypothesis tests about β 0 and β 1 can be constructed using these distributions. The same techniques for deriving the sampling distribution of b = [b 0, b 1 ] are used in multiple regression.

Sampling Dist. Of b 1 The point estimator for b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 The sampling distribution for b 1 is the distribution of b 1 that arises from the variability of b 1 when the predictor variables X i are held fixed and the errors are repeatedly sampled Note that the sampling distribution we derive for b 1 will be highly dependent on our modeling assumptions.

Sampling Dist. Of b 1 In Normal Regr. Model For a normal error regression model the sampling distribution of b 1 is normal, with mean and variance given by E{b 1 } = β 1 σ 2 {b 1 } = σ 2 (Xi X ) 2 To show this we need to go through a number of algebraic steps.

(Xi X )(Y i Ȳ ) = (X i X )Y i First step To show we observe (Xi X )(Y i Ȳ ) = (X i X )Y i (X i X )Ȳ = (X i X )Y i Ȳ (X i X ) = (X i X )Y i Ȳ (X i ) + Ȳ n = (X i X )Y i Xi This will be useful because the sampling distribution of the estimators will be expressed in terms of the distribution of the Y i s which are assumed to be equal to the regression function plus a random error term. n

b 1 as convex combination of Y i s b 1 can be expressed as a linear combination of the Y i s b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 = (Xi X )Y i (Xi X ) 2 from previous slide = k i Y i where k i = (X i X ) (Xi X ) 2 Now the estimator is simply a convex combination of the Y i s which makes computing its analytic sampling distribution simple.

Properties of the k i s It can be shown (using simple algebraic operations) that ki = 0 ki X i = 1 k 2 i = 1 (Xi X ) 2 (possible homework). We will use these properties to prove various properties of the sampling distributions of b 1 and b 0.

Normality of b 1 s Sampling Distribution Reminder: useful fact: A linear combination of independent normal random variables is normally distributed More formally: when Y1,..., Y n are independent normal random variables, the linear combination a 1 Y 1 + a 2 Y 2 +... + a n Y n is normally distributed, with mean ai E{Y i } and variance ai 2 σ 2 {Y i }

Normality of b 1 s Sampling Distribution Since b 1 is a linear combination of the Y i s and each Y i = β 1 X i + β 0 + e i is (conditioned on X i, β 1, and β 0 ) an independent normal random variable, then then distribution of b 1 under sampling of the errors is normal as well b 1 = k i Y i, k i = (X i X ) (Xi X ) 2 From previous slide E{b 1 } = k i E{Y i }, σ 2 {b 1 } = ki 2 σ 2 {Y i } This means b 1 N(E{b 1 }, σ 2 {b 1 }). To use this we must know E{b 1 } and σ 2 {b 1 }.

b 1 is an unbiased estimator This can be seen using two of the properties E{b 1 } = E{ k i Y i } = k i E{Y i } = k i (β 0 + β 1 X i ) = β 0 ki + β 1 ki X i = β 0 (0) + β 1 (1) = β 1 So now we know the mean of the sampling distribution of b 1 and conveniently (importantly) it s centered on the true value of the unknown quantity β 1 (the slope of the linear relationship).

Variance of b 1 Since the Y i are independent random variables with variance σ 2 and the k i s are constants we get σ 2 {b 1 } = σ 2 { k i Y i } = k 2 i σ 2 {Y i } = ki 2 σ 2 = σ 2 ki 2 = σ 2 1 (Xi X ) 2 and now we know the variance of the sampling distribution of b 1. This means that we can write b 1 N(β 1, σ 2 (Xi X ) 2 ) How does this behave as a function of σ 2 and the spread of the X i s? Is this intuitive? Note: this assumes that we know σ 2. Can we?

Estimated variance of b 1 When we don t know σ 2 then one thing that we can do is to replace it with the MSE estimate of the same Let where s 2 = MSE = SSE n 2 SSE = e 2 i and e i = Y i Ŷi plugging in we get σ 2 {b 1 } = s 2 {b 1 } = σ 2 (Xi X ) 2 s 2 (Xi X ) 2

Recap We now have an expression for the sampling distribution of b 1 when σ 2 is known b 1 N (β 1, σ 2 (Xi X ) 2 ) (1) When σ 2 is unknown we have an unbiased point estimator of σ 2 s 2 s 2 {b 1 } = (Xi X ) 2 As n (i.e. the number of observations grows large) s 2 {b 1 } σ 2 {b 1 } and we can use Eqn. 1. Questions When is n big enough? What if n isn t big enough?

Sampling Distribution of (b 1 β 1 )/s{b 1 }? b 1 is normally distributed so (b 1 β 1 )/( σ 2 {b 1 }) is a standard normal variable. Why? We don t know σ 2 {b 1 } because we don t know σ 2 so it must be estimated from data. We have already denoted it s estimate s 2 {b 1 } If using the estimate s 2 {b 1 } we will show that b 1 β 1 s{b 1 } t(n 2) where s{b 1 } = s 2 {b 1 }

Where does this come from? For now we need to rely upon the following theorem: Cochran s Theorem For the normal error regression model and is independent of b 0 and b 1 SSE (Yi Ŷ i ) 2 σ 2 = σ 2 χ 2 (n 2) Intuitively this follows the standard result for the sum of squared normal random variables Here there are two linear constraints imposed by the regression parameter estimation that each reduce the number of degrees of freedom by one. We will revisit this subject soon.

Another useful fact : Student-t distribution A definition: Let z and χ 2 (ν) be independent random variables (standard normal and χ 2 respectively). The following random variable is defined to be a t-distributed random variable: t(ν) = z χ 2 (ν) ν This version of the t distribution has one parameter, the degrees of freedom ν

Distribution of the studentized statistic b 1 β 1 s{b 1 } Is a so-called studentized statistic. t(n 2) To derive the distribution of this statistic, first we do the following rewrite b 1 β 1 s{b 1 } = where b 1 β 1 σ{b 1 } s{b 1 } σ{b 1 } s{b 1 } σ{b 1 } = s 2 {b 1 } σ 2 {b 1 }

Studentized statistic cont. And note the following s 2 {b 1 } σ 2 {b 1 } = MSE P (Xi X ) 2 σ 2 P (Xi X ) 2 = MSE σ 2 = SSE σ 2 (n 2) where we know (by the given simple version of Cochran s theorem) that the distribution of the last term is χ 2 and indep. of b 1 and b 0 SSE σ 2 (n 2) χ2 (n 2) n 2

Studentized statistic final But by the given definition of the t distribution we have our result b 1 β 1 s{b 1 } t(n 2) because putting everything together we can see that b 1 β 1 s{b 1 } z χ 2 (n 2) n 2

Confidence Intervals and Hypothesis Tests Now that we know the sampling distribution of b 1 (t with n-2 degrees of freedom) we can construct confidence intervals and hypothesis tests easily Things to think about What does the t-distribution look like? Why is the estimator distributed according to a t-distribution rather than a normal distribution? When performing tests why does this matter? When is it safe to cheat and use a normal approximation?