Statistics in Geophysics: Linear Regression II

Similar documents
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Multiple Linear Regression in Data Mining

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Notes on Applied Linear Regression

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Penalized regression: Introduction

Introduction to General and Generalized Linear Models

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Least Squares Estimation

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

Statistical Models in R

Simple Linear Regression Inference

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Module 5: Multiple Regression Analysis

5. Multiple regression

Multivariate Normal Distribution

GLM I An Introduction to Generalized Linear Models

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Chapter 6: Multivariate Cointegration Analysis

2. Linear regression with multiple regressors

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Factor analysis. Angela Montanari

Introduction to Regression and Data Analysis

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

MATHEMATICAL METHODS OF STATISTICS

ANOVA. February 12, 2015

Chapter 7: Simple linear regression Learning Objectives

A Basic Introduction to Missing Data

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Logistic Regression.

Part II. Multiple Linear Regression

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Poisson Models for Count Data

Additional sources Compilation of sources:

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Multivariate Statistical Inference and Applications

Estimation of σ 2, the variance of ɛ

Quadratic forms Cochran s theorem, degrees of freedom, and all that

SAS Software to Fit the Generalized Linear Model

Econometrics Simple Linear Regression

2013 MBA Jump Start Program. Statistics Module Part 3

General Regression Formulae ) (N-2) (1 - r 2 YX

SYSTEMS OF REGRESSION EQUATIONS

Multiple Linear Regression

1 Theory: The General Linear Model

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Simple Regression Theory II 2010 Samuel L. Baker

1 Another method of estimation: least squares

Elements of statistics (MATH0487-1)

Review Jeopardy. Blue vs. Orange. Review Jeopardy

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Sections 2.11 and 5.8

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

1 Simple Linear Regression I Least Squares Estimation

Statistics in Retail Finance. Chapter 6: Behavioural models

One-Way Analysis of Variance (ANOVA) Example Problem

Regression Analysis: A Complete Example

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

3.1 Least squares in matrix form

Chapter 6: Point Estimation. Fall Probability & Statistics

From the help desk: Bootstrapped standard errors

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Chapter 13 Introduction to Linear Regression and Correlation Analysis

BayesX - Software for Bayesian Inference in Structured Additive Regression

Final Exam Practice Problem Answers

How to do hydrological data validation using regression

Statistics Graduate Courses

Empirical Model-Building and Response Surfaces

MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Regression III: Advanced Methods

Linear Regression. Guy Lebanon

College Readiness LINKING STUDY

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Regression step-by-step using Microsoft Excel

TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS. Abstract

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

The Method of Least Squares

Study Guide for the Final Exam

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Elementary Statistics Sample Exam #3

Part 2: Analysis of Relationship Between Two Variables

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

Lecture 3: Linear methods for classification

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Smoothing and Non-Parametric Regression

Section 13, Part 1 ANOVA. Analysis Of Variance

Transcription:

Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28

Model definition Suppose we have the following model under consideration: y = Xβ + ɛ, where y = (y 1..., y n ) is an n 1 vector of observations on the response, β = (β 0, β 1,..., β k ) is a (k + 1) 1 vector of parameters, ɛ = (ɛ 1,..., ɛ n ) is an n 1 vector of random errors, and X is the n (k + 1) design matrix with X = 1 x 11 x 1k... 1 x n1 x nk. Winter Term 2013/14 2/28

Model assumptions The following assumptions are made: 1 E(ɛ) = 0. 2 Cov(ɛ) = σ 2 I. 3 The design matrix has full column rank, that is, rank(x) = k + 1 = p. 4 The normal regression model is obtained if additionally ɛ N (0, σ 2 I). For stochastic covariates these assumptions are to be understood conditionally on X. It follows that E(y) = Xβ and Cov(y) = σ 2 I. In case of assumption 4, we have y N (Xβ, σ 2 I). Winter Term 2013/14 3/28

Modelling the effects of continuous covariates We can fit nonlinear relationships between continuous covariates and the response within the scope of linear models. Two simple methods for dealing with nonlinearity: 1 Variable transformation; 2 Polynomial regression. Sometimes it is customary to transform the continuous response as well. Winter Term 2013/14 4/28

Modelling the effects of categorical covariates We might want to include a categorical variable with two or more distinct levels. In such a case we cannot set up a continuous scale for the categorical variable. A remedy is to define new covariates, so-called dummy variables, and estimate a separate effect for each category of the original covariate. We can deal with c categories by the introduction of c 1 dummy variables. Winter Term 2013/14 5/28

Example: Turkey data weightp agew origin 1 13.3 28 G 2 8.9 20 G 3 15.1 32 G 4 10.4 22 G 5 13.1 29 V 6 12.4 27 V 7 13.2 28 V 8 11.8 26 V 9 11.5 21 W 10 14.2 27 W 11 15.4 29 W 12 13.1 23 W 13 13.8 25 W Turkey weights in pounds, ages in weeks, and origin (Georgia (G); Virgina (V); Wisconsin (W)), of 13 Thanksgiving turkeys. Winter Term 2013/14 6/28

Dummy coding for categorical covariates Given a covariate x {1,..., c} with c categories, we define the c 1 dummy variables x i1 = { 1 xi = 1 0 otherwise,... x i,c 1 = { 1 xi = c 1 0 otherwise, for i = 1..., n, and include them as explanatory variables in the regression model y i = β 0 + β 1 x i1 +... + β i,c 1 x i,c 1 +... + ɛ i. For reasons of identifiability, we omit the dummy variable for category c, where c is the reference category. Winter Term 2013/14 7/28

Design matrix for the turkey data using dummy coding > X (Intercept) agew originv originw 1 1 28 0 0 2 1 20 0 0 3 1 32 0 0 4 1 22 0 0 5 1 29 1 0 6 1 27 1 0 7 1 28 1 0 8 1 26 1 0 9 1 21 0 1 10 1 27 0 1 11 1 29 0 1 12 1 23 0 1 13 1 25 0 1 Winter Term 2013/14 8/28

Interactions between covariates An interaction between predictor variables exists if the effect of a covariate depends on the value of at least one other covariate. Consider the following model between a response y and two covariates x 1 and x 2 : y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + ɛ, where the term β 3 x 1 x 2 is called an interaction between x 1 and x 2. The terms β 1 x 1 and β 2 x 2 depend on only one variable and are called main effects. Winter Term 2013/14 9/28

Least squares estimation of regression coefficients The error sum of squares is ɛ ɛ = (y Xβ) (y Xβ) = y y β X y y Xβ + β X Xβ = y y 2β X y + β X Xβ. The least squares estimator of β is the value ˆβ, which minimizes ɛ ɛ. Minimizing ɛ ɛ with respect to β yields ˆβ = (X X) 1 X y, which is obtained irrespective of any distribution properties of the errors. Winter Term 2013/14 10/28

Maximum likelihood estimation of regression coefficients Assuming normally distributed errors yields the likelihood ( L(β, σ 2 1 ) = exp 1 ) (2πσ 2 n/2 ) 2σ 2 (y Xβ) (y Xβ). The log-likelihood is given by l(β, σ 2 ) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 (y Xβ) (y Xβ). Maximizing l(β, σ 2 ) with respect to β is equivalent to minimizing (y Xβ) (y Xβ), which is the least squares criterion. The MLE, ˆβ ML, therefore is identical to the least squares estimator. Winter Term 2013/14 11/28

Fitted values and residuals Based on ˆβ = (X X) 1 X y, we can estimate the (conditional) mean of y by Ê(y) = ŷ = Xˆβ. Substituting the least squares estimator further results in ŷ = X(X X) 1 X y = Hy, where the n n matrix H is called the hat matrix. The residuals are ˆɛ = y ŷ = y Hy = (I H)y. Winter Term 2013/14 12/28

Estimation of the error variance Maximization of the log-likelihood with respect to σ 2 yields ˆσ 2 ML = 1 n (y Xˆβ) (y Xˆβ) = 1 n (y ŷ) (y ŷ) = 1 n ˆɛ ˆɛ. Since E(ˆσ ML) 2 = n p n the MLE of σ 2 is biased. σ 2, An unbiased estimator is ˆσ 2 = 1 n p ˆɛ ˆɛ. Winter Term 2013/14 13/28

Properties of the least squares estimator For the least squares estimator we have E(ˆβ) = β. Cov(ˆβ) = σ 2 (X X) 1. Gauss-Markov Theorem: Among all linear and unbiased estimators ˆβ L, the least squares estimator has minimal variances, implying Var( ˆβ j ) Var( ˆβ L j ), j = 0,..., k. If ɛ N (0, σ 2 I), then ˆβ N (β, σ 2 (X X) 1 ). Winter Term 2013/14 14/28

ANOVA table for multiple linear regression and R 2 Source Degrees of freedom Sum of squares Mean square F -value of variation (df) (SS) (MS) Regression k SSR MSR = SSR/k Residual n p = n (k + 1) SSE ˆσ 2 = SSE n p Total n 1 SST The multiple coefficient of determination is still computed as R 2 = SST SSR, but is no longer the square of the Pearson correlation between the response and any of the predictor variables. MSR ˆσ 2 Winter Term 2013/14 15/28

Interval estimation and tests We would like to construct confidence intervals and statistical tests for hypotheses regarding the unknown regression parameters β. A requirement for the construction of tests and confidence intervals is the assumption of normally distributed errors. However, tests and confidence intervals are relatively robust to mild departures from the normality assumption. Moreover, tests and confidence intervals, derived under the assumption of normality, remain valid for large sample size even with non-normal errors. Winter Term 2013/14 16/28

Testing linear hypotheses Hypotheses: 1 General linear hypothesis: H 0 : Cβ = d against H 1 : Cβ d, where C is a r p matrix with rank(c) = r p (r linear independent restrictions) and d is a p 1 vector. 2 Test of significance (t-test): H 0 : β j = 0 against H 1 : β j 0. Winter Term 2013/14 17/28

Testing linear hypotheses Hypotheses: 3 Composite test of subvector: H 0 : β 1 = 0 against H 1 : β 1 0. 4 Test for significance of regression: H 0 : β 1 = β 2 = = β k = 0 against H 1 : β j 0 for at least one j {1,..., k}. Winter Term 2013/14 18/28

Testing linear hypotheses Test statistics: 1 F = 1 r (Cˆβ d) (ˆσ 2 C(X X) 1 C ) 1 (Cˆβ d) F r,n p. 2 t j = ˆβ j ˆσ ˆβj t n p, where ˆσ ˆβ j denotes the standard error of ˆβ j. 3 F = 1 r (ˆβ 1 ) Ĉov(ˆβ 1 ) 1 (ˆβ 1 ) F r,n p. 4 F = n p k SSR SSE = n p k R 2 1 R 2 F k,n p. Winter Term 2013/14 19/28

Testing linear hypotheses Critical values: Reject H 0 in the case of: 1 F > F r,n p (1 α). 2 t > t n p (1 α/2). 3 F > F r,n p (1 α). 4 F > F k,n p (1 α) Winter Term 2013/14 20/28

Confidence intervals and regions for regression coefficients Confidence interval for β j : A confidence interval for β j with level 1 α is given by [ ˆβ j t n p (1 α/2)ˆσ ˆβ j, ˆβ j + t n p (1 α/2)ˆσ ˆβj ]. Confidence region for subvector β 1 : A confidence ellipsoid for β 1 = (β 1,..., β r ) with level 1 α is given by { β 1 : 1 r (ˆβ 1 β 1 ) Ĉov(ˆβ } 1 ) 1 (ˆβ 1 β 1 ) F r,n p (1 α). Winter Term 2013/14 21/28

Prediction intervals Confidence interval for the (conditional) mean of a future observation: A confidence interval for E(y 0 ) of a future observation y 0 at location x 0 with level 1 α is given by x 0 ˆβ ± t n p (1 α/2)ˆσ(x 0 (X X) 1 x 0 ) 1/2. Prediction interval for a future observation: A prediction interval for a future observation y 0 at location x 0 with level 1 α is given by x 0 ˆβ ± t n p (1 α/2)ˆσ(1 + x 0 (X X) 1 x 0 ) 1/2. Winter Term 2013/14 22/28

The corrected coefficient of determination We already defined the coefficient of determination, R 2, as a measure for the goodness-of-fit to the data. The use of R 2 is limited, since it will never decrease with the addition of a new covariate into the model. The corrected coefficient of determination, Radj 2, adjusts for this problem, by including a correction term for the number of parameters. It is defined by R 2 adj = 1 n 1 n p (1 R2 ). Winter Term 2013/14 23/28

Akaike information criterion The Akaike information criterion (AIC) is one of the most widely used criteria for model choice within the scope of likelihood-based inference. The AIC is defined by AIC = 2l(ˆβ ML, σ 2 ML) + 2(p + 1), where l(ˆβ ML, σml 2 ) is the maximum value of the log-likelihood. Smaller values of the AIC correspond to a better model fit. Winter Term 2013/14 24/28

Bayesian information criterion The Bayesian information criterion (BIC) is defined by BIC = 2l(ˆβ ML, σ 2 ML) + log(n)(p + 1). The BIC multiplied by 1/2 is also known as Schwartz criterion. Smaller values of the BIC correspond to a better model fit. The BIC penalizes complex models much more than the AIC. Winter Term 2013/14 25/28

Practical use of model choice criteria To select the most promising models from candidate models, we first obtain a preselection of potential models. All potential models can now be assessed with the aid of one of the various model choice criteria (AIC, BIC). This method is not always practical, since the number of regressor variables and modelling variants can be very large in many applications. In this case, we can use the following partially heuristic methods. Winter Term 2013/14 26/28

Practical use of model choice criteria II Complete model selection: In case that the number of predictor variables is not too large, we can determine the best model with the leaps-and-bounds algorithm. Forward selection: 1 Based on a starting model, forward selection includes one additional variable in every iteration of the algorithm. 2 The variable which offers the greatest reduction of a preselected model choice criterion is chosen. 3 The algorithm terminates if no further reduction is possible. Winter Term 2013/14 27/28

Practical use of model choice criteria III Backward elimination: 1 Backward elimination starts with the full model containing all predictor variables. 2 Subsequently, in every iteration, the covariate which provides the greatest reduction of the model choice criterion is eliminated from the model. 3 The algorithm terminates if no further reduction is possible. Stepwise selection: Stepwise selection is a combination of forward selection and backward elimination. In every iteration of the algorithm, a predictor variable may be added to the model or removed from the model. Winter Term 2013/14 28/28