# Statistics in Geophysics: Linear Regression II

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28

2 Model definition Suppose we have the following model under consideration: y = Xβ + ɛ, where y = (y 1..., y n ) is an n 1 vector of observations on the response, β = (β 0, β 1,..., β k ) is a (k + 1) 1 vector of parameters, ɛ = (ɛ 1,..., ɛ n ) is an n 1 vector of random errors, and X is the n (k + 1) design matrix with X = 1 x 11 x 1k... 1 x n1 x nk. Winter Term 2013/14 2/28

3 Model assumptions The following assumptions are made: 1 E(ɛ) = 0. 2 Cov(ɛ) = σ 2 I. 3 The design matrix has full column rank, that is, rank(x) = k + 1 = p. 4 The normal regression model is obtained if additionally ɛ N (0, σ 2 I). For stochastic covariates these assumptions are to be understood conditionally on X. It follows that E(y) = Xβ and Cov(y) = σ 2 I. In case of assumption 4, we have y N (Xβ, σ 2 I). Winter Term 2013/14 3/28

4 Modelling the effects of continuous covariates We can fit nonlinear relationships between continuous covariates and the response within the scope of linear models. Two simple methods for dealing with nonlinearity: 1 Variable transformation; 2 Polynomial regression. Sometimes it is customary to transform the continuous response as well. Winter Term 2013/14 4/28

5 Modelling the effects of categorical covariates We might want to include a categorical variable with two or more distinct levels. In such a case we cannot set up a continuous scale for the categorical variable. A remedy is to define new covariates, so-called dummy variables, and estimate a separate effect for each category of the original covariate. We can deal with c categories by the introduction of c 1 dummy variables. Winter Term 2013/14 5/28

6 Example: Turkey data weightp agew origin G G G G V V V V W W W W W Turkey weights in pounds, ages in weeks, and origin (Georgia (G); Virgina (V); Wisconsin (W)), of 13 Thanksgiving turkeys. Winter Term 2013/14 6/28

7 Dummy coding for categorical covariates Given a covariate x {1,..., c} with c categories, we define the c 1 dummy variables x i1 = { 1 xi = 1 0 otherwise,... x i,c 1 = { 1 xi = c 1 0 otherwise, for i = 1..., n, and include them as explanatory variables in the regression model y i = β 0 + β 1 x i β i,c 1 x i,c ɛ i. For reasons of identifiability, we omit the dummy variable for category c, where c is the reference category. Winter Term 2013/14 7/28

8 Design matrix for the turkey data using dummy coding > X (Intercept) agew originv originw Winter Term 2013/14 8/28

9 Interactions between covariates An interaction between predictor variables exists if the effect of a covariate depends on the value of at least one other covariate. Consider the following model between a response y and two covariates x 1 and x 2 : y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + ɛ, where the term β 3 x 1 x 2 is called an interaction between x 1 and x 2. The terms β 1 x 1 and β 2 x 2 depend on only one variable and are called main effects. Winter Term 2013/14 9/28

10 Least squares estimation of regression coefficients The error sum of squares is ɛ ɛ = (y Xβ) (y Xβ) = y y β X y y Xβ + β X Xβ = y y 2β X y + β X Xβ. The least squares estimator of β is the value ˆβ, which minimizes ɛ ɛ. Minimizing ɛ ɛ with respect to β yields ˆβ = (X X) 1 X y, which is obtained irrespective of any distribution properties of the errors. Winter Term 2013/14 10/28

11 Maximum likelihood estimation of regression coefficients Assuming normally distributed errors yields the likelihood ( L(β, σ 2 1 ) = exp 1 ) (2πσ 2 n/2 ) 2σ 2 (y Xβ) (y Xβ). The log-likelihood is given by l(β, σ 2 ) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 (y Xβ) (y Xβ). Maximizing l(β, σ 2 ) with respect to β is equivalent to minimizing (y Xβ) (y Xβ), which is the least squares criterion. The MLE, ˆβ ML, therefore is identical to the least squares estimator. Winter Term 2013/14 11/28

12 Fitted values and residuals Based on ˆβ = (X X) 1 X y, we can estimate the (conditional) mean of y by Ê(y) = ŷ = Xˆβ. Substituting the least squares estimator further results in ŷ = X(X X) 1 X y = Hy, where the n n matrix H is called the hat matrix. The residuals are ˆɛ = y ŷ = y Hy = (I H)y. Winter Term 2013/14 12/28

13 Estimation of the error variance Maximization of the log-likelihood with respect to σ 2 yields ˆσ 2 ML = 1 n (y Xˆβ) (y Xˆβ) = 1 n (y ŷ) (y ŷ) = 1 n ˆɛ ˆɛ. Since E(ˆσ ML) 2 = n p n the MLE of σ 2 is biased. σ 2, An unbiased estimator is ˆσ 2 = 1 n p ˆɛ ˆɛ. Winter Term 2013/14 13/28

14 Properties of the least squares estimator For the least squares estimator we have E(ˆβ) = β. Cov(ˆβ) = σ 2 (X X) 1. Gauss-Markov Theorem: Among all linear and unbiased estimators ˆβ L, the least squares estimator has minimal variances, implying Var( ˆβ j ) Var( ˆβ L j ), j = 0,..., k. If ɛ N (0, σ 2 I), then ˆβ N (β, σ 2 (X X) 1 ). Winter Term 2013/14 14/28

15 ANOVA table for multiple linear regression and R 2 Source Degrees of freedom Sum of squares Mean square F -value of variation (df) (SS) (MS) Regression k SSR MSR = SSR/k Residual n p = n (k + 1) SSE ˆσ 2 = SSE n p Total n 1 SST The multiple coefficient of determination is still computed as R 2 = SST SSR, but is no longer the square of the Pearson correlation between the response and any of the predictor variables. MSR ˆσ 2 Winter Term 2013/14 15/28

16 Interval estimation and tests We would like to construct confidence intervals and statistical tests for hypotheses regarding the unknown regression parameters β. A requirement for the construction of tests and confidence intervals is the assumption of normally distributed errors. However, tests and confidence intervals are relatively robust to mild departures from the normality assumption. Moreover, tests and confidence intervals, derived under the assumption of normality, remain valid for large sample size even with non-normal errors. Winter Term 2013/14 16/28

17 Testing linear hypotheses Hypotheses: 1 General linear hypothesis: H 0 : Cβ = d against H 1 : Cβ d, where C is a r p matrix with rank(c) = r p (r linear independent restrictions) and d is a p 1 vector. 2 Test of significance (t-test): H 0 : β j = 0 against H 1 : β j 0. Winter Term 2013/14 17/28

18 Testing linear hypotheses Hypotheses: 3 Composite test of subvector: H 0 : β 1 = 0 against H 1 : β Test for significance of regression: H 0 : β 1 = β 2 = = β k = 0 against H 1 : β j 0 for at least one j {1,..., k}. Winter Term 2013/14 18/28

19 Testing linear hypotheses Test statistics: 1 F = 1 r (Cˆβ d) (ˆσ 2 C(X X) 1 C ) 1 (Cˆβ d) F r,n p. 2 t j = ˆβ j ˆσ ˆβj t n p, where ˆσ ˆβ j denotes the standard error of ˆβ j. 3 F = 1 r (ˆβ 1 ) Ĉov(ˆβ 1 ) 1 (ˆβ 1 ) F r,n p. 4 F = n p k SSR SSE = n p k R 2 1 R 2 F k,n p. Winter Term 2013/14 19/28

20 Testing linear hypotheses Critical values: Reject H 0 in the case of: 1 F > F r,n p (1 α). 2 t > t n p (1 α/2). 3 F > F r,n p (1 α). 4 F > F k,n p (1 α) Winter Term 2013/14 20/28

21 Confidence intervals and regions for regression coefficients Confidence interval for β j : A confidence interval for β j with level 1 α is given by [ ˆβ j t n p (1 α/2)ˆσ ˆβ j, ˆβ j + t n p (1 α/2)ˆσ ˆβj ]. Confidence region for subvector β 1 : A confidence ellipsoid for β 1 = (β 1,..., β r ) with level 1 α is given by { β 1 : 1 r (ˆβ 1 β 1 ) Ĉov(ˆβ } 1 ) 1 (ˆβ 1 β 1 ) F r,n p (1 α). Winter Term 2013/14 21/28

22 Prediction intervals Confidence interval for the (conditional) mean of a future observation: A confidence interval for E(y 0 ) of a future observation y 0 at location x 0 with level 1 α is given by x 0 ˆβ ± t n p (1 α/2)ˆσ(x 0 (X X) 1 x 0 ) 1/2. Prediction interval for a future observation: A prediction interval for a future observation y 0 at location x 0 with level 1 α is given by x 0 ˆβ ± t n p (1 α/2)ˆσ(1 + x 0 (X X) 1 x 0 ) 1/2. Winter Term 2013/14 22/28

23 The corrected coefficient of determination We already defined the coefficient of determination, R 2, as a measure for the goodness-of-fit to the data. The use of R 2 is limited, since it will never decrease with the addition of a new covariate into the model. The corrected coefficient of determination, Radj 2, adjusts for this problem, by including a correction term for the number of parameters. It is defined by R 2 adj = 1 n 1 n p (1 R2 ). Winter Term 2013/14 23/28

24 Akaike information criterion The Akaike information criterion (AIC) is one of the most widely used criteria for model choice within the scope of likelihood-based inference. The AIC is defined by AIC = 2l(ˆβ ML, σ 2 ML) + 2(p + 1), where l(ˆβ ML, σml 2 ) is the maximum value of the log-likelihood. Smaller values of the AIC correspond to a better model fit. Winter Term 2013/14 24/28

25 Bayesian information criterion The Bayesian information criterion (BIC) is defined by BIC = 2l(ˆβ ML, σ 2 ML) + log(n)(p + 1). The BIC multiplied by 1/2 is also known as Schwartz criterion. Smaller values of the BIC correspond to a better model fit. The BIC penalizes complex models much more than the AIC. Winter Term 2013/14 25/28

26 Practical use of model choice criteria To select the most promising models from candidate models, we first obtain a preselection of potential models. All potential models can now be assessed with the aid of one of the various model choice criteria (AIC, BIC). This method is not always practical, since the number of regressor variables and modelling variants can be very large in many applications. In this case, we can use the following partially heuristic methods. Winter Term 2013/14 26/28

27 Practical use of model choice criteria II Complete model selection: In case that the number of predictor variables is not too large, we can determine the best model with the leaps-and-bounds algorithm. Forward selection: 1 Based on a starting model, forward selection includes one additional variable in every iteration of the algorithm. 2 The variable which offers the greatest reduction of a preselected model choice criterion is chosen. 3 The algorithm terminates if no further reduction is possible. Winter Term 2013/14 27/28

28 Practical use of model choice criteria III Backward elimination: 1 Backward elimination starts with the full model containing all predictor variables. 2 Subsequently, in every iteration, the covariate which provides the greatest reduction of the model choice criterion is eliminated from the model. 3 The algorithm terminates if no further reduction is possible. Stepwise selection: Stepwise selection is a combination of forward selection and backward elimination. In every iteration of the algorithm, a predictor variable may be added to the model or removed from the model. Winter Term 2013/14 28/28

### Lecture 5 Hypothesis Testing in Multiple Linear Regression

Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Notes on Applied Linear Regression

Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### t-tests and F-tests in regression

t-tests and F-tests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and F-tests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Fixed vs. Random Effects

Statistics 203: Introduction to Regression and Analysis of Variance Fixed vs. Random Effects Jonathan Taylor - p. 1/19 Today s class Implications for Random effects. One-way random effects ANOVA. Two-way

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Statistiek (WISB361)

Statistiek (WISB361) Final exam June 29, 2015 Schrijf uw naam op elk in te leveren vel. Schrijf ook uw studentnummer op blad 1. The maximum number of points is 100. Points distribution: 23 20 20 20 17

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### ANOVA. February 12, 2015

ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Instrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s.

Instrumental Variables Regression Instrumental Variables (IV) estimation is used when the model has endogenous s. IV can thus be used to address the following important threats to internal validity: Omitted

### Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

### Lecture 2: Simple Linear Regression

DMBA: Statistics Lecture 2: Simple Linear Regression Least Squares, SLR properties, Inference, and Forecasting Carlos Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching

### Part II. Multiple Linear Regression

Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

### Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

### Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

### Multiple Linear Regression. Multiple linear regression is the extension of simple linear regression to the case of two or more independent variables.

1 Multiple Linear Regression Basic Concepts Multiple linear regression is the extension of simple linear regression to the case of two or more independent variables. In simple linear regression, we had

### Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated.

Collinearity of independent variables Collinearity is a condition in which some of the independent variables are highly correlated. Why is this a problem? Collinearity tends to inflate the variance of

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### Using JMP with a Specific

1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### S6: Spatial regression models: : OLS estimation and testing

: Spatial regression models: OLS estimation and testing Course on Spatial Econometrics with Applications Profesora: Coro Chasco Yrigoyen Universidad Autónoma de Madrid Lugar: Universidad Politécnica de

### Inferences About Differences Between Means Edpsy 580

Inferences About Differences Between Means Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Inferences About Differences Between Means Slide

### The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

Chapter 8 The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information An important new development that we encounter in this chapter is using the F- distribution to simultaneously

### General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### Hedonism example. Our questions in the last session. Our questions in this session

Random Slope Models Hedonism example Our questions in the last session Do differences between countries in hedonism remain after controlling for individual age? How much of the variation in hedonism is

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### 1 Theory: The General Linear Model

QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear

### 1 Another method of estimation: least squares

1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Sections 2.11 and 5.8

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

### 1. χ 2 minimization 2. Fits in case of of systematic errors

Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first

### The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

### Lectures 8, 9 & 10. Multiple Regression Analysis

Lectures 8, 9 & 0. Multiple Regression Analysis In which you learn how to apply the principles and tests outlined in earlier lectures to more realistic models involving more than explanatory variable and

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Multiple Regression - Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation

Multiple Regression - Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent

### 3.1 Least squares in matrix form

118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS. Abstract

TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS GARY G. VENTER Abstract The use of age-to-age factors applied to cumulative losses has been shown to produce least-squares optimal reserve estimates when certain

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty