Statistics in Geophysics: Linear Regression II

Size: px
Start display at page:

Download "Statistics in Geophysics: Linear Regression II"

Transcription

1 Statistics in Geophysics: Linear Regression II Steffen Unkel Department of Statistics Ludwig-Maximilians-University Munich, Germany Winter Term 2013/14 1/28

2 Model definition Suppose we have the following model under consideration: y = Xβ + ɛ, where y = (y 1..., y n ) is an n 1 vector of observations on the response, β = (β 0, β 1,..., β k ) is a (k + 1) 1 vector of parameters, ɛ = (ɛ 1,..., ɛ n ) is an n 1 vector of random errors, and X is the n (k + 1) design matrix with X = 1 x 11 x 1k... 1 x n1 x nk. Winter Term 2013/14 2/28

3 Model assumptions The following assumptions are made: 1 E(ɛ) = 0. 2 Cov(ɛ) = σ 2 I. 3 The design matrix has full column rank, that is, rank(x) = k + 1 = p. 4 The normal regression model is obtained if additionally ɛ N (0, σ 2 I). For stochastic covariates these assumptions are to be understood conditionally on X. It follows that E(y) = Xβ and Cov(y) = σ 2 I. In case of assumption 4, we have y N (Xβ, σ 2 I). Winter Term 2013/14 3/28

4 Modelling the effects of continuous covariates We can fit nonlinear relationships between continuous covariates and the response within the scope of linear models. Two simple methods for dealing with nonlinearity: 1 Variable transformation; 2 Polynomial regression. Sometimes it is customary to transform the continuous response as well. Winter Term 2013/14 4/28

5 Modelling the effects of categorical covariates We might want to include a categorical variable with two or more distinct levels. In such a case we cannot set up a continuous scale for the categorical variable. A remedy is to define new covariates, so-called dummy variables, and estimate a separate effect for each category of the original covariate. We can deal with c categories by the introduction of c 1 dummy variables. Winter Term 2013/14 5/28

6 Example: Turkey data weightp agew origin G G G G V V V V W W W W W Turkey weights in pounds, ages in weeks, and origin (Georgia (G); Virgina (V); Wisconsin (W)), of 13 Thanksgiving turkeys. Winter Term 2013/14 6/28

7 Dummy coding for categorical covariates Given a covariate x {1,..., c} with c categories, we define the c 1 dummy variables x i1 = { 1 xi = 1 0 otherwise,... x i,c 1 = { 1 xi = c 1 0 otherwise, for i = 1..., n, and include them as explanatory variables in the regression model y i = β 0 + β 1 x i β i,c 1 x i,c ɛ i. For reasons of identifiability, we omit the dummy variable for category c, where c is the reference category. Winter Term 2013/14 7/28

8 Design matrix for the turkey data using dummy coding > X (Intercept) agew originv originw Winter Term 2013/14 8/28

9 Interactions between covariates An interaction between predictor variables exists if the effect of a covariate depends on the value of at least one other covariate. Consider the following model between a response y and two covariates x 1 and x 2 : y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 + ɛ, where the term β 3 x 1 x 2 is called an interaction between x 1 and x 2. The terms β 1 x 1 and β 2 x 2 depend on only one variable and are called main effects. Winter Term 2013/14 9/28

10 Least squares estimation of regression coefficients The error sum of squares is ɛ ɛ = (y Xβ) (y Xβ) = y y β X y y Xβ + β X Xβ = y y 2β X y + β X Xβ. The least squares estimator of β is the value ˆβ, which minimizes ɛ ɛ. Minimizing ɛ ɛ with respect to β yields ˆβ = (X X) 1 X y, which is obtained irrespective of any distribution properties of the errors. Winter Term 2013/14 10/28

11 Maximum likelihood estimation of regression coefficients Assuming normally distributed errors yields the likelihood ( L(β, σ 2 1 ) = exp 1 ) (2πσ 2 n/2 ) 2σ 2 (y Xβ) (y Xβ). The log-likelihood is given by l(β, σ 2 ) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 (y Xβ) (y Xβ). Maximizing l(β, σ 2 ) with respect to β is equivalent to minimizing (y Xβ) (y Xβ), which is the least squares criterion. The MLE, ˆβ ML, therefore is identical to the least squares estimator. Winter Term 2013/14 11/28

12 Fitted values and residuals Based on ˆβ = (X X) 1 X y, we can estimate the (conditional) mean of y by Ê(y) = ŷ = Xˆβ. Substituting the least squares estimator further results in ŷ = X(X X) 1 X y = Hy, where the n n matrix H is called the hat matrix. The residuals are ˆɛ = y ŷ = y Hy = (I H)y. Winter Term 2013/14 12/28

13 Estimation of the error variance Maximization of the log-likelihood with respect to σ 2 yields ˆσ 2 ML = 1 n (y Xˆβ) (y Xˆβ) = 1 n (y ŷ) (y ŷ) = 1 n ˆɛ ˆɛ. Since E(ˆσ ML) 2 = n p n the MLE of σ 2 is biased. σ 2, An unbiased estimator is ˆσ 2 = 1 n p ˆɛ ˆɛ. Winter Term 2013/14 13/28

14 Properties of the least squares estimator For the least squares estimator we have E(ˆβ) = β. Cov(ˆβ) = σ 2 (X X) 1. Gauss-Markov Theorem: Among all linear and unbiased estimators ˆβ L, the least squares estimator has minimal variances, implying Var( ˆβ j ) Var( ˆβ L j ), j = 0,..., k. If ɛ N (0, σ 2 I), then ˆβ N (β, σ 2 (X X) 1 ). Winter Term 2013/14 14/28

15 ANOVA table for multiple linear regression and R 2 Source Degrees of freedom Sum of squares Mean square F -value of variation (df) (SS) (MS) Regression k SSR MSR = SSR/k Residual n p = n (k + 1) SSE ˆσ 2 = SSE n p Total n 1 SST The multiple coefficient of determination is still computed as R 2 = SST SSR, but is no longer the square of the Pearson correlation between the response and any of the predictor variables. MSR ˆσ 2 Winter Term 2013/14 15/28

16 Interval estimation and tests We would like to construct confidence intervals and statistical tests for hypotheses regarding the unknown regression parameters β. A requirement for the construction of tests and confidence intervals is the assumption of normally distributed errors. However, tests and confidence intervals are relatively robust to mild departures from the normality assumption. Moreover, tests and confidence intervals, derived under the assumption of normality, remain valid for large sample size even with non-normal errors. Winter Term 2013/14 16/28

17 Testing linear hypotheses Hypotheses: 1 General linear hypothesis: H 0 : Cβ = d against H 1 : Cβ d, where C is a r p matrix with rank(c) = r p (r linear independent restrictions) and d is a p 1 vector. 2 Test of significance (t-test): H 0 : β j = 0 against H 1 : β j 0. Winter Term 2013/14 17/28

18 Testing linear hypotheses Hypotheses: 3 Composite test of subvector: H 0 : β 1 = 0 against H 1 : β Test for significance of regression: H 0 : β 1 = β 2 = = β k = 0 against H 1 : β j 0 for at least one j {1,..., k}. Winter Term 2013/14 18/28

19 Testing linear hypotheses Test statistics: 1 F = 1 r (Cˆβ d) (ˆσ 2 C(X X) 1 C ) 1 (Cˆβ d) F r,n p. 2 t j = ˆβ j ˆσ ˆβj t n p, where ˆσ ˆβ j denotes the standard error of ˆβ j. 3 F = 1 r (ˆβ 1 ) Ĉov(ˆβ 1 ) 1 (ˆβ 1 ) F r,n p. 4 F = n p k SSR SSE = n p k R 2 1 R 2 F k,n p. Winter Term 2013/14 19/28

20 Testing linear hypotheses Critical values: Reject H 0 in the case of: 1 F > F r,n p (1 α). 2 t > t n p (1 α/2). 3 F > F r,n p (1 α). 4 F > F k,n p (1 α) Winter Term 2013/14 20/28

21 Confidence intervals and regions for regression coefficients Confidence interval for β j : A confidence interval for β j with level 1 α is given by [ ˆβ j t n p (1 α/2)ˆσ ˆβ j, ˆβ j + t n p (1 α/2)ˆσ ˆβj ]. Confidence region for subvector β 1 : A confidence ellipsoid for β 1 = (β 1,..., β r ) with level 1 α is given by { β 1 : 1 r (ˆβ 1 β 1 ) Ĉov(ˆβ } 1 ) 1 (ˆβ 1 β 1 ) F r,n p (1 α). Winter Term 2013/14 21/28

22 Prediction intervals Confidence interval for the (conditional) mean of a future observation: A confidence interval for E(y 0 ) of a future observation y 0 at location x 0 with level 1 α is given by x 0 ˆβ ± t n p (1 α/2)ˆσ(x 0 (X X) 1 x 0 ) 1/2. Prediction interval for a future observation: A prediction interval for a future observation y 0 at location x 0 with level 1 α is given by x 0 ˆβ ± t n p (1 α/2)ˆσ(1 + x 0 (X X) 1 x 0 ) 1/2. Winter Term 2013/14 22/28

23 The corrected coefficient of determination We already defined the coefficient of determination, R 2, as a measure for the goodness-of-fit to the data. The use of R 2 is limited, since it will never decrease with the addition of a new covariate into the model. The corrected coefficient of determination, Radj 2, adjusts for this problem, by including a correction term for the number of parameters. It is defined by R 2 adj = 1 n 1 n p (1 R2 ). Winter Term 2013/14 23/28

24 Akaike information criterion The Akaike information criterion (AIC) is one of the most widely used criteria for model choice within the scope of likelihood-based inference. The AIC is defined by AIC = 2l(ˆβ ML, σ 2 ML) + 2(p + 1), where l(ˆβ ML, σml 2 ) is the maximum value of the log-likelihood. Smaller values of the AIC correspond to a better model fit. Winter Term 2013/14 24/28

25 Bayesian information criterion The Bayesian information criterion (BIC) is defined by BIC = 2l(ˆβ ML, σ 2 ML) + log(n)(p + 1). The BIC multiplied by 1/2 is also known as Schwartz criterion. Smaller values of the BIC correspond to a better model fit. The BIC penalizes complex models much more than the AIC. Winter Term 2013/14 25/28

26 Practical use of model choice criteria To select the most promising models from candidate models, we first obtain a preselection of potential models. All potential models can now be assessed with the aid of one of the various model choice criteria (AIC, BIC). This method is not always practical, since the number of regressor variables and modelling variants can be very large in many applications. In this case, we can use the following partially heuristic methods. Winter Term 2013/14 26/28

27 Practical use of model choice criteria II Complete model selection: In case that the number of predictor variables is not too large, we can determine the best model with the leaps-and-bounds algorithm. Forward selection: 1 Based on a starting model, forward selection includes one additional variable in every iteration of the algorithm. 2 The variable which offers the greatest reduction of a preselected model choice criterion is chosen. 3 The algorithm terminates if no further reduction is possible. Winter Term 2013/14 27/28

28 Practical use of model choice criteria III Backward elimination: 1 Backward elimination starts with the full model containing all predictor variables. 2 Subsequently, in every iteration, the covariate which provides the greatest reduction of the model choice criterion is eliminated from the model. 3 The algorithm terminates if no further reduction is possible. Stepwise selection: Stepwise selection is a combination of forward selection and backward elimination. In every iteration of the algorithm, a predictor variable may be added to the model or removed from the model. Winter Term 2013/14 28/28

Lecture 5 Hypothesis Testing in Multiple Linear Regression

Lecture 5 Hypothesis Testing in Multiple Linear Regression Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

Yiming Peng, Department of Statistics. February 12, 2013

Yiming Peng, Department of Statistics. February 12, 2013 Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

More information

Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

More information

Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212. Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

More information

Regression, least squares

Regression, least squares Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

t-tests and F-tests in regression

t-tests and F-tests in regression t-tests and F-tests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and F-tests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Fixed vs. Random Effects

Fixed vs. Random Effects Statistics 203: Introduction to Regression and Analysis of Variance Fixed vs. Random Effects Jonathan Taylor - p. 1/19 Today s class Implications for Random effects. One-way random effects ANOVA. Two-way

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014. University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

More information

ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Statistiek (WISB361)

Statistiek (WISB361) Statistiek (WISB361) Final exam June 29, 2015 Schrijf uw naam op elk in te leveren vel. Schrijf ook uw studentnummer op blad 1. The maximum number of points is 100. Points distribution: 23 20 20 20 17

More information

INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

More information

Instrumental Variables & 2SLS

Instrumental Variables & 2SLS Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Instrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s.

Instrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s. Instrumental Variables Regression Instrumental Variables (IV) estimation is used when the model has endogenous s. IV can thus be used to address the following important threats to internal validity: Omitted

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Lecture 2: Simple Linear Regression

Lecture 2: Simple Linear Regression DMBA: Statistics Lecture 2: Simple Linear Regression Least Squares, SLR properties, Inference, and Forecasting Carlos Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching

More information

Part II. Multiple Linear Regression

Part II. Multiple Linear Regression Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

Multiple Linear Regression. Multiple linear regression is the extension of simple linear regression to the case of two or more independent variables.

Multiple Linear Regression. Multiple linear regression is the extension of simple linear regression to the case of two or more independent variables. 1 Multiple Linear Regression Basic Concepts Multiple linear regression is the extension of simple linear regression to the case of two or more independent variables. In simple linear regression, we had

More information

Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated.

Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated. Collinearity of independent variables Collinearity is a condition in which some of the independent variables are highly correlated. Why is this a problem? Collinearity tends to inflate the variance of

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Using JMP with a Specific

Using JMP with a Specific 1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

S6: Spatial regression models: : OLS estimation and testing

S6: Spatial regression models: : OLS estimation and testing : Spatial regression models: OLS estimation and testing Course on Spatial Econometrics with Applications Profesora: Coro Chasco Yrigoyen Universidad Autónoma de Madrid Lugar: Universidad Politécnica de

More information

Inferences About Differences Between Means Edpsy 580

Inferences About Differences Between Means Edpsy 580 Inferences About Differences Between Means Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Inferences About Differences Between Means Slide

More information

The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information Chapter 8 The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information An important new development that we encounter in this chapter is using the F- distribution to simultaneously

More information

General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae ) (N-2) (1 - r 2 YX General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

More information

L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Lesson Lesson Outline Outline

Lesson Lesson Outline Outline Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

More information

Hedonism example. Our questions in the last session. Our questions in this session

Hedonism example. Our questions in the last session. Our questions in this session Random Slope Models Hedonism example Our questions in the last session Do differences between countries in hedonism remain after controlling for individual age? How much of the variation in hedonism is

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

1 Theory: The General Linear Model

1 Theory: The General Linear Model QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the

More information

REGRESSION LINES IN STATA

REGRESSION LINES IN STATA REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

More information

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p. Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

More information

Instrumental Variables & 2SLS

Instrumental Variables & 2SLS Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

1. χ 2 minimization 2. Fits in case of of systematic errors

1. χ 2 minimization 2. Fits in case of of systematic errors Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Lectures 8, 9 & 10. Multiple Regression Analysis

Lectures 8, 9 & 10. Multiple Regression Analysis Lectures 8, 9 & 0. Multiple Regression Analysis In which you learn how to apply the principles and tests outlined in earlier lectures to more realistic models involving more than explanatory variable and

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Multiple Regression - Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation

Multiple Regression - Selecting the Best Equation An Example Techniques for Selecting the Best Regression Equation Multiple Regression - Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent

More information

3.1 Least squares in matrix form

3.1 Least squares in matrix form 118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS. Abstract

TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS. Abstract TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS GARY G. VENTER Abstract The use of age-to-age factors applied to cumulative losses has been shown to produce least-squares optimal reserve estimates when certain

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

More information

Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

More information