IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results



Similar documents
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

Lecture 15. Endogeneity & Instrumental Variable Estimation

Nonlinear Regression Functions. SW Ch 8 1/54/

August 2012 EXAMINATIONS Solution Part I

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Handling missing data in Stata a whirlwind tour

Correlation and Regression

MULTIPLE REGRESSION EXAMPLE

Linear Regression Models with Logarithmic Transformations

2. Linear regression with multiple regressors

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Discussion Section 4 ECON 139/ Summer Term II

Interaction effects between continuous variables (Optional)

5. Linear Regression

Econometrics Simple Linear Regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Multiple Linear Regression in Data Mining

Chapter 7: Dummy variable regression

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Solución del Examen Tipo: 1

Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Rockefeller College University at Albany

Standard errors of marginal effects in the heteroskedastic probit model

17. SIMPLE LINEAR REGRESSION II

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

Chapter 7: Simple linear regression Learning Objectives

Module 5: Multiple Regression Analysis

Sample Size Calculation for Longitudinal Studies

Regression Analysis (Spring, 2000)

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

SPSS Guide: Regression Analysis

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Multinomial and Ordinal Logistic Regression

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Chapter 4 and 5 solutions

5. Multiple regression

Part 2: Analysis of Relationship Between Two Variables

Simple Linear Regression Inference

25 Working with categorical data and factor variables

Introduction to Regression and Data Analysis

International Statistical Institute, 56th Session, 2007: Phil Everson

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Data Analysis Methodology 1

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

Final Exam Practice Problem Answers

A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

Week 5: Multiple Linear Regression

S TAT E P LA N N IN G OR G A N IZAT IO N

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Regression Analysis: A Complete Example

Quick Stata Guide by Liz Foster

especially with continuous

The average hotel manager recognizes the criticality of forecasting. However, most

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Premaster Statistics Tutorial 4 Full solutions

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Moderator and Mediator Analysis

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Econometrics I: Econometric Methods

Multiple Regression: What Is It?

Multiple Linear Regression

General Regression Formulae ) (N-2) (1 - r 2 YX

1 Simple Linear Regression I Least Squares Estimation

Univariate Regression

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Review of Bivariate Regression

Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, Last revised January 8, 2015

MODELING AUTO INSURANCE PREMIUMS

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Elementary Statistics Sample Exam #3

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

UNIVERSITY OF WAIKATO. Hamilton New Zealand

Testing for Lack of Fit

Simple linear regression

Regression step-by-step using Microsoft Excel

2. Simple Linear Regression

Chapter 3 Quantitative Demand Analysis

LOGIT AND PROBIT ANALYSIS

Correlated Random Effects Panel Data Models

Transcription:

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results

How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the year, 2008??? Best article of the year, 2009 0.2 Best article of the year, 200

Session 3 Topics n Multiple regression analysis What does it mean? Why is it important? How is it done and how are results interpreted? What are the hazards?

Multiple Regression Analysis n What does it mean? Multivariate analysis/statistics Ceteris paribus All else equal Controlling for

Multiple Regression Analysis n Why does it matter? y α + x + u n If E u x = E u = implying Corr( u, x ) = 0 What if y = α + βx + β2x2 + ε If = β ( ) ( ) 0 Results are biased, then = β 2x2 + ε Corr ( x x ) 0 Corr( u, x ) 0, 2 ( x, x ) 0 E u 2 = (and other conditions), we can estimate w/ multiple regressors u

Multiple Regression Analysis n Consider maize yield (mzyield) and basal fertilizer (basaprate), both kg/ha mzyield α + β basaprate + = u. reg mzyield basaprate Source SS df MS Number of obs = 8648 F(, 8646) = 526.38 Model 2.590e+09 2.590e+09 Prob > F = 0.0000 Residual.2229e+0 8646 44446.5 R-squared = 0.50 Adj R-squared = 0.500 Total.4388e+0 8647 663962.69 Root MSE = 89.3 mzyield Coef. Std. Err. t P> t [95% Conf. Interval] basaprate 5.254685.344979 39.07 0.000 4.99037 5.58333 _cons 335.84 4.5786 9.63 0.000 307.262 364.47

Multiple Regression Analysis n Top dressing (topaprate) determines yield and is correlated with basaprate, both kg/ha = α + β basaprate + β topaprate + ε mzyield 2. reg mzyield basaprate topaprate Source SS df MS Number of obs = 8647 F( 2, 8644) = 840.22 Model 2.348e+09 2.709e+09 Prob > F = 0.0000 Residual.2046e+0 8644 393535.34 R-squared = 0.628 Adj R-squared = 0.626 Total.4387e+0 8646 66406.58 Root MSE = 80.5 mzyield Coef. Std. Err. t P> t [95% Conf. Interval] basaprate.897807.32747 5.90 0.000.26706 2.528508 topaprate 3.62044.357663.47 0.000 3.00463 4.23948 _cons 34.93 4.5870 90.4 0.000 286.336 343.524

Multiple Regression Analysis y = α + βx + β2x2 +... + βk xk + u n n α β is the intercept are slope parameters (usually)

y β slope α intercept 2 3 x 8

Multiple Regression Analysis y = α + βx + β2x2 +... + βk xk + u n n α β is the intercept are slope parameters (usually) n u is the unobserved error or disturbance term n y is the dependant, explained, response or predicted variable n x... x k are the independent, explanatory, control or predictor variables, or regressors

How is it done? n OLS finds the β parameters that minimize: n ( yi α β xi β2xi2... βk xik ) i= n Minimize the noise n Squared, so residuals don t off set n Gives us βˆ and predicted values 2 ŷ

Ceteris Paribus Interpretation u x x x y k k + + + + + = β β β α... 2 2 n is the partial effect or ceteris paribus n Change x only: n Change x 2 only: n Share of total change attributable to x : β ˆ ˆ x y Δ = Δ β 2 2 ˆ ˆ x y Δ = Δ β y x ˆ ˆ Δ Δ β 2 2 ˆ ˆ ˆ x x y Δ + Δ = Δ β β

Ceteris Paribus Interpretation n Now, how do we interpret the coefficient estimate for basaprate? mzyield α + β basaprate + β topaprate + u = 2. reg mzyield basaprate topaprate Source SS df MS Number of obs = 8647 F( 2, 8644) = 840.22 Model 2.348e+09 2.709e+09 Prob > F = 0.0000 Residual.2046e+0 8644 393535.34 R-squared = 0.628 Adj R-squared = 0.626 Total.4387e+0 8646 66406.58 Root MSE = 80.5 mzyield Coef. Std. Err. t P> t [95% Conf. Interval] basaprate.897807.32747 5.90 0.000.26706 2.528508 topaprate 3.62044.357663.47 0.000 3.00463 4.23948 _cons 34.93 4.5870 90.4 0.000 286.336 343.524

Ceteris Paribus Interpretation n According to these results, a one unit change in x will result in a βˆ unit change in y, all else equal. n The ceteris paribus effect of a one unit change in x is a βˆ unit change in y. n Holding x 2 constant, a one unit change in x results in a βˆ unit change in y.

Key Assumptions n Linear in parameters n Random sample n Zero conditional mean n No perfect collinearity (variation in data) n Homoskedastic errors

Key Assumptions n Linear in parameters n Random sample n Zero conditional mean n No perfect collinearity (variation in data) n Homoskedastic errors

Perfect Collinearity n Variable is a linear function of one or more others. n No variation in one variable (collinear w/ intercept)

Can t estimate slope parameter if no variation in x Source: Wooldridge (2002) 7

Perfect Collinearity n Variable is a linear function of one or more others. n No variation in one variable (collinear w/ intercept) n Perfect correlation between 2 binary variables

Other hazards n Multi-collinearity n Including irrelevant variables n Omitting relevant variables

Multi-Collinearity n Highly correlated variables n Variable is a nonlinear function of others n What s the problem? n Efficiency losses n Schmidt thumb rule

Including Irrelevant Variables y α + β x + β x + x + = 2 2 β3 3 u n Suppose x 3 is has no effect on y, but key assumptions are satisfied (overspecified) n OLS is an unbiased estimator of β 3, even if is zero n Estimates of β and β 2 will be less efficient β 3

Omitting Relevant Variables y α + β x + β x + = 2 2 u n Suppose we omit x 2 (underspecifying) n OLS is generally biased

Omitting Relevant Variables y α + β x + β x + = 2 2 n Estimate n And let x ~ ~ ~ y = α + β x 0 ~ ~ = δ + δ x u 2 n It can be shown that: E ~ ~ ( β ) = β + β 2δ Omitted Variable Bias

Multiple Regression Analysis Corr(x,x 2 )>0 Corr(x,x 2 )<0 β 2 > 0 Positive bias Negative bias β 2 < 0 Negative bias Positive bias Source: Wooldridge, 2002, page 92

Omitting Relevant Variables n More generally, all OLS estimates will be biased, even if just one explanatory variable is correlated with the omitted variables n Direction of bias is less clear

Multiple Regression Analysis n Goodness of fit R 2 is the share of explained variance R 2 never decreases when we add variables Usually, it will increase regardless of relevance n Adjusted R 2 accounts for this

Next time: Interpreting results n Binary regressors n Other categorical regressors n Categorical regressors as a series of binary regressors n Quadratic terms n Other interactions n Average Partial Effects

Sessions materials developed by Bill Burke with input from Nicole Mason. January 202. burkewi2@stanford.edu