This section focuses on Chow Test and leaves general discussion on dummy variable models to other section.

Similar documents
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

August 2012 EXAMINATIONS Solution Part I

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

MULTIPLE REGRESSION EXAMPLE

SPSS Guide: Regression Analysis

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Multiple Linear Regression

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Discussion Section 4 ECON 139/ Summer Term II

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Regression step-by-step using Microsoft Excel

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Addressing Alternative. Multiple Regression Spring 2012

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Estimation of σ 2, the variance of ɛ

Nonlinear Regression Functions. SW Ch 8 1/54/

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Correlation and Regression

Interaction effects between continuous variables (Optional)

International Statistical Institute, 56th Session, 2007: Phil Everson

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lecture 15. Endogeneity & Instrumental Variable Estimation

Regression Analysis: A Complete Example

Comparing Nested Models

25 Working with categorical data and factor variables

Week TSX Index

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

1 Simple Linear Regression I Least Squares Estimation

Chapter 7: Simple linear regression Learning Objectives

Handling missing data in Stata a whirlwind tour

2013 MBA Jump Start Program. Statistics Module Part 3

Using R for Linear Regression

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

GLM I An Introduction to Generalized Linear Models

2. Simple Linear Regression

12: Analysis of Variance. Introduction

Review of Bivariate Regression

MODELING AUTO INSURANCE PREMIUMS

Module 5: Multiple Regression Analysis

Simple Methods and Procedures Used in Forecasting

STAT 350 Practice Final Exam Solution (Spring 2015)

Elementary Statistics Sample Exam #3

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Regression Analysis (Spring, 2000)

Part 2: Analysis of Relationship Between Two Variables

From the help desk: Swamy s random-coefficients model

Final Exam Practice Problem Answers

Correlation and Simple Linear Regression

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Statistical Models in R

The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)

1 Theory: The General Linear Model

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Rockefeller College University at Albany

1.1. Simple Regression in Excel (Excel 2010).

Simple Linear Regression Inference

11. Analysis of Case-control Studies Logistic Regression

Quick Stata Guide by Liz Foster

Forecasting in STATA: Tools and Tricks

xtmixed & denominator degrees of freedom: myth or magic

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Linear Regression Models with Logarithmic Transformations

Causal Forecasting Models

One-Way Analysis of Variance (ANOVA) Example Problem

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Data Analysis Methodology 1

Section 1: Simple Linear Regression

Solución del Examen Tipo: 1

Stata Walkthrough 4: Regression, Prediction, and Forecasting

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

5. Linear Regression

Multiple Linear Regression in Data Mining

Regression III: Advanced Methods

Multinomial and Ordinal Logistic Regression

2. Linear regression with multiple regressors

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS

Introduction to Time Series Regression and Forecasting

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Week 5: Multiple Linear Regression

Transcription:

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: 7 4. Tests of Structural Changes This section focuses on Chow Test and leaves general discussion on dummy variable models to other section. 4.. Chow Test: Simple Example Chow Test examines whether parameters of one group of the data are equal to those of other groups. Simply put, the test checks whether the data can be pooled. If only intercepts are different across groups, this is a fixed effect model, which is simple to handle. Let us consider two groups. y α + βx + ε for all observations y α + βx + ε for n observations (group ) y α + β x + for n observations (group ) ε The null hypothesis is α α and β β. If the null hypothesis is rejected, two groups have different slopes and intercepts; data are not poolable. ( e' e ee ee ) / J ( SSE SSE SSE) J F( J, n + n K) where ( ee + ee ) /( n + n K) ( SSE + SSE) ( n + n K) e' e is the SSE of the pooled model and J is the number of restrictions (often equal to K all parameters). In order to conduct the Chow test,. Run pooled OLS to get SSE pooled. Run separate OLS to get SSE and SSE 3. Apply the formula In the following example, we assume that two groups have difference slopes of cost and different intercepts; there are two restrictions, J.. use http://www.indiana.edu/~statmath/stat/all/panel/airline.dta, clear (Cost of U.S. Airlines (Greene 003)). gen d(airline<). gen d0(d0). gen d(airline> & airline<4). gen d3 (airline>5). gen cost0cost*d0. gen costcost*d. gen costcost*d. gen cost3cost*d3 First, fit the pooled model. Greene 003 (89-9), http://www.stata.com/support/faqs/stat/chow.html If we want to test the null hypothesis that only intercept is different, J will be K- (all the slopes are equal)

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: 8. regress output cost // pooled model -------------+------------------------------ F(, 88) 880.73 Model 07.3089 07.3089 Prob > F 0.0000 Residual 0.703439 88.699 R-squared 0.909 -------------+------------------------------ Adj R-squared 0.908 Total 7.865 89.338935 Root MSE.34875 cost.96995.03658 9.68 0.000.90494.034096 _cons -4.89.4380397-3.5 0.000-4.9987-3.5768 And then fit separate models using two subsets of data.. regress output cost if d0 // if airline > 3 Source SS df MS Number of obs 60 -------------+------------------------------ F(, 58) 34.5 Model 30.775 30.775 Prob > F 0.0000 Residual 5.6746959 58.097839584 R-squared 0.8443 -------------+------------------------------ Adj R-squared 0.846 Total 36.446959 59.677334 Root MSE.379 cost.845475.04767 7.73 0.000.7499936.940843 _cons -.6486.60876-0.70 0.000-3.86566 -.4006. regress output cost if d // if airline < Source SS df MS Number of obs 30 -------------+------------------------------ F(, 8) 66.3 Model.80505683.80505683 Prob > F 0.0000 Residual.474836 8.0686573 R-squared 0.8559 -------------+------------------------------ Adj R-squared 0.8508 Total 3.77866 9.30097 Root MSE.987 cost.544647.039895.90 0.000.437507.596787 _cons -7.3899.5798709 -.64 0.000-8.5680-6.479 F ( e' e e e ee ) / J (0.7034 5.6747.47) ( e e + e e ) /( n + n K) (5.6747 +.47) (60 + 30 *) 3.8745. di ((0.703439-.474836-5.6746959)/)/((.474836+5.6746959)/(30+60-*)) 3.8745. di Ftail(,86,3.8745) 4.393e- The large F 3.8745 (, 68) rejects the null hypothesis of equal slope and intercept (p<.0000). Alternatively, you may regress y on two dummies and two interaction terms with the intercept suppressed. 3 Parameter estimates are identical to those of the above, while standard errors are 3 Therefore, R and standard errors are not reliable.

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: 9 different. This estimation is handy since parameter estimates are slopes and intercepts of individual groups; any further computation is not need at all.. regress output cost0 d0 cost d, noconstant -------------+------------------------------ F( 4, 86) 84.7 Model 35.789788 4 58.947447 Prob > F 0.0000 Residual 6.469073 86.074758 R-squared 0.9746 -------------+------------------------------ Adj R-squared 0.9734 Total 4.936709 90.6888566 Root MSE.6735 cost0.845475.040745 0.75 0.000.764487.9646 d0 -.6486.56-4. 0.000-3.6808 -.6049 cost.544647.089 6.6 0.000.35098.677797 d -7.3899.93756-6.4 0.000-9.70098-4.955883. test _b[cost0]_b[cost] ( ) cost0 - cost 0 F(, 86) 3.03 Prob > F 0.0005. test _b[d0]_b[d], accum ( ) cost0 - cost 0 ( ) d0 - d 0 F(, 86) 3.87 Prob > F 0.0000 More convenient way is to regress y on a regressor of interest, cost, an interaction term, and a dummy with the intercept included. The intercept is the intercept of the baseline group and the dummy coefficient is the deviation from the baseline intercept. The coefficient of the regressor is the slope of the baseline, while the coefficient of the interaction term is the deviation of slope from the baseline slope. That is, the intercept of compared group is 7.390 (5.339-.649) and the slope is.545 (-.330+.8454).. regress output cost cost d -------------+------------------------------ F( 3, 86) 50.83 Model.67960 3 37.65339 Prob > F 0.0000 Residual 6.469073 86.074758 R-squared 0.9478 -------------+------------------------------ Adj R-squared 0.9460 Total 7.865 89.338935 Root MSE.6735 cost.845475.040745 0.75 0.000.764487.9646 cost -.330958.09675-3.6 0.00 -.5397 -.487085 d 5.3387.30946 4.08 0.000.73699 7.90404 _cons -.6486.56-4. 0.000-3.6808 -.6049. test _b[cost]0 ( ) cost 0 F(, 86) 3.03 Prob > F 0.0005

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: 0. test _b[d]0, accum ( ) cost 0 ( ) d 0 F(, 86) 3.87 Prob > F 0.0000 4.. Chow Test: Comparing Three Groups What if we want to compare three groups? Let us fit two remaining models for group and 3. Restrictions here are ) slop is equal to the baseline slope (slop), ) slope3 is equal to the baseline slope, 3) intercept is equal to the baseline intercept, and 4) intercept3 is equal to the baseline intercept. Degrees of freedom is 84N-(group*K)90-3*.. regress output cost if d Source SS df MS Number of obs 30 -------------+------------------------------ F(, 8) 43.0 Model 4.483383 4.483383 Prob > F 0.0000 Residual.98598 8.0435435 R-squared 0.6057 -------------+------------------------------ Adj R-squared 0.596 Total 7.409753 9.554058 Root MSE.386 cost.60553.09460 6.56 0.000.46736.843739 _cons -9.498564.55486-7.57 0.000 -.0703-6.9689. regress output cost if d3 Source SS df MS Number of obs 30 -------------+------------------------------ F(, 8) 87.74 Model 9.934658 9.934658 Prob > F 0.0000 Residual.34068734 8.048883 R-squared 0.9669 -------------+------------------------------ Adj R-squared 0.9657 Total 0.7486 9.3543049 Root MSE.0 cost.737045.05774 8.60 0.000.684466.7898385 _cons -.4775.3847-36.06 0.000 -.344-0.8007 F ' ( e e ( e' e e e ee + e e + e e ) /( n 3 3 ' e3e3 ) / J (0.703.47.99.340) 4 + n + n gk) (.47 +.99 +.340) (30 + 30 + 30 3* ) 3 39.45. di ((0.703439-.474836-.98598-.34068734)/4)/ /// ((.474836+.98598+.34068734)/(30+30+30-3*)) 39.44693. di Ftail(4,84,39.44693).696e-8 Now, include all interaction terms and dummies without the regressor of interest. The model report all parameter estimates, which are identical to the above.. regress output cost d cost d cost3 d3, noconstant -------------+------------------------------ F( 6, 84) 893.83 Model 38.0573 6 39.7009539 Prob > F 0.0000

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: Residual 3.73098575 84.04446497 R-squared 0.9846 -------------+------------------------------ Adj R-squared 0.9835 Total 4.936709 90.6888566 Root MSE.075 cost.544647.0647376 7.95 0.000.385768.64306 d -7.3899.940399-7.79 0.000-9.0035-5.45769 cost.60553.067658 0.05 0.000.49775.743383 d -9.498564.89554 -.59 0.000 -.833-7.868796 cost3.737045.0498 4.96 0.000.6390399.835045 d3 -.4775.6083095-8.86 0.000 -.6844-0.606. test _b[cost]_b[cost]_b[cost3], notest. test _b[d]_b[d]_b[d3], accum ( ) cost - cost 0 ( ) cost - cost3 0 ( 3) d - d 0 ( 4) d - d3 0 F( 4, 84) 39.4 Prob > F 0.0000 Finally, include the regresor, two interaction terms, and two dummies excluding baseline interaction and dummy. The coefficient of the regressor is the baseline slope and the intercept is the baseline intercept. Coefficients of interaction terms are deviations from the baseline slope. As a result, the slope of group is.544647 (-.5778+.737045) and the intercept is -7.3899 (4.476-.4775).. regress output cost cost d cost d -------------+------------------------------ F( 5, 84) 53.75 Model 4.095537 5.89073 Prob > F 0.0000 Residual 3.73098575 84.04446497 R-squared 0.9683 -------------+------------------------------ Adj R-squared 0.9665 Total 7.865 89.338935 Root MSE.075 cost.737045.0498 4.96 0.000.6390399.835045 cost -.5778.08364 -.74 0.008 -.3843739 -.060787 d 4.476.0534 3.70 0.000.94458 6.37067 cost -.64893.079073 -.47 0.44 -.73639.0406453 d.97389.00639.93 0.057 -.0564649 4.0084 _cons -.4775.6083095-8.86 0.000 -.6844-0.606. test _b[cost]0, notest. test _b[cost]0, accum notest. test _b[d]0, accum notest. test _b[d]0, accum ( ) cost 0 ( ) cost 0 ( 3) d 0 ( 4) d 0 F( 4, 84) 39.4 Prob > F 0.0000 Hypothesis testing is identical to the above. 4..3 Chow Test: Including Covariates

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: Now, suppose we need to include some covariates for control, load and fuel. We may regress the dependent variable on all interactions, dummies, and covariates with the intercept suppressed. Again R is not reliable here. A coefficient of an interaction term is the slope of cost of the group, just as the dummy coefficient is the intercept of the group.. regress output cost d cost d cost3 d3 load fuel, noconstant -------------+------------------------------ F( 8, 8) 56.59 Model 40.330 8 30.0403877 Prob > F 0.0000 Residual.6360768 8.096784 R-squared 0.9933 -------------+------------------------------ Adj R-squared 0.997 Total 4.936709 90.6888566 Root MSE.408 cost.07854.0884737.50 0.000.84858.93856 d -9.77744.77463-3.55 0.000 -.3-8.3464 cost.0703.0839545.75 0.000.9033093.37334 d -0.56474.583504-8. 0.000 -.750-9.404465 cost3.086.06778 6.47 0.000.97449.4098 d3 -.094.407006-7.8 0.000 -.96-0.99 load.04787.398699 5. 0.000.48648.83497 fuel -.473369.05563-8.5 0.000 -.5839555 -.366983 This model is written as, output -9.7774+.079*cost+.048*load-.4733*fuel (group ) output-0.5647+.0703*cost+.048*load-.4733*fuel (group ) output-.09+.083*cost+.048*load-.4733*fuel (group 3) Let us conduct the hypothesis test for the four restrictions.. test _b[cost]_b[cost]_b[cost3], notest. test _b[d]_b[d]_b[d3], accum ( ) cost - cost 0 ( ) cost - cost3 0 ( 3) d - d 0 ( 4) d - d3 0 F( 4, 8) 0.86 Prob > F 0.493 We may also fit the same model including the regressor of interest and excluding baseline interaction term and its intercept. Covariates remain unchanged. Note that slope is.0703 (-.0379+.083) and the intercept of group is -0.5647 (.537079-.094).. regress output cost cost d cost d load fuel -------------+------------------------------ F( 7, 8) 843.67 Model 6.95 7 6.608449 Prob > F 0.0000 Residual.6360768 8.096784 R-squared 0.9863 -------------+------------------------------ Adj R-squared 0.985 Total 7.865 89.338935 Root MSE.408 cost.086.06778 6.47 0.000.97449.4098 cost -.0904063.057606 -.58 0.8 -.04359.035034 d.345.845408.57 0. -.357557 3.00695

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: 3 cost -.0379388.0545639-0.70 0.489 -.464838.070606 d.537079.75669 0.74 0.459 -.89887.97634 load.04787.398699 5. 0.000.48648.83497 fuel -.473369.05563-8.5 0.000 -.5839555 -.366983 _cons -.094.407006-7.8 0.000 -.96-0.99. test _b[cost]0, notest. test _b[cost]0, accum notest. test _b[d]0, accum notest. test _b[d]0, accum ( ) cost 0 ( ) cost 0 ( 3) d 0 ( 4) d 0 F( 4, 8) 0.86 Prob > F 0.493 4..4 Chow Test: Different Slopes of Multiple Regressors 4 If we assume that more than one regressor have different slopes across group, model will be complicated. But the underlying logic remains same. Let us create a set of interaction terms for the regressor load.. gen loadload*d. gen loadload*d. gen load3load*d3 First, include all interactions, dummies, and covariate fuel. Do not forget the suppress the intercept.. regress output cost load d cost load d cost3 load3 d3 fuel, noconstant -------------+------------------------------ F( 0, 80) 07.56 Model 40.34445 0 4.034445 Prob > F 0.0000 Residual.595866 80.0990333 R-squared 0.9934 -------------+------------------------------ Adj R-squared 0.996 Total 4.936709 90.6888566 Root MSE.408 cost.04764.0975898 0.40 0.000.80554.08974 load.87979.68907.67 0.009.463659 3.7359 d -9.754334.7455903-3.08 0.000 -.38-8.7056 cost.093.098769 0.38 0.000.83934.4689 load.660.7098 3.69 0.000.646 4.095896 d -0.3895.608743-7.0 0.000 -.605-9.73845 cost3.908.0687766 6.7 0.000.9750386.48778 load3.703393.70095.43 0.07.308507 3.09878 d3 -.345.409607-7.3 0.000 -.986-0.987 fuel -.465669.057865-7.98 0.000 -.5767 -.34647. test _b[cost]_b[cost]_b[cost3], notest. test _b[load]_b[load]_b[load3], accum notest. test _b[d]_b[d]_b[d3], accum ( ) cost - cost 0 ( ) cost - cost3 0 ( 3) load - load 0 4 http://www.stata.com/support/faqs/stat/chow3.html

Jeeshim and KUCC65 (3//008) Statistical Inferences in Linear Regression: 4 ( 4) load - load3 0 ( 5) d - d 0 ( 6) d - d3 0 F( 6, 80) 0.74 Prob > F 0.65 Now, include two regressors of interest, two sets of interactions, two dummies, and a covariate, excluding baseline interaction terms and its dummy. Make sure you include two regresors of interest, cost and load.. regress output cost load cost load d cost load d fuel -------------+------------------------------ F( 9, 80) 648.89 Model 6.3464 9.9498 Prob > F 0.0000 Residual.595866 80.0990333 R-squared 0.9865 -------------+------------------------------ Adj R-squared 0.9850 Total 7.865 89.338935 Root MSE.408 cost.908.0687766 6.7 0.000.9750386.48778 load.703393.70095.43 0.07.308507 3.09878 cost -.097444.077734 -.5 0.5 -.5889.057530 load.45858.985763 0. 0.908 -.84746.07637 d.359.869683.56 0. -.376095 3.08983 cost -.095978.078974 -.7 0.44 -.497558.0645603 load.95768.00969 0.94 0.35 -.07465.9894 d.739.7484569 0.97 0.336 -.7655546.3399 fuel -.465669.057865-7.98 0.000 -.5767 -.34647 _cons -.345.409607-7.3 0.000 -.986-0.987. test _b[cost]0, notest. test _b[cost]0, accum notest. test _b[load]0, accum notest. test _b[load]0, accum notest. test _b[d]0, accum notest. test _b[d]0, accum ( ) cost 0 ( ) cost 0 ( 3) load 0 ( 4) load 0 ( 5) d 0 ( 6) d 0 F( 6, 80) 0.74 Prob > F 0.65