Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)



Similar documents
IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Correlated Random Effects Panel Data Models

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

Lecture 15. Endogeneity & Instrumental Variable Estimation

MULTIPLE REGRESSION EXAMPLE

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

August 2012 EXAMINATIONS Solution Part I

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

The following postestimation commands for time series are available for regress:

Discussion Section 4 ECON 139/ Summer Term II

Interaction effects between continuous variables (Optional)

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005

Nonlinear Regression Functions. SW Ch 8 1/54/

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Correlation and Regression

Handling missing data in Stata a whirlwind tour

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Linear Regression Models with Logarithmic Transformations

Addressing Alternative. Multiple Regression Spring 2012

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Rockefeller College University at Albany

xtmixed & denominator degrees of freedom: myth or magic

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

The leverage statistic, h, also called the hat-value, is available to identify cases which influence the regression model more than others.

Module 5: Multiple Regression Analysis

Regression Analysis (Spring, 2000)

Standard errors of marginal effects in the heteroskedastic probit model

The average hotel manager recognizes the criticality of forecasting. However, most

An introduction to GMM estimation using Stata

Econometrics I: Econometric Methods

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

From the help desk: Swamy s random-coefficients model

Sample Size Calculation for Longitudinal Studies

2. Linear regression with multiple regressors

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Data Analysis Methodology 1

Does corporate performance predict the cost of equity capital?

Introduction to Panel Data Analysis

THE UNIVERSITY OF CHICAGO, Booth School of Business Business 41202, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Homework Assignment #2

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

Univariate Regression

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Introduction to Time Series Regression and Forecasting

Quick Stata Guide by Liz Foster

MODELING AUTO INSURANCE PREMIUMS

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

For more information on the Stata Journal, including information for authors, see the web page.

4. Multiple Regression in Practice

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Regression Analysis: A Complete Example

Clustering in the Linear Model

The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, Last revised January 8, 2015

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

is paramount in advancing any economy. For developed countries such as

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Employer-Provided Health Insurance and Labor Supply of Married Women

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

UNIVERSITY OF WAIKATO. Hamilton New Zealand

1.1. Simple Regression in Excel (Excel 2010).

Multiple Regression: What Is It?

Title. Syntax. stata.com. fp Fractional polynomial regression. Estimation

Globalization: a Road to Innovation

International Statistical Institute, 56th Session, 2007: Phil Everson

Multiple Linear Regression in Data Mining

A Simple Feasible Alternative Procedure to Estimate Models with High-Dimensional Fixed Effects

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

A Basic Introduction to Missing Data

Models for Longitudinal and Clustered Data

Regression step-by-step using Microsoft Excel

5. Linear Regression

Introduction to Regression and Data Analysis

Implementing Panel-Corrected Standard Errors in R: The pcse Package

Applied Panel Data Analysis

Dynamic Panel Data estimators

Integrated Resource Plan

25 Working with categorical data and factor variables

SPSS Guide: Regression Analysis

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

MEASURING THE INVENTORY TURNOVER IN DISTRIBUTIVE TRADE

2. Simple Linear Regression

Northern Colorado Retail Study: A shift-share analysis 2000 to 2010

Simple linear regression

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Uninformative Feedback and Risk Taking: Evidence from Retail Forex Trading

Transcription:

Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation (13.12) from Wooldridge: log(durat) = 1.126 (0.031) + 0.191 (0.069) + 0.0077 (0.0447) n = 5626, R 2 = 0.021. af change + 0.256 (0.047) af change highearn, highearn (1) which is part of Example 13.4 Effect of Worker Compensation Laws on Weeks out of Work. This refers to a study by Meyer, Viscusi and Durbin (1995) which examined the impact of a change in the cap on weekly earnings that was covered workers compensation (for injuries etc.). Here the control group is low income workers while the treatment group is high income workers. This is because low income workers had earnings that were below the original cap so raising the cap did not raise the amount of the benefit they could get. In contrast high income workers had earnings that were at or above the cap so raising the the cap did increase the amount of the benefit they could get. Raising the amount of the benefit a worker can get will make it more attractive for the worker to remain on benefit for longer (in the event of an injury). The data used in the example is given in INJURY.DTA and the estimates were obtained using the observations for Kentucky. 1. Using the data in injury.dta for Kentucky, the estimated equation when afchange is dropped from Equation (13.12) is: log(durat) = 1.129 (0.022) + 0.253 (0.042) n = 5626, R 2 = 0.021. highearn + 0.198 (0.052) afchange highearn (2) Is it surprising that the estimate on the interaction term is fairly close to that in Equation (13.12)? Explain. The equation of interest is: log(durat) = β 0 + δ 0 afchange + β 1 highearn +δ 1 afchange highearn 1

Before After After-Before Control β 0 β 0 + δ 0 δ 0 Treatment β 0 + β 1 β 0 + δ 0 + β 1 + δ 1 δ 0 + δ 1 Treatment - Control β 1 β 1 + δ 1 δ 1 Table 1: Illustration of the Difference-in-Differences Estimator see Equation (13.10). As we can see, the estimated coefficient on afchange in Equation (13.12) is very small in magnitude (and when compared to the estimated coefficients on highearn and on the interaction term afchange highearn) and is statistically very insignificant. Consequently it is not a surprise the results change very little as a result of dropping afchange from the Equation (13.12) while keeping highearn and the interaction term: the change is easily explainable as the result of sampling variability. 2. When afchange is included but highearn is dropped, the result is: log(durat) = 1.233 (0.023) 0.100 (0.040) n = 5626, R 2 = 0.016. af change + 0.447 (0.050) afchange highearn (3) Why is the coefficient on the interaction term now so much large than in Equation (13.12)? Explain [Hint: In Equation (13.10), what is the assumption being made about the treatment and control groups if β 1 = 0?] The coefficient on afchange is the increase in log(duration) for low earners. Raising the cap should have no effect on these workers since their income was below the cap anyway and hence we would assume that δ 0 = 0. This is consistent with the estimates in Equation (13.12) and the results in part 1. In contrast, dropping highearn from Equation (13.12) means that we are assuming that β 1 = 0, i.e., that there was no difference in average pre-policy change log durations between the high and low income groups. This is not very plausible since, for example, the two groups may tend to do different jobs for which the effects of injuries may be quite different. In the results from Equation (13.12) we see that the coefficient on highearn, namely β 1, is strongly significantly different from zero (t-statistic exceeds 5) so we would expect dropping this variable to have quite a strong impact. 2 Exercise C14.7 This question is about analyzing the impact of execution rates and unemployment on murder rates at the US state level using panel data to control for unobserved state specific factors.the data are contained in the file murder.dta and consist of a panel all 50 states plus the District of Columbia with three waves: 1987, 1990, and 1993. This gives 153 observations. 2

1. Consider the unobserved effects model: mrdrte it = θ t + β 1 exec it + β 2 unem it + a i + u it, i = 1,..., 51 t = 1987, 1990, 1993 where: θ t denotes year specific intercepts; a i denotes the unobserved state specific effects; mrdrte it is the murder rate (per 100,000 population) for state i in year t; exec it is total executions over the past 3 years for state i in year t; and unem it is the unemployment rate for state i in year t. If past executions have a deterrent effect, what should be the sign of β 1? What sign do you think β 2 should have? Explain. β 1 should be negative if, ceteris paribus, past executions have a deterrent effect. β 2 is likely to have a positive effect as it tends to reflect relative social economic deprivation which people seem to think are positively related to crime rates including rates of violent crimes such as murder. 2. Using just the years 1990 and 1993, estimate the equation from part (1) by pooled OLS. Ignore the serial correlation problem in the composite errors. Do you find evidence for a deterrent effect? Running OLS on the data for just the years 1990 and 1993 gives:. regress mrdrte exec unem if year >87 Source SS df MS Number of obs = 102 -------------+------------------------------ F( 2, 99) = 5.08 Model 1061.37023 2 530.685114 Prob > F = 0.0079 Residual 10339.8451 99 104.44288 R-squared = 0.0931 -------------+------------------------------ Adj R-squared = 0.0748 Total 11401.2153 101 112.88332 Root MSE = 10.22 mrdrte Coef. Std. Err. t P> t [95% Conf. Interval] exec.1149087.2628052 0.44 0.663 -.4065538.6363713 unem 2.287502.7402678 3.09 0.003.8186501 3.756354 _cons -4.889065 4.40781-1.11 0.270-13.63512 3.856986 Ignoring the any problems in calculating standard errors resulting from unobserved effects then we don t find a deterrent effect the coefficient on exec is positive (not negative) but is insignificant (p-value of 0.663). 3. Now, using 1990 and 1993, estimate the equation by fixed effects. You may use first differencing since you are only using two years of data. Now, is there any evidence of a deterrent effect? 3

The first differencing approach is easy to implement here as the variables have already been created in the data set: cmrdrte is mrdrte minus its first lag (giving the change in the murder rate) cexec is exec minus its first lag (giving the change in the total number of executions) and cunem is unem minus its first lag (giving the change in the unemployment rate) Running the first difference regression using just 1990 and 1993 gives:. regress cmrdrte cexec cunem if year==93 Source SS df MS Number of obs = 51 -------------+------------------------------ F( 2, 48) = 2.96 Model 6.8879023 2 3.44395115 Prob > F = 0.0614 Residual 55.8724857 48 1.16401012 R-squared = 0.1097 -------------+------------------------------ Adj R-squared = 0.0727 Total 62.760388 50 1.25520776 Root MSE = 1.0789 cmrdrte Coef. Std. Err. t P> t [95% Conf. Interval] cexec -.1038396.0434139-2.39 0.021 -.1911292 -.01655 cunem -.0665914.1586859-0.42 0.677 -.3856509.252468 _cons.4132665.2093848 1.97 0.054 -.0077298.8342628 There is now evidence of a deterrent effect since the coefficient on cexec is negative every additional execution in a state reduces the murder rate in that state in the next three year period by about 1 per million. Furthermore, this effect is fairly statistically significant (p-value of 0.021). 4. Compute the heteroskedasticity-robust standard error for the estimation in part (3). It will be easiest to use first differencing. Different states are different in many ways and we may worry about heteroskedasticity. Re-running the regression from part (3) but generating heteroskedasticity-robust standard errors gives:. regress cmrdrte cexec cunem if year==93, vce(robust) Linear regression Number of obs = 51 F( 2, 48) = 18.92 Prob > F = 0.0000 R-squared = 0.1097 Root MSE = 1.0789 Robust cmrdrte Coef. Std. Err. t P> t [95% Conf. Interval] 4

cexec -.1038396.0169995-6.11 0.000 -.1380194 -.0696598 cunem -.0665914.14693-0.45 0.652 -.3620141.2288312 _cons.4132665.2000057 2.07 0.044.0111281.8154049 which indicates that the deterrent effect is now highly statistically significant (p-value is 0 to 3dp). 5. Find the state that has the largest number for the execution variable in 1993. How much bigger is this value than the next highest value. Browsing through the data in Stata reveals that the highest observed value for the execution variables is 34 for Texas (id 44) in 1993. The next largest value in 1993 was 11 for Virginia (id 47) so the value for Texas is indeed very high. 6. Estimate the equation using first differences, dropping Texas from the analysis. Compute the usual and heteroskedasticity-robust standard errors. Now what do you find? What is going on? Dropping Texas and rerunning the first-difference regression for 1993 with the usual OLS standard errors gives:. regress cmrdrte cexec cunem if year==93 & id!=44 Source SS df MS Number of obs = 50 -------------+------------------------------ F( 2, 47) = 0.32 Model.755191109 2.377595555 Prob > F = 0.7287 Residual 55.7000012 47 1.18510641 R-squared = 0.0134 -------------+------------------------------ Adj R-squared = -0.0286 Total 56.4551923 49 1.15214678 Root MSE = 1.0886 cmrdrte Coef. Std. Err. t P> t [95% Conf. Interval] cexec -.067471.104913-0.64 0.523 -.2785288.1435868 cunem -.0700316.1603712-0.44 0.664 -.3926569.2525936 _cons.4125226.2112827 1.95 0.057 -.0125233.8375686 so the deterrent effect is rather smaller and is no longer statistically significant (p-value of 0.523). Using heteroskedasticity-robust standard errors gives:. regress cmrdrte cexec cunem if year==93 & id!=44, vce(robust) Linear regression Number of obs = 50 F( 2, 47) = 0.54 Prob > F = 0.5846 R-squared = 0.0134 Root MSE = 1.0886 Robust 5

cmrdrte Coef. Std. Err. t P> t [95% Conf. Interval] cexec -.067471.0790993-0.85 0.398 -.2265983.0916563 cunem -.0700316.1462091-0.48 0.634 -.3641663.2241031 _cons.4125226.2004375 2.06 0.045.0092943.815751 which implies the deterrent effect is still insignificant (p-value of 0.398). In fact, what we are seeing is that dropping Texas both reduces the magnitude of the estimated deterrent effect quite substantially (the estimate shifts from -0.103 to -0.067) but also increases the standard errors (from the usual OLS ones increase from 0.043 to 0.105 while the heteroskedasticity-robust changes from 0.017 to 0.079). Clearly including excluding Texas has very dramatic effects on the results and suggests that it is an outlier. 7. Use all three years of data and estimate by fixed effects. Include Texas in the analysis. Discuss the size and statistical significance of the deterrent effect compared with only using 1990 and 1993. To run this regression in Stata we use the xtreg command with the fe option. Doing so gives:. xtreg mrdrte exec unem, fe Fixed-effects (within) regression Number of obs = 153 Group variable: id Number of groups = 51 R-sq: within = 0.0047 Obs per group: min = 3 between = 0.0007 avg = 3.0 overall = 0.0002 max = 3 F(2,100) = 0.24 corr(u_i, Xb) = -0.0635 Prob > F = 0.7909 mrdrte Coef. Std. Err. t P> t [95% Conf. Interval] exec -.1140743.1800836-0.63 0.528 -.4713551.2432065 unem.095914.2800721 0.34 0.733 -.4597411.6515692 _cons 7.637844 1.684436 4.53 0.000 4.295971 10.97972 sigma_u 8.788124 sigma_e 3.612922 rho.85542114 (fraction of variance due to u_i) F test that all u_i=0: F(50, 100) = 16.46 Prob > F = 0.0000 The deterrent effect is similar to that which we estimated using the first differencing approach with just data from the years 1990 and 1993: however it is no longer statistically significant. We could use standard errors clustered by id:. xtreg mrdrte exec unem, fe vce(cluster id) 6

Fixed-effects (within) regression Number of obs = 153 Group variable: id Number of groups = 51 R-sq: within = 0.0047 Obs per group: min = 3 between = 0.0007 avg = 3.0 overall = 0.0002 max = 3 F(2,50) = 1.23 corr(u_i, Xb) = -0.0635 Prob > F = 0.3017 (Std. Err. adjusted for 51 clusters in id) Robust mrdrte Coef. Std. Err. t P> t [95% Conf. Interval] exec -.1140743.0811107-1.41 0.166 -.27699.0488414 unem.095914.2652027 0.36 0.719 -.4367612.6285893 _cons 7.637844 1.609143 4.75 0.000 4.405786 10.8699 sigma_u 8.788124 sigma_e 3.612922 rho.85542114 (fraction of variance due to u_i) Even though the standard errors change quite a lot, the deterrent effect is still insignificant. 7