Panel Data Analysis in Stata



Similar documents
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

Regression Analysis (Spring, 2000)

Correlated Random Effects Panel Data Models

Example: Boats and Manatees

Panel Data: Linear Models

Introduction to Quantitative Methods

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Ordinal Regression. Chapter

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 7: Dummy variable regression

Simple Linear Regression Inference

Econometrics Simple Linear Regression

Hypothesis testing - Steps

UNIVERSITY OF WAIKATO. Hamilton New Zealand

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Association Between Variables

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

SPSS Guide: Regression Analysis

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Econometric Methods for Panel Data

Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, Professor Patricia A.

Nonlinear Regression Functions. SW Ch 8 1/54/

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Illustration (and the use of HLM)

COURSES: 1. Short Course in Econometrics for the Practitioner (P000500) 2. Short Course in Econometric Analysis of Cointegration (P000537)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Simple Regression Theory II 2010 Samuel L. Baker

Multinomial and Ordinal Logistic Regression

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

The Effect of R&D Expenditures on Stock Returns, Price and Volatility

Multiple Linear Regression

MODELS FOR PANEL DATA Q

Clustering in the Linear Model

SYSTEMS OF REGRESSION EQUATIONS

Study Guide for the Final Exam

2. Linear regression with multiple regressors

2 Sample t-test (unequal sample sizes and unequal variances)

Module 5: Multiple Regression Analysis

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Scatter Plot, Correlation, and Regression on the TI-83/84

ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL DATA FROM NORTH CAROLINA BADI H. BALTAGI*

CALCULATIONS & STATISTICS

A Panel Data Analysis of Foreign Trade Determinants of Nepal: Gravity Model Approach

Spatial panel models

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Longitudinal (Panel and Time Series Cross-Section) Data

Research Methods & Experimental Design

Is Infrastructure Capital Productive? A Dynamic Heterogeneous Approach.

Causal Forecasting Models

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

Lecture 3: Differences-in-Differences

Introduction to Regression and Data Analysis

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Additional sources Compilation of sources:

Multiple Regression: What Is It?

Descriptive Statistics

Financial Risk Management Exam Sample Questions/Answers

Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command

Mgmt 469. Fixed Effects Models. Suppose you want to learn the effect of price on the demand for back massages. You

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Part 2: Analysis of Relationship Between Two Variables

Note 2 to Computer class: Standard mis-specification tests

1 Introduction. 2 The Econometric Model. Panel Data: Fixed and Random Effects. Short Guides to Microeconometrics Fall 2015

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

Econometrics I: Econometric Methods

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

2. Simple Linear Regression

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Performance Related Pay and Labor Productivity

Notes on Applied Linear Regression

5. Linear Regression

14.74 Lecture 7: The effect of school buildings on schooling: A naturalexperiment

Regression step-by-step using Microsoft Excel

Standard errors of marginal effects in the heteroskedastic probit model

Econometric analysis of the Belgian car market

Final Exam Practice Problem Answers

Discussion Section 4 ECON 139/ Summer Term II

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Categorical Data Analysis

HYPOTHESIS TESTING: POWER OF THE TEST

Sample Size Calculation for Longitudinal Studies

Transcription:

Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a S-Bahn in Berlin, you never know..

Our plan Introduction to Panel data Fixed vs. Random effects Testing for fixed effects Testing for random effects Fixed or random effect? Example: Gravity model Exercises

Introduction to Panel data Panel data are cross-sectional data observed over time e.g we observe the same households, firms or countries over a couple of years. Panel data are also known as longitudinal data. In general: y it = α + βx it + ɛ it where i = 1..N cross-sectional observations and t = 1..T years Panel data have following advantages over pooled data (Baltagi 2004): (1) account for heterogeneity across individual units which is assumed away in pooled data (2) deal with time-invariant omitted variables as we can find in pooled data (3) are less likely to have problems with autocorrelation and multicollinearity as time series data do Aranello (2003) emphasizes (1) as the advantage from using panel data. There are basically two types of panel models, the fixed effects and the random effects model. They differ by their assumptions how the heterogeneity is captured and estimation techniques (fixed = OLS, random = GLS).

Fixed vs. random effects The fixed effect model assumes that individual heterogeneity is captured by the intercept term. This means every individual gets his own intercept α i while the slope coefficients are the same. This also means that the heterogeneity is associated with the regressors on the right hand side. The fixed effects model is also known as Least square dummy variable estimator (LSDV) because we assign pretty much a dummy to every individual. The random effects model assume in some sense that the individual effects are captured by the intercept and a random component µ i. This random component is not associated with the regressors on the right hand side and part of the error term. The intercept becomes α + µ i. That is the reason why some textbooks write both capture the heterogeneity by the intercept term. The assumption of the random effects model that individual effects are not associated with explanatory variables is a big one! But it allows us to estimate the effect of time-invariant variables which cancel out in a fixed effects estimation. Baltagi (and Hsiao) introduce both estimators as a one-way-error component model. For both estimators the error-term ɛ it equals µ i + v it where µ i captures the individual effect and is assumed to be fixed in the fixed effects model. For the random effects model it is stochastic and distributed. In other words individual effect are not correlated with the error-term but with the regressors in the fixed effects model (vice versa in the random effects model).

Fixed vs. random effects continued The regression equations come down to for the fixed effects model: y it = α i + β 1 X it + ɛ it where ɛ it = µ i + v it and µ i = 0 and for the random effects model: y it = α + β 1 X it + ɛ it where ɛ it = µ i + v it You know that for the random effects model you need to use a GLS-estimator which is a weighted average of between and within effects. It tells you where the variation comes from e.g. from within the individuals or between the individuals. The LSDV estimator assumes all the variation (or heterogeneity) comes from the within or from the individuals. If you assume all the variation comes from between the groups you have a between-estimator still using OLS. Let the random effects estimator be: ˆβ GLS = Wxy + Φ2 B xy W xx + Φ 2 B xx where Φ 2 is the weight on the between variation. The Stata output will tell you where the variations came from.

Testing for fixed effects Testing for fixed effects involves a F-test comparing for the pooled OLS results with the results from the LSDV-estimation. The pooled OLS is the restricted model and if we reject H 0 fixed effects are present. The F-test has following form: F = (RSS URSS)/(N 1) URSS/(NT N K) F N 1,N(T 1) K Don t worry this is part of the fixed effect output. Although always a nice exercise to this by hand..

Testing for random effects Involves a LaGrange Multiplier test developed by Breusch and Pagan. After a random effects regression this tests for the presence of random effects in the underlying pooled OLS. Following Baltagi (2004) that λ LM = The null hypothesis is H 0 = var(µ) = 0 nt 2(T 1) ni=1 ( T t=1 ɛ it ) 2 ni=1 Tt=1 ɛ 2 it 1 and χ 2 1 If we can reject the null random effects are present (remember p < 0.05 and you can reject any null!) Does it mean random effects are more efficient than fixed effects if random effects are present? Not necessarily but the Hausman specification test helps a bit to decide.

Fixed or random effect? The Hausman specification test is a very general test and can be used if two models could be used for the same question. In our example we have the fixed and the random effects model. Both models will be consistent estimator but we assume that the random effects estimator is more efficient e.g. uses less degrees of freedom. The null hypothesis tells us pretty much the same while the alternative is that only the fixed effect model is consistent. If we reject the Null we cannot use the random effects model. The problem is that the Hausman test rejects the random effects model very often and does not work very well in small samples (Baum 2006). It comes down what you think which model is more appropriate given your data and your question. But in general the Hausman test looks likes this (Hosny 2009): [ H = β ˆ FE β ˆ ] RE [Var( β ˆ FE β ˆ 1 [ RE )] β ˆ FE β ˆ ] RE and χ 2 k 1

Example: Gravity model Imagine someone gives you data for trade and conflicts between countries. Furthermore he is very generous in gives you also GDP, per capita income and the actual distance between country pairs. You want to know if conflicts affect trade negatively or in other terms if trade promotes somehow peace. Big question and there is a big debate in political science. Someone tells you a gravity model is similar to the one in physics and using your variables could look like this: ln(trade ij ) = β 0 + β 1 ln(gdp i ) + β 2 ln(gdp j ) + β 3 distance ij +β 3 ln(pci i ) + β 4 ln(pci j ) + n i=1 γ i at i + n i=1 γ j at j + ɛ ij who knows maybe you should also add country attributes at i. What would you use? Now imagine there are many papers out there just estimating pooled regression models in varies forms. Do you think if we observe countries over time trading with each other, that they miss something while assuming countries stay the same over time? Likely that you would say yes and want to use a panel estimation.

Example: Gravity model continued Usually you have your cross-section in annual form meaning for every year one data-file. If you want to use them together you have to merge them into on data-set. Before you can merge every observations needs an unique identifier id and you need for every annual dataset also a variable indicating which year it is. Example: You observe trade between the US and Germany over 2 years. This is the same trade relationship, so give it an id-number equals 1 or id = 1 for both years. Imagine you observe them 1988 and 1989. The year variable takes the value 1988 in 1988 (!) and of course 1989 in 1989 (!). The identifier variable allows to follow this trade relationship over time in a panel. Before you can merge data-sets, they have to be sorted individually. You open every data-set and sort them: e.g. if you have your data-set for 1989 use: sort id year do the same for every year following. Open the first data set and use following command for merging another to it: merge id year using location and name of the other data-set.dta sort again! And merge another to it.. do it until you have all years merged into one data-set (=your panel)

Example: Gravity model continued Tell Stata you want to use it as a panel xtset id year if your panel is strongly balanced, then you don t have to worry about unbalanced panels Let us do a simple pooled OLS: reg ldyt conflict ij lrgdp i lrgdp j lpci i lpci j ldist1 distance2 and compare it to a fixed effects estimation (=LSDV) xtreg ldyt conflict ij lrgdp i lrgdp j lpci i lpci j ldist1 distance2, fe where Stata uses xt-commands for panel models the option fe tells Stata to estimate a fixed effect model Look at the attached output! At the bottom you see the F-test for pooled OLS vs. fixed effects. You should be able to reject the null. We can conclude fixed effects are present.

Example: Gravity model continued Now let us estimate a random effects model xtreg ldyt conflict ij lrgdp i lrgdp j lpci i lpci j ldist1 distance2, re See only the option changed to re. If you don t specify an option, Stata assume a random effects model anyway. Let us test for random effects in the underlying pooled OLS using the random effects regression results. The Breusch-Pagan test has following command: xttest0 You should find random effects! Or you can reject the Null!

Example: Gravity model continued Finally let us do a Hausman specification test for testing fixed against random effects. We have to use the estimates of the fixed and random effects models. xtreg ldyt conflict ij lrgdp i lrgdp j lpci i lpci j ldist1 distance2, fe est store fixed which saves the results in fixed xtreg ldyt conflict ij lrgdp i lrgdp j lpci i lpci j ldist1 distance2, re est store random which saves the results in random and finally hausman fixed random where the second model is the one you think, which is more efficient. We should be able to reject the Null and conclude that fixed effects are more efficient.

Exercises Estimate the above pooled regression. Do the fixed effect model again. Use the RSS from the pooled and the fixed effects regression to compute the F-test by hand. Hint use the help for the xtreg command to figure out how to find the RSS from the fixed effects model. (okay: use display e(rss after the regression)