Econometrics of Panel Data

Similar documents
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

SYSTEMS OF REGRESSION EQUATIONS

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Panel Data: Linear Models

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

Correlated Random Effects Panel Data Models

Marketing Mix Modelling and Big Data P. M Cain

Chapter 9: Serial Correlation

Note 2 to Computer class: Standard mis-specification tests

Chapter 4: Vector Autoregressive Models

Clustering in the Linear Model

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Introduction to Regression and Data Analysis

Econometric analysis of the Belgian car market

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Multiple Linear Regression in Data Mining

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

Chapter 2. Dynamic panel data models

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Lecture 15. Endogeneity & Instrumental Variable Estimation

Panel Data Analysis in Stata

EFFECT OF INVENTORY MANAGEMENT EFFICIENCY ON PROFITABILITY: CURRENT EVIDENCE FROM THE U.S. MANUFACTURING INDUSTRY

COURSES: 1. Short Course in Econometrics for the Practitioner (P000500) 2. Short Course in Econometric Analysis of Cointegration (P000537)

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Models for Longitudinal and Clustered Data

2. Linear regression with multiple regressors

Redistributional impact of the National Health Insurance System

Econometric Methods for Panel Data

Vector Time Series Model Representations and Analysis with XploRe

7 Time series analysis

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Regression Analysis (Spring, 2000)

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Module 5: Multiple Regression Analysis

Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, Professor Patricia A.

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

Forecasting Using Eviews 2.0: An Overview

Simple Linear Regression Inference

Panel Data Econometrics

The Basic Two-Level Regression Model

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS

How Far is too Far? Statistical Outlier Detection

State Space Time Series Analysis

Econometrics Simple Linear Regression

Chapter 6: Multivariate Cointegration Analysis

Solución del Examen Tipo: 1

HLM software has been one of the leading statistical packages for hierarchical

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

The Method of Least Squares

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Stock prices are not open-ended: Stock trading seems to be *

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

1.5 Oneway Analysis of Variance

5. Multiple regression

2.2 Elimination of Trend and Seasonality

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

5. Linear Regression

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

16 : Demand Forecasting

Implementing Panel-Corrected Standard Errors in R: The pcse Package

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Multiple Regression: What Is It?

Data Analysis Tools. Tools for Summarizing Data

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

T-test & factor analysis

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Module 3: Correlation and Covariance

Longitudinal (Panel and Time Series Cross-Section) Data

A Trading Strategy Based on the Lead-Lag Relationship of Spot and Futures Prices of the S&P 500

UNIVERSITY OF WAIKATO. Hamilton New Zealand

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Performing Unit Root Tests in EViews. Unit Root Testing

Week TSX Index

Introduction to Data Analysis in Hierarchical Linear Models

An analysis method for a quantitative outcome and two categorical explanatory variables.

Mgmt 469. Fixed Effects Models. Suppose you want to learn the effect of price on the demand for back massages. You

1 Short Introduction to Time Series

Corporate Defaults and Large Macroeconomic Shocks

Multinomial and Ordinal Logistic Regression

Simulation Models for Business Planning and Economic Forecasting. Donald Erdman, SAS Institute Inc., Cary, NC

Regression Analysis: A Complete Example

Econometric Modelling for Revenue Projections

Financial Risk Management Exam Sample Questions/Answers

Directions for using SPSS

1 Teaching notes on GMM 1.

Trend and Seasonal Components

Premaster Statistics Tutorial 4 Full solutions

Transcription:

Econometrics of Panel Data 1. Basics and Examples 2. The generalized least squares estimator 3. Fixed effects model 4. Random Effects model 1

1 Basics and examples We observes variables for N units, called the cross-sections, for T consecutive periods: (Y it, X it ) i = 1,...,N, with N the cross-sectional dimension. t = 1,...,T, with T the temporal dimension. panel of size N T. 2

Y it is the income of family i during year t, for 1 i 1000, and observed in years 2000, 2001, 2002, so T = 3. Y it is the unemployment rate for EU-country i, (1 i 15), observed monthly from 1998:01 up to 2001:12, so T = 48. Note that: T large, N small multiple time series T small, N large survey data on individuals/firms for a small number of waves. 3

Example 1: South American countries For 8 South-American countries we want to model the Real GDP per capita in 1985 prices (=Rgdl) in function of the following explicative variables. Population in 1000 s (Pop) Real Investment share of GDP, in % (I) Real Government share of GDP, in % (G) Exchange Rate with U.S. dollar (XR) Measure of Openness of the Economy (Open) You find the data in the file penn.wmf, already in Eviews format. We are in particular interested in the effect of Openness on economic growth. 4

1. Create a pool object in Eviews ( /Object/New object ). Give it a name and define the cross-section identifiers. These identifiers are those parts of the names of the series identifying the cross-section. 2. Open the XR-variables as a group and make a plot of them. Compute them in log-difference, using the PoolGenr menu of the pool object and logdifxr?=dlog(xr?). The? will be substituted by every cross-section identifier. Plot the transformed variables. 3. Compute the medians of the variable I? for the different countries (use View/descriptive statistics within the Pool object.) Compute now the average value of I? for every year. 4. Estimate the regression model for Brazil, using /Quick/estimate equation and specifying in Eviews the equation dlog(rgdp bra) c dlog(pop bra) i bra g bra dlog(xr bra) open bra 5. Now we want to pool the data of all countries, to increase the sample size. Use, within the pooled object, /Estimate, and specify: dependent variable=dlog(rgdp?); common coefficients=c dlog(pop?) i? g? dlog(xr?) open?. This is a pooled regression model. 5

6. Pooling the data ignores the fact that the data originate from different countries. Dummy variables for the different countries need to be added. This can be done by specifying the constant term as a cross section specific coefficient. We obtain a fixed effect panel data model. Discuss the regression output. 7. The fixed effect panel data model assumes that the effect of openness is the same of all countries. How could you relax this assumption? 8. Test whether all country effects are equal (to know how Eviews labels the coefficients, use View/Representation), using a Wald test. The country effects are called the fixed effects, and if there are significantly different then there is unobserved heterogeneity. 6

2 The Generalized Least Squares estimator Standard linear regression model: Y i = X iβ + ε i (i = 1,...,n) with Var(ε i ) = σ 2 is constant homoscedastic errors Cov(ε i, ε j ) = 0 for i j uncorrelated errors 7

At the standard model, the Ordinary Least Squares (OLS) estimator is Consistent, meaning that ˆβ β for n tending to infinity. Has the smallest variance among all estimators (for normal errors) and smallest variance among all linear estimators. One has that ˆβ OLS = ( n ) 1 ( n ) X i X i X i Y i. i=1 i=1 8

What if the the errors are not homoscedastic and uncorrelated? E.g. for panel data: Cross-sectional heteroscedasticity Correlation among cross sections Serial correlation within and across cross-sections... The Ordinary Least Squares (OLS) estimator is still consistent, but not optimal anymore. 9

General linear regression model: Y i = X iβ + ε i (i = 1,...,n) with Var(ε i ) = σ 2 i heteroscedastic errors Cov(ε i, ε j ) = σ ij for i j correlated errors. One can still use OLS (not even a bad idea), if one uses White standard errors (if heteroscedasticity) Newey-West standard errors (if correlated errors + heteroscedasticity) 10

The Generalized Least Squares (GLS) estimator will be consistent and optimal and is given by ˆβ GLS = n n w ij X i X j 1 n n w ij X i Y j, i=1 j=1 i=1 j=1 where the weights depends on the values of σ ij. More precisely: let Σ be the n n matrix with elements σ ij, then w ij = (Σ 1 ) ij. Unfortunately, the values in Σ are unknown. 11

The Feasible Generalized Least Squares (GLS) proceeds in 2 steps: 1. Compute ˆβ OLS and the residuals r OLS i = Y i X i ˆβ OLS. 2. Use the above residuals to estimate the σ ij. [This will require some additional assumptions on the structure of Σ] Compute then the GLS estimator with estimated weights w ij. The above scheme can be iterated fully iterated GLS estimator. 12

Theoretical Example Our sample of size n = 20 consists of two groups of equal size (e.g. men and women). There is no correlation among the observations, but we think that the variances of the error terms for men and women might be of different size. [The error terms contains the omitted and unobserved variables. We might indeed think that their size is different for women than for men, e.g. when regressing salary on individual characteristics] σ 2 i = σ ii = σ 2 M for i = 1,...,10 σ 2 i = σ ii = σ 2 F for i = 11,...,20 σ ij = 0 for i j. 13

Computation of the (Feasible) GLS estimator: 1. Compute the OLS estimator and the residuals r OLS i. 2. Estimate ˆσ 2 M = 1 10 10 i=1 (r OLS i ) 2 and ˆσ 2 F = 1 10 20 (r OLS i=11 i ) 2. Due to the simple structure of the matrix Σ, we have ŵ i = 1 ˆσ 2 M (i = 1,...,10) and ŵ i = 1 ˆσ 2 F (i = 11,...,20) ˆβ GLS = ( n ) 1 ( n ) w i X i X i w i X i Y i. i=1 i=1 14

Application to panel data regression Let ε it be the error term of a panel data regression model, with 1 i n, and 1 t T. Three different specifications are common: 1. V ar(ε it ) = σ 2 and all covariances between error terms are zero. OLS can be applied (no weighting). 2. V ar(ε it ) = σi 2 and all covariances between error terms are zero. We have cross-sectional heteroscedasticity. GLS can be applied (cross-section weights): 3. V ar(ε it ) = σi 2, Cov(ε it, ε jt ) = σ ij, all other covariances zero. We allow now for contemporaneous correlation between cross-sections. GLS can be applied (SUR weights). 15

Example South American (continued) 1. Have a look at the residuals (View/residuals/Graphs) within the pool object). Compute the covariance and the correlation matrix of the residuals (i) Is there cross-sectional heteroscedasticity? (ii) Is there contemporaneous correlation? 2. Estimate now the model with the appropriate GLS estimator. Are the results depending a lot on the weighting scheme? 3. Is there still serial correlation present in the residuals, i.e. (cross)-correlation at leads and lags? Hence, is the model capturing the dynamics in the data? 16

3 The Fixed Effects regression model Fixed effects Model: Y it = X itβ + α i + ε it with t = 1,...T time periods and i = 1...,N cross-sectional units. The α i contain the omitted variables, constant over time, for every unit i. The α i are called the fixed effects, and induce unobserved heterogeneity in the model. The X it are the observed part of the heterogeneity. The ε it contain the remaining omitted variables. 17

Testing for unobserved heterogeneity: H 0 : α 1 =... = α N := α (Test for redundant fixed effects) In case H 0 holds, there is no unobserved heterogeneity, and the model reduces to the pooled regression model: Y it = X itβ + α + ε it Ignoring unobserved heterogeneity may lead to severe bias of the estimated β, see figure: 18

15 Cross Section 1 Cross Section 2 10 Pooled Regression y 5 Cross Section 3 0 1 2 3 4 5 6 7 x 19

LSDV estimation LSDV=Least Squares Dummy Variable estimation Rewrite the model as Y it = α 1 D 1 i +... + α n D n i + X itβ + ε it, with D j i = 1 if i = j and zero if i j. Estimate model by OLS or GLS (weighting). If necessary, use White/Newey West type of Standard Errors (also if GLS is used, see later). 20

Within groups estimator Compute averages of X it and Y it within each group of cross-sectional unit X i. and Ȳi. Y it = X it β + α i + ε it Ȳ i. = X i. β + α i + ε i. (Y it Ȳi.) = (X it X i. ) β + (ε it ε i. ) Regress the centered Y it on the centered X it by OLS. By centering, the fixed effects are eliminated! One can show that the within group estimator is identical to LSDV. 21

Comments 1. If a variable X it is constant in time for all cross-sections, the FE model cannot be estimated. Why? 2. The fixed effects model can be rewritten with a common intercept included as Y it = X itβ + α + µ i + ε it, and µ 1 + µ 2 +... + µ N = 0. Obviously, we have α i = α + µ i, and α is the average of the fixed effects. 22

3. One can add time effects (or period effects) in the model: Y it = X itβ + α i + δ t + ε it, The δ t contain the omitted variables, constant over cross-sections, at every time point t. The time effects capture the business cycle. 23

4. If we think that the cross-sectional units are an i.i.d. sample (typical for micro-applications), but serial correlation or period heteroscedasticity is present (within each unit), then OLS can be made more precise/efficient: (a) V ar(ε it ) = σt 2 and all covariances between error terms are zero. We have period heteroscedasticity. GLS can be applied (Period weights): (b) V ar(ε it ) = σt 2, Cov(ε it, ε is ) = σ ts, all other covariances zero. We allow for serial correlation. GLS can be applied (Period weights). 24

Example: Grunfeld data We consider investment data for 10 American firms from 1935-1954, and consider the model INV it = β i1 V AL it + β i2 CAP it + α i + ε it for 1 i N = 10, and 1 t T = 20. The variables are Gross investment for the firm (INV) Value of the firm (VAL) Real Value of the Capital stock (plant and equipment) (CAP) The data are in the excel file grunfeld2.xls. 25

1. Have a look at the data in the Excel File. Write up the number of observations, the number of variables, and the upper left cell of the data matrix. Close the Excel file, create an unstructured Workfile and read in the data (Proc/Import/Read Text Lotus Excel). 2. To apply a panel structure, double click on the Range: line at the top of the workfile window, or select Proc/Structure/Resize Current Page. Select Dated Panel, and enter the appropriate variables as Date Series and as Cross Section ID series. 3. Open the investment series. Explore the Descriptive Statistics and tests menu. 26

4. Use View/Graph to (i) Make a line plot of the time series for every cross section (ii) Make boxplots of the distribution of investment over the different cross sections and over time. 5. Use Quick/Estimate Equation to estimate the fixed effects model. Specify the equation inv c cap value and use Panel Options to indicate that you use fixed effects. 6. Interpret your outcome. Would it be useful to add period effects? Test whether they this is necessary with View/Fixed Random Effects testing. 7. Select an appropriate weighting scheme within Panel Options. Interpret your outcome. 27

4 Random Effects model Model Y it = c + X itβ + ε it where the error term is decomposed as ε it = α i + v it. α i is a random effect N(0, σα). 2 It is the permanent component of the error term. v it a noise term N(0, σv). 2 It is the idiosyncratic component of the error term. 28

(The v it are uncorrelated among cross-sections, are serially uncorrelated at all leads and lags, within and across cross sections. The random effects are uncorrelated among cross-sections.) At the price of one extra parameter σα 2, the random effects model allows for correlation within cross-section units: For every i and t s: Cov(ε it, ε is ) = Cov(α i + v it, α i + v is ) = σ 2 α The following Variance decomposition holds: Var(ε it ) = Var(α i + v it ) = σ 2 α + σ 2 v. 29

Within groups/cross sections correlation: ρ = Corr(ε it, ε is ) = σ2 α σ 2 α + σ 2 v. The larger the value of ρ, the more unobserved heterogeneity. One estimates β by Generalized Least Squares, and obtains the RE-estimator. Different methods are existing to make GLS feasible. 30

Testing for correlated random effects: The random effect α i needs to be uncorrelated with the X-variables. This is a strong assumption. If not, there is an endogeneity problem, and the RE-estimator is inconsistent. H 0 : Corr(α i, X it ) = 0 The Hausman test compares two estimators: the FE (always consistent) and the RE estimator (consistent under H 0 ). One rejects H 0 if the difference between the two estimators is large. 31

Using Fixed or random effects? In econometrics, the fixed effects model seems to be the most appropriate (H O not needed). If N is large, and T is small, and the cross-sectional units are a random sample from a population, then random effects model becomes attractive: It is a parsimonious model, that captures within group-correlation. (For N large, FE requires estimation of many parameters) Random effects is popular for modeling grouped data: (i) Sample of 1000 children coming from 30 different schools (ii) Sample of 1000 persons from 20 different villages... 32

Robust Standard Errors: For RE no weighted versions are available. Using robust standard errors (or coefficient covariance) might be appropriate. This only affects the SE, not the estimators. 1. White cross section: robust to V ar(ε it ) = σi 2 and Cov(ε it, ε jt ) = σ ij. [robust to cross-section heteroscedasticity and contemporenous correlation among cross sections; appropriate if N << T.] 2. White period: robust to V ar(ε it ) = σ 2 t and Cov(ε it, ε is ) = σ ts. [robust to serial correlation within cross-section and changing variances over time; appropriate if cross-sections are random sample and T << N.] 3. White diagonal: robust to V ar(ε it ) = σ 2 it [robust to all forms of heteroscedasticity, but not robust for any type of correlation over time of across cross-section.] Can also be used for FE. 33

Exercise Consider the grunfeld data in grundfeld2.wf1. The model was: INV it = β i1 V AL it + β i2 CAP it + α i + ε it 1. Estimate the model as a random effects model. 2. What is the within-group correlation? 3. Perform the Hausman test. (View/Fixed random effects testing/correlated random effects) 4. Compute different types of robust SE. How is this affecting the results? 34