Panel Data Econometrics



Similar documents
Chapter 2. Dynamic panel data models

Chapter 3: The Multiple Linear Regression Model

Chapter 1. Linear Panel Models and Heterogeneity

Marketing Mix Modelling and Big Data P. M Cain

Normalization and Mixed Degrees of Integration in Cointegrated Time Series Systems

COURSES: 1. Short Course in Econometrics for the Practitioner (P000500) 2. Short Course in Econometric Analysis of Cointegration (P000537)

APPROXIMATING THE BIAS OF THE LSDV ESTIMATOR FOR DYNAMIC UNBALANCED PANEL DATA MODELS. Giovanni S.F. Bruno EEA

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

CAPM, Arbitrage, and Linear Factor Models

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS

Econometric Methods for Panel Data

Panel Data: Linear Models

Empirical Methods in Applied Economics

SYSTEMS OF REGRESSION EQUATIONS

Clustering in the Linear Model

IT Productivity and Aggregation using Income Accounting

Advanced Development Economics: Business Environment and Firm Performance. 20 October 2009

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

A note on the impact of options on stock return volatility 1

1 Another method of estimation: least squares

4. Simple regression. QBUS6840 Predictive Analytics.

Chapter 5: Bivariate Cointegration Analysis

Changing income shocks or changed insurance - what determines consumption inequality?

TEMPORAL CAUSAL RELATIONSHIP BETWEEN STOCK MARKET CAPITALIZATION, TRADE OPENNESS AND REAL GDP: EVIDENCE FROM THAILAND

Is Infrastructure Capital Productive? A Dynamic Heterogeneous Approach.

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Chapter 4: Statistical Hypothesis Testing

Optimal insurance contracts with adverse selection and comonotonic background risk

On Marginal Effects in Semiparametric Censored Regression Models

Chapter 5: The Cointegrated VAR model

Department of Economics

Consistent cotrending rank selection when both stochastic and. nonlinear deterministic trends are present

2. Linear regression with multiple regressors

An Introduction to Regression Analysis

Econometrics Simple Linear Regression

MODELS FOR PANEL DATA Q

Conditional Investment-Cash Flow Sensitivities and Financing Constraints

Econometric Analysis of Cross Section and Panel Data

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

EconS Advanced Microeconomics II Handout on Cheap Talk

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

The Life-Cycle Motive and Money Demand: Further Evidence. Abstract

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

Multivariate Normal Distribution

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.

Econometric analysis of the Belgian car market

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

Topic 5: Stochastic Growth and Real Business Cycles

A Stock-Flow Accounting Model of the Labor Market: An Application to Israel

171:290 Model Selection Lecture II: The Akaike Information Criterion

Chapter 4: Vector Autoregressive Models

From the help desk: Bootstrapped standard errors

160 CHAPTER 4. VECTOR SPACES

Solución del Examen Tipo: 1

Online Appendices to the Corporate Propensity to Save

Examining the effects of exchange rates on Australian domestic tourism demand: A panel generalized least squares approach

Session 9 Case 3: Utilizing Available Software Statistical Analysis

I. Basic concepts: Buoyancy and Elasticity II. Estimating Tax Elasticity III. From Mechanical Projection to Forecast

Economic Growth: Lectures 2 and 3: The Solow Growth Model

Common sense, and the model that we have used, suggest that an increase in p means a decrease in demand, but this is not the only possibility.

Working Paper no. 37: An Empirical Analysis of Subprime Consumer Credit Demand

Non Parametric Inference

University of Saskatchewan Department of Economics Economics Homework #1

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Panel Data Analysis in Stata

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

1 National Income and Product Accounts

Simple Linear Regression Inference

10. Fixed-Income Securities. Basic Concepts

Life Table Analysis using Weighted Survey Data

Example G Cost of construction of nuclear power plants

The Power of the KPSS Test for Cointegration when Residuals are Fractionally Integrated

2013 MBA Jump Start Program. Statistics Module Part 3

Corporate Income Taxation

Cost implications of no-fault automobile insurance. By: Joseph E. Johnson, George B. Flanigan, and Daniel T. Winkler

Transcription:

Panel Data Econometrics Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans University of Orléans January 2010

De nition A longitudinal, or panel, data set is one that follows a given sample of individuals over time, and thus provides multiple observations on each individual in the sample. (Hsiao,2003, page 2).

Terminology and notations: Individual or cross section unit : country, region, state, rm, consumer, individual, couple of individuals or countries (gravity models) etc. Double index : i (for cross-section unit) and t (for time) y it for i = 1,.., N and t = 1,.., T

A micro-panel data set is a panel for which the time dimension T is largely less important than the individual dimension N (example: the University of Michigan s Panel Study of Income Dynamics, PSID with 15,000 individuals observed since 1968): T << N A macro-panel data set is a panel for which the time dimension T is similar to the individual dimension N (example: a panel of 100 countries with quateraly data since the WW2): T ' N

De nition A panel is said to be balanced if we have the same time periods, t = 1,.., T, for each cross section observation. For an unbalanced panel, the time dimension, denoted T i, is speci c to each individual. Remark: While the mechanics of the unbalanced case are similar to the balanced case, a careful treatment of the unbalanced case requires a formal description of why the panel may be unbalanced, and the sample selection issues can be somewhat subtle. => issues of sample selection and attrition.

What are the main advantages of the panel data sets and the panel data models?

Panel data sets for economic research possess several major advantages over conventional cross-sectional or time-series data sets (see Hsiao 2003) Hsiao, C., (2003), Analysis of Panel Data, second edition, Cambridge Universirty Press. Wooldridge J.M., (2001), Econometric Analysis of Cross Section and Panel Data, The MIT Press.

What are the main advantages of the panel data sets and the panel data models? 1 Panel data usually give the researcher a large number of data points (N T ), increasing the degrees of freedom and reducing the collinearity among explanatory variables hence improving the e ciency of econometric estimates => but it is a kind of phantasm... more data points doesn t necessarily imply more information (heterogeneity bias)!! 2 More importantly, longitudinal data allow a researcher to analyze a number of important economic questions that cannot be addressed using cross-sectional or time-series data sets.

What are the main advantages of the panel data sets and the panel data models? De nition The oft-touted power of panel data derives from their theoretical ability to isolate the e ects of speci c actions, treatments, or more general policies.

Example (Ben-Porath (1973), in Hsiao (2003)) Suppose that a cross-sectional sample of married women is found to have an average yearly labor-force participation rate of 50%. 1 ) It might be interpreted as implying that each woman in a homogeneous population has a 50 percent chance of being in the labor force in any given year. 2 ) It might imply that 50 percent of the women in a heterogeneous population always work and 50 percent never work. To discriminate between these two models, we need to utilize individual labor-force histories (the time dimension) to estimate the probability of participation in di erent subintervals of the life cycle.

What are the main advantages of the panel data sets and the panel data models? 3. Panel data data provides a means of resolving the magnitude of econometric problems that often arises in empirical studies, namely the often heard assertion that the real reason one nds (or does not nd) certain e ects is the presence of omitted (mismeasured or unobserved) variables that are correlated with explanatory variables. Panel data allows to control for omitted (unobserved or mismeasured) variables.

Example Example: let us consider a simple regression model. where y it = α + β 0 x it + ρ 0 z it + ε it i = 1,.., N t = 1,.., T x it and z it are k 1 1 and k 2 1 vectors of exogeneous variables α is a constant, β and ρ are k 1 1 and k 2 1 vectors of parameters ε it is i.i.d. over i and t, with V (ε it ) = σ 2 ε Let us assume that z it variables unobservable and correlated with x it cov (x it, z it) 6= 0

Example (II): It is well known the least-squares regression coe cients of y it on x it are biaised. Let us assume that z i,t = z i (z values stay constant throught time for a given individual but vary accross individuals). y it = α + β 0 x it + ρ 0 z i + ε it i = 1,.., N t = 1,.., T Then, we can take the rst di erence of individual observations over time : y it y i,t 1 = β 0 (x it x i,t 1 ) + ε it ε i,t 1 Least squares regression now provides unbiased and consistent estimates of β.

Example (III): Let us assume that z i,t = z t (z values are common for all individuals but vary accross time: common factors). y it = α + β 0 x it + ρ 0 z t + ε it i = 1,.., N t = 1,.., T Then, we can take deviation from the mean across individuals at a given time : y it y t = β 0 (x it x t ) + ε it ε t where y t = (1/N) N i=1 y it x t = (1/N) N i=1 x it ε t = (1/N) N i=1 ε it Least squares regression now provides unbiased and consistent estimates of β.

What are the main advantages of the panel data sets and the panel data models? 4. Panel data involve two dimensions: a cross-sectional dimension N, and a time-series dimension T. We would expect that the computation of panel dataestimators would be more complicated than the analysis of cross-section data alone (where T = 1) or time series data alone (where N = 1). However, in certain cases the availability of panel data can actually simplify the computation and inference.

Example (time-series analysis of nonstationary data) Let us consider a simple AR (1) model. x t = ρx t 1 + ε t where the innovation ε t is i.i.d. 0, σ 2 ε. Under the non stationarity, assumption ρ = 1, it is well known that the asymptotic distribution of the OLS estimator bρ is given by: T (bρ 1) d 1 W (1) 2 1! R T! 2 1 0 W (r)2 dr where W (.) denotes a standard brownian motion.

Example (time-series analysis of nonstationary data) Hence, the behavior of the usual test statistics will often have to be inferred through computer simulations. But if panel data are available, and observations among cross-sectional units are independent, then one can invoke the central limit theorem across cross-sectional units to show that the limiting distributions of many estimators remain asymptotically normal and the Waldtype test statistics are asymptotically chi-square distributed (e.g., Levin and Lin (1993); Im, Pesaran, Shin (1999), Phillips and Moon (1999, 2000); Quah (1994) etc.).

Example (time-series analysis of nonstationary data) Let us consider the panel data model x i,t = ρx i,t 1 + ε i,t where the innovation ε i,t is i.i.d. 0, σ 2 ε over i and t, then: T p N (bρ 1) d! N (0, 2) N,T!

There are three main issues in utilizing panel data: 1 Heterogeneity bias => Chapter 1 2 Dynamic panel data models (Nickel, 1981, bias) => Chapter 2 3 Selectivity bias (not speci c to panel data models)

The heterogeneity bias. When important factors peculiar to a given individual are left out, the typical assumption that economic variable y is generated by a parametric probability distribution function P (Y jθ)), where θ is an m-dimensional real vector, identical for all individuals at all times, may not be a realistic one.

The heterogeneity bias. Ignoring the individual or time-speci c e ects that exist among cross-sectional or time-series units but are not captured by the included explanatory variables can lead to parameter heterogeneity in the model speci cation.

Example Let us consider the example of a simple production function (Cobb Douglas) with two factors (labor and capital). We have N countries and T periods. Let us denote: - y it the log of the GDP for country i at time t. - n it the log of the labor employment for country i at time t. - y it the log of the capital stock for country i at time t. with ε i,t i.i.d. 0, σ 2 ε, 8 i, 8 t. y i,t = α i + β i k i,t + γ i n i,t + ε i,t

Example In this speci cation, the elasticities α i and β i are speci c to each country y i,t = α i + β i k i,t + γ i n i,t + ε i,t with ε i,t i.i.d. 0, σ 2 ε, 8 i, 8 t. But, several alternative speci cations can be considered. First, we can assume that the production function is the same for all countries: in this case we have an homogeneous speci cation: y i,t = α + βk i,t + γn i,t + ε i,t α i = α β i = β γ i = γ

Example However, an homogeneous speci cation of the production function for macro aggregated data is meaningless. We can introduce an heterogeneity of the Total Factor Productivity: more precisely, we can assume that the mean of TFP (given by E (α i + ε i,t ) = α i ) is di erent accross countries (due to institutional organisational factors, etc.). Then, we can use a speci cation with individual e ects, α i and common slope parameters (elasticities β and γ). y i,t = α i + βk i,t + γn i,t + ε i,t β i = β γ i = γ

Example Finally, we can assume that the labor and/or capital elasticities are di erent accross countries.in this case, we will have an heterogeneous speci cation of the panel data model (heterogeneous panel). y i,t = α i + β i k i,t + γ i n i,t + ε i,t

Example In this case, there are two solutions: 1 ) The rst solution consists in using N times series models to produce some group-mean estimates of the elasticities. 2 ) The second solution consists in using a model with random (slope) parameters => random coe cient model. In this case, we assume that parameters β i and γ i and randomly distributed, but follows the same distribution: β i i.i.i β, σ 2 β γ i i.i.i γ, σ 2 γ

The heterogeneity bias. Fact Ignoring such heterogeneity (in slope and/or constant) could lead to inconsistent or meaningless estimates of interesting parameters.

The heterogeneity bias. Let us consider a simple linear with individual e ects and only one explicative variable x i (common slope) as a DGP. y it = α i + βx it + ε it Let us assume that all NT observations fx it, y it g are used to estimate the homogeneous model. y it = α + βx it + ε it

The heterogeneity bias. Source: Hsiao (2003) Broken ellipses= point scatter for an individual over time Broken straight lines = individual regressions. Solid lines = least-squares regression using all NT observations

The heterogeneity bias. All of these gures depict situations in which biases (on bβ) arise in pooled least-squares estimates because of heterogeneous intercepts. Obviously, in these cases, pooled regression ignoring heterogeneous intercepts should never be used. Moreover, the direction of the bias of the pooled slope estimates cannot be identi ed a priori; it can go either way.

The heterogeneity bias. Another example: the true DGP is heterogeneous y it = α i + β i x it + ε it and we use all NT observations fx it, y it g to estimate the homogeneous model. y it = α + βx it + ε it

In this case it is straightforward that pooling of all NT observations, assuming identical parameters for all cross-sectional units, would lead to nonsensical results because itwould represent an average of coe cients that di er greatly across individuals (the phantasm of the NT observations..)

In this case, pooling gives rise to the false inference that the pooled relation is curvilinear. Fact In both cases, the classic paradigm of the representative agent simply does not hold, and pooling the data under homogeneity assumption makes no sense.

The lecture is organized as follows: 1 General Introduction on Panel Data Econometrics 2 Chapter 1. Heterogeneity and the linear model: from the unobserved e ect model to heterogeneous panel data models 3 Chapter 2. Dynamic panel data models.

References Articles (chapter 2) Baltagi, B.H. et Kao, C. (2000), Nonstationary panels, cointegration in panels and dynamic panels : a survey, in Advances in Econometrics, 15, edited by B. Baltagi et C. Kao, pp. 7-51, Elsevier Science. Hurlin, C. et Mignon, V. (2005), Une synthèse des tests de racine unitaire sur données de panel, Economie et Prévision, 169-170-171, pp. 253-294 http://www.univ-orleans.fr/deg/masters/esa/ch/churlin_e.htm