Econometrics. Week 5. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Similar documents
Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Panel Data Econometrics

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Panel Data: Linear Models

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

Clustering in the Linear Model

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

Lecture 15. Endogeneity & Instrumental Variable Estimation

SYSTEMS OF REGRESSION EQUATIONS

Redistributional impact of the National Health Insurance System

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

2. Linear regression with multiple regressors

Marketing Mix Modelling and Big Data P. M Cain

Mgmt 469. Fixed Effects Models. Suppose you want to learn the effect of price on the demand for back massages. You

Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, Professor Patricia A.

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Example: Boats and Manatees

Identifying Non-linearities In Fixed Effects Models

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

An Introduction to Regression Analysis

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Correlated Random Effects Panel Data Models

Financial Risk Management Exam Sample Questions/Answers

Solución del Examen Tipo: 1

Lecture 3: Differences-in-Differences

Econometrics Simple Linear Regression

Employer-Provided Health Insurance and Labor Supply of Married Women

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

An Investigation of the Statistical Modelling Approaches for MelC

Introduction to Regression and Data Analysis

Intuit Small Business Employment Index. White Paper

Premaster Statistics Tutorial 4 Full solutions

On Marginal Effects in Semiparametric Censored Regression Models

FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Introduction to Fixed Effects Methods

Examining the effects of exchange rates on Australian domestic tourism demand: A panel generalized least squares approach

The Effects of Unemployment on Crime Rates in the U.S.

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Is the Forward Exchange Rate a Useful Indicator of the Future Exchange Rate?

Mortgage Lending Discrimination and Racial Differences in Loan Default

Are Chinese Growth and Inflation Too Smooth? Evidence from Engel Curves

Chapter 7: Dummy variable regression

Chapter 9: Univariate Time Series Analysis

Module 5: Multiple Regression Analysis

Note 2 to Computer class: Standard mis-specification tests

17. SIMPLE LINEAR REGRESSION II

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.

The Effect of Housing on Portfolio Choice. July 2009

Module 3: Correlation and Covariance

Labor Economics, Lecture 3: Education, Selection, and Signaling

Topic 1 - Introduction to Labour Economics. Professor H.J. Schuetze Economics 370. What is Labour Economics?

Introduction to Longitudinal Data Analysis

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Wooldridge, Introductory Econometrics, 4th ed. Chapter 10: Basic regression analysis with time series data

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Chapter 4: Vector Autoregressive Models

Premium Copayments and the Trade-off between Wages and Employer-Provided Health Insurance. February 2011

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Cost implications of no-fault automobile insurance. By: Joseph E. Johnson, George B. Flanigan, and Daniel T. Winkler

The Life-Cycle Motive and Money Demand: Further Evidence. Abstract

A Basic Introduction to Missing Data

Centre for Central Banking Studies

Econometric analysis of the Belgian car market

FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits

COURSES: 1. Short Course in Econometrics for the Practitioner (P000500) 2. Short Course in Econometric Analysis of Cointegration (P000537)

16 : Demand Forecasting

Integrated Resource Plan

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

Review Jeopardy. Blue vs. Orange. Review Jeopardy

1 Short Introduction to Time Series

LOGIT AND PROBIT ANALYSIS

Poor identification and estimation problems in panel data models with random effects and autocorrelated errors

Binomial Sampling and the Binomial Distribution

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Part 2: Analysis of Relationship Between Two Variables

The Wage Return to Education: What Hides Behind the Least Squares Bias?

Who Pays the Social Security Tax? *

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

The Effect of R&D Expenditures on Stock Returns, Price and Volatility

Not Your Dad s Magic Eight Ball

Multilevel Models for Longitudinal Data. Fiona Steele

EFFECT OF INVENTORY MANAGEMENT EFFICIENCY ON PROFITABILITY: CURRENT EVIDENCE FROM THE U.S. MANUFACTURING INDUSTRY

Chapter 3: The Multiple Linear Regression Model

Hypothesis Testing for Beginners

Multivariate Analysis of Health Survey Data

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Fixed Effects Bias in Panel Data Estimators

Regression Analysis of the Relationship between Income and Work Hours

UNIVERSITY OF WAIKATO. Hamilton New Zealand

Do broker/analyst conflicts matter? Detecting evidence from internet trading platforms

MODELS FOR PANEL DATA Q

Regression III: Advanced Methods

Transcription:

Econometrics Week 5 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 21

Recommended Reading For the today Pooling cross sections across time. Simple panel data methods. Chapter 13 (pp. 408 434). For the next week Advanced Panel Data Methods. Chapter 14 (pp. 441 460). 2 / 21

Panel Data vs. Pooled Cross Sections Up until now, we covered multiple regression analysis using: Pure cross-sectional data each observation represents an individual, firm, etc. Pure time series data each observation represents a separate period. In an economic applications, we often have data which have both these dimensions we may have cross sections for different time periods. We will talk about 2 types of pooled data: Independently pooled cross sections Panel data (sometimes called longitudinal data) Panel data are not independently distributed across time as pooled cross sections! We will cover basics to introduce the methods. 3 / 21

Panel Data vs. Pooled Cross Sections Pooled Cross Sections Population surveys - each period, Statistical Bureau independently samples the population At each period, the sample is different. these are independent cross-sections, but we can take time into account. Panel Data Each year, the European Community Household Panel surveys the same individuals on a range of questions: income, health, education, employment, etc. These are cross-sections with time order. We have observations for each individual with temporal ordering. Sample does not change. 4 / 21

What is the basic intuition? Time Series vs. Panel Data y it = β 0 + β 1 x it1 + u it u it N(0, Σ) t = 1, 2,..., n i = 1, 2,..., N cross-sections But let as introduce the intuition step by step first... 5 / 21

Pooling Independent Cross Sections Across Time If a random sample is drawn at each time period, resulting data are independently pooled cross sections. Reasons for pooling cross sections: To increase sample size more precise estimators. To investigate the effect of time (simply add dummy variable) We may assume that parameters remain constant wages it = β 0 + β 1 educ it + u it Alternatively, we may want to investigate the effect of time wages it = β 0t + β 1 educ it + u it Or, we may want to investigate whether relations change in time wages it = β 0 + β 1t educ it + u it 6 / 21

Pooling Independent Cross Sections Across Time cont. Let s find out if an hourly wage pooled across the years 1978 and 1985 was dependent on an education and gender gap. Example: Changes in the Return to Education log(wage) = β 0 + δ 0 y85 + β 1 educ + δ 1 y85educ + β 2 female + u y85 is a dummy equal to 1 if observation is from 1985 and zero if it comes from 1978 (the number of observations is different + different people in the sample). Thus the intercept for 1978 is β 0 and intercept for 1985 is β 0 + δ 0. The return to education in 1978 is β 1 and the return to education in 1985 is β 1 + δ 1. δ 1 measures the 7-year change. We can test the null hypothesis that nothing has changed over the period H 0 : δ 1 = 0 to alternative that the effect has been reduced, H 0 : δ 1 > 0. 7 / 21

The Chow Test for Structural Change We may want to determine if regression function differs across 2 groups 8 6 y 4 2 0 1 2 3 4 5 6 7 y = α 1 + β 1 x + u 1 vs. y = α 2 + β 2 x + u 2 Clearly, α 1 α 2 and β 1 β 2 Resulting y = α 0 + β 0 x + u 0 in the worst fit. 8 / 21

The Chow Test for Structural Change cont. We may want to test if there are 2 (or more) periods in a regression: y it = β 0t + β 1t x it + u it It may not be as obvious as from the example on previous slide. H 0 : β 01 = β 02, β 11 = β 12. Compute simple F test. Alternatively, use pooled cross-sections as in previous example with wages. 9 / 21

Policy Analysis with Pooled Cross Sections Example: What effect will building a garbage incinerator (in Czech spalovna ) have on housing prices? (Wooldridge book, Ex.13.3.) Consider only simple one-period case of 1981 data: prices = β 0 + β 1 near + u where near is dummy variable (1 for houses near incinerator, 0 otherwise) The hypothesis is that prices of houses near the incinerator fall when it is built: H 0 : β 1 = 0 vs. H A : β 1 < 0 Unfortunately, this test does not really imply that building incinerator is causing the lower prices. ˆβ 1 is inconsistent as cov(near, u) 0. 10 / 21

Policy Analysis with Pooled Cross Sections cont. Let s include also another year into the analysis so we can really study the impact. t = {1978, 1981} now: prices it = β 0 + β 1 near it + β 3 Dit 1981 + β 4 near it Dit 1981 The hypothesis is the same, that location of incinerator will decrease the prices, but: H 0 : β 4 = 0 vs. H A : β 4 < 0 ˆβ 4 is called difference-in-differences estimator. u it 11 / 21

Policy Analysis with Pooled Cross Sections cont. Let s look at the logics of the difference-in-differences estimator. E[prices near, 1981] = β 0 + β 1 + β 3 + β 4 E[prices near, 1978] = β 0 + β 1 The difference: E[ prices near ] = E[prices 1981 prices 1987 near] = β 3 + β 4 E[prices far, 1981] = β 0 + β 3 E[prices far, 1978] = β 0 The difference: E[ prices far ] = E[prices 1981 prices 1987 far] = β 3 The difference: E[ prices near prices far ] = β 4 β 4 reflects the policy effect if the incinerator is built with no different inflation. 12 / 21

Two-period Panel Data For a cross-section of individuals, schools, firms, cities, etc., we have two time periods of data. Data are not independent as in pooled cross-sections, so they can be more helpful. It can be used to address some kinds of omitted variable bias. Fixed Effects Model (Unobserved Effects Model) y it = β 0 + βx it + a i + u it a i is time-invariant, individual specific, unobserved effect on the level of y it. a i is reffered to as fixed effect fixed over time. a i is reffered to as unobserved heterogeneity, or individual heterogeneity. u it is idiosyncratic error. 13 / 21

Two-period Panel Data cont. We can rewrite the model as: y it = β 0 + βx it + ν it, where ν it = a i + u it ν it is also called composite error. We can simply estimate this model by pooled OLS. But it will be biased and inconsistent if a i and x it are correlated: cov(a i, x it ) 0 heterogeneity bias. In real-world applications, the main reason for collecting the panel data is to allow for the unobserved effect a i to be corralated with explanatory variables. I.e. we want to explain crime, and allow unmeasured city factors a i affecting the crime to be correlated with i.e. unemployment rate. Simple solution follows... 14 / 21

Two-period Panel Data cont. This is simple to solve as a i is constant over time. Solution: first - differenced estimator Let s write y i2 = β 0 + βx i2 + a i + u i2, (t = 2) y i1 = β 0 + βx i1 + a i + u i1, (t = 1) Subtracting second equation from the first one gives: y i = β x i + u i Here, a i is differenced away. And we have standard cross-sectional equation. If cov( u i, x i ) = 0, β F ˆ D is consistent (strict exogeneity). 15 / 21

Two-period Panel Data cont. Differencing two years is powerful way to control unobserved effects. If we use standard cross-sections instead, it may suffer from omitted variables. But, it is often very difficult to collect panel data (i.e. for individuals), as we have to have the same sample. Moreover, a i can greatly reduce the variation in the explanatory variables. Still, this is solution only when we have 2 data periods (more general next lecture). Organization of Panel Data is crucial (more during the seminars). 16 / 21

Differencing with More than Two Periods We can extend FD to more than two periods. We simply difference adjacent periods A general fixed effects model for N individuals and T=3. y it = δ 1 + δ 2 d2 t + δ 3 d 3 + β 1 x it1 +... + β k x itk + a i + u it, The total number of observations is 3N. The key assumption is that idiosyncratic errors are uncorrelated with explanatory variables: cov(x itj, u is ) = 0 for all t, s and j strict exogeneity. How to estimate? Simply difference equation for t = 1 from t = 2 and t = 2 from t = 3. It will result in 2 equations which can be estimated by pooled OLS consistently under the CLM assumptions. We can simply further extend to T periods. Correlation and heteroskedasticity are treated in the same way as in time series data. 17 / 21

Assumptions for Pooled OLS Using First Differences Assumptions revisited Assumption FD1 For each i, the model is y it = β 1 x it1 +... + β k x itk + a i + u it, t = 1,..., T, where parameters β j are to be estimated and a i is the unobserved effect. Assumption FD2 We have a random sample from the cross section. Assumption FD3 Let X i denote x itj, t = 1,..., T, j = 1,..., k. For each t, the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero: E(u it X i, a i ) = 0., 18 / 21

Assumptions for Pooled OLS Using First Differences cont. An important implication of Ass. FD3 is that E( u it X i ) = 0, t = 2,..., T. Once we control for a i, there is no correlation between the x isj and remaining error u it for all s and t. x itj is strictly exogenous conditional on the unobserved effect. Assumption FD4: No Perfect Collinearity Each explanatory variable changes over time (for at least some i), and no perfect linear relationship exist among the explanatory variables. Assumption FD5: Homoskedastictity The variance of the differenced error, conditional on all explanatory variables, is constant: V ar( u it X i ) = σ 2, for all t = 2,..., T. 19 / 21

Assumptions for Pooled OLS Using First Differences cont. Assumption FD6: No Serial Correlation For all t s, the differences in the idiosyncratic errors are uncorrelated (conditional on all explanatory variables): Cov( u it, u is X i ) = 0, t s. Under the Ass. FD1 FD4, the first-difference estimators are unbiased. Under the Ass. FD1 FD6, the first-difference estimators are BLUE. Assumption FD7: Normality Conditional on X i, the u it are independent and identically distributed normal random variables. Last assumptions assures us that FD estimator is normally distributed, t and F statistics from the pooled OLS on the differenced data have exact t and F distributions. 20 / 21

Thank you Thank you very much for your attention! 21 / 21