Comments on The Effect of College Curriculum on Earnings: Accounting for Non-Ignorable Non-Response Bias by Daniel Hamermesh and Stephen Donald

Similar documents
Race and Gender Differences in College Major Choice

LOGIT AND PROBIT ANALYSIS

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Module 3: Correlation and Covariance

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Does It Pay to Attend an Elite Private College? Evidence on the Effects of Undergraduate College Quality on Graduate School Attendance

Lecture 15. Endogeneity & Instrumental Variable Estimation

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

6/15/2005 7:54 PM. Affirmative Action s Affirmative Actions: A Reply to Sander

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Models for Longitudinal and Clustered Data

Chapter 3 Local Marketing in Practice

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Total Credits: 32 credits are required for master s program graduates and 53 credits for undergraduate program.

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Solución del Examen Tipo: 1

Fairfield Public Schools

Lecture 3: Differences-in-Differences

The earnings and employment returns to A levels. A report to the Department for Education

11. Analysis of Case-control Studies Logistic Regression

2. THE ECONOMIC BENEFITS OF EDUCATION

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Earnings in private jobs after participation to post-doctoral programs : an assessment using a treatment effect model. Isabelle Recotillet

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Research Brief. Can Applying to More Colleges Increase Enrollment Rates? College Board Advocacy & Policy Center. October 2011

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

What Role Do Peer Effects Play in Predicting College Math Courses Taking andSTEM persistence?

UNDERSTANDING THE TWO-WAY ANOVA

Econometrics Simple Linear Regression

The Effects of Focused Area of Study on Self-Reported Skill Gains and Satisfaction of MBA Graduates

Is Economics a Good Major for Future Lawyers? Evidence from Earnings Data

Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Changing Distributions: How Online College Classes Alter Student and Professor Performance

Internal Quality Assurance Arrangements

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Why Do Students Withdraw From Courses?

A Static Version of The Macroeconomics of Child Labor Regulation

STATISTICAL ANALYSIS OF UBC FACULTY SALARIES: INVESTIGATION OF

Markups and Firm-Level Export Status: Appendix

When You Are Born Matters: The Impact of Date of Birth on Child Cognitive Outcomes in England

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015

Unraveling versus Unraveling: A Memo on Competitive Equilibriums and Trade in Insurance Markets

Submitted December Accepted May Elaine Worzala. College of Charleston. Charles C. Tu. University of San Diego

Public and Private Sector Earnings - March 2014

Multilevel Models for Longitudinal Data. Fiona Steele

Chapter 9 Assessing Studies Based on Multiple Regression

ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS

Expansion of Higher Education, Employment and Wages: Evidence from the Russian Transition

Lectures, 2 ECONOMIES OF SCALE

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

EXPLORING THE RELATIONSHIP BETWEEN LEARNING STYLE AND CRITICAL THINKING IN AN ONLINE COURSE. Simone Conceição. Abstract

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

Linear and quadratic Taylor polynomials for functions of several variables.

Breakthrough White Paper: Four Year Colleges vs. Community Colleges

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

USES OF CONSUMER PRICE INDICES

IBM SPSS Direct Marketing 23

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

Marketing Mix Modelling and Big Data P. M Cain

The Effect of Housing on Portfolio Choice. July 2009

2. Linear regression with multiple regressors

Multinomial and Ordinal Logistic Regression

The primary goal of this thesis was to understand how the spatial dependence of

IBM SPSS Direct Marketing 22

AN INTRODUCTION TO MATCHING METHODS FOR CAUSAL INFERENCE

The Binomial Distribution

Determining Future Success of College Students

HOW EFFECTIVE IS TARGETED ADVERTISING?

Topic 1 - Introduction to Labour Economics. Professor H.J. Schuetze Economics 370. What is Labour Economics?

Intermediate Macroeconomics: The Real Business Cycle Model

CAPM, Arbitrage, and Linear Factor Models

The Probit Link Function in Generalized Linear Models for Data Mining Applications

MOBILE MARITIME LEADERSHIP SCHOLARSHIP INSTRUCTIONS TO APPLICANTS

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

Determining Future Success of College Students

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

Introduction to Regression and Data Analysis

Validity, Fairness, and Testing

Relative income concerns and the rise in married women s employment

How To Study The Academic Performance Of An Mba

Simultaneous or Sequential? Search Strategies in the U.S. Auto. Insurance Industry

Cost implications of no-fault automobile insurance. By: Joseph E. Johnson, George B. Flanigan, and Daniel T. Winkler

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Who Goes to Graduate School in Taiwan? Evidence from the 2005 College Graduate Survey and Follow- Up Surveys in 2006 and 2008

Marginal Treatment Effects and the External Validity of the Oregon Health Insurance Experiment

Statistical tests for SPSS

Standard errors of marginal effects in the heteroskedastic probit model

Mortgage Loan Approvals and Government Intervention Policy


Elementary Statistics

NOTES ON HLM TERMINOLOGY

The Opportunity Cost of Study Abroad Programs: An Economics-Based Analysis

Predicting the Performance of a First Year Graduate Student

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Transcription:

Comments on The Effect of College Curriculum on Earnings: Accounting for Non-Ignorable Non-Response Bias by Daniel Hamermesh and Stephen Donald Manuel Arellano Quinta do Lago, May 21, 2005 1. Summary This paper looks at the relationship between college major and earnings using a new dataset, which contains, among other virtues, a richer set of ability-related controls than other papers in the literature. Earlier studies have found large effects of college major on earnings (e.g. an economics major would earn 48 percent more than a philosophy & theology major, but 21 percent less than a chemical engineering major, according to Black, Sanders, and Taylor, 2003). The question this paper helps to answer is whether the observed differentials in returns to college by major are actual monetary premia or spurious reflections of differing abilities of individuals choosing different majors. The authors collect earnings survey data from the population of graduates of their own university. By matching this information with University administrative records, they get detailed measures on college careers, and pre-college background and ability.

Regrettably the response rate was only 25 percent. Thus, although the administrative data are available for all graduates, the earnings data are only available for one in four individuals in the target sample. The problem is that respondents may be an endogenously self-selected group. Another selection problem is that earnings are only observed for employed respondents. To address these problems, the authors consider an econometric model with two selection rules and exclusion restrictions in the earnings equation. Both parametric and semiparametric versions are considered. The determinants of employment excluded from the wage equation are indicators of the presence of young children. The determinant of response excluded from the wage equation is an indicator of membership of the University s alumni association. The idea of using such a kind of affinity measures for adjusting for non-response bias is another nice contribution of the paper that may have wider applicability. The main conclusion is that about half of the earnings differences across majors are accounted for by differences in endowments and post-college activities. The qualitative effect of accounting for non-response on the results turns out to be small. 2

2. Comments on data and variable construction The first issue raised by this kind of data is the extent of its specificity. (a) How specific is the Texas-Austin population of graduates? Some majors in Austin may enjoy a particularly high (low) consideration, leading to abnormally high (low) returns. (b) The distribution of SAT scores for Austin graduates may be more compressed than the distribution from a mix of colleges (given the prevalence of college sorting in the US). This might explain the small effect of SAT results on earnings. (c) Related to this: why not separate math scores from verbal scores? (d) Could graduates from the years 1994-95 and 1999-2000 be affected by the top 10 percent rule in Texas (top 10 percent students of their high school classes are guaranteed admission to leading campuses). Other aspects that merit discussion are the classification by major and major switches. (a) Why not separate natural sciences between life sciences and math-physics? (b) Why not separate economics (and psychology perhaps) from the other social sciences (Turner and Bowen found them to be very different). (c) Do the authors know who switched majors? I understand transferring majors is not uncommon. Is there are switched major effect? 3

3. Does the exclusion of a dummy variable help identification? The paper emphasizes the role of exclusion restrictions for identification in a selection model, and suggests one restriction in the context of survey non-response. A nice aspect of the results is the marked contrast between estimates that rely exclusively on the nonlinearity and those that use exclusion restrictions. The point I discuss here is that with an excluded dummy variable (like membership of the alumni association), functional form assumptions play a more fundamental role in securing identification than in the case of an excluded continuous variable. The object of interest is E (Y X, Z) =g 0 (X), but we observe E (Y X, Z, D =1) = µ (X, Z) =g 0 (X)+λ 0 [p (X, Z)] E (D X, Z) =p (X, Z) The question is whether g 0 (.) and λ 0 (.) are identified from knowledge of µ (X, Z) and p (X, Z). 4

Consider first the case of continuous X and Z. If there is another solution (g, λ ) then g 0 (X) g (X)+λ 0 (p) λ (p) =0 Differentiating (λ 0 λ ) p p Z =0 (g 0 g ) + (λ 0 λ ) p X p X =0 Under the assumption that p/ Z 6= 0,wehave (λ 0 λ ) (g 0 g ) =0, =0 p X so that λ 0 λ and g 0 g are constant (i.e. g 0 is identified up to an unknown constant). This is the identification result of Das, Newey, and Vella (2003). Suppose X is continuous but Z is a dummy variable. In general g 0 (X) is not identified. To see this, consider µ (X, 1) = g 0 (X) + λ 0 [p (X, 1)] µ (X, 0) = g 0 (X)+λ 0 [p (X, 0)] so that we can identify the difference µ (X) =λ 0 [p (X, 1)] λ 0 [p (X, 0)] but this does not suffice to determine λ 0 up to a constant. 5

Take as an example a simple logit or probit model of the form p (X, Z) =F (βx + γz), so that, letting h 0 (.) =λ 0 [F (.)], µ (X) =h 0 (βx + γ) h 0 (βx). Any other solution h should satisfy h 0 (βx + γ) h 0 (βx) =h (βx + γ) h (βx), which holds for a multiplicity of periodic functions. If X is also discrete, there is clearly lack of identification. For example, suppose X and Z are dummy variables: µ (1,j)=g 0 (1) + λ 0 [p (1,j)] µ (0,j)=g 0 (0) + λ 0 [p (0,j)] (j =0, 1) Since λ 0 (.) is unknown g 0 (1) g 0 (0) is not identified. Only λ 0 [p (1, 1)] λ 0 [p (1, 0)] and λ 0 [p (0, 1)] λ 0 [p (0, 0)] are identified. The conclusion is that excluding a binary variable does not guarantee identification by itself, although it may imply tighter bounds, and more credible identification in conjunction with functional form assumptions. Perhaps the idea of using affinity measures can be pursued to using donations from former graduates, or length of membership of the alumni association, which would be variables with a larger support. 6

4. Discussion of results and extensions (Observed) heterogeneity in the effects of major (a) Within each major, a substantial fraction of college graduates pursue advanced degrees. Are there interactions with returns to majors? Others have found interactions of this sort with fewer controls. (b) Black et al. found male-female differences in premia. Do these differences remain after controlling for the background measures available in the present dataset? Issues relating to the affinity measure and its effects (a) Why not showing effects of affinity on the probability of response, to see how good the instrument is given controls? (b) I missed an analysis of determinants of membership of the alumni association. Presumably membership varies with year of class. (c) It is suspicious that the estimates imposing ρ =0are so similar to unrestricted estimates. Is there evidence of lack of identification? Estimated bρ is not shown. (d) An interesting conclusion is the small effect of lack of adjustment for non-response bias. Subject to the exclusion restriction, a test of H 0 : E (Y X, Z, D =1)=E (Y X, D =1) is a test of self-selection, whereas subject to ruling out self-selection it is a test of the validity of the exclusion restriction. Either way it seems a useful diagnostic. 7

Possible extensions Distributional effects of major choice (e.g. effects on the variance of earnings). (a) From Table 2 we see large differences in unconditional standard deviations of earnings by major, but how much of this is accounted by controls we do not know. (b) The issue is of interest because there could be a mean-variance trade-off in the earning consequences of different major choices. Endogeneity of major choice (a) No attempt is made of accounting for endogeneity or unobserved heterogeneity in major choice. (b) I share the authors pessimism on the possibilities of achieving identification of this kind using data on graduates from a single university. (c) Such an exercise would require multi-college data to be able to exploit variation in proximity measures, tuition fees, etc. 8

5. Heterogeneity of treatments vs. heterogeneity of gains from treatment The results in this paper have implications for the interpretation of the heterogeneity in returns to college found in the literature. The nice aspect of the sample used is its high homogeneity and the wealth of background controls available. Yet the authors find significant differences in returns to different majors. Suppose for the sake of the argument that there are two majors with indicators (D A,D B ) and potential outcomes (Y 0,Y 1A,Y 1B ) corresponding to high-school, college with major A, and college with major B. Observed earnings are Y = Y 0 +(Y 1A Y 0 ) D A +(Y 1B Y 0 ) D B and the college high school indicator is D = D A + D B. Suppose we observe Y and D. Under exogeneity (Y 0,Y 1A,Y 1B ) (D A,D B ),the OLS impact is a linear combination of the two average returns: β = E (Y D =1) E (Y D =0)=πE (Y 1A Y 0 )+(1 π) E (Y 1B Y 0 ) where π =Pr(D A =1 D =1). Since π is the result of choice, β does not measure any meaningful causal effect. Note that also Y = Y 0 +(Y 1B Y 0 ) D +(Y 1A Y 1B ) D A, so that under exogeneity, OLS of Y on D A from data for graduates gives E (Y 1A Y 1B ). 9

Now suppose non-exogeneity but that the IV assumption holds (Y 0,Y 1A,Y 1B ) Z, with Z binary. Let the potential indicators ½ of major choice be D1A if Z =1 D A = D D 0A if Z =0 B = and assume absence of defiers in both groups. ½ D1B if Z =1 D 0B if Z =0 It turns out that the IV impact is a linear combination of average treatment effects for two different groups of compliers: Cov (Z, Y ) Cov (Z, D) = E (Y 1A Y 0 D 1A D 0A =1)(1 λ) +E (Y 1B Y 0 D 1B D 0B =1)λ where λ is given by the odds of compliers in one college major relative to the other. Specifically, λ =1/ (1 + ϕ) and ϕ =Pr(D 1A D 0A =1)/ Pr (D 1B D 0B =1). If Y 1A Y 0 and Y 1B Y 0 are constant, the IV impact is just a weighted combination of the two returns. In neither case the IV parameter provides a meaningful causal effect. These expressions can be easily generalized to more than two majors. 10