The Importance of Reproducible Research



Similar documents
On Reproducible Econometric Research

Monitoring Structural Change in Dynamic Econometric Models

From the help desk: Bootstrapped standard errors

Interacting with local and remote data repositories using the stashr package

for an appointment,

How To Understand The Theory Of Probability

SAS Software to Fit the Generalized Linear Model

Implementing a Class of Structural Change Tests: An Econometric Computing Approach

Maximum likelihood estimation of mean reverting processes

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Hailong Qian. Department of Economics John Cook School of Business Saint Louis University 3674 Lindell Blvd, St. Louis, MO 63108, USA

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Data Availability Policies & Author Responsibility Policies Time of Evaluation: May 2014

Automatic Generation of Simple (Statistical) Exams

Master programme in Statistics

VI. Introduction to Logistic Regression

Department of Economics

Comparison of resampling method applied to censored data

Statistical Rules of Thumb

Curriculum Vitae Richard A. L. Carter

Nicholas J. Gonedes. 1971/1972: Graduate School of Industrial Administration, Carnegie-Mellon University.

ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL DATA FROM NORTH CAROLINA BADI H. BALTAGI*

The Variability of P-Values. Summary

Statistical Methods for research in International Relations and Comparative Politics

RUNNING HEAD: FAFSA lists 1

A Case Study in Software Enhancements as Six Sigma Process Improvements: Simulating Productivity Savings

Minimum LM Unit Root Test with One Structural Break. Junsoo Lee Department of Economics University of Alabama

Testing for Granger causality between stock prices and economic growth

Statistics Graduate Courses

Health Policy and Administration PhD Track in Health Services and Policy Research

Organizing Your Approach to a Data Analysis

Teaching model: C1 a. General background: 50% b. Theory-into-practice/developmental 50% knowledge-building: c. Guided academic activities:

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

EDMS 769L: Statistical Analysis of Longitudinal Data 1809 PAC, Th 4:15-7:00pm 2009 Spring Semester

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

F nest. Monte Carlo and Bootstrap using Stata. Financial Intermediation Network of European Studies

The frequency of visiting a doctor: is the decision to go independent of the frequency?

Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University

How To Close The Loop On A Fully Differential Op Amp

U.S DEPARTMENT OF COMMERCE

Calculating the Probability of Returning a Loan with Binary Probability Models

Chapter 1 Introduction. 1.1 Introduction

Econometrics and Data Analysis I

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050

INTERNATIONAL UNIVERSITY OF JAPAN Public Management and Policy Analysis Program Graduate School of International Relations

A spreadsheet Approach to Business Quantitative Methods

11. Time series and dynamic linear models

Simulation and Risk Analysis

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Flood Risk Analysis considering 2 types of uncertainty

From the help desk: Swamy s random-coefficients model

REPORT DOCUMENTATION PAGE

Sample Size Designs to Assess Controls

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Note 2 to Computer class: Standard mis-specification tests

SOFTWARE PERFORMANCE EVALUATION ALGORITHM EXPERIMENT FOR IN-HOUSE SOFTWARE USING INTER-FAILURE DATA

health economics and policy

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

Clustering in the Linear Model

Statistical Functions in Excel

APPENDIX 15. Review of demand and energy forecasting methodologies Frontier Economics

R: A Free Software Project in Statistical Computing

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, Hull Univ. Business School

STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs.

Local Government Information Security Risk in the Age of E-Government. Eunjung Shin Lauren N. Bowman PhD Students. Eric Welch Associate Professor

The Impact of Release Management and Quality Improvement in Open Source Software Project Management

How To Get A Degree In Economics At The University Of Houston

PROBABILITY AND STATISTICS. Ma To teach a knowledge of combinatorial reasoning.

CV of Dr. Joachim Schnurbus

Online Appendix Assessing the Incidence and Efficiency of a Prominent Place Based Policy

Probability and Statistics

QMB 3302 Business Analytics CRN Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209

Elements of statistics (MATH0487-1)

Fixed Effects Bias in Panel Data Estimators

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

Department of Epidemiology and Public Health Miller School of Medicine University of Miami

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Least Squares Estimation

Teaching Statistics with Fathom

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Generalized Linear Models

Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes

Transcription:

The Importance of Reproducible Research Christian Kleiber Universität Basel Berne, 2014-11-07 Workshop Improving Data Access and Research Transparency (DART) in Switzerland

Outline 1 Introduction 2 Reproducibility (in economics) 3 Case studies in forensic econometrics Confidence intervals for breakpoints in time series Data problems Complete separation in a binary response model Complete separation in a regression model for count data 4 Some suggestions 5 References Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 2 / 29

Introduction Computation-based science publication is currently a doubtful enterprise because there is not enough support for identifying and rooting out sources of error in computational work. Donoho (Biostatistics 2010) We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail. Ince, Hatton, Graham-Cumming (Nature 2012) Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 3 / 29

Introduction Q1: What is replication, reproduction, etc.? Old definition: (replication in the wide sense) Getting similar results using different data, different methods,... New definition: (replication in the narrow sense) Getting the exact same (!) tables, figures, etc. as the original publication. Emerging terminology: computational reproducibility Q2: Why work reproducibly? more impact, citations, feedback,... better computing environments more effective advising An emerging community standard in various fields. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 4 / 29

Introduction Some recent papers from various fields: Donoho DL, Maleki A, Shahram M, Rahman I, Stodden V (2009). Reproducible research in computational harmonic analysis. Computing in Science & Engineering, 11(1), 8 18. Donoho D (2010). An invitation to reproducible computational research. Biostatistics, 11(3), 385 388. Ince DC, Hatton L, Graham-Cumming J (2012). The case for open computer programs. Nature, 482, 485 488. Peng RD, Dominici F, Zeger SL (2006). Reproducible epidemiologic research. American J Epidemiology, 163, 783 789. Vandewalle P, Kovacevic J, Vetterli M (2009). Reproducible research in signal processing. IEEE Signal Processing Magazine, 26(3), 27 47. And a recent book: Stodden V, Leisch F, Peng RD (eds) (2014). Implementing Reproducible Research. Chapman & Hall. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 5 / 29

Introduction Traditional issues: Why are some publications not reproducible? Data are not available. Data available, but code is not. Data and code are available, but there are data problems, numerical problems, software problems,... Recent threats to reproducibility: Data explosion, big data. Rise of computational science (simulation-based inference, etc.). Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 6 / 29

Reproducibility in economics 1982 J Money, Credit and Banking (JMCB) Data Storage and Evaluation Project, funded by NSF Dewald, Thursby and Anderson (AER 1986) find: only 2 out of 54 works replicable Our findings suggest that inadvertent errors in published empirical articles are a commonplace rather than a rare occurrence.... we recommend that journals require the submission of programs and data at the time empirical papers are submitted. 1986 Replication policy at American Economic Review: Data and code New JMCB study (McCullough, McGeary and Harrison, JMCB 2006): now 14 out of 62 replicable McCullough and Vinod (AER 2003) attempt replication of an entire issue of AER Since 2004: mandatory (?) data and code archives at American Economic Review, Econometrica, Review of Economic Studies, J Political Economy, Review of Economics and Statistics Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 7 / 29

Reproducibility in economics Neglected issue: Simulations (Kleiber and Zeileis 2013). JAE JoE Freq of manuscripts in total 40 15 with simulation 33 14 Freq of data availability in archive 31 0 proprietary 6 0 not available 0 12 none used 3 3 Freq of simulation types Monte Carlo 17 11 Resampling 15 3 Simulation-based estimation 13 3 Nonstandard distributions 2 0 Prop of all manuscripts with simulation 82.5 93.3 indicating software used 65.0 26.7 providing code 45.0 6.7 with code available upon request 17.5 0.0 Prop of simulation manuscripts with replication files 30.3 7.1 with random seed 15.2 7.1 Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 8 / 29

Case studies in forensic econometrics Examples: (mainly taken from my own work) Evaluation of a nonstandard distribution in time series econometrics A classical panel data set with (too) many versions Non-existing estimates in a binary response model Non-existing estimates in a count data regression model Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 9 / 29

Confidence intervals for breaks in time series Example: Breaks in the US real interest rate, 1961 1986. Bai and Perron (J Applied Econometrics 2003): Regression on a constant, standard errors for break points via HAC methods with automated bandwidth selection. Point estimates of break dates are fully reproducible... but only 2 out of 3 confidence intervals. Computational task: confidence intervals require quantiles from a non-standard distribution. Issues: coding error software fault Details: Zeileis and Kleiber (J Applied Econometrics 2005). Data and computational tools available in R package strucchange. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 10 / 29

Confidence intervals for breaks in time series RealInt 5 0 5 10 1960 1965 1970 1975 1980 1985 Time Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 11 / 29

Confidence intervals for breaks in time series Asymptotics of break points: Limiting distribution is distribution of where argmax V (s) s V (s) = { W1 ( s) s /2 for s 0, ξ(φ2 /φ 1 )W 2 (s) ξs/2 for s > 0. A two-sided Brownian motion with different scales and linear drifts. Right branch of limiting distribution: G(x) = 1 + + { ξ x exp φ 2π ( d + 2 ξ 2φ x ) Φ } 8φ x ξ2 ( ξ 2 φ + c exp(ax)φ( b x) ) x (x > 0) Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 12 / 29

Confidence intervals for breaks in time series P(argmaxV x) 0.0 0.2 0.4 0.6 0.8 1.0 T^1 T^2 T^3 10 5 0 5 10 x Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 13 / 29

Confidence intervals for breaks in time series P(argmaxV x) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 GAUSS R 0 50 100 150 200 x Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 14 / 29

Data problems: Grunfeld data Grunfeld Y (1958). The Determinants of Corporate Investment. Unpublished Ph.D. Dissertation, University of Chicago. Originally, an empirical study of corporate investment, with a panel of large US firms over a period of 20 years (1935 1954): [1] "General Motors" "US Steel" "General Electric" [4] "Chrysler" "Atlantic Refining" "IBM" [7] "Union Oil" "Westinghouse" "Goodyear" [10] "Diamond Match" Later used for illustrations in econometric methodology, notably panel and SUR models. Used in numerous textbooks, including Maddala (1977): Econometrics (10 firms) Greene (2003): Econometric Analysis, 5e (5 firms) Greene (2008): Econometric Analysis, 6e (10 firms) Baltagi (2008): Econometric Analysis of Panel Data, 4e (10 firms) In fact, there are 11 firms... and also more data (years 1955 56 for some firms). Complete and correct data available in R package AER, accompanying Kleiber and Zeileis (2008): Applied Econometrics with R. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 15 / 29

Data problems: Grunfeld data 2010 2000 1990 1980 1970 1960 Theil Fomby, Hill, Johnson Grunfeld select 10 Boot, de Wit (all but AS) selects 2 (GE, WH) select 3 (WH, GE, GM) 2 errors Maddala Vinod, Ullah 2 errors + selects 5 (GM, US, GE, CH, WH) 1 error? Griffiths, Hill, Judge Hill, Griffiths, Lim Greene/1st Greene/5th Baltagi/Econ Baltagi/Panel Greene/6th Kleiber, Zeileis Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 16 / 29

Complete separation Example: Data from Maddala GS (2001). Introduction to Econometrics, 3rd ed, J. Wiley. Data on 44 US states for 1950. Variables are rate Murder rate per 100,000 (FBI estimate, 1950). convictions No. of convictions divided by no. of murders in 1950. executions Average number of executions during 1946 1950 divided by convictions in 1950. time Median time served (in months) of convicted murderers released in 1951. income Median family income in 1949 (in 1,000 USD). lfp Labor force participation rate in 1950 (in percent). noncauc Proportion of population that is non-caucasian in 1950. southern Region (factor). Stokes H (2004). On the advantage of using two or more econometric software systems to solve the same problem. J Economic and Social Measurement, 29, 307 320. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 17 / 29

Complete separation Problem: Coefficient on southern somewhat unusual... Logit model estimated using defaults: Estimate Std. Error z value 17.33126 2872.17069 0.00603 Change of convergence controls: Reason: no yes FALSE 9 0 TRUE 20 15 Estimate Std. Error z value 31.33110940 17327434.17699462 0.00000181 quasi-complete separation Hence MLE does not exist... Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 18 / 29

Count data regression Example: Recreation demand Cross-sectional data (n = 659) on the number of recreational boating trips to Lake Somerville, TX, in 1980, based on a survey administered to 2,000 registered leisure boat owners in 23 counties in eastern Texas. Variable Description trips Number of recreational boating trips. quality Facility s subjective quality ranking on scale 1 5. ski Was the individual engaged in water-skiing? income Annual household income (in 1,000 USD). userfee Did the owner pay an annual user fee at Lake Somerville? costc Expenditure when visiting Lake Conroe (in USD). costs Expenditure when visiting Lake Somerville (in USD). costh Expenditure when visiting Lake Houston (in USD). Data are used in various publications, among them Sellar, Stoll and Chavas, Land Economics 1985 Ozuna and Gomez, Empirical Economics 1995 Gurmu and Trivedi, J Business and Economic Statistics 1996 Cameron and Trivedi, Regression Models for Count Data, CUP 2013 Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 19 / 29

Count data regression trips trips 0 100 200 300 400 0 4 8 12 20 25 30 40 50 88 Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 20 / 29

Count data regression Source: Ozuna and Gomez, Specification and Testing of Count Data Recreation Demand Functions, Empirical Economics 1995. Methodology: Poisson and negative binomial regression. Footnote says: It should be noted that one of the anonymous referees re-estimated the models used in this study using the same data set and he obtained different parameter estimates. The referee and the authors of this article agreed that the problem was in the software used to estimate the models. The referee used LIMDEP 6.0 for the Poisson and MICROFIT 3.0 for the NLS models whereas the authors used GAUSS 3.0. This is an important observation since the parameter estimates affect consumer surplus. Researchers should thus be cautious of the software they use to estimate the models. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 21 / 29

Summary Summary of problems: Breakpoint estimation: closed source, coding errors Grunfeld: ancient data Complete separation in binary response: statistical and software issues Complete separation in count data: statistical and computational issues Implicit issues: In econometrics (and other social sciences?), software development is often considered as a subsidiary activity. By implication, the established econometrics journals currently do not publish papers on software development. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 22 / 29

Some suggestions: Authors How to improve on the current situation? Authors: release data and code If journal does not have an archive: use e.g. RePEc code archives. publish case studies that document problems Journals: mandatory archives for data and code (ideally, an editorial function) require data and code already at submission publish case studies, replications, etc. Instructors: use archives in teaching all of this also applies to computational economics and computational social science Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 23 / 29

Some suggestions: Technology Technology for econom(etr)ics: Version control system (svn, Dropbox,...) Data in.txt/.csv (no proprietary formats please) L A TEX Statistical software... Most of these tools were unavailable 50 (or 40, 30, 20, 10...) years ago, but now the technology is available and we should use it. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 24 / 29

Some suggestions: Technology Computational tools for reproducible research: Desirable: more than data and code fully replicable analyses. One solution: Example: literate programming R function Sweave() combines R and L A TEX See Leisch (2002) for more information. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 25 / 29

Some suggestions: Technology Sweave() example: (source code from Zeileis and Kleiber, 2005)... Confidence intervals for the breakpoints can be computed from the fitted \texttt{bp.ri} object for any number of breaks (smaller than the maximal number of breaks admissible) using the \texttt{confint} method from \texttt{strucchange}. A function for estimating the covariance matrix, here \texttt{kernhac}, may again be supplied. <<eval=true, echo=false, results=hide>>= library("strucchange") data("realint") bp.ri <- breakpoints(realint ~ 1, h = 15) cis <- confint(bp.ri, breaks = 3, vcov = kernhac) @ This returns the breakpoints and corresponding confidence intervals (at the default 95\% level) coded by... Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 26 / 29

Some suggestions: Computational tools Computational tools section from Kleiber and Zeileis (2013): Our results were obtained using R 2.15.0 with the packages strucchange 1.4-6, and lattice 0.20-6 and were identical on various platforms including PCs running Debian GNU/Linux (with a 3.2.0-1-amd64 kernel) and Mac OS X, version 10.6.8. Normal random variables were generated from uniform random numbers obtained by the Mersenne Twister currently R s default generator by means of the inversion method. The random seed and further technical details are available in the code supplementing this paper. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 27 / 29

References Econom(etr)ics: Anderson RD, Greene WH, McCullough BD, Vinod HD (2008). The role of data/code archives in the future of economic research. J Economic Methodology, 15(1), 99 119. Kleiber C, Zeileis A (2010). The Grunfeld data at 50. German Economic Review, 11(4), 404 417. Kleiber C, Zeileis A (2013). Reproducible econometric simulations. J Econometric Methods, 2(1), 89 99. Lovell MC, Selover DD (1994). Econometric software accidents. Economic J, 104, 713 725. McCullough BD, Vinod HD (1999). The numerical reliability of econometric software. J Economic Literature, 37, 633 665. McCullough BD, Vinod HD (2003). Verifying the solution from a nonlinear solver: A case study. American Economic Review, 93, 873 892. Newbold P, Agiakloglou C, Miller J (1994). Adventures with ARIMA software. International J Forecasting, 10, 573 581. Zeileis A, Kleiber C (2005). Validating multiple structural change models A case study. J Applied Econometrics, 20, 485 490. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 28 / 29

References Other fields: Buckheit JB, Donoho DL (1995). WaveLab and reproducible research. Dept. of Statistics, Stanford University, Tech. Rep. 474. Donoho DL, Maleki A, Shahram M, Rahman I, Stodden V (2009). Reproducible research in computational harmonic analysis. Computing in Science & Engineering, 11(1), 8 18. Donoho D (2010). An invitation to reproducible computational research. Biostatistics, 11(3), 385 388. Ince DC, Hatton L, Graham-Cumming J (2012). The case for open computer programs. Nature, 482, 485 488. Leisch F (2002). Sweave: Dynamic generation of statistical reports using literate data analysis. In Härdle W, Rönz B (eds.), Compstat 2002 Proc. in Computational Statistics, pp. 575 580. Physica Verlag, Heidelberg. Peng RD, Dominici F, Zeger SL (2006). Reproducible epidemiologic research. American J Epidemiology, 163, 783 789. Vandewalle P, Kovacevic J, Vetterli M (2009). Reproducible research in signal processing. IEEE Signal Processing Magazine, 26(3), 27 47. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, 2014-11-07 29 / 29