The Importance of Reproducible Research
|
|
|
- Brent Small
- 10 years ago
- Views:
Transcription
1 The Importance of Reproducible Research Christian Kleiber Universität Basel Berne, Workshop Improving Data Access and Research Transparency (DART) in Switzerland
2 Outline 1 Introduction 2 Reproducibility (in economics) 3 Case studies in forensic econometrics Confidence intervals for breakpoints in time series Data problems Complete separation in a binary response model Complete separation in a regression model for count data 4 Some suggestions 5 References Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
3 Introduction Computation-based science publication is currently a doubtful enterprise because there is not enough support for identifying and rooting out sources of error in computational work. Donoho (Biostatistics 2010) We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail. Ince, Hatton, Graham-Cumming (Nature 2012) Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
4 Introduction Q1: What is replication, reproduction, etc.? Old definition: (replication in the wide sense) Getting similar results using different data, different methods,... New definition: (replication in the narrow sense) Getting the exact same (!) tables, figures, etc. as the original publication. Emerging terminology: computational reproducibility Q2: Why work reproducibly? more impact, citations, feedback,... better computing environments more effective advising An emerging community standard in various fields. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
5 Introduction Some recent papers from various fields: Donoho DL, Maleki A, Shahram M, Rahman I, Stodden V (2009). Reproducible research in computational harmonic analysis. Computing in Science & Engineering, 11(1), Donoho D (2010). An invitation to reproducible computational research. Biostatistics, 11(3), Ince DC, Hatton L, Graham-Cumming J (2012). The case for open computer programs. Nature, 482, Peng RD, Dominici F, Zeger SL (2006). Reproducible epidemiologic research. American J Epidemiology, 163, Vandewalle P, Kovacevic J, Vetterli M (2009). Reproducible research in signal processing. IEEE Signal Processing Magazine, 26(3), And a recent book: Stodden V, Leisch F, Peng RD (eds) (2014). Implementing Reproducible Research. Chapman & Hall. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
6 Introduction Traditional issues: Why are some publications not reproducible? Data are not available. Data available, but code is not. Data and code are available, but there are data problems, numerical problems, software problems,... Recent threats to reproducibility: Data explosion, big data. Rise of computational science (simulation-based inference, etc.). Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
7 Reproducibility in economics 1982 J Money, Credit and Banking (JMCB) Data Storage and Evaluation Project, funded by NSF Dewald, Thursby and Anderson (AER 1986) find: only 2 out of 54 works replicable Our findings suggest that inadvertent errors in published empirical articles are a commonplace rather than a rare occurrence.... we recommend that journals require the submission of programs and data at the time empirical papers are submitted Replication policy at American Economic Review: Data and code New JMCB study (McCullough, McGeary and Harrison, JMCB 2006): now 14 out of 62 replicable McCullough and Vinod (AER 2003) attempt replication of an entire issue of AER Since 2004: mandatory (?) data and code archives at American Economic Review, Econometrica, Review of Economic Studies, J Political Economy, Review of Economics and Statistics Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
8 Reproducibility in economics Neglected issue: Simulations (Kleiber and Zeileis 2013). JAE JoE Freq of manuscripts in total with simulation Freq of data availability in archive 31 0 proprietary 6 0 not available 0 12 none used 3 3 Freq of simulation types Monte Carlo Resampling 15 3 Simulation-based estimation 13 3 Nonstandard distributions 2 0 Prop of all manuscripts with simulation indicating software used providing code with code available upon request Prop of simulation manuscripts with replication files with random seed Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
9 Case studies in forensic econometrics Examples: (mainly taken from my own work) Evaluation of a nonstandard distribution in time series econometrics A classical panel data set with (too) many versions Non-existing estimates in a binary response model Non-existing estimates in a count data regression model Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
10 Confidence intervals for breaks in time series Example: Breaks in the US real interest rate, Bai and Perron (J Applied Econometrics 2003): Regression on a constant, standard errors for break points via HAC methods with automated bandwidth selection. Point estimates of break dates are fully reproducible... but only 2 out of 3 confidence intervals. Computational task: confidence intervals require quantiles from a non-standard distribution. Issues: coding error software fault Details: Zeileis and Kleiber (J Applied Econometrics 2005). Data and computational tools available in R package strucchange. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
11 Confidence intervals for breaks in time series RealInt Time Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
12 Confidence intervals for breaks in time series Asymptotics of break points: Limiting distribution is distribution of where argmax V (s) s V (s) = { W1 ( s) s /2 for s 0, ξ(φ2 /φ 1 )W 2 (s) ξs/2 for s > 0. A two-sided Brownian motion with different scales and linear drifts. Right branch of limiting distribution: G(x) = { ξ x exp φ 2π ( d + 2 ξ 2φ x ) Φ } 8φ x ξ2 ( ξ 2 φ + c exp(ax)φ( b x) ) x (x > 0) Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
13 Confidence intervals for breaks in time series P(argmaxV x) T^1 T^2 T^ x Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
14 Confidence intervals for breaks in time series P(argmaxV x) GAUSS R x Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
15 Data problems: Grunfeld data Grunfeld Y (1958). The Determinants of Corporate Investment. Unpublished Ph.D. Dissertation, University of Chicago. Originally, an empirical study of corporate investment, with a panel of large US firms over a period of 20 years ( ): [1] "General Motors" "US Steel" "General Electric" [4] "Chrysler" "Atlantic Refining" "IBM" [7] "Union Oil" "Westinghouse" "Goodyear" [10] "Diamond Match" Later used for illustrations in econometric methodology, notably panel and SUR models. Used in numerous textbooks, including Maddala (1977): Econometrics (10 firms) Greene (2003): Econometric Analysis, 5e (5 firms) Greene (2008): Econometric Analysis, 6e (10 firms) Baltagi (2008): Econometric Analysis of Panel Data, 4e (10 firms) In fact, there are 11 firms... and also more data (years for some firms). Complete and correct data available in R package AER, accompanying Kleiber and Zeileis (2008): Applied Econometrics with R. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
16 Data problems: Grunfeld data Theil Fomby, Hill, Johnson Grunfeld select 10 Boot, de Wit (all but AS) selects 2 (GE, WH) select 3 (WH, GE, GM) 2 errors Maddala Vinod, Ullah 2 errors + selects 5 (GM, US, GE, CH, WH) 1 error? Griffiths, Hill, Judge Hill, Griffiths, Lim Greene/1st Greene/5th Baltagi/Econ Baltagi/Panel Greene/6th Kleiber, Zeileis Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
17 Complete separation Example: Data from Maddala GS (2001). Introduction to Econometrics, 3rd ed, J. Wiley. Data on 44 US states for Variables are rate Murder rate per 100,000 (FBI estimate, 1950). convictions No. of convictions divided by no. of murders in executions Average number of executions during divided by convictions in time Median time served (in months) of convicted murderers released in income Median family income in 1949 (in 1,000 USD). lfp Labor force participation rate in 1950 (in percent). noncauc Proportion of population that is non-caucasian in southern Region (factor). Stokes H (2004). On the advantage of using two or more econometric software systems to solve the same problem. J Economic and Social Measurement, 29, Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
18 Complete separation Problem: Coefficient on southern somewhat unusual... Logit model estimated using defaults: Estimate Std. Error z value Change of convergence controls: Reason: no yes FALSE 9 0 TRUE Estimate Std. Error z value quasi-complete separation Hence MLE does not exist... Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
19 Count data regression Example: Recreation demand Cross-sectional data (n = 659) on the number of recreational boating trips to Lake Somerville, TX, in 1980, based on a survey administered to 2,000 registered leisure boat owners in 23 counties in eastern Texas. Variable Description trips Number of recreational boating trips. quality Facility s subjective quality ranking on scale 1 5. ski Was the individual engaged in water-skiing? income Annual household income (in 1,000 USD). userfee Did the owner pay an annual user fee at Lake Somerville? costc Expenditure when visiting Lake Conroe (in USD). costs Expenditure when visiting Lake Somerville (in USD). costh Expenditure when visiting Lake Houston (in USD). Data are used in various publications, among them Sellar, Stoll and Chavas, Land Economics 1985 Ozuna and Gomez, Empirical Economics 1995 Gurmu and Trivedi, J Business and Economic Statistics 1996 Cameron and Trivedi, Regression Models for Count Data, CUP 2013 Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
20 Count data regression trips trips Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
21 Count data regression Source: Ozuna and Gomez, Specification and Testing of Count Data Recreation Demand Functions, Empirical Economics Methodology: Poisson and negative binomial regression. Footnote says: It should be noted that one of the anonymous referees re-estimated the models used in this study using the same data set and he obtained different parameter estimates. The referee and the authors of this article agreed that the problem was in the software used to estimate the models. The referee used LIMDEP 6.0 for the Poisson and MICROFIT 3.0 for the NLS models whereas the authors used GAUSS 3.0. This is an important observation since the parameter estimates affect consumer surplus. Researchers should thus be cautious of the software they use to estimate the models. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
22 Summary Summary of problems: Breakpoint estimation: closed source, coding errors Grunfeld: ancient data Complete separation in binary response: statistical and software issues Complete separation in count data: statistical and computational issues Implicit issues: In econometrics (and other social sciences?), software development is often considered as a subsidiary activity. By implication, the established econometrics journals currently do not publish papers on software development. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
23 Some suggestions: Authors How to improve on the current situation? Authors: release data and code If journal does not have an archive: use e.g. RePEc code archives. publish case studies that document problems Journals: mandatory archives for data and code (ideally, an editorial function) require data and code already at submission publish case studies, replications, etc. Instructors: use archives in teaching all of this also applies to computational economics and computational social science Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
24 Some suggestions: Technology Technology for econom(etr)ics: Version control system (svn, Dropbox,...) Data in.txt/.csv (no proprietary formats please) L A TEX Statistical software... Most of these tools were unavailable 50 (or 40, 30, 20, 10...) years ago, but now the technology is available and we should use it. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
25 Some suggestions: Technology Computational tools for reproducible research: Desirable: more than data and code fully replicable analyses. One solution: Example: literate programming R function Sweave() combines R and L A TEX See Leisch (2002) for more information. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
26 Some suggestions: Technology Sweave() example: (source code from Zeileis and Kleiber, 2005)... Confidence intervals for the breakpoints can be computed from the fitted \texttt{bp.ri} object for any number of breaks (smaller than the maximal number of breaks admissible) using the \texttt{confint} method from \texttt{strucchange}. A function for estimating the covariance matrix, here \texttt{kernhac}, may again be supplied. <<eval=true, echo=false, results=hide>>= library("strucchange") data("realint") bp.ri <- breakpoints(realint ~ 1, h = 15) cis <- confint(bp.ri, breaks = 3, vcov = This returns the breakpoints and corresponding confidence intervals (at the default 95\% level) coded by... Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
27 Some suggestions: Computational tools Computational tools section from Kleiber and Zeileis (2013): Our results were obtained using R with the packages strucchange 1.4-6, and lattice and were identical on various platforms including PCs running Debian GNU/Linux (with a amd64 kernel) and Mac OS X, version Normal random variables were generated from uniform random numbers obtained by the Mersenne Twister currently R s default generator by means of the inversion method. The random seed and further technical details are available in the code supplementing this paper. Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
28 References Econom(etr)ics: Anderson RD, Greene WH, McCullough BD, Vinod HD (2008). The role of data/code archives in the future of economic research. J Economic Methodology, 15(1), Kleiber C, Zeileis A (2010). The Grunfeld data at 50. German Economic Review, 11(4), Kleiber C, Zeileis A (2013). Reproducible econometric simulations. J Econometric Methods, 2(1), Lovell MC, Selover DD (1994). Econometric software accidents. Economic J, 104, McCullough BD, Vinod HD (1999). The numerical reliability of econometric software. J Economic Literature, 37, McCullough BD, Vinod HD (2003). Verifying the solution from a nonlinear solver: A case study. American Economic Review, 93, Newbold P, Agiakloglou C, Miller J (1994). Adventures with ARIMA software. International J Forecasting, 10, Zeileis A, Kleiber C (2005). Validating multiple structural change models A case study. J Applied Econometrics, 20, Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
29 References Other fields: Buckheit JB, Donoho DL (1995). WaveLab and reproducible research. Dept. of Statistics, Stanford University, Tech. Rep Donoho DL, Maleki A, Shahram M, Rahman I, Stodden V (2009). Reproducible research in computational harmonic analysis. Computing in Science & Engineering, 11(1), Donoho D (2010). An invitation to reproducible computational research. Biostatistics, 11(3), Ince DC, Hatton L, Graham-Cumming J (2012). The case for open computer programs. Nature, 482, Leisch F (2002). Sweave: Dynamic generation of statistical reports using literate data analysis. In Härdle W, Rönz B (eds.), Compstat 2002 Proc. in Computational Statistics, pp Physica Verlag, Heidelberg. Peng RD, Dominici F, Zeger SL (2006). Reproducible epidemiologic research. American J Epidemiology, 163, Vandewalle P, Kovacevic J, Vetterli M (2009). Reproducible research in signal processing. IEEE Signal Processing Magazine, 26(3), Christian Kleiber (Universität Basel) The Importance of Reproducible Research DART Workshop, Berne, / 29
On Reproducible Econometric Research
On Reproducible Econometric Research Achim Zeileis http://eeecon.uibk.ac.at/~zeileis/ Overview Joint work with Roger Koenker (University of Urbana-Champaign). Koenker R, Zeileis A (2009). On Reproducible
Monitoring Structural Change in Dynamic Econometric Models
Monitoring Structural Change in Dynamic Econometric Models Achim Zeileis Friedrich Leisch Christian Kleiber Kurt Hornik http://www.ci.tuwien.ac.at/~zeileis/ Contents Model frame Generalized fluctuation
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
Interacting with local and remote data repositories using the stashr package
Computational Statistics DOI 10.1007/s00180-008-0124-x ORIGINAL PAPER Interacting with local and remote data repositories using the stashr package Sandrah P. Eckel Roger D. Peng Received: 14 March 2007
for an appointment, e-mail [email protected]
M.Sc. in Economics Department of Economics, University College London Econometric Theory and Methods (G023) 1 Autumn term 2007/2008: weeks 2-8 Jérôme Adda for an appointment, e-mail [email protected] Introduction
How To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Implementing a Class of Structural Change Tests: An Econometric Computing Approach
Implementing a Class of Structural Change Tests: An Econometric Computing Approach Achim Zeileis http://www.ci.tuwien.ac.at/~zeileis/ Contents Why should we want to do tests for structural change, econometric
Maximum likelihood estimation of mean reverting processes
Maximum likelihood estimation of mean reverting processes José Carlos García Franco Onward, Inc. [email protected] Abstract Mean reverting processes are frequently used models in real options. For
ECON 523 Applied Econometrics I /Masters Level American University, Spring 2008. Description of the course
ECON 523 Applied Econometrics I /Masters Level American University, Spring 2008 Instructor: Maria Heracleous Lectures: M 8:10-10:40 p.m. WARD 202 Office: 221 Roper Phone: 202-885-3758 Office Hours: M W
Hailong Qian. Department of Economics John Cook School of Business Saint Louis University 3674 Lindell Blvd, St. Louis, MO 63108, USA qianh@slu.
Hailong Qian Department of Economics John Cook School of Business Saint Louis University 3674 Lindell Blvd, St. Louis, MO 63108, USA [email protected] FIELDS OF INTEREST Theoretical and Applied Econometrics,
Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs
Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens 2 Aug 2014 Abstract It is common in regression discontinuity analysis to control for high order
Data Availability Policies & Author Responsibility Policies Time of Evaluation: May 2014
Data policies found in a sample of 346 journals in economic sciences Data Availability Policies & Author Responsibility Policies Time of Evaluation: May 2014 Table of Contents: Data Availability Policies:...
Automatic Generation of Simple (Statistical) Exams
Automatic Generation of Simple (Statistical) Exams Bettina Grün, Achim Zeileis http://statmath.wu-wien.ac.at/ Overview Introduction Challenges Solution implemented in the R package exams Exercises Combining
Master programme in Statistics
Master programme in Statistics Björn Holmquist 1 1 Department of Statistics Lund University Cramérsällskapets årskonferens, 2010-03-25 Master programme Vad är ett Master programme? Breddmaster vs Djupmaster
VI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
Department of Economics
Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional
Comparison of resampling method applied to censored data
International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison
Statistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
Curriculum Vitae Richard A. L. Carter
Curriculum Vitae Richard A. L. Carter January 25, 2011 Personal Office Addresses: Department of Economics University of Western Ontario London, Ontario N6A 5C2 Department of Economics University of Calgary
Nicholas J. Gonedes. 1971/1972: Graduate School of Industrial Administration, Carnegie-Mellon University.
Nicholas J. Gonedes Positions Assistant Professor of Accounting, Graduate School of Business, University of Chicago; September 1969 August 1974. Associate Professor of Accounting, Graduate School of Business,
ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL DATA FROM NORTH CAROLINA BADI H. BALTAGI*
JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 21: 543 547 (2006) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/jae.861 ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL
The Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report
Statistical Methods for research in International Relations and Comparative Politics
James Raymond Vreeland Dept. of Political Science Assistant Professor Yale University E-Mail: [email protected] Room 300 Tel: 203-432-5252 124 Prospect Avenue Office hours: Wed. 10am to 12pm New
RUNNING HEAD: FAFSA lists 1
RUNNING HEAD: FAFSA lists 1 Strategic use of FAFSA list information by colleges Stephen R. Porter Department of Leadership, Policy, and Adult and Higher Education North Carolina State University Raleigh,
A Case Study in Software Enhancements as Six Sigma Process Improvements: Simulating Productivity Savings
A Case Study in Software Enhancements as Six Sigma Process Improvements: Simulating Productivity Savings Dan Houston, Ph.D. Automation and Control Solutions Honeywell, Inc. [email protected] Abstract
Minimum LM Unit Root Test with One Structural Break. Junsoo Lee Department of Economics University of Alabama
Minimum LM Unit Root Test with One Structural Break Junsoo Lee Department of Economics University of Alabama Mark C. Strazicich Department of Economics Appalachian State University December 16, 2004 Abstract
Testing for Granger causality between stock prices and economic growth
MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Health Policy and Administration PhD Track in Health Services and Policy Research
Health Policy and Administration PhD Track in Health Services and Policy INTRODUCTION The Health Policy and Administration (HPA) Division of the UIC School of Public Health offers a PhD track in Health
Organizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
Teaching model: C1 a. General background: 50% b. Theory-into-practice/developmental 50% knowledge-building: c. Guided academic activities:
1. COURSE DESCRIPTION Degree: Double Degree: Derecho y Finanzas y Contabilidad (English teaching) Course: STATISTICAL AND ECONOMETRIC METHODS FOR FINANCE (Métodos Estadísticos y Econométricos en Finanzas
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers
Keep It Simple: Easy Ways To Estimate Choice Models For Single Consumers Christine Ebling, University of Technology Sydney, [email protected] Bart Frischknecht, University of Technology Sydney,
CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University
Practical I conometrics data collection, analysis, and application Christiana E. Hilmer Michael J. Hilmer San Diego State University Mi Table of Contents PART ONE THE BASICS 1 Chapter 1 An Introduction
EDMS 769L: Statistical Analysis of Longitudinal Data 1809 PAC, Th 4:15-7:00pm 2009 Spring Semester
Instructor Dr. Jeffrey Harring 1230E Benjamin Building Phone: (301) 405-3630 Email: [email protected] Office Hours Tuesday 2:00-3:00pm, or by appointment Course Objectives, Description and Prerequisites
[This document contains corrections to a few typos that were found on the version available through the journal s web page]
Online supplement to Hayes, A. F., & Preacher, K. J. (2014). Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, 67,
F nest. Monte Carlo and Bootstrap using Stata. Financial Intermediation Network of European Studies
F nest Financial Intermediation Network of European Studies S U M M E R S C H O O L Monte Carlo and Bootstrap using Stata Dr. Giovanni Cerulli 8-10 October 2015 University of Rome III, Italy Lecturer Dr.
The frequency of visiting a doctor: is the decision to go independent of the frequency?
Discussion Paper: 2009/04 The frequency of visiting a doctor: is the decision to go independent of the frequency? Hans van Ophem www.feb.uva.nl/ke/uva-econometrics Amsterdam School of Economics Department
Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University
Machine Learning Methods for Causal Effects Susan Athey, Stanford University Guido Imbens, Stanford University Introduction Supervised Machine Learning v. Econometrics/Statistics Lit. on Causality Supervised
How To Close The Loop On A Fully Differential Op Amp
Application Report SLOA099 - May 2002 Fully Differential Op Amps Made Easy Bruce Carter High Performance Linear ABSTRACT Fully differential op amps may be unfamiliar to some designers. This application
U.S DEPARTMENT OF COMMERCE
Alaska Fisheries Science Center National Marine Fisheries Service U.S DEPARTMENT OF COMMERCE AFSC PROCESSED REPORT 2013-01 RMark: An R Interface for Analysis of Capture-Recapture Data with MARK March 2013
Calculating the Probability of Returning a Loan with Binary Probability Models
Calculating the Probability of Returning a Loan with Binary Probability Models Associate Professor PhD Julian VASILEV (e-mail: [email protected]) Varna University of Economics, Bulgaria ABSTRACT The
Chapter 1 Introduction. 1.1 Introduction
Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations
Econometrics and Data Analysis I
Econometrics and Data Analysis I Yale University ECON S131 (ONLINE) Summer Session A, 2014 June 2 July 4 Instructor: Doug McKee ([email protected]) Teaching Fellow: Yu Liu ([email protected]) Classroom:
PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050
PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050 Class Hours: 2.0 Credit Hours: 3.0 Laboratory Hours: 2.0 Date Revised: Fall 2013 Catalog Course Description: Descriptive
INTERNATIONAL UNIVERSITY OF JAPAN Public Management and Policy Analysis Program Graduate School of International Relations
INTERNATIONAL UNIVERSITY OF JAPAN Public Management and Policy Analysis Program Graduate School of International Relations ADC6512 Topics in Data Analysis (Panel Data Models Using Stata) (2 Credits) Winter
A spreadsheet Approach to Business Quantitative Methods
A spreadsheet Approach to Business Quantitative Methods by John Flaherty Ric Lombardo Paul Morgan Basil desilva David Wilson with contributions by: William McCluskey Richard Borst Lloyd Williams Hugh Williams
11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
Simulation and Risk Analysis
Simulation and Risk Analysis Using Analytic Solver Platform REVIEW BASED ON MANAGEMENT SCIENCE What We ll Cover Today Introduction Frontline Systems Session Ι Beta Training Program Goals Overview of Analytic
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
Flood Risk Analysis considering 2 types of uncertainty
US Army Corps of Engineers Institute for Water Resources Hydrologic Engineering Center Flood Risk Analysis considering 2 types of uncertainty Beth Faber, PhD, PE Hydrologic Engineering Center (HEC) US
From the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
REPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
Sample Size Designs to Assess Controls
Sample Size Designs to Assess Controls B. Ricky Rambharat, PhD, PStat Lead Statistician Office of the Comptroller of the Currency U.S. Department of the Treasury Washington, DC FCSM Research Conference
Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes
Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA
Note 2 to Computer class: Standard mis-specification tests
Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the
SOFTWARE PERFORMANCE EVALUATION ALGORITHM EXPERIMENT FOR IN-HOUSE SOFTWARE USING INTER-FAILURE DATA
I.J.E.M.S., VOL.3(2) 2012: 99-104 ISSN 2229-6425 SOFTWARE PERFORMANCE EVALUATION ALGORITHM EXPERIMENT FOR IN-HOUSE SOFTWARE USING INTER-FAILURE DATA *Jimoh, R. G. & Abikoye, O. C. Computer Science Department,
health economics and policy
International doctoral courses and seminars in health economics and policy Advanced education in health economics and policy for PhD students Offered by theswiss School of Public Health+ University of
An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.
An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao
Clustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
Statistical Functions in Excel
Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.
APPENDIX 15. Review of demand and energy forecasting methodologies Frontier Economics
APPENDIX 15 Review of demand and energy forecasting methodologies Frontier Economics Energex regulatory proposal October 2014 Assessment of Energex s energy consumption and system demand forecasting procedures
R: A Free Software Project in Statistical Computing
R: A Free Software Project in Statistical Computing Achim Zeileis Institut für Statistik & Wahrscheinlichkeitstheorie http://www.ci.tuwien.ac.at/~zeileis/ Acknowledgments Thanks: Alex Smola & Machine Learning
Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School
Duration Analysis Econometric Analysis Dr. Keshab Bhattarai Hull Univ. Business School April 4, 2011 Dr. Bhattarai (Hull Univ. Business School) Duration April 4, 2011 1 / 27 What is Duration Analysis?
STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs.
STATISTICS Statistics is one of the natural, mathematical, and biomedical sciences programs in the Columbian College of Arts and Sciences. The curriculum emphasizes the important role of statistics as
Local Government Information Security Risk in the Age of E-Government. Eunjung Shin Lauren N. Bowman PhD Students. Eric Welch Associate Professor
Introduction Local Government Information Security Risk in the Age of E-Government Eunjung Shin Lauren N. Bowman PhD Students Eric Welch Associate Professor Department of Public Administration Science,
The Impact of Release Management and Quality Improvement in Open Source Software Project Management
Applied Mathematical Sciences, Vol. 6, 2012, no. 62, 3051-3056 The Impact of Release Management and Quality Improvement in Open Source Software Project Management N. Arulkumar 1 and S. Chandra Kumramangalam
How To Get A Degree In Economics At The University Of Houston
UNIVERSITY OF HOUSTON GRADUATE STUDY IN ECONOMICS The Department of Economics offers a program leading to the Ph.D. degree in Economics designed to provide students rigorous training in economic theory
PROBABILITY AND STATISTICS. Ma 527. 1. To teach a knowledge of combinatorial reasoning.
PROBABILITY AND STATISTICS Ma 527 Course Description Prefaced by a study of the foundations of probability and statistics, this course is an extension of the elements of probability and statistics introduced
CV of Dr. Joachim Schnurbus
CV of Dr. Joachim Schnurbus June 21, 2016 1 Personal and contact Born on July 30, 1979 in Selb, Germany Email: [email protected] Fon: +49 851 509 2563 Fax: +49 851 509 2562 2 Education Aug.
Online Appendix Assessing the Incidence and Efficiency of a Prominent Place Based Policy
Online Appendix Assessing the Incidence and Efficiency of a Prominent Place Based Policy By MATIAS BUSSO, JESSE GREGORY, AND PATRICK KLINE This document is a Supplemental Online Appendix of Assessing the
Probability and Statistics
Probability and Statistics Syllabus for the TEMPUS SEE PhD Course (Podgorica, April 4 29, 2011) Franz Kappel 1 Institute for Mathematics and Scientific Computing University of Graz Žaneta Popeska 2 Faculty
QMB 3302 Business Analytics CRN 10251 Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209
QMB 3302 Business Analytics CRN 10251 Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209 Elias T. Kirche, Ph.D. Associate Professor Department of Information Systems and Operations Management Lutgert
Elements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
Fixed Effects Bias in Panel Data Estimators
DISCUSSION PAPER SERIES IZA DP No. 3487 Fixed Effects Bias in Panel Data Estimators Hielke Buddelmeyer Paul H. Jensen Umut Oguzoglu Elizabeth Webster May 2008 Forschungsinstitut zur Zukunft der Arbeit
Chapter 11 Introduction to Survey Sampling and Analysis Procedures
Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152
MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics
MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title- course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.
Department of Epidemiology and Public Health Miller School of Medicine University of Miami
Department of Epidemiology and Public Health Miller School of Medicine University of Miami BST 630 (3 Credit Hours) Longitudinal and Multilevel Data Wednesday-Friday 9:00 10:15PM Course Location: CRB 995
The Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Teaching Statistics with Fathom
Teaching Statistics with Fathom UCB Extension X369.6 (2 semester units in Education) COURSE DESCRIPTION This is a professional-level, moderated online course in the use of Fathom Dynamic Data software
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes
Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes Achim Zeileis http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation Exchange rate regimes Exchange rate regression What is the
