Starting values for the iterative maximum likelihood estimator in survival analysis

Similar documents
Published online: 17 Jun 2010.

Daring Greatly: How the Courage to Be Vulnerable Transforms the Way We Live, Love, Parent, and Lead. Click for updates

Online publication date: 19 May 2010 PLEASE SCROLL DOWN FOR ARTICLE

The Prevalence and Prevention of Crosstalk: A Multi-Institutional Study

Using Learning from Work for Progression to Higher Education: a degree of experience

Statistics in Retail Finance. Chapter 6: Behavioural models

Rens van de Schoot a b, Peter Lugtig a & Joop Hox a a Department of Methods and Statistics, Utrecht

PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use:

How To Understand The History Of Part Time Business Studies

Online publication date: 15 March 2010

NASPE Sets the Standard

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Nilpotent Lie and Leibniz Algebras

GLM, insurance pricing & big data: paying attention to convergence issues.

California Published online: 09 Jun 2014.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Business Security Architecture: Weaving Information Security into Your Organization's Enterprise Architecture through SABSA

Beijing, China b CMOE Key Laboratory of Petroleum Engineering in China University

PLEASE SCROLL DOWN FOR ARTICLE

Least Squares Estimation

STA 4273H: Statistical Machine Learning

SUMAN DUVVURU STAT 567 PROJECT REPORT

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Linear Threshold Units

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

From the help desk: Bootstrapped standard errors

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

PLEASE SCROLL DOWN FOR ARTICLE

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Lecture 15 Introduction to Survival Analysis

Statistics in Retail Finance. Chapter 2: Statistical models of default

Regression Modeling Strategies

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

The Probit Link Function in Generalized Linear Models for Data Mining Applications

A Basic Introduction to Missing Data

Parametric Survival Models

Decomposing Mortgage Portfolio Risk: Default, Prepayment, and Severity YUFENG DING

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

CSCI567 Machine Learning (Fall 2014)

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction to Logistic Regression

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Distribution (Weibull) Fitting

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Nordic Institute for Studies in Innovation, Research and Evaluation, Online publication date: 10 February 2011

171:290 Model Selection Lecture II: The Akaike Information Criterion

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Predicting Customer Default Times using Survival Analysis Methods in SAS

Classification Problems

PLEASE SCROLL DOWN FOR ARTICLE

Multivariate Normal Distribution

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, Hull Univ. Business School

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

Statistical Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Numerical Analysis An Introduction

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Lecture 3: Linear methods for classification

Probabilistic user behavior models in online stores for recommender systems

Statistical Machine Translation: IBM Models 1 and 2

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Reject Inference in Credit Scoring. Jie-Men Mok

Maximum likelihood estimation of mean reverting processes

Cameron M. Weber a a New School for Social Research, New York, USA. Available online: 25 Oct 2011

Outline. Generalize Simple Example

Full terms and conditions of use:

Gamma Distribution Fitting

Firm and Product Life Cycles and Firm Survival

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

LONG TERM FOREIGN CURRENCY EXCHANGE RATE PREDICTIONS

7.1 The Hazard and Survival Functions

The Method of Least Squares

Online publication date: 20 November 2009

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS

Causes of Inflation in the Iranian Economy

Lecture 6: Logistic Regression

Logistic Regression (1/24/13)

PS 271B: Quantitative Methods II. Lecture Notes

A random point process model for the score in sport matches

Stress Test Modeling in Cards Portfolios

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Comparison of resampling method applied to censored data

5 Numerical Differentiation

Package depend.truncation

SAS Software to Fit the Generalized Linear Model

Linear Classification. Volker Tresp Summer 2015

A Log-Robust Optimization Approach to Portfolio Management

Christfried Webers. Canberra February June 2015

Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Problem of Missing Data

Transcription:

This article was downloaded by: [Hogskolan Dalarna] On: 15 October 2014, At: 22:09 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjas20 Starting values for the iterative maximum likelihood estimator in survival analysis Kenneth Carling a a Department of Statistics, Uppsala University Published online: 05 Jun 2011. To cite this article: Kenneth Carling (1995) Starting values for the iterative maximum likelihood estimator in survival analysis, Journal of Applied Statistics, 22:4, 531-535, DOI: 10.1080/757584789 To link to this article: http://dx.doi.org/10.1080/757584789 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content ) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/ terms-and-conditions

Journal of Applied Statistics, Vol. 22, No. 4, 1995 Starting values for the iterative maximum likelihood estimator in survival analysis Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 KENNETH CARLING, Department of Statistics, Uppsala University SUMMARY Maximum likelihood estimation of parametric or semi-parametric proportional hazard models requires an iterative procedure, since closed-form solutions are difficult to come by, because of non-linearities. Here, I propose an approximate maximum likelihood (AML) estimator,, in closed form, for obtaining a vector of starting values. Program packages such as GAUSS and SAS typically use the ordinary least-squares estimate as the starting vector. Unfortunately, this estimator is biased under censoring, so the starting values can yield slow convergence and even a local rather than a global maximum solution. The AML estimates, however, are excellent starting values and are just as easily calculated. 1 Introduction Micro-econometric studies of unemployment duration make frequent use of an underlying model formulated in terms of a conditional proportional hazard model (see, for example, Kiefer, 1988). As a result of the non-linearity of the likelihood function, iterative procedures are used to obtain the maximum likelihood (ML) estimator. The chance of finding the global maximum solution hinges on the appropriateness of the starting vector. A bad starting vector, such as the naive vector of zeros, will at best yield the global maximum, but only after slow convergence. At worst, the iterative procedure will not converge or will converge to a local maximum. A starting vector may be obtained from the ordinary least-squares (OLS) methods; program packages such as GAUSS (EXPON, EXPGAM, etc.) and SAS (LIFEREG) typically use OLS estimates as the starting vector. However, in the presence of censored observations (which is almost always the case), the OLS 0266-4763/95/040531-05 01995 Journals Oxford Ltd

Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 estimator is biased and, in practice, often severely so, as a result of a large proportion of censored spells. Hence, the OLS method may not provide an acceptable starting vector. Moreover, for problems that involve semi-parametric models and models with time-varying covariates, packages such as those mentioned above may be of limited usefulness. Under such circumstances, the investigator may rely on pre-programmed routines for maximizing the likelihood function, such as MAXLIK in GAUSS. MAXLIK requires the log-likelihood function, while it is (for faster convergence) optional to supply analytical gradients. However, analytical gradients can be intractable and the investigator may have to resort to some numerical method. It is then critical for fast convergence that the starting values are such that the likelihood function is close to its maximum. For this reason, I suggest an easy-tocalculate closed-form approximate ML (AML) estimator as an alternative to the inadequate OLS estimator. The next section concerns the notation used and a presentation of the AML estimator. An example that demonstrates how the AML method outperforms the OLS method when estimating two unemployment duration models closes the paper. 2 An AML estimator Let the random variable Ti denote the time to employment for an unemployed individual i, and let xi (1 x k) be hidher covariate vector and b (1 x k) the vector of unknown parameters to be estimated. I assume the continuous time proportional hazard model, i.e. i,,(t; xi, b) = exp (xibl)ilo(t) (1) where?&,(t) is the baseline hazard, common to all individuals. The sample consists of n individuals, of which, without loss of generality, the first n, individuals are not censored. The log-likelihood function can then be written as The investigator may choose to assert some parametric model, such as a Weibull model with Ao(t)= at"-'. However, there has been recent interest in semiparametric models. For instance, Narendranathan and Stewart (1993) outlined a method to estimate the baseline hazard non-parametrically. In essence, they assumed a piecewise constant hazard and estimate A,(t) along with b. Interestingly, Bergstrom and Edin (1992) gave results that indicate that the estimate of b is rather robust to the functional form of the baseline hazard. Therefore, when estimating starting values, it is sufficient and convenient to assume a constant hazard, i.e. to assume that T follows the exponential distribution, with the implication that?,,(t) = 1. It follows that equation (1) can be written as a regression model, i.e. where E follows the extreme value distribution (see, for example, Kiefer, 1988). Now, the OLS estimator, not adjusting for censoring, is easily obtained as

a Iterative maximum likelihood estimation 533 b,,, = [(XIX)-'Xf(ln(t)- E[&])I1 (3) where X is the (n x k) covariate matrix, ln(t) is an (n x 1) vector of log durations in time (observed or censored) and E[E] = - 0.5772. This is the estimator used by default in GAUSS and SAS for obtaining starting values. The shortcoming of the OLS estimator under censoring has long been recognized and various modifications have been suggested, often by applying EM algorithms (for a survey, see Brannas (1992)). However, the EM algorithm in itself requires iterations, so it is not suitable for calculating a starting vector. An alternative is to consider an approximation of the likelihood function, i.e. the AML estimator. First, some useful notation is Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 The non-linearity of the likelihood function in equation (2) is due to the term exp (xb'). A second-order Taylor expansion yields exp (xb') = 1 + xb' + t(~b')~ The choice of expanding only up to a second-order term is justified by the small absolute values of the parameters and the small variability of the covariates usually at hand. The approximation is improved by scaling t by a constant, so that Cixibl=O. Substituting this approximation into the likelihood function and maximizing with respect to b yields the AML estimator as This estimator takes explicit account of the censored durations, so will perform better than the corresponding OLS estimator. It should be noted that the idea of calculating starting values from a second-order approximation of the likelihood function resembles the idea of the Newton-Raphson method.

Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 3 Example To illustrate the usefulness of the AML estimator, I compare, with respect to number of iterations and system computing time, starting vectors from methods using AMI,, OLS and a naive vector of zeros. The starting values are applied to two models previously estimated in a study of unemployment duration (see Carling et al., 1994). The data set is a flow sample of 12098 unemployed people, of which 54% are censored spells. The first model is an exponential model without time-varying covariates. A program is written in GAUSS that makes use of the MAXLIK routine. In this program, no analytical gradients are supplied and the BFGS algorithm is applied, since it is expected to yield the fastest convergence (see Aptech, 1992). From this, 12 parameters are estimated and the results are presented in Table 1. The second model is a semi-parametric model, in line with Narendranathan and Stewart (1993), including time-varying covariates. As noted earlier, the parameter vector b is simultaneously estimated along with?&,(t). It is assumed that i.,(t) is piecewise constant, i.e. within each small time-interval, the hazard is constant, while it varies across time-intervals. The time unit used in this example is 4 weeks and spells exceeding 72 weeks are treated as being right censored. The explicit form of the likelihood function being maximized has been given by Narendranathan and Stewart (1993). In total, 31 parameters are estimated. In this program, which also makes use of MAXLIK, analytical gradients are suplied. For this reason, the BHHH algorithm is chosen. For both models, STEPBT is used as the line search method. In Table 1, the AML and OLS starting vectors are shown for the exponential and the semi-parametric models, together with the ML estimates. The AML Variable Constant Age 16-24 Age 25-34 Age 35-44 Female Foreign citizen Some work experience Long work experience I-ligh school University Unemployment income Cash benefits 1-ocal unemployn~ent rate Local programme offer rate 'I.:IHI.I: 1. h1l estimates and starting values Starting vector Exponential Semi-parametric model model Naive 01s AML ML 1\11, Exponential model Semi-parametric model Naive OIS AML Naive OLS AML Number of iterations 66 60 43 24 6 4 System time (min) 13.08 11.90 8.52 17.35 11.83 7.92

Iterative maximum likelihood estimation 5 3 5 starting vector does a very good job, while the OLS starting vector less so. The number of iterations and system time required for estimation of the models make it clear that it is worthwhile to use a more refined starting vector. Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 ' 4 Conclusions In this paper, an AML estimator is proposed. It yields better starting values for solving the non-linear likelihood function that arises from the proportional hazard model. The virtue of this estimator is that it is easy to calculate and takes explicit account of censored observations. In an example, this estimator outperforms the OLS and naive starting vectors. Correspondence: Kenneth Carling, Department of Statistics, Uppsala University, PO Box 5 13, S-75 1 20 Uppsala, Sweden. REFERENCES AI~Y~CII (1992) GAUSS3.0Applications, MAXLIK (Maple Valley, WA, Aptech Systems). BI.:KGS.I.K~M, R. & ELIIN, P.-A. (1992) Time aggregation and the distributional shape of unemployment duration, Journal ofapplied Econometrics, 7, pp. 5-30. BKANNAS, K. (1992) Econometrics of the accelerated duration model, Umed Economic Studies, 269, University of Umei. CAKI.ING, K., EDIN, P.-A., HAKKMAN, A. & HOI.MI.UNI), B. (1994) Unemployment duration, unemployment benefits, and labor market programs in Sweden, Journal of Public Economics (in press). KIEFEK, N. M. (1988) Economic duration data and the hazard functions, Journal of Economic Literature, 26, pp. 646-679. NI\KENDKANK~IIAN, W. & STEWAK.~, M. B. (1993) Modelling the probability of leaving unemployment: competing risks models with flexible base-line hazards, Applied Statistics, 42, pp. 63-83. SAS (1990) SAS/STAT User's Guide, Version 6, Lith edn, Vol. 2 (Cary, NC, SAS Institute).