Starting values for the iterative maximum likelihood estimator in survival analysis

Size: px
Start display at page:

Download "Starting values for the iterative maximum likelihood estimator in survival analysis"

Transcription

1 This article was downloaded by: [Hogskolan Dalarna] On: 15 October 2014, At: 22:09 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: Registered office: Mortimer House, Mortimer Street, London W1T 3JH, UK Journal of Applied Statistics Publication details, including instructions for authors and subscription information: Starting values for the iterative maximum likelihood estimator in survival analysis Kenneth Carling a a Department of Statistics, Uppsala University Published online: 05 Jun To cite this article: Kenneth Carling (1995) Starting values for the iterative maximum likelihood estimator in survival analysis, Journal of Applied Statistics, 22:4, , DOI: / To link to this article: PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content ) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at terms-and-conditions

2 Journal of Applied Statistics, Vol. 22, No. 4, 1995 Starting values for the iterative maximum likelihood estimator in survival analysis Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 KENNETH CARLING, Department of Statistics, Uppsala University SUMMARY Maximum likelihood estimation of parametric or semi-parametric proportional hazard models requires an iterative procedure, since closed-form solutions are difficult to come by, because of non-linearities. Here, I propose an approximate maximum likelihood (AML) estimator,, in closed form, for obtaining a vector of starting values. Program packages such as GAUSS and SAS typically use the ordinary least-squares estimate as the starting vector. Unfortunately, this estimator is biased under censoring, so the starting values can yield slow convergence and even a local rather than a global maximum solution. The AML estimates, however, are excellent starting values and are just as easily calculated. 1 Introduction Micro-econometric studies of unemployment duration make frequent use of an underlying model formulated in terms of a conditional proportional hazard model (see, for example, Kiefer, 1988). As a result of the non-linearity of the likelihood function, iterative procedures are used to obtain the maximum likelihood (ML) estimator. The chance of finding the global maximum solution hinges on the appropriateness of the starting vector. A bad starting vector, such as the naive vector of zeros, will at best yield the global maximum, but only after slow convergence. At worst, the iterative procedure will not converge or will converge to a local maximum. A starting vector may be obtained from the ordinary least-squares (OLS) methods; program packages such as GAUSS (EXPON, EXPGAM, etc.) and SAS (LIFEREG) typically use OLS estimates as the starting vector. However, in the presence of censored observations (which is almost always the case), the OLS /95/ Journals Oxford Ltd

3 Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 estimator is biased and, in practice, often severely so, as a result of a large proportion of censored spells. Hence, the OLS method may not provide an acceptable starting vector. Moreover, for problems that involve semi-parametric models and models with time-varying covariates, packages such as those mentioned above may be of limited usefulness. Under such circumstances, the investigator may rely on pre-programmed routines for maximizing the likelihood function, such as MAXLIK in GAUSS. MAXLIK requires the log-likelihood function, while it is (for faster convergence) optional to supply analytical gradients. However, analytical gradients can be intractable and the investigator may have to resort to some numerical method. It is then critical for fast convergence that the starting values are such that the likelihood function is close to its maximum. For this reason, I suggest an easy-tocalculate closed-form approximate ML (AML) estimator as an alternative to the inadequate OLS estimator. The next section concerns the notation used and a presentation of the AML estimator. An example that demonstrates how the AML method outperforms the OLS method when estimating two unemployment duration models closes the paper. 2 An AML estimator Let the random variable Ti denote the time to employment for an unemployed individual i, and let xi (1 x k) be hidher covariate vector and b (1 x k) the vector of unknown parameters to be estimated. I assume the continuous time proportional hazard model, i.e. i,,(t; xi, b) = exp (xibl)ilo(t) (1) where?&,(t) is the baseline hazard, common to all individuals. The sample consists of n individuals, of which, without loss of generality, the first n, individuals are not censored. The log-likelihood function can then be written as The investigator may choose to assert some parametric model, such as a Weibull model with Ao(t)= at"-'. However, there has been recent interest in semiparametric models. For instance, Narendranathan and Stewart (1993) outlined a method to estimate the baseline hazard non-parametrically. In essence, they assumed a piecewise constant hazard and estimate A,(t) along with b. Interestingly, Bergstrom and Edin (1992) gave results that indicate that the estimate of b is rather robust to the functional form of the baseline hazard. Therefore, when estimating starting values, it is sufficient and convenient to assume a constant hazard, i.e. to assume that T follows the exponential distribution, with the implication that?,,(t) = 1. It follows that equation (1) can be written as a regression model, i.e. where E follows the extreme value distribution (see, for example, Kiefer, 1988). Now, the OLS estimator, not adjusting for censoring, is easily obtained as

4 a Iterative maximum likelihood estimation 533 b,,, = [(XIX)-'Xf(ln(t)- E[&])I1 (3) where X is the (n x k) covariate matrix, ln(t) is an (n x 1) vector of log durations in time (observed or censored) and E[E] = This is the estimator used by default in GAUSS and SAS for obtaining starting values. The shortcoming of the OLS estimator under censoring has long been recognized and various modifications have been suggested, often by applying EM algorithms (for a survey, see Brannas (1992)). However, the EM algorithm in itself requires iterations, so it is not suitable for calculating a starting vector. An alternative is to consider an approximation of the likelihood function, i.e. the AML estimator. First, some useful notation is Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 The non-linearity of the likelihood function in equation (2) is due to the term exp (xb'). A second-order Taylor expansion yields exp (xb') = 1 + xb' + t(~b')~ The choice of expanding only up to a second-order term is justified by the small absolute values of the parameters and the small variability of the covariates usually at hand. The approximation is improved by scaling t by a constant, so that Cixibl=O. Substituting this approximation into the likelihood function and maximizing with respect to b yields the AML estimator as This estimator takes explicit account of the censored durations, so will perform better than the corresponding OLS estimator. It should be noted that the idea of calculating starting values from a second-order approximation of the likelihood function resembles the idea of the Newton-Raphson method.

5 Downloaded by [Hogskolan Dalarna] at 22:09 15 October Example To illustrate the usefulness of the AML estimator, I compare, with respect to number of iterations and system computing time, starting vectors from methods using AMI,, OLS and a naive vector of zeros. The starting values are applied to two models previously estimated in a study of unemployment duration (see Carling et al., 1994). The data set is a flow sample of unemployed people, of which 54% are censored spells. The first model is an exponential model without time-varying covariates. A program is written in GAUSS that makes use of the MAXLIK routine. In this program, no analytical gradients are supplied and the BFGS algorithm is applied, since it is expected to yield the fastest convergence (see Aptech, 1992). From this, 12 parameters are estimated and the results are presented in Table 1. The second model is a semi-parametric model, in line with Narendranathan and Stewart (1993), including time-varying covariates. As noted earlier, the parameter vector b is simultaneously estimated along with?&,(t). It is assumed that i.,(t) is piecewise constant, i.e. within each small time-interval, the hazard is constant, while it varies across time-intervals. The time unit used in this example is 4 weeks and spells exceeding 72 weeks are treated as being right censored. The explicit form of the likelihood function being maximized has been given by Narendranathan and Stewart (1993). In total, 31 parameters are estimated. In this program, which also makes use of MAXLIK, analytical gradients are suplied. For this reason, the BHHH algorithm is chosen. For both models, STEPBT is used as the line search method. In Table 1, the AML and OLS starting vectors are shown for the exponential and the semi-parametric models, together with the ML estimates. The AML Variable Constant Age Age Age Female Foreign citizen Some work experience Long work experience I-ligh school University Unemployment income Cash benefits 1-ocal unemployn~ent rate Local programme offer rate 'I.:IHI.I: 1. h1l estimates and starting values Starting vector Exponential Semi-parametric model model Naive 01s AML ML 1\11, Exponential model Semi-parametric model Naive OIS AML Naive OLS AML Number of iterations System time (min)

6 Iterative maximum likelihood estimation starting vector does a very good job, while the OLS starting vector less so. The number of iterations and system time required for estimation of the models make it clear that it is worthwhile to use a more refined starting vector. Downloaded by [Hogskolan Dalarna] at 22:09 15 October 2014 ' 4 Conclusions In this paper, an AML estimator is proposed. It yields better starting values for solving the non-linear likelihood function that arises from the proportional hazard model. The virtue of this estimator is that it is easy to calculate and takes explicit account of censored observations. In an example, this estimator outperforms the OLS and naive starting vectors. Correspondence: Kenneth Carling, Department of Statistics, Uppsala University, PO Box 5 13, S Uppsala, Sweden. REFERENCES AI~Y~CII (1992) GAUSS3.0Applications, MAXLIK (Maple Valley, WA, Aptech Systems). BI.:KGS.I.K~M, R. & ELIIN, P.-A. (1992) Time aggregation and the distributional shape of unemployment duration, Journal ofapplied Econometrics, 7, pp BKANNAS, K. (1992) Econometrics of the accelerated duration model, Umed Economic Studies, 269, University of Umei. CAKI.ING, K., EDIN, P.-A., HAKKMAN, A. & HOI.MI.UNI), B. (1994) Unemployment duration, unemployment benefits, and labor market programs in Sweden, Journal of Public Economics (in press). KIEFEK, N. M. (1988) Economic duration data and the hazard functions, Journal of Economic Literature, 26, pp NI\KENDKANK~IIAN, W. & STEWAK.~, M. B. (1993) Modelling the probability of leaving unemployment: competing risks models with flexible base-line hazards, Applied Statistics, 42, pp SAS (1990) SAS/STAT User's Guide, Version 6, Lith edn, Vol. 2 (Cary, NC, SAS Institute).

Published online: 17 Jun 2010.

Published online: 17 Jun 2010. This article was downloaded by: [Sam Houston State University] On: 07 August 2014, At: 15:09 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Daring Greatly: How the Courage to Be Vulnerable Transforms the Way We Live, Love, Parent, and Lead. Click for updates

Daring Greatly: How the Courage to Be Vulnerable Transforms the Way We Live, Love, Parent, and Lead. Click for updates This article was downloaded by: [184.100.72.114] On: 19 January 2015, At: 17:22 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,

More information

Online publication date: 19 May 2010 PLEASE SCROLL DOWN FOR ARTICLE

Online publication date: 19 May 2010 PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [Patterson, David A.] On: 19 May 2010 Access details: Access Details: [subscription number 922426156] Publisher Routledge Informa Ltd Registered in England and Wales Registered

More information

The Prevalence and Prevention of Crosstalk: A Multi-Institutional Study

The Prevalence and Prevention of Crosstalk: A Multi-Institutional Study This article was downloaded by: [65.186.78.206] On: 10 April 2014, At: 17:16 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,

More information

Using Learning from Work for Progression to Higher Education: a degree of experience

Using Learning from Work for Progression to Higher Education: a degree of experience This article was downloaded by: [148.251.235.206] On: 27 August 2015, At: 21:16 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place,

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Rens van de Schoot a b, Peter Lugtig a & Joop Hox a a Department of Methods and Statistics, Utrecht

Rens van de Schoot a b, Peter Lugtig a & Joop Hox a a Department of Methods and Statistics, Utrecht This article was downloaded by: [University Library Utrecht] On: 15 May 2012, At: 01:20 Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article was downloaded by: On: 6 January 2010 Access details: Access Details: Free Access Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

How To Understand The History Of Part Time Business Studies

How To Understand The History Of Part Time Business Studies This article was downloaded by: [148.251.235.206] On: 27 August 2015, At: 06:33 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place,

More information

Online publication date: 15 March 2010

Online publication date: 15 March 2010 This article was downloaded by: [Swets Content Distribution] On: 17 September 2010 Access details: Access Details: [subscription number 925215345] Publisher Routledge Informa Ltd Registered in England

More information

NASPE Sets the Standard

NASPE Sets the Standard This article was downloaded by: [Bowling Green SU] On: 25 March 2015, At: 09:45 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Nilpotent Lie and Leibniz Algebras

Nilpotent Lie and Leibniz Algebras This article was downloaded by: [North Carolina State University] On: 03 March 2014, At: 08:05 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

California Published online: 09 Jun 2014.

California Published online: 09 Jun 2014. This article was downloaded by: [Mr Neil Ribner] On: 10 June 2014, At: 20:58 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Business Security Architecture: Weaving Information Security into Your Organization's Enterprise Architecture through SABSA

Business Security Architecture: Weaving Information Security into Your Organization's Enterprise Architecture through SABSA This article was downloaded by: [188.204.15.66] On: 20 February 2012, At: 01:40 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Beijing, China b CMOE Key Laboratory of Petroleum Engineering in China University

Beijing, China b CMOE Key Laboratory of Petroleum Engineering in China University This article was downloaded by: [Zhejiang University On: 21 September 2014, At: 03:04 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

PLEASE SCROLL DOWN FOR ARTICLE

PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [University of Minnesota] On: 8 April 2009 Access details: Access Details: [subscription number 788736612] Publisher Taylor & Francis Informa Ltd Registered in England and

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

More information

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

PLEASE SCROLL DOWN FOR ARTICLE

PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by:[ebscohost EJS Content Distribution] On: 30 October 2007 Access Details: [subscription number 768320842] Publisher: Routledge Informa Ltd Registered in England and Wales

More information

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012] Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

More information

Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Parametric Survival Models

Parametric Survival Models Parametric Survival Models Germán Rodríguez grodri@princeton.edu Spring, 2001; revised Spring 2005, Summer 2010 We consider briefly the analysis of survival data when one is willing to assume a parametric

More information

Decomposing Mortgage Portfolio Risk: Default, Prepayment, and Severity YUFENG DING

Decomposing Mortgage Portfolio Risk: Default, Prepayment, and Severity YUFENG DING Decomposing Mortgage Portfolio Risk: Default, Prepayment, and Severity YUFENG DING NOVEMBER 19, 2010 Overview» The loss on a mortgage portfolio is an aggregate of the losses of the loans in the portfolio»

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER Objectives Introduce event history analysis Describe some common survival (hazard) distributions Introduce some useful Stata

More information

Introduction to Logistic Regression

Introduction to Logistic Regression OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA

More information

Distribution (Weibull) Fitting

Distribution (Weibull) Fitting Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

Nordic Institute for Studies in Innovation, Research and Evaluation, Online publication date: 10 February 2011

Nordic Institute for Studies in Innovation, Research and Evaluation, Online publication date: 10 February 2011 This article was downloaded by: [Opheim, Vibeke] On: 22 February 2011 Access details: Access Details: [subscription number 933353483] Publisher Routledge Informa Ltd Registered in England and Wales Registered

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science

More information

Predicting Customer Default Times using Survival Analysis Methods in SAS

Predicting Customer Default Times using Survival Analysis Methods in SAS Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens Bart.Baesens@econ.kuleuven.ac.be Overview The credit scoring survival analysis problem Statistical methods for Survival

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

PLEASE SCROLL DOWN FOR ARTICLE

PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by:[new York University] [New York University] On: 16 July 2007 Access Details: [subscription number 769426389] Publisher: Taylor & Francis Informa Ltd Registered in England

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1. **BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

More information

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, 2011. Hull Univ. Business School Duration Analysis Econometric Analysis Dr. Keshab Bhattarai Hull Univ. Business School April 4, 2011 Dr. Bhattarai (Hull Univ. Business School) Duration April 4, 2011 1 / 27 What is Duration Analysis?

More information

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance

More information

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Numerical Analysis An Introduction

Numerical Analysis An Introduction Walter Gautschi Numerical Analysis An Introduction 1997 Birkhauser Boston Basel Berlin CONTENTS PREFACE xi CHAPTER 0. PROLOGUE 1 0.1. Overview 1 0.2. Numerical analysis software 3 0.3. Textbooks and monographs

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

More information

Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Maximum likelihood estimation of mean reverting processes

Maximum likelihood estimation of mean reverting processes Maximum likelihood estimation of mean reverting processes José Carlos García Franco Onward, Inc. jcpollo@onwardinc.com Abstract Mean reverting processes are frequently used models in real options. For

More information

Cameron M. Weber a a New School for Social Research, New York, USA. Available online: 25 Oct 2011

Cameron M. Weber a a New School for Social Research, New York, USA. Available online: 25 Oct 2011 This article was downloaded by: [The New School] On: 25 October 2011, At: 09:15 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,

More information

Outline. Generalize Simple Example

Outline. Generalize Simple Example Solving Simultaneous Nonlinear Algebraic Equations Larry Caretto Mechanical Engineering 309 Numerical Analysis of Engineering Systems March 5, 014 Outline Problem Definition of solving simultaneous nonlinear

More information

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article was downloaded by: [Lanzhou Institute of Geology] On: 27 February 2013, At: 01:00 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Firm and Product Life Cycles and Firm Survival

Firm and Product Life Cycles and Firm Survival TECHNOLOGICAL CHANGE Firm and Product Life Cycles and Firm Survival By RAJSHREE AGARWAL AND MICHAEL GORT* On average, roughly 5 10 percent of the firms in a given market leave that market over the span

More information

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components An Application of Weibull Analysis to Determine Failure Rates in Automotive Components Jingshu Wu, PhD, PE, Stephen McHenry, Jeffrey Quandt National Highway Traffic Safety Administration (NHTSA) U.S. Department

More information

LONG TERM FOREIGN CURRENCY EXCHANGE RATE PREDICTIONS

LONG TERM FOREIGN CURRENCY EXCHANGE RATE PREDICTIONS LONG TERM FOREIGN CURRENCY EXCHANGE RATE PREDICTIONS The motivation of this work is to predict foreign currency exchange rates between countries using the long term economic performance of the respective

More information

7.1 The Hazard and Survival Functions

7.1 The Hazard and Survival Functions Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

More information

The Method of Least Squares

The Method of Least Squares Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this

More information

Online publication date: 20 November 2009

Online publication date: 20 November 2009 This article was downloaded by: [Michigan State University] On: 17 December 2009 Access details: Access Details: [subscription number 908199210] Publisher Routledge Informa Ltd Registered in England and

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS Tailiang Xie, Ping Zhao and Joel Waksman, Wyeth Consumer Healthcare Five Giralda Farms, Madison, NJ 794 KEY WORDS: Safety Data, Adverse

More information

Causes of Inflation in the Iranian Economy

Causes of Inflation in the Iranian Economy Causes of Inflation in the Iranian Economy Hamed Armesh* and Abas Alavi Rad** It is clear that in the nearly last four decades inflation is one of the important problems of Iranian economy. In this study,

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

A random point process model for the score in sport matches

A random point process model for the score in sport matches IMA Journal of Management Mathematics (2009) 20, 121 131 doi:10.1093/imaman/dpn027 Advance Access publication on October 30, 2008 A random point process model for the score in sport matches PETR VOLF Institute

More information

Stress Test Modeling in Cards Portfolios

Stress Test Modeling in Cards Portfolios Stress Test Modeling in Cards Portfolios José J. Canals-Cerdá The views expressed here are my own and not necessarily those of the Federal Reserve or its staff. Credit Card Performance: Charge-off Rate

More information

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications

More information

Comparison of resampling method applied to censored data

Comparison of resampling method applied to censored data International Journal of Advanced Statistics and Probability, 2 (2) (2014) 48-55 c Science Publishing Corporation www.sciencepubco.com/index.php/ijasp doi: 10.14419/ijasp.v2i2.2291 Research Paper Comparison

More information

5 Numerical Differentiation

5 Numerical Differentiation D. Levy 5 Numerical Differentiation 5. Basic Concepts This chapter deals with numerical approximations of derivatives. The first questions that comes up to mind is: why do we need to approximate derivatives

More information

Package depend.truncation

Package depend.truncation Type Package Package depend.truncation May 28, 2015 Title Statistical Inference for Parametric and Semiparametric Models Based on Dependently Truncated Data Version 2.4 Date 2015-05-28 Author Takeshi Emura

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

A Log-Robust Optimization Approach to Portfolio Management

A Log-Robust Optimization Approach to Portfolio Management A Log-Robust Optimization Approach to Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function

Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function The Empirical Econometrics and Quantitative Economics Letters ISSN 2286 7147 EEQEL all rights reserved Volume 1, Number 4 (December 2012), pp. 89 106. Comparison of sales forecasting models for an innovative

More information

Wes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

Wes, Delaram, and Emily MA751. Exercise 4.5. 1 p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }]. Wes, Delaram, and Emily MA75 Exercise 4.5 Consider a two-class logistic regression problem with x R. Characterize the maximum-likelihood estimates of the slope and intercept parameter if the sample for

More information

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method 578 CHAPTER 1 NUMERICAL METHODS 1. ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS As a numerical technique, Gaussian elimination is rather unusual because it is direct. That is, a solution is obtained after

More information

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models Abstract This paper considers the modeling of claim durations for existing claimants under income

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information