Lecture 15 Introduction to Survival Analysis



Similar documents
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

SUMAN DUVVURU STAT 567 PROJECT REPORT

Gamma Distribution Fitting

Distribution (Weibull) Fitting

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Statistics in Retail Finance. Chapter 6: Behavioural models

Parametric Survival Models

7.1 The Hazard and Survival Functions

Introduction. Survival Analysis. Censoring. Plan of Talk

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Duration Analysis. Econometric Analysis. Dr. Keshab Bhattarai. April 4, Hull Univ. Business School

Properties of Future Lifetime Distributions and Estimation

2 Right Censoring and Kaplan-Meier Estimator

Nonparametric adaptive age replacement with a one-cycle criterion

ATV - Lifetime Data Analysis

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Confidence Intervals for Exponential Reliability

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

Survival Analysis: Introduction

5 Modeling Survival Data with Parametric Regression

Measuring prepayment risk: an application to UniCredit Family Financing

Introduction to Survival Analysis

Checking proportionality for Cox s regression model

Competing-risks regression

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Nominal and ordinal logistic regression

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Reliability estimators for the components of series and parallel systems: The Weibull model

Predicting Customer Default Times using Survival Analysis Methods in SAS

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Lecture 3: Linear methods for classification

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

Introduction to Basic Reliability Statistics. James Wheeler, CMRP

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

More details on the inputs, functionality, and output can be found below.

Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD

Introduction to mixed model and missing data issues in longitudinal studies

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Survival analysis methods in Insurance Applications in car insurance contracts

Course 4 Examination Questions And Illustrative Solutions. November 2000

Goodness of Fit Tests for the Gumbel Distribution with Type II right Censored data

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time

Interpretation of Somers D under four simple models

On Marginal Effects in Semiparametric Censored Regression Models

Maximum Likelihood Estimation

BS3b Statistical Lifetime-Models

Statistical Machine Learning

Comparison of resampling method applied to censored data

Operational Risk Modeling Analytics

Survival Analysis Approaches and New Developments using SAS. Jane Lu, AstraZeneca Pharmaceuticals, Wilmington, DE David Shen, Independent Consultant

Regression Modeling Strategies

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Interaction between quantitative predictors

SAS Software to Fit the Generalized Linear Model

Confidence Intervals for the Difference Between Two Means

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Applied Reliability Page 1 APPLIED RELIABILITY. Techniques for Reliability Analysis

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Detection of changes in variance using binary segmentation and optimal partitioning

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Time series Forecasting using Holt-Winters Exponential Smoothing

Statistics Graduate Courses

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Military Reliability Modeling William P. Fox, Steven B. Horton

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

MAINTAINED SYSTEMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University ENGINEERING RELIABILITY INTRODUCTION

Kaplan-Meier Plot. Time to Event Analysis Diagnostic Plots. Outline. Simulating time to event. The Kaplan-Meier Plot. Visual predictive checks

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Logistic Regression (1/24/13)

Determining distribution parameters from quantiles

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Contributions to extreme-value analysis

Lecture 12: The Black-Scholes Model Steven Skiena. skiena


Principles of Hypothesis Testing for Public Health

1 The Black-Scholes model: extensions and hedging

Poisson Models for Count Data

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Instabilities using Cox PH for forecasting or stress testing loan portfolios

Principle of Data Reduction

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

DURATION ANALYSIS OF FLEET DYNAMICS

Transcription:

Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15

Background In logistic regression, we were interested in studying how risk factors were associated with presence or absence of disease. Sometimes, though, we are interested in how a risk factor or treatment affects time to disease or some other event. Or we may have study dropout, and therefore subjects who we are not sure if they had disease or not. In these cases, logistic regression is not appropriate. Survival analysis is used to analyze data in which the time until the event is of interest. The response is often referred to as a failure time, survival time, or event time. BIOST 515, Lecture 15 1

Examples Time until tumor recurrence Time until cardiovascular death after some treatment intervention Time until AIDS for HIV patients Time until a machine part fails BIOST 515, Lecture 15 2

The survival time response Usually continuous May be incompletely determined for some subjects i.e.- For some subjects we may know that their survival time was at least equal to some time t. Whereas, for other subjects, we will know their exact time of event. Incompletely observed responses are censored Is always 0. BIOST 515, Lecture 15 3

Analysis issues If there is no censoring, standard regression procedures could be used. However, these may be inadequate because Time to event is restricted to be positive and has a skewed distribution. The probability of surviving past a certain point in time may be of more interest than the expected time of event. The hazard function, used for regression in survival analysis, can lend more insight into the failure mechanism than linear regression. BIOST 515, Lecture 15 4

Censoring Censoring is present when we have some information about a subject s event time, but we don t know the exact event time. For the analysis methods we will discuss to be valid, censoring mechanism must be independent of the survival mechanism. There are generally three reasons why censoring might occur: A subject does not experience the event before the study ends A person is lost to follow-up during the study period A person withdraws from the study These are all examples of right-censoring. BIOST 515, Lecture 15 5

Types of right-censoring Fixed type I censoring occurs when a study is designed to end after C years of follow-up. In this case, everyone who does not have an event observed during the course of the study is censored at C years. In random type I censoring, the study is designed to end after C years, but censored subjects do not all have the same censoring time. This is the main type of right-censoring we will be concerned with. In type II censoring, a study ends when there is a prespecified number of events. BIOST 515, Lecture 15 6

Regardless of the type of censoring, we must assume that it is non-informative about the event; that is, the censoring is caused by something other than the impending failure. BIOST 515, Lecture 15 7

Terminology and notation T denotes the response variable, T 0. The survival function is S(t) = P r(t > t) = 1 F (t). S(t) 0.4 0.7 1.0 0.0 0.5 1.0 1.5 2.0 t BIOST 515, Lecture 15 8

The survival function gives the probability that a subject will survive past time t. As t ranges from 0 to, the survival function has the following properties It is non-increasing At time t = 0, S(t) = 1. In other words, the probability of surviving past time 0 is 1. At time t =, S(t) = S( ) = 0. As time goes to infinity, the survival curve goes to 0. In theory, the survival function is smooth. In practice, we observe events on a discrete time scale (days, weeks, etc.). BIOST 515, Lecture 15 9

The hazard function, h(t), is the instantaneous rate at which events occur, given no previous events. h(t) = lim t 0 P r(t < T t + t T > t) t = f(t) S(t). The cumulative hazard describes the accumulated risk up to time t, H(t) = t 0 h(u)du. H(t) 0.5 1.5 2.5 H(t) 0.0 0.4 0.8 0.0 0.5 1.0 1.5 2.0 t 0.0 0.5 1.0 1.5 2.0 t BIOST 515, Lecture 15 10

If we know any one of the functions S(t), H(t), or h(t), we can derive the other two functions. h(t) = log(s(t)) t H(t) = log(s(t)) S(t) = exp( H(t)) BIOST 515, Lecture 15 11

Survival data How do we record and represent survival data with censoring? T i denotes the response for the ith subject. Let C i denote the censoring time for the ith subject Let δ i denote the event indicator δ i = { 1 if the event was observed (Ti C i ) 0 if the response was censored (T i > C i ). The observed response is Y i = min(t i, C i ). BIOST 515, Lecture 15 12

Example T i C i Y i δ i 80 100 80 1 40 80 40 1 74+ 74 74 0 85+ 85 85 0 40 95 40 1 Termination of study BIOST 515, Lecture 15 13

Estimating S(t) and H(t) If we are assuming that every subject follows the same survival function (no covariates or other individual differences), we can easily estimate S(t). We can use nonparametric estimators like the Kaplan-Meier estimator We can estimate the survival distribution by making parametric assumptions exponential Weibull Gamma log-normal BIOST 515, Lecture 15 14

Non-parametric estimation of S When no event times are censored, a non-parametric estimator of S(T ) is 1 F n (t), where F n (t) is the empirical cumulative distribution function. When some observations are censored, we can estimate S(t) using the Kaplan-Meier product-limit estimator. BIOST 515, Lecture 15 15

t No. subjects Deaths Censored Cumulative at risk survival 59 26 1 0 25/26 = 0.962 115 25 1 0 24/25 0.962 = 0.923 156 24 1 0 23/24 0.923 = 0.885 268 23 1 0 22/23 0.885 = 0.846 329 22 1 0 21/23 0.846 = 0.808 353 21 1 0 20/21 0.808 = 0.769 365 20 0 1 20/20 0.769 = 0.769 377 19 0 1 19/19 0.769 = 0.769 421 18 0 1 18/18 0.769 = 0.769 431. 17 1 0 16/17 0.769 = 0.688... BIOST 515, Lecture 15 16

How can we get this in R? > library(survival) > data(ovarian) > S1=Surv(ovarian$futime,ovarian$fustat) > S1 [1] 59 115 156 421+ 431 448+ 464 475 477+ 563 638 744+ [13] 769+ 770+ 803+ 855+ 1040+ 1106+ 1129+ 1206+ 1227+ 268 329 353 [25] 365 377+ BIOST 515, Lecture 15 17

> fit1=survfit(s1) > summary(fit1) Call: survfit(formula = S1) time n.risk n.event survival std.err lower 95% CI upper 95% CI 59 26 1 0.962 0.0377 0.890 1.000 115 25 1 0.923 0.0523 0.826 1.000 156 24 1 0.885 0.0627 0.770 1.000 268 23 1 0.846 0.0708 0.718 0.997 329 22 1 0.808 0.0773 0.670 0.974 353 21 1 0.769 0.0826 0.623 0.949 365 20 1 0.731 0.0870 0.579 0.923 431 17 1 0.688 0.0919 0.529 0.894 464 15 1 0.642 0.0965 0.478 0.862 475 14 1 0.596 0.0999 0.429 0.828 563 12 1 0.546 0.1032 0.377 0.791 638 11 1 0.497 0.1051 0.328 0.752 BIOST 515, Lecture 15 18

>plot(fit1,xlab="t",ylab=expression(hat(s)*"(t)")) S^(t) 0.0 0.2 0.4 0.6 0.8 1.0 0 200 400 600 800 1000 1200 t BIOST 515, Lecture 15 19

Parametric survival functions The Kaplan-Meier estimator is a very useful tool for estimating survival functions. Sometimes, we may want to make more assumptions that allow us to model the data in more detail. By specifying a parametric form for S(t), we can easily compute selected quantiles of the distribution estimate the expected failure time derive a concise equation and smooth function for estimating S(t), H(t) and h(t) estimate S(t) more precisely than KM assuming the parametric form is correct! BIOST 515, Lecture 15 20

Appropriate distributions Some popular distributions for estimating survival curves are Weibull exponential log-normal (log(t ) has a normal distribution) log-logistic BIOST 515, Lecture 15 21

Estimation for parametric S(t) We will use maximum likelihood estimation to estimate the unknown parameters of the parametric distributions. If Y i is uncensored, the ith subject contributes f(y i ) to the likelihood If Y i is censored, the ith subject contributes P r(y > Y i ) to the likelihood. The joint likelihood for all n subjects is L = n f(y i ) n S(Y i ). i:δ i =1 i:δ i =0 BIOST 515, Lecture 15 22

The log-likelihood can be written as log L = n log(h(y i )) n H(Y i ). i:δ i =1 i=1 BIOST 515, Lecture 15 23

Example Let s look at the ovarian data set in the survival library in R. Suppose we assume the time-to-event follows an distribution, where h(t) = λ and S(t) = exp( λt). > s2=survreg(surv(futime, fustat)~1, ovarian, dist= exponential ) > summary(s2) Call: survreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist = "exponential" Value Std. Error z p (Intercept) 7.17 0.289 24.8 3.72e-136 Scale fixed at 1 BIOST 515, Lecture 15 24

Exponential distribution Loglik(model)= -98 Loglik(intercept only)= -98 Number of Newton-Raphson Iterations: 4 n= 26 In the R output, λ = exp( (Intercept)) = exp( 7.17) Therefore, S(t) = exp( exp( 7.17)t). BIOST 515, Lecture 15 25

plot(t,1-pexp(t,exp(-7.169)),xlab="t",ylab=expression(hat(s)*"(t)")) S^(t) 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 200 400 600 800 1000 1200 t BIOST 515, Lecture 15 26

Example Let s look at the ovarian data set in the survival library in R. Suppose we assume the time-to-event follows a Weibull distribution, where h(t) = αγt γ 1 and S(t) = exp( αt γ ). > s1=survreg(surv(futime, fustat)~1, ovarian, dist= weibull,scale=0) > summary(s1) Call: survreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist = "weibull", scale = 0) Value Std. Error z p (Intercept) 7.111 0.293 24.292 2.36e-130 Log(scale) -0.103 0.254-0.405 6.86e-01 Scale= 0.902 BIOST 515, Lecture 15 27

Weibull distribution Loglik(model)= -98 Loglik(intercept only)= -98 Number of Newton-Raphson Iterations: 5 n= 26 To match the notation above, γ = 1/Scale and α = exp( (Intercept)γ). This gives us the following survival function, S(t) = exp( exp[ 7.111/.902])t 1/.902 ). BIOST 515, Lecture 15 28

plot(t,1-pweibull(t,1/0.902,exp(7.111))) S^(t) 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 200 400 600 800 1000 1200 t BIOST 515, Lecture 15 29