Improved Survival Modeling Using A Piecewise Exponential Approach

Similar documents
SUMAN DUVVURU STAT 567 PROJECT REPORT

More details on the inputs, functionality, and output can be found below.

ATV - Lifetime Data Analysis

Gamma Distribution Fitting

Lecture 15 Introduction to Survival Analysis

Adjuvant Therapy Non Small Cell Lung Cancer. Sunil Nagpal MD Director, Thoracic Oncology Jan 30, 2015

Permutation Tests for Comparing Two Populations

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Basics of Statistical Machine Learning

Predictive Modeling Techniques in Insurance

Bayesian Phylogeny and Measures of Branch Support

Statistical Analysis of Life Insurance Policy Termination and Survivorship

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data)

Simulation and Lean Six Sigma

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Non Parametric Inference

Interpretation of Somers D under four simple models

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

3.4 Statistical inference for 2 populations based on two samples

Measures of Prognosis. Sukon Kanchanaraksa, PhD Johns Hopkins University

The NCPE has issued a recommendation regarding the use of pertuzumab for this indication. The NCPE does not recommend reimbursement of pertuzumab.

Pricing a Motor Extended Warranty With Limited Usage Cover

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

Lecture 19: Conditional Logistic Regression

Statistics in Retail Finance. Chapter 6: Behavioural models

Specification of the Bayesian CRM: Model and Sample Size. Ken Cheung Department of Biostatistics, Columbia University

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Data Analysis, Research Study Design and the IRB

Chapter 3 RANDOM VARIATE GENERATION

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Regression Modeling Strategies

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Statistics Graduate Courses

Generalized Linear Models

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

The Normal distribution

LOGNORMAL MODEL FOR STOCK PRICES

Update of a projection software to represent a stock-recruitment relationship using flexible assumptions

Temporal Trends in Demographics and Overall Survival of Non Small-Cell Lung Cancer Patients at Moffitt Cancer Center From 1986 to 2008

CHAPTER 2 Estimating Probabilities

LDA at Work: Deutsche Bank s Approach to Quantifying Operational Risk

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Normality Testing in Excel

Introduction. Survival Analysis. Censoring. Plan of Talk

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Two-sample inference: Continuous data

Operational Risk Modeling Analytics

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Comparison of resampling method applied to censored data

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Quantitative Methods for Finance

The Assumption(s) of Normality

Fairfield Public Schools

Parametric Survival Models

Inference of Probability Distributions for Trust and Security applications

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Distribution (Weibull) Fitting

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time

Java Modules for Time Series Analysis

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

WEIBULL ANALYSIS USING R, IN A NUTSHELL

Probabilistic Methods for Time-Series Analysis

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Sample Size and Power in Clinical Trials

Statistical Rules of Thumb

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Agility, Uncertainty, and Software Project Estimation Todd Little, Landmark Graphics

Randomized trials versus observational studies

Operational Risk Management: Added Value of Advanced Methodologies

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

The Kaplan-Meier Plot. Olaf M. Glück

CALCULATIONS & STATISTICS

Section 6.2 Definition of Probability

ACTUARIAL MODELING FOR INSURANCE CLAIM SEVERITY IN MOTOR COMPREHENSIVE POLICY USING INDUSTRIAL STATISTICAL DISTRIBUTIONS

Topic 8. Chi Square Tests

Pricing Alternative forms of Commercial Insurance cover

A linear algebraic method for pricing temporary life annuities

Examining credit card consumption pattern

RISKY LOSS DISTRIBUTIONS AND MODELING THE LOSS RESERVE PAY-OUT TAIL

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Bayesian Statistical Analysis in Medical Research

Cool Tools for PROC LOGISTIC

Organizing Your Approach to a Data Analysis

Reliability estimators for the components of series and parallel systems: The Weibull model

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Introduction to Hypothesis Testing OPRE 6301

Extreme Value Modeling for Detection and Attribution of Climate Extremes

ACTUARIAL MATHEMATICS FOR LIFE CONTINGENT RISKS

Inference on the parameters of the Weibull distribution using records

Transcription:

Improved Survival Modeling Using A Piecewise Exponential Approach Gang Han, Michael J. Schell, and Jongphil Kim Biostatistics Core H.Lee Moffitt Cancer Center & Research Institute Joint Statistical Meetings Washington, DC August 03, 2009

Outline of Topics 1 Introduction 2 A motivating example 3 4 Summary and future research

Introduction Models for time-to-event data Non-parametric models, e.g., Product-Limits estimates, Life-Table analysis. Strength: Easy to understand and apply, distribution-free. Weakness: Generally the uncertainty is high, less efficient than proper parametric models. Parametric models, e.g., Exponential, Weibull, Lognormal, Gamma. Strength: Parsimonious, makes use of all the data. Weakness: Use of the parametric models is restricted by the distribution assumption; more effort is needed to ensure parametric modeling.

Introduction Models for time-to-event data Middle ground: Piecewise parametric modeling. Gains most of strengths of both while minimizing the weaknesses.

Introduction Piecewise exponential distribution is a simple and flexible distribution for modeling time to event data. (Kim and Proschan 1991, Qiuo et al., 1999, Demarqui et al. 2008) Challenge determination of the pieces.

A motivating example A non-small cell lung cancer (NSCLC) example Lung cancer has the highest mortality among all types of cancer. Non-small cell lung cancer (NSCLC) is the most common and most severe lung cancer. (NSCLC median overall survival is about 8 to 10 months.) We model a survival data set in a clinical trial at Moffitt Cancer Center for NSCLC patients from recent phase II trials. Sample size N = 178.

A motivating example Results and comparison Using Weibull survival model can result in large bias Figure 1. Plot of KME (blue step function) and Weibull estimates (the continuous curve). The label C indicates censored cases. The estimated survival function from Weibull distribution has the form S(t) = e (t/16.27)0.85.

A motivating example Results and comparison The proposed piecewise exponential model fits the data much better. Figure 2. Plot of KME (the step function) and the proposed piecewise estimate (the continuous curve).

The idea is to integrate three major components: 1 A likelihood ratio (UMPU) test; 2 A backward elimination procedure; 3 An optional pool-adjacent-violators algorithm.

The 1st component of the proposed method A likelihood ratio test (LRT) For exponentially distributed survival times, the total time on test (TTOT) between two events follows Gamma(α, λ); α : the number of events (known); λ : unknown but its MLE is TTOT/α. We attempt to test the equivalence of the scale parameters (λ) of two Gamma distributions.

A likelihood ratio test (LRT) Two independent random variables X 1 Γ(α 1, λ 1 ), X 2 Γ(α 2, λ 2 ); α 1, α 2 are known. Test: H 0 : λ 1 = λ 2 vs. H 1 : λ 1 λ 2. ( ) α1 ( ) α2 The LRT is to reject H 0 if Φ(X 1, X 2 ) = X1 X2 X 1 +X 2 X 1 +X 2 is small. Theorem The above test is the uniformly most powerful unbiased (UMPU) test under certain conditions.

Quantifying the P-value and power Let Y = X 1 X 1 +X 2, then Φ(X 1, X 2 ) = Y α 1(1 Y ) α 2 = h(y ). Facts 1 Y is distributed as Beta(α 1, α 2 ) under H 0. [ 2 function h(y ) is increasing if Y 0, [ ] decreasing if Y α1 α 1 +α 2, 1. So h(y ) < C α iff Y (0, a 1 ) (a 2, 1) where a 1 a 2 and h(a 1 ) = h(a 2 ) = C α. ] α 1 α 1 +α 2 and is 3 p-value: Prob(Y (0, a 1 ) (a 2, 1)) under H 0. 4 power: Prob(Y (0, a 1 ) (a 2, 1)) under H 1.

An example If α 1 = α 2 = 1, then Y uniform(0, 1) under H 0. Size 5% UMPU test rejects H 0 if x 1 /(x 1 + x 2 ) (0, 0.025) (0.975, 1), where (x 1, x 2 ) is a realization of (X 1, X 2 ). 0.025 = 1 39 40 and 0.975 = 40. So we reject λ 1 = λ 2 if one survival is at least 39 times the other one.

The 2nd component of the proposed method Backward elimination (BE) 1 Start with hazard rate changes for each distinct failure time. 2 Combine the two adjacent hazards with the highest p-value from the LRT. 3 Repeat until the p-value is below some critical naive p-value α. 4 α can be found using a Monte Carlo approach (Schell and Singh, 1997) in order to achieve an α level test.

The 3rd component of the proposed method An optional preprocessing: The pool-adjacent-violators algorithm (PAVA) for decreasing failure rate functions Robertson et al. (1988), Order restricted statistical inference. If the hazard function is known to be non-increasing, one can pool the adjacent cases where the non-increasing assumption is violated (Robertson et al.). Then the BE approach can be used. PAVA might lead to a different solution.

Estimation and confidence interval(ci) Suppose t has a piecewise exponential distribution with pieces divided by times 0 = t 0 < t 1 < t 2 < < t p and model parameters are λ = (λ 1, λ 2,..., λ p+1 ). Define E 0 = 1 and E j = jk=1 e (t k t k 1 )/λ k. S(t λ) = E i 1 e (t t i 1 )/λ i for all t [t i 1, t i ); E(t λ) = λ 1 + p i=1 E i (λ i+1 λ i ); Median(t λ) = t m + λ m+1 log(2e m) for E m 0.5 and E m+1 0.5. Assuming that the t i s are known, we let λ MLE of λ; CIs of λ have been derived in (Epstein and Sobel, 1953); Ŝ(t λ) S(t λ); we estimate E(t λ) and Median(t λ) in the same way. Similarly, CIs for S(t λ), E(t λ), and Median(t λ) build on the CIs of λ.

Demo of our method in the NSCLC example We use PAVA with non-increasing hazard rate assumption because the risk of NSCLC cancer is believed to decrease over time. There were 141 distinct failure times in total; 6 times left after PAVA: (12.7, 18.6, 23.5, 24.7, 28.0, 47.9).

Demo of our method in the NSCLC example BE 1st iteration (p = 6): t 0,..., t p 0 12.7 18.6 23.5 24.7 28.0 47.9 λ 1,..., λ p+1 14.8 15.7 19.9 20.6 24.9 61.7 82.5 n 1,..., n p+1 112 22 10 2 4 6 1 p-value for t 1,..., t p 0.809 0.529 0.966 0.840 0.191 0.798 BE 2nd iteration (p = 5): t 0,..., t p 0 12.7 18.6 24.7 28.0 47.9 λ 1,..., λ p+1 14.8 15.7 20.0 24.9 61.7 82.7 p-value for t 1,..., t p 0.809 0.493 0.707 0.191 0.798 BE 3rd iteration (p = 4): t 0,..., t p 0 18.6 24.7 28.0 47.9 λ 1,..., λ p+1 15.0 20.0 24.9 61.7 82.5 p-value for t 1,..., t p 0.317 0.707 0.191 0.798

Demo of our method in the NSCLC example BE 4th iteration (p = 3): t 0,..., t p 0 18.6 24.7 28.0 λ 1,..., λ p+1 15.0 20.0 24.9 64.6 p-value for t 1,..., t p 0.317 0.707 0.162 BE 5th iteration (p = 2): BE 6th iteration (p = 1): t 0,..., t p 0 18.6 28.0 λ 1,..., λ p+1 15.0 21.3 64.6 p-value for t 1,..., t p 0.167 0.011 t 0,..., t p 0 28.0 λ 1,..., λ p+1 15.6 64.6 n 1,..., n p+1 150 7 p-value for t 1,..., t p 0.000

Demo of our method in the NSCLC example ( λ 1, λ 2 ) = (15.6, 64.6); t 1 = 28.0. S(t 1 λ 1 ) 0.17 < 1/2. Thus the estimated median survival: 10.8 = log2 λ 1 ; 95% CI of the median survival: (9.6, 12.4).

Summary and future research Summary and future research By integrating the three components, the proposed method can greatly improve survival modeling. Four of the future research topics are More generalized piecewise models (e.g., piecewise Gompertz model); Instead of PAVA, using umbrella alternatives for certain data sets (e.g., breast cancer data); Obtain α formulae for different censoring rates; Goodness-of-fit quantitative assessments of models.