Competing Risks in Survival Analysis So far, we ve assumed that there is only one survival endpoint of interest, and that censoring is independent of the event of interest. However, in many contexts it is likely that we can have several different types of failure (death, relapse, opportunistic infection, etc) that are of interest to us, and the occurence of one type of failure may (or may not) prevent us from observing the other types of failures - so called competing risks. Examples: n cardiovascular studies, deaths from other causes (such as cancer) are considered competing risks. After a bone marrow transplantation, patients are followed to evaluate leukemia-free survival, so the endpoint is time to leukemia relapse or death, whichever occurs first. This endpoint consists of two types of failures (competing risks): leukemia relapse non-relapse deaths 1
Another example (competing risks and dependent censoring): n ADS study, if the endpoint is time to opportunistic infection (relapse), we may treat both death and loss to follow-up as censoring. But death and infection may be viewed as competing risks; moreover, these two types of events may not be independent (as in independent censoring ). So: T = time to event of interest (relapse) U = time to other events (death, loss to FU) X = min(t, U) δ = (T U) Observable Data: (X, δ) Possiblities: (1) Failure T and censoring U are independent (2) Failure T and censoring U are dependent 2
Case (1): ndependent failure types (this includes the case of independent censoring) NO PROBLEM, i.e. we can apply what we have learned so far to each type of events, and treat the other types of events as independent censoring. f we denote T the type of events under consideration (eg. relapes) Nonparametric estimation: n this case, we can use the Kaplan-Meier estimator to estimate S T (t) = P (T > t). Parametric estimation: f we know the distribution has a certain parametric form (exponential, Weibull, log-logistic), then we can use the likelihood for (X, δ) to get parameter estimates of the marginal distribution of S T (t). Semi-parametric estimation: We can apply the Cox regression model to assess the effects of covariates on the marginal hazard. Here, marginal is relative to the joint distribution of the different types of failures. 3
Case (2): Dependent failure types BG PROBLEM - needs different concepts/methodology. Tsiatis (1975) showed that under dependent censoring, S T (t) = P (T t) cannot be identified from the data, i.e. there can be two different distributions, S 1 (t) and S 2 (t), and the data won t be able to tell us which one is the true underlying distribution. See also Kalbfleisch and Prentice book. n fact, observing (X, δ) does not provide enough information to estimate the joint distribution of (T, U) so that we can t even check whether the assumption of independence is valid. This is a so-called untestable assumption. When is it reasonable to assume independent censoring? when censoring occurs because the study ends, or because the subject moves to a different state and there is no trend over time in health status of enrolling patients n the case of the relapse study, the fact that someone dies may reflect that they would have been at greater risk of relapse if they had not died than someone else who remained alive at that point. 4
What is the impact of dependent competing risks? Slud and Byar (1988) show that dependent causes of death can potentially make risk factors appear protective. That is, if we have a single binary covariate Z { 1 if risk factor is present Z = 0 otherwise True ordering between survival distributions: S 1 (t) < S 0 (t) for all t Kaplan Meier estimates of survival distributions: Ŝ 1 (t) > Ŝ0(t) for all t 5
General Case of Multiple Failure Types n general, suppose we have m different types of failure, and the respective times to failure are: T 1, T 2, T 3,, T m. But we observe only T = min(t 1, T 2,..., T m ). T 1, T 2, T 3,, T m are sometimes called the latent variables. The approach we describe below is mainly based on causespecific hazard functions, which focuses on what the observed survival is due to a certain cause of failure, while acknowledging that there are other types of failures operating at the same time. Some of the competing risks methods we describe here typically require no new software package. But the interpretation of the results are different from previous cases (eg. coxph()). 6
The cause-specific hazard function for the j-th failure type is: λ j (t) = lim t 0 1 P r(t T < t + t, J = j T t) t Notice that this is NOT a hazard function for any random variable a new concept specific to competing risks. The overall hazard of death is the sum over the failure types: m λ(t) = λ j (t) j=1 where λ(t) = lim t 0 1 P r(t T < t + t T t) t Compare also with the marginal hazard function (for T j ): λ j (t) = lim t 0 1 t P r(t T j < t + t T j t) The latter two are proper hazard functions. 7
We can estimate the cause-specific hazard function, because we can observe T = min(t 1, T 2,..., T m ), and we can observe whether the failure is due to a certain cause. We can also estimate S(t) = S T (t) = P (T t) for the above T even if the failure times are dependent. But as we already said, we can t estimate the marginal hazard functions when the risks are dependent, i.e. we can t estimate S j (t) = P (T j t). (n other words, we can t tell when a person would have died from cancer if he died from heart attack.) (This is the main tricky issue of competing risks analyses.) Prentice (1978) shows that probabilities that can be expressed as a function of the cause-specific hazards can be estimated. Furthermore, these are the only quantities that can be estimated without further assumptions. 8
For example, estimable quantities include: (a) The overall survival probability: S(t) = P (T t) = exp [ t = exp 0 t 0 ] λ(u)du λ j (u)du j (b) Conditional probability of failing from cause j in a small interval (τ i 1, τ i ] q ij = [S(τ i 1 )] 1 τi τ i 1 λ j (u) S(u) du (c) Conditional probability of surviving i th interval p i = 1 m j=1 q ij 9
Estimators: (a) The MLE of q ij is simply ˆq ij = d ij r i i.e., the number of failures (deaths) due to cause j during the i-th interval among the r i subjects at risk of failure at the beginning of the interval. (b) The MLE of p i is: m j=1 d ij ˆp i = 1 r i (c) The MLE of S(t) is based on p i : Ŝ(τ i ) = i k=1 ˆp k When there are no ties, this is the same as the KM estimate for T. 10
Main approaches to compete risks analysis: 1. Cause-specific hazards; 2. Cumulative incidence functions. They are used in ways similar to the Cox regression model and the KM curve. Cause-specific hazard can by estimated discretely in time interval i by ˆq ij = d ij r i. However, we know that such estimates of hazard function tended to be highly variable depending on the grouping intervals. Often one needs to do some smoothing, for which we do not go into details in this course. An example of leukemia data from Pepe and Mori (1993): Denote the type of failure by { R if event is relapse j = D if event is death without relapse The graft-versus-host disease (GVHD) can happen after bone marrow transplant. 11
... -... -... -,.,--....,..... w w-... Figure from Pepe and Mori for Leukemia Data Kernel Estimates of Cause-Specific Hazards 2 L - Relapse 1 l - J:."E o-t --- 2 - NoGVHD --- GVHD 0.1 2 \ Non-Relapse Death 1 v/"\. - J: \ ' - NoGVHD - GVHD 1! 1 l 1 1 l o =>-- -- 0. 1 ---- Years after Transplant 17 12 2
Cumulative ncidence Function (CF; sometimes called crude incidence curve, subdistribution function): This estimates the probability of an event in the setting where other competing risks are acknowledged to exist. The probability that failure type j occurs before time t in this setting is: F j (t) = P (T t, J = j) = t 0 S(u)λ j (u)du. This is called a subdistribution function. Note that it is likely F j ( ) < 1; in fact J j=1 F j( ) = 1. The overall survival function (no relapse or death) is then S(t) = 1 F R (t) F D (t) and λ j (t) = F j(t)/s(t). Cumulative incidence curves reflect what proportion of the total study population have the particular event (eg. relapse) by time t. Estimate: ˆFj (t) = i:t ij <t d ij r i Ŝ(t ij) 13
Figure from Pepe and Mori for Leukemia Data Cumulative ncidence Curves 1.0 1\ Relapse: P R(t) L i o.sf - NoGVHD. - GVHD 0.. 0.. o.ok:: - --- -- 0 1 2 3 4 5 6 1.0 1\ Non-relapse death: P DR(t} 0,5 1 --------------------------------------- l 0.0.-!1--.-...,.----r--...r--.,--.-...,.---.-...,---1 0 1 2 3 4 5 6 Years after transplant Figure 3. Marsinal probability functions. Cumulative incidence estimates were used 23...-........_..._......,... 14
There are other approaches used in practice, or having been proposed in the past. But it has become clear that these are inappropriate, and are not recommended. Use the Kaplan-Meier estimate anyway, ŜT 1 (t) Report the complement to the KM, 1 ŜT 1 (t) Use some conditional probability function Give upper and lower bounds for the true marginal survival function, in the absence of the competing risk. 15
..,. l J Figure from Pepe and Mori for Leukemia Data Complement Kaplan-Meier Functions. SUMMARZNO COMPimNG RSKS FALURE TME DATA 1.0...-----------------, 1\ Relapse: 1-SR(t) :0 as.0 0.5 e CL,..!, ---------------------------- _..,...,..r-.. - NoGVHD ---- GVHD 0.0 -t--.--,--..--,---r--r---.---,r--...--r---.--l 0 2 3 4 5 6 1.0..----------------. ' 1\ Non-relapse deatt.l: 1-S 0 Rm e a..0 0.5,_,-11!.--.--...------------------------ 0.0 -jl-...--,--..--,---.----r--.--,r---...--or--"?-..,.--1 0 1 2 3 4 5 6 Years after tansplant. Figure l. Complement Kaplnn-Meier functions 21 16
Regression models t is straightforward to relate the cause-specific hazard functions to covariates Z(t), using the regression models we have seen before. For example, using the PH modelling we have for j = 1,..., m. λ j (t Z(t)) = λ 0j (t) exp{β jz(t)}, n fitting the model, the (partial) likelihood for all event types factors into a separate likelihood function for each event type. The likelihood function for each event type treats all other types of events as if censored. So we can fit the model with coxph() in R (or any other software for the regular Cox model), one type of events at a time. Complication arises when we try to estimate (or predict) the corresponding CF, since F j (t Z) = t 0 S(u Z)λ j (u Z)du. We might need to fit models for all types of events first, if we want to estimate S(t Z). 17
An alternative model relates the CF more directly to the covariates (Fine and Gray, 1999). This is referred to as the subdistribution function approach, or the Fine-Gray model. Define the subdistribution hazard function as λ j (t) = d dt log{1 F j(t)}. The Fine-Gray model puts the PH assumption on the subdistribution hazard: for j = 1,..., m. λ j (t Z(t)) = λ 0j (t) exp{β jz(t)}, This approach has gained popularity in recent years, but you cannot use the regular Cox model software to directly fit the models. 18
Further points Competing risk analyses traditionally are descriptive and treated as secondary analyses individual competing risks cannot be interpreted in isolation in clinical trials, it can provide useful supplementary information to overall comparisons: patient deciding among treatment options might be interested in the probabilities of different types of events (subdistributions). There is currently debate about whether compete risks analyses should be used as primary analysis. This depends on the primary endpoint, where it should be relapsefree survival (no competing risk), or relapse itself (competing risk), and their implications in practice. Cause-specific hazard and cumulative incidence comparisons give complementary information. 19