Lecture 4: Bivariate and Multivariate Failure Times

Transcription

1 Lecture 4: Bivariate and Multivariate Failure Times In many survival studies, the failure times may be clustered, matched or repeatedly measured, and failure times are likely to be correlated. The choice of a statistical model relies on the nature of correlation structure of failure time measurements and there are numerous way to model the dependence of correlated survival data. Example. Several individuals in one cluster shared observed or unobserved risk factors; e.g., onset age of a certain disease (cancer) for siblings. Example. One individual experiences multiple failure events, e.g., failure times of left and right eyes; failure times of two kidneys.

2 4.1 Bivariate failure times: Several dependence measures In studies involving bivariate events, interest could be focused on marginal effects of treatment (exposure) or dependence structure between bivariate times. For example, in familial studies, investigators may be interested in strength of disease aggregation, say, hypothesizing greater heritability for early-onset than later-onset disease.

3 Dependence measures Correlation coefficient. The traditional way of evaluating dependence between two random variables is the correlation coefficient (Pearson correlation): Corr[T 1, T 2 ] = Cov[T 1, T 2 ] V ar[t1 ]V ar[t 2 ] = Cov[T 1, T 2 ] SD[T 1 ] SD[T 2 ], where Cov[T 1, T 2 ] = E[(T 1 E[T 1 ])(T 2 E[T 2 ])]. Note that 1 Corr[T 1, T 2 ] 1.

4 Kendall s tau (coefficient of concordance). For failure time data, it is natural to demand a rank-based measure that is invariant subject to both linear and nonlinear monotonic change of scale of failure times. One easily interpreted measure that does have the rank-invariant property is the Kendall s tau (coefficient of concordance), τ = E[sign{(T 11 T 21 )(T 12 T 22 )}]. A more transparent formulation for continuous failure times is τ = P {(T 11 T 21 )(T 12 T 22 ) > 0} P {(T 11 T 21 )(T 12 T 22 ) < 0} = 2P {(T 11 T 21 )(T 12 T 22 ) > 0} 1

5 The Kendall s tau is a measure of rank correlation for bivariate dependence. If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value 1. If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value -1. If X and Y are independent, then τ = 0.

6 Cross Ratio In familial examples, researches tend to believe that genetic influences may exist only in early ages. The global measures, such as Kendall s tau, is not ideal for addressing the concepts of early/late dependence. To address the question of local dependence, we need measures which evaluate dependence at a single time point, such as the cross ratio. For continuous (T 1, T 2 ), define the bivariate hazard function λ(t 1, t 2 ) = f(t 1, t 2 )/S(t 1, t 2 ). The cross ratio at (t 1, t 2 ) is defined as θ(t 1, t 2 ) = λ(t 1 = t 1 T 2 = t 2 ) λ(t 1 = t 1 T 2 > t 2 ) = = S(t 1, t 2 )f(t 1, t 2 ) f(t 1,t 2) s S(t 2 1,s 2) s2 =t 2 s 1 S(s 1,t 2) s1 =t 1 S(t 1,t 2) s 1 S(s 1, t 2 ) s1=t 1 s 2 S(t 1, s 2 ) s2=t 2

7 The cross ratio θ(t 1, t 2 ) is interpreted as the ratio of one s failure risk at time t 1 if his/her partner is known to have failed versus survived at time t 2. The cross ratio measures the degree of dependence between T 1 and T 2, where independence is implied by θ(t 1, t 2 ) = 1. When two failure times are exchangeable, such as the failure times from (identical) twins, the cross ratio is symmetric with respect to the two components; that is, the cross ratio for (T 1, T 2 ) is the same as the cross ratio for (T 2, T 1 ).

8 4.2 Nonparametric Estimation for Bivariate Distribution Nonparametric estimation for bivariate distribution is mathematically complicated compared to the Kaplan-Meier estimator for univariate survival data. Popular approaches include the works of Dabrowska (1988) among others. Dabrowska extended the univariate Kaplan-Meier approach by defining a bivariate hazard function and estimated the joint distribution via the bivariate hazard function. Prentice and Cai did not define a joint hazard. Instead, they gave a representation of the bivariate survival function in terms of the marginal survival functions and covariance between counting process martingales of two components. Remark: Dabrowska Estimator is discussed in appendix (reading with option)

9 4.3 Copula Model for Bivariate Failure Times One of the earliest family of distributions for correlated bivariate measurements is the Copula family, in which the marginal distributions are uniform on the unit interval. The Copula family includes many popular bivariate failure time models and has gained considerable attention in statistical literature because of its flexibility in modelling: In statistics, a copula is used as a general way of formulating a multivariate distribution in that various general types of dependence can be represented. The approach to formulating a multivariate distribution using a copula is based on the idea that a simple transformation can be made of each marginal variable in such a way that each transformed marginal variable has a uniform distribution. Once this is done, the dependence structure can be expressed as a multivariate distribution on the obtained uniforms, and a copula is precisely a multivariate distribution on marginally uniform random variables.

10 Suppose that C(t 1, t 2 ) is a joint survival function with density c(t 1, t 2 ) on [0, 1] 2, that is, c : [0, 1] [0, 1] [0, 1]. Let (T 1, T 2 ) denote the paired failure times, (S 1, S 2 ) and (f 1, f 2 ) denote the corresponding marginal survival and density functions. Then the joint survival function of (T 1, T 2 ) in the Copula family is given by S(t 1, t 2 ) = C {S 1 (t 1 ), S 2 (t 2 )}.

11 Archimedian Copula model. The survival function in this subclass has the following form S(t 1, t 2 ) = φ [ φ 1 {S 1 (t 1 )} + φ 1 {S 2 (t 2 )} ], where 0 φ 1, φ(0) = 1, φ < 0, φ > 0 (a convex decreasing function). If φ is a Laplace transform of some distribution (of W ), φ(t) = E(e tw ), the AC model reduces to the proportional frailty model. The Copula models can be formulated by the marginal distributions and Copula. This two-step approach of modelling is convenient because many tractable models are readily available for the marginal distributions. Also, the Copula models make sense for illustrating dependence. Other Copula models include Clayton s Family, Frank s Family, Positive stable copula, etc. (Clayton, 1978; Hougaard, 1986; Frank,1979).

12 4.4 Frailty models for multivariate failure times A commonly used approach to model multivariate failure times, the frailty model, is to specify independence among multivariate failure times conditional on an unobserved positive-valued variable, W, called frailty. Assume that the hazard function of T ij given W i = w (frailty) is λ j (t j W i = w) = w λ 0j (t j ), which is a proportional frailty model with the baseline hazard function λ 0j ( ). Let B j ( ) be the corresponding survival function for λ 0j ( ).

13 Univariate inference. Conditioning on W i = w, the survival function of T ij is {B j (t j )} w. Conditioning on W i = w, the survival function of (T i1,..., T im ) is S(t W i = w) = m j=1 {B j(t j )} w, t = (t 1,..., t m ). The unconditional survival function of T ij is S j (t j ) = φ{ logb j (t j )} where φ( ) is the Laplace transform of the random variable W i, i.e., φ(t) = E(e twi ).

14 Conditioning on W i = w, the hazard function of T ij is λ j (t j W i = w) = w λ 0j (t j ). By extending the proportional hazards model, the more general setting of the proportional frailty model can be expressed as, for j = 1,..., m, λ j (t j ; x ij, w i ) = w i λ 0j (t j )exp(βx ij )

15 Bivariate inference. The bivariate survival function satisfies S(t 1, t 2 ) = {B 1 (t 1 )B 2 (t 2 )} w df W (w) where F W indicates the frailty distribution of W. It follows that S(t 1, t 2 ) = φ( logb 1 (t 1 ) logb 2 (t 2 )), where φ( ) is the Laplace transform of the random variable W. Inferential procedures for frailty models have exclusively focused on likelihood-based approaches.

16 Bivariate distributions generated by frailty models are seen to be a subclass of the archimedean distributions (Genest and MacKey, 1989, American Statistician), providing that φ(u) is a Laplace transform. With B j (t j ) = exp[ φ 1 (S j (t j )], the bivariate distribution can be written as S(t 1, t 2 ) = 2 exp[ wφ 1 (S j (t j ))df W (w) j=1 = φ [ φ 1 {S 1 (t 1 )} + φ 1 {S 2 (t 2 )} ] However, the converse statement is false because there is no guarantee that the function φ(u) is a Laplace transformation.

17 Example 1 : Gamma frailty models (Clayton model) In this model, the frailty W follows a Gamma distribution with expectation 1 and variance α > 0. The corresponding Laplace transform is φ(u) = (1 + u) 1 α. The failure times (T 1, T 2 ) are positively correlated when α > 0 and independent when α = 0. The joint survival function can be written as S(t 1, t 2 ) = [S 1 (t 1 ) α + S 2 (t 2 ) α 1] 1/α.

18 Example2 : Stable frailty models Hougaard (1986) proposed a class of multivariate model, where the frailty W follows the positive stable distribution with parameter α so that the Laplace transform is φ(u) = exp( u a ), 0 < a < 1 The corresponding joint survival function is S(t 1, t 2 ) = exp( [( log S 1 (t 1 )) 1/a + ( log S 2 (t 2 )) 1/a ] a ). A notable property of the stable frailty model is that if the conditional hazards are proportional, then the hazard in the marginal distributions are also proportional, but with different baseline hazards and regression coefficients.

19 4.5 Regression models and methods There are two commonly used regression models, for clustered survival data, to account for intra-cluster association: frailty models and marginal models. In frailty models, the dependence structure is explicitly specified by some unobserved random quantities, frailties, common to observations from the same cluster. Marginal models, in contrast, model the marginal failure time distribution and leave the structure of the intra-cluster association unspecified but adjust for it in the inference.

20 Marginal regression models. The marginal model assumes for different baseline hazards for different strata, and it allows for different parameters for different strata given the covariates (Wei et al., JASA 1989). Marginal proportional hazards model where the event history, N(t), is not included as a part of the conditional statistics: λ(t X(t)) = λ 0 (t)exp{x(t)β}. The marginal model is generally ideal for identifying treatment effects and risk factors. The usual estimation procedure is adopted by Wei et al. to estimate the regression parameter β under independent censoring assumption; that is, the usual risk-set-based estimating equations are used for estimation of β. A robust sandwich-estimator from estimating equations is used for estimating the variance-covariance matrix of ˆβ.

21 Frailty regression models. A frailty model typically assumes that, conditioning on possibly time-dependent covariates X = x and frailty Z = z, T j is distributed with pdf f j (y x, z, θ), θ Θ. Denote the observed data in the sample by {(Y ij, δ ij, X ij ) : j = 1, 2,..., m i, i = 1, 2,..., n}, where δ ij is a censoring indicator, Y ij = T ij if δ ij = 1, Y ij = C ij if δ ij = 0. Assume that the cluster-specific latent variable Z i is distributed with pdf h(z; γ), γ Γ, and that the censoring time C ij is independent of (T ij, Z i ) conditioning on X ij = x ij. Further assume that the cluster size M i is independent of (T ij, C ij, X ij, Z i ). The likelihood function can be expressed as n m i L f j (y ij x ij, z i ; θ) δij S j (y ij x ij, z i ; θ) 1 δij h(z i; γ)dz i i=1 j=1

22 Remarks: - In literature the EM algorithm has been used as the tool for identifying the MLE ˆβ. The convergence in EM algorithm might be slow, if it converges at all, which depends on the amount of information to be iterated for the survival model, its censoring pattern, and the choice of starting value. - In some cases, the EM algorithm does not convergence at all.

23 Assuming frailty is gamma distributed, we can construct the likelihood function for the model. If we assume a parametric form for the baseline hazard function, then by directly maximizing that likelihood function would yield the maximum likelihood estimates. Inverting the resulting information matrix, one gets the estimates of the variability of the parameter estimates. However, if the baseline hazard function is not parameterized, EM algorithm gets us the semiparametric estimates. There are various problems of obtaining variance estimates for frailty parameter, regression coefficients and cumulative baseline hazards using the observed non-parametric information matrix from a shared Gamma frailty model (Andersen et al., 1997, Biometrics).

24 Appendix Dabrowska Estimator (optional reading). The Dabrowska estimator (1988) overcomes some undesirable features of previously developed estimators such as non-uniqueness, inconsistency, and lack of weak convergence theory. It was found that the bivariate hazard function λ(t 1, t 2 ) = f(t 1, t 2 )/S(t 1, t 2 ) is insufficient to determine the survival function in the bivariate case. The relation between the bivariate survival and hazard function is d 2 { } { } d d log S(t 1, t 2 ) = λ(t 1, t 2 ) log S(t 1, t 2 ) log S(t 1, t 2 ), dt 1 dt 2 dt 1 dt 2 or equivalently,

25 log S(dt 1, dt 2 ) = λ(t 1, t 2 )dt 1 dt 2 {log S(dt 1, t 2 )dt 2 } {log S(t 1, dt 2 )dt 1 } = f(t 1, t 2 )dt 1 dt 2 S(t 1, t 2 ) S(dt 1, t 2 )dt 2 S(t 1, t 2 ) S(t 1, dt 2 )dt 1 S(t 1, t 2 ) = { t2 t1 0 { t2 0 0 } log S(dt 1, dt 2) + log S(t 1, 0) + log S(0, t 2 ) } [log S(t 1, dt 2) log S(0, dt 2)] + log S(t 1, 0) + log S(0, t 2 ) = {(log S(t 1, t 2 ) log S(t 1, 0)) (log S(0, t 2 ) log S(0, 0))} + log S(t 1, 0) + log S(0, t 2 ) = log S(t 1, t 2 )

26 Thus, by taking the exponential transformation, { t1 t2 } S(t 1, t 2 ) = S(t 1, 0)S(0, t 2 ) exp log S(dt 1, dt 2 ). 0 0 Observations are paired times (T 1, T 2 ) with corresponding failure indicators (D 1, D 2 ). The estimator is as follows. First define the bivariate risk set at time (t 1, t 2 ) R(t 1, t 2 ) = I(T i1 t 1, T i2 t 2 ). i

27 The number of bivariate events occurring at time (t 1, t 2 ) is K 11 (t 1, t 2 ) = i D i1 D i2 I(T i1 = t 1, T i2 = t 2 ), and the number of events for the first component among those where the second component is alive at time t 2 is K 10 (t 1, t 2 ) = i D i1 I{T i1 = t 1, T i2 t 2 }.

28 Similar to K 10 (t 1, t 2 ), K 01 (t 1, t 2 ) = i D i2 I(T i1 t 1, T i2 = t 2 ). Then we define the quantities relative to the risk set L 11 (t 1, t 2 ) = K 11 (t 1, t 2 )/R(t 1, t 2 ) L 10 (t 1, t 2 ) = K 10 (t 1, t 2 )/R(t 1, t 2 ) L 01 (t 1, t 2 ) = K 01 (t 1, t 2 )/R(t 1, t 2 ).

29 The Dabrowska estimator is built on the two marginal K-M estimators and an association measure H, where Ŝ(t 1, t 2 ) = {1 L 10 (u, 0)} {1 L 01 (0, u)} {1 H(u, v)} u t 1 u t 2 0 u t 1,0 v t 2 H(t 1, t 2 ) = L 10(t 1, t 2 )L 01 (t 1, t 2 ) L 11 (t 1, t 2 ) {1 L 10 (t 1, t 2 )}{1 L 01 (t 1, t 2 )}. The estimated survival function may not be monotone. It can be explained by the complicated censoring and the fact that three different portions of the data set are involved in estimation.