R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Transcription

1 Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C. Fournier & E. Dantan (Nantes, France) July 2015 Slide 1/29

2 Slide 1/29 P Blanche et al. Context & Motivation Medical researchers hope to improve patient management using earlier diagnoses Statisticians can help by tting prediction models The making of so-called "dynamic" predictions has recently received a lot of attention In order to be useful for medical practice, predictions should be "accurate" How should we evaluate dynamic prediction accuracy?

3 Slide 2/29 P Blanche et al. Data & Clinical goal Data: DIVAT cohort data of kidney transplant recipients (subsample, n = 4, 119) Clinical goal: Dynamic prediction of risk of kidney graft failure (death or return to dialysis) Using repeated measurements of serum creatinine

4 Slide 3/29 P Blanche et al. DIVAT data sample (n=4,119) French cohort Adult recipients Transplanted after 2000 Creatinine measured every year (

5 Slide 3/29 P Blanche et al. DIVAT data sample (n=4,119) French cohort Adult recipients Transplanted after 2000 Creatinine measured every year 6 centers (

6 Slide 4/29 P Blanche et al. Statistical challenges discussed How to evaluate and/or compare dynamic predictions? Using concepts of: Discrimination Calibration Accounting for: Dynamic setting Censoring issue

7 Slide 5/29 P Blanche et al. Basic idea & concepts for evaluating predictions Basic idea: comparing predictions and observations

8 Slide 5/29 P Blanche et al. Basic idea & concepts for evaluating predictions Basic idea: comparing predictions and observations (simple!)

9 Slide 5/29 P Blanche et al. Basic idea & concepts for evaluating predictions Basic idea: comparing predictions and observations (simple!) Concepts: Discrimination: Calibration:

10 Slide 5/29 P Blanche et al. Basic idea & concepts for evaluating predictions Basic idea: comparing predictions and observations (simple!) Concepts: Discrimination: A model has high discriminative power if the range of predicted risks is wide and subjects with low (high) predicted risk are more (less) likely to experience the event. Calibration:

11 Slide 5/29 P Blanche et al. Basic idea & concepts for evaluating predictions Basic idea: comparing predictions and observations (simple!) Concepts: Discrimination: A model has high discriminative power if the range of predicted risks is wide and subjects with low (high) predicted risk are more (less) likely to experience the event. Calibration: A model is calibrated if we can expect that x subjects out of 100 experience the event among all subjects that receive a predicted risk of x% (weak denition).

12 Dynamic prediction Creatinine (mol/l) follow up time (years) Slide 6/29 P Blanche et al.

16 Slide 6/29 P Blanche et al. Dynamic prediction s: Landmark time at which predictions are made (varies) Creatinine (mol/l) Landmark time ''s'' 0 s = 4 years follow up time (years)

17 Slide 6/29 P Blanche et al. Dynamic prediction s: Landmark time at which predictions are made (varies) Creatinine (mol/l) Landmark time ''s'' 0 s = 4 years follow up time (years)

18 Slide 6/29 P Blanche et al. Dynamic prediction s: Landmark time at which predictions are made (varies) t: prediction horizon (xed) Creatinine (mol/l) Landmark time ''s'' Horizon ''t'' 0 s = 4 years s+t = 9 years follow up time (years)

19 Slide 6/29 P Blanche et al. Dynamic prediction s: Landmark time at which predictions are made (varies) t: prediction horizon (xed) Creatinine (mol/l) π i (s, t) = 64% Landmark time ''s'' Horizon ''t'' 0 s = 4 years s+t = 9 years follow up time (years) 0 % 100 % 1 Pr(Graft Failure in (s,s+t])

20 Slide 7/29 P Blanche et al. Right censoring issue Landmark time s (when predictions are made) Time s + t (end of prediction window) time D i (s, t) = 1{event occurs in (s, s + t]}

21 Slide 7/29 P Blanche et al. Right censoring issue : uncensored Landmark time s (when predictions are made) Time s + t (end of prediction window) time D i (s, t) = 1{event occurs in (s, s + t]}

22 Slide 7/29 P Blanche et al. Right censoring issue Landmark time s (when predictions are made) Time s + t (end of prediction window) : uncensored : censored time D i (s, t) = 1{event occurs in (s, s + t]}

23 Slide 7/29 P Blanche et al. Right censoring issue Landmark time s (when predictions are made) Time s + t (end of prediction window) : uncensored : censored time For subject i censored within (s, s + t] the status is unknown. D i (s, t) = 1{event occurs in (s, s + t]}

24 Slide 8/29 P Blanche et al. Notations for population parameters Indicator of event in (s, s + t]: where T i is the time-to-event. D i (s, t) = 1{s < T i s + t}

25 Slide 8/29 P Blanche et al. Notations for population parameters Indicator of event in (s, s + t]: where T i is the time-to-event. D i (s, t) = 1{s < T i s + t} Dynamic predictions: π i (s, t)

26 Slide 8/29 P Blanche et al. Notations for population parameters Indicator of event in (s, s + t]: where T i is the time-to-event. D i (s, t) = 1{s < T i s + t} Dynamic predictions: ( ) π i (s, t) = P ξ ξ: previously estimated parameters (from independent training data)

27 Slide 8/29 P Blanche et al. Notations for population parameters Indicator of event in (s, s + t]: where T i is the time-to-event. D i (s, t) = 1{s < T i s + t} Dynamic predictions: ( π i (s, t) = P ξ D i (s, t) = 1 ) ξ: previously estimated parameters (from independent training data)

28 Slide 8/29 P Blanche et al. Notations for population parameters Indicator of event in (s, s + t]: where T i is the time-to-event. D i (s, t) = 1{s < T i s + t} Dynamic predictions: ( π i (s, t) = P ξ D i (s, t) = 1 T i > s, Y i (s), ) ξ: previously estimated parameters (from independent training data) Y i (s): marker measurements observed before time s

29 Slide 8/29 P Blanche et al. Notations for population parameters Indicator of event in (s, s + t]: where T i is the time-to-event. D i (s, t) = 1{s < T i s + t} Dynamic predictions: ( ) π i (s, t) = P ξ D i (s, t) = 1 T i > s, Y i (s), X i ξ: previously estimated parameters (from independent training data) Y i (s): marker measurements observed before time s X i : baseline covariates

30 Slide 9/29 P Blanche et al. Predictive accuracy How close are the predicted risks π i (s, t) to the true underlying risk P ( event occurs in (s,s+t] information at s )?

31 Slide 9/29 P Blanche et al. Predictive accuracy How close are the predicted risks π i (s, t) to the true underlying risk P ( event occurs in (s,s+t] information at s )? Prediction Error: [ { } 2 ] T PE π (s, t) = E D(s, t) π(s, t) > s

32 Predictive accuracy How close are the predicted risks π i (s, t) to the true underlying risk P ( event occurs in (s,s+t] information at s )? Prediction Error: [ { } 2 ] T PE π (s, t) = E D(s, t) π(s, t) > s the lower the better PE Bias 2 + Variance evaluates both Calibration and Discrimination depends on P ( event occurs in (s,s+t] at risk at s ) often called "Expected Brier Score" Slide 9/29 P Blanche et al.

33 Slide 10/29 P Blanche et al. How does the PE relate to calibration and discrimination? [{ } 2 ] T PE π (s, t) =E D(s, t) π(s, t) > s [{D(s, t)+ E [ D(s, t) H π (s) ] E } {{ } true underlying risk ] π(s, t) T > s

34 Slide 10/29 P Blanche et al. How does the PE relate to calibration and discrimination? PE π (s, t) =E [{D(s, t) E [ D(s, t) H π (s) ]} 2 [{D(s, t) + E [ D(s, t) H π (s) ] E } {{ } true underlying risk } 2 ] T π(s, t) > s

35 Slide 10/29 P Blanche et al. How does the PE relate to calibration and discrimination? PE π (s, t) =E [{D(s, t) E [ D(s, t) H π (s) ]} 2 ] T > s [{ + E E [ D(s, t) H π (s) ] } 2 ] T π(s, t) > s } {{ } true underlying risk H π (s) = {X π (s), T > s} denotes the subject-specic history at time s.

36 Slide 10/29 P Blanche et al. How does the PE relate to calibration and discrimination? PE π (s, t) = E [{D(s, t) E [ D(s, t) H π (s) ]} 2 ] T > s } {{ } Inseparability [{ + E E [ D(s, t) H π (s) ] } 2 ] T π(s, t) > s } {{ } Bias/Calibration H π (s) = {X π (s), T > s} denotes the subject-specic history at time s.

37 Slide 10/29 P Blanche et al. How does the PE relate to calibration and discrimination? PE π (s, t) = E [Var { D(s, t) H π (s) } ] T > s } {{ } Discrimination [{ + E E [ D(s, t) H π (s) ] } 2 ] T π(s, t) > s } {{ } Calibration H π (s) = {X π (s), T > s} denotes the subject-specic history at time s.

38 Slide 10/29 P Blanche et al. How does the PE relate to calibration and discrimination? PE π (s, t) = E [Var { D(s, t) H π (s) } ] T > s } {{ } Discrimination [{ + E E [ D(s, t) H π (s) ] } 2 ] T π(s, t) > s } {{ } Calibration H π (s) = {X π (s), T > s} denotes the subject-specic history at time s. the more discriminating H π (s) the smaller Var { D(s, t) H π (s) }

39 Slide 10/29 P Blanche et al. How does the PE relate to calibration and discrimination? PE π (s, t) = E [Var { D(s, t) H π (s) } ] T > s } {{ } Discrimination [{ + E E [ D(s, t) H π (s) ] } 2 ] T π(s, t) > s } {{ } Calibration H π (s) = {X π (s), T > s} denotes the subject-specic history at time s. the more discriminating H π (s) the smaller Var { D(s, t) H π (s) } E [ D(s, t) H π (s) ] π(s, t) 0 denes "strong" calibration.

40 Slide 10/29 P Blanche et al. How does the PE relate to calibration and discrimination? PE π (s, t) = E [Var { D(s, t) H π (s) } ] T > s } {{ } Does NOT depend on π(s,t) [{ + E E [ D(s, t) H π (s) ] } 2 ] T π(s, t) > s } {{ } Depends on π(s,t) H π (s) = {X π (s), T > s} denotes the subject-specic history at time s. the more discriminating H π (s) the smaller Var { D(s, t) H π (s) } E [ D(s, t) H π (s) ] π(s, t) 0 denes "strong" calibration.

41 Slide 11/29 P Blanche et al. Rπ 2 -type criterion Benchmark PE 0 The best "null" prediction tool, which gives the same (marginal) predicted risk S(s + t s) = E [ D(s, t) H 0 (s) ], H 0 (s) = {T > s} to all subjects leads to { } PE 0 (s, t) = Var{D(s, t) T > s} = S(s + t s) 1 S(s + t s).

42 Slide 11/29 P Blanche et al. Rπ 2 -type criterion Benchmark PE 0 The best "null" prediction tool, which gives the same (marginal) predicted risk S(s + t s) = E [ D(s, t) H 0 (s) ], H 0 (s) = {T > s} to all subjects leads to { } PE 0 (s, t) = Var{D(s, t) T > s} = S(s + t s) 1 S(s + t s). Simple idea R 2 π(s, t) = 1 PE π(s, t) PE 0 (s, t)

43 Slide 12/29 P Blanche et al. Why bother? R 2 π(s, t) aims to circumvent the dicult interpretation of: the scale on which PE(s, t) is measured interpretation for trend of PE(s, t) vs s

44 Why bother? R 2 π(s, t) aims to circumvent the dicult interpretation of: the scale on which PE(s, t) is measured interpretation for trend of PE(s, t) vs s Because the meaning of the scale on which PE(s,t) is measured changes with s, an increasing/decreasing trend can be due to changes in: the quality of the predictions and/or the at risk population Slide 12/29 P Blanche et al.

45 Slide 13/29 P Blanche et al. Changes in the quality of the predictions "Essentially, all models are wrong, but some are useful.", G. Box

46 Slide 13/29 P Blanche et al. Changes in the quality of the predictions "Essentially, all models are wrong, but some are useful.", G. Box The prediction model from which we have obtained the predictions can be more wrong for some s than for some others. Calibration term of PE(s, t) changes with s We can work on it!

47 Slide 14/29 P Blanche et al. Changes in the at risk population An example: Patients with cardiovascular history (CV) all die early. Only those without CV remain at risk for late s. Then: the earlier s the more homogeneous the at risk population CV is useful for prediction for early s but useless for late s.

48 Slide 14/29 P Blanche et al. Changes in the at risk population An example: Patients with cardiovascular history (CV) all die early. Only those without CV remain at risk for late s. Then: the earlier s the more homogeneous the at risk population CV is useful for prediction for early s but useless for late s. The available information can be more informative for some s than for some others. Discriminating term of PE(s, t) changes with s This is just how it is, there is nothing we can do! (we can only work with the data we have)

49 Slide 15/29 P Blanche et al. R 2 π(s, t) interpretation Always true: Measure of how the prediction tool π(s, t) performs compared to the benchmark null prediction tool, which gives the same predicted risk to all subjects (marginal risk).

50 Slide 15/29 P Blanche et al. R 2 π(s, t) interpretation Always true: Measure of how the prediction tool π(s, t) performs compared to the benchmark null prediction tool, which gives the same predicted risk to all subjects (marginal risk). When predictions are calibrated (strongly): R 2 π(s, t) = Var{π(s, t) T > s} Var{D(s, t) T > s} explained variation (after little algebra)

51 Slide 15/29 P Blanche et al. R 2 π(s, t) interpretation Always true: Measure of how the prediction tool π(s, t) performs compared to the benchmark null prediction tool, which gives the same predicted risk to all subjects (marginal risk). When predictions are calibrated (strongly): R 2 π(s, t) = Var{π(s, t) T > s} Var{D(s, t) T > s} explained variation = Corr 2{ D(s, t), π(s, t) T > s } correlation (after little algebra)

52 Slide 15/29 P Blanche et al. R 2 π(s, t) interpretation Always true: Measure of how the prediction tool π(s, t) performs compared to the benchmark null prediction tool, which gives the same predicted risk to all subjects (marginal risk). When predictions are calibrated (strongly): R 2 π(s, t) = Var{π(s, t) T > s} Var{D(s, t) T > s} explained variation = Corr 2{ D(s, t), π(s, t) T > s } correlation { } = E π(s, t) D(s, t) = 1, T > s { } E π(s, t) D(s, t) = 0, T > s. mean risk dierence (after little algebra)

53 Slide 16/29 P Blanche et al. Observations & IPCW PE estimator Observations (i.i.d.) { ( Ti, i, π i (, ) ), i = 1,..., n} where T i = T i C i, i = 1{T i C i }

54 Slide 16/29 P Blanche et al. Observations & IPCW PE estimator Observations (i.i.d.) { ( Ti, i, π i (, ) ), i = 1,..., n} where T i = T i C i, i = 1{T i C i } Indicator of observed event occurrence in (s, s + t]: D i (s, t) = 1{s < T 1 : event occurred i s + t, i = 1} = 0 : event did not occur or censored obs.

55 Slide 16/29 P Blanche et al. Observations & IPCW PE estimator Observations (i.i.d.) { ( Ti, i, π i (, ) ), i = 1,..., n} where T i = T i C i, i = 1{T i C i } Indicator of observed event occurrence in (s, s + t]: D i (s, t) = 1{s < T 1 : event occurred i s + t, i = 1} = 0 : event did not occur or censored obs. Inverse Probability of Censoring Weighting (IPCW) estimator: PE π(s, t) = 1 n n i=1 { Di (s, t) π i (s, t)} 2

56 Slide 16/29 P Blanche et al. Observations & IPCW PE estimator Observations (i.i.d.) { ( Ti, i, π i (, ) ), i = 1,..., n} where T i = T i C i, i = 1{T i C i } Indicator of observed event occurrence in (s, s + t]: D i (s, t) = 1{s < T 1 : event occurred i s + t, i = 1} = 0 : event did not occur or censored obs. Inverse Probability of Censoring Weighting (IPCW) estimator: PE π(s, t) = 1 n n i=1 { } 2 Ŵ i (s, t) Di (s, t) π i (s, t)

57 Slide 16/29 P Blanche et al. Observations & IPCW PE estimator Observations (i.i.d.) { ( Ti, i, π i (, ) ), i = 1,..., n} where T i = T i C i, i = 1{T i C i } Indicator of observed event occurrence in (s, s + t]: D i (s, t) = 1{s < T 1 : event occurred i s + t, i = 1} = 0 : event did not occur or censored obs. Inverse Probability of Censoring Weighting (IPCW) estimator: PE π(s, t) = 1 n n i=1 { } 2 Ŵ i (s, t) Di (s, t) π i (s, t) and R 2 π(s, t) = 1 PE π(s, t) PE 0(s, t)

58 Slide 17/29 P Blanche et al. Inverse Probability of Censoring Weights Ŵ i (s, t) = + + with Ĝ(u s) the Kaplan-Meier estimator of P(C > u C > s). Landmark time s Time s + t time

59 Slide 17/29 P Blanche et al. Inverse Probability of Censoring Weights Ŵ i (s, t) = 1{s < T i s + t} i Ĝ( T i s) + + with Ĝ(u s) the Kaplan-Meier estimator of P(C > u C > s). Landmark time s Time s + t : uncensored : censored time

60 Slide 17/29 P Blanche et al. Inverse Probability of Censoring Weights Ŵ i (s, t) = 1{s < T i s + t} i Ĝ( T i s) + 1{ T i > s + t} Ĝ(s + t s) + with Ĝ(u s) the Kaplan-Meier estimator of P(C > u C > s). Landmark time s Time s + t time

61 Slide 17/29 P Blanche et al. Inverse Probability of Censoring Weights Ŵ i (s, t) = 1{s < T i s + t} i Ĝ( T i s) + 1{ T i > s + t} Ĝ(s + t s) + 0 with Ĝ(u s) the Kaplan-Meier estimator of P(C > u C > s). Landmark time s Time s + t time

62 Slide 18/29 P Blanche et al. Asymptotic i.i.d. representation Lemma: Assume that the censoring time C is independent of (T, η, π(, )) and let θ denote either PE π, R 2 π or a dierence in PE or R 2 π, then n ( θ(s, t) θ(s, t) ) = 1 n n IF θ ( T i, i, π i (s, t), s, t) + o p (1) i=1 where IF θ ( T i, i, π i (s, t), s, t) being : zero-mean i.i.d. terms easy to estimate (using Nelson-Aalen & Kaplan-Meier)

63 Slide 19/29 P Blanche et al. Pointwise condence interval (xed s) Asymptotic normality: n ( θ(s, t) θ(s, t) ) D ( ) N 0, σ 2 s,t 95% condence interval: { } σ s,t θ(s, t) ± z 1 α/2 n where z 1 α/2 is the 1 α/2 quantile of N (0, 1). Variance estimator: σ 2 s,t = 1 n n i=1 { IFθ ( T } 2 i, i, π i (s, t), s, t)

64 Slide 20/29 P Blanche et al. Simultaneous condence band over s S { θ(s, t) ± q (S,t) 1 α } σ s,t, s S n Υ b = sup s S

65 Slide 20/29 P Blanche et al. Simultaneous condence band over s S { θ(s, t) ± q (S,t) 1 α } σ s,t, s S n Computation of q (S,t) 1 α by the simulation algorithm 1 For b = 1,..., B, say B = 4000, do: Υ b = sup s S

66 Slide 20/29 P Blanche et al. Simultaneous condence band over s S { θ(s, t) ± q (S,t) 1 α } σ s,t, s S n Computation of q (S,t) 1 α by the simulation algorithm ( Wild Bootstrap): 1 For b = 1,..., B, say B = 4000, do: Υ b = sup s S (Conditional multiplier central limit theorem)

67 Slide 20/29 P Blanche et al. Simultaneous condence band over s S { θ(s, t) ± q (S,t) 1 α } σ s,t, s S n Computation of q (S,t) 1 α by the simulation algorithm ( Wild Bootstrap): 1 For b = 1,..., B, say B = 4000, do: 1 Generate {ω1, b..., ωn} b from n i.i.d. N (0, 1). Υ b = sup s S (Conditional multiplier central limit theorem)

68 Slide 20/29 P Blanche et al. Simultaneous condence band over s S { θ(s, t) ± q (S,t) 1 α } σ s,t, s S n Computation of q (S,t) 1 α by the simulation algorithm ( Wild Bootstrap): 1 For b = 1,..., B, say B = 4000, do: 1 Generate {ω1, b..., ωn} b from n i.i.d. N (0, 1). 2 Using the plug-in estimator IF θ ( ), compute: Υ b 1 n = sup ω b IF θ ( T i, i, π i (s, t), s, t) i n σ s,t s S i=1 (Conditional multiplier central limit theorem)

71 Slide 20/29 P Blanche et al. Simultaneous condence band over s S { θ(s, t) ± q (S,t) 1 α } σ s,t, s S n Computation of q (S,t) 1 α by the simulation algorithm ( Wild Bootstrap): 1 For b = 1,..., B, say B = 4000, do: 1 Generate {ω1, b..., ωn} b from n i.i.d. N (0, 1). 2 Using the plug-in estimator IF θ ( ), compute: Υ b 1 n = sup ω b IF θ ( T i, i, π i (s, t), s, t) i n σ s,t s S i=1 2 Compute q (S,t) 1 α as the 100(1 α)th percentile of { Υ 1,..., Υ B} (Conditional multiplier central limit theorem)

72 Slide 21/29 P Blanche et al. DIVAT sample Population based study of kidney recipients (n=4,119) Split the data into training (2/3) and validation (1/3) samples

73 Slide 21/29 P Blanche et al. DIVAT sample Population based study of kidney recipients (n=4,119) Split the data into training (2/3) and validation (1/3) samples T : time from 1-year after transplantation to graft failure which is: Death Return to dialysis OR

74 Slide 21/29 P Blanche et al. DIVAT sample Population based study of kidney recipients (n=4,119) Split the data into training (2/3) and validation (1/3) samples T : time from 1-year after transplantation to graft failure which is: Death Return to dialysis OR Censoring due to: delayed entries: end of follow-up: 2014 Baseline covariates: age, sex, cardiovascular history Longitudinal biomarker (yearly): serum creatinine

75 Descriptive statistics & censoring issue s S = {0, 0.5,..., 5} t = 5 years No. of subjects Censored in (s,s+5] Known as event free at s+5 Observed failure in (s,s+5] s=0 s=1 s=2 s=3 s=4 s=5 landmark time (year) Slide 22/29 P Blanche et al.

76 Slide 23/29 P Blanche et al. Joint model Longitudinal [ ] log Y i (t ij ) = (β 0 + b 0i ) + β 0,age AGE i + β 0,sex SEX i + ( β 1 + b 1i + β 1,age AGE i ) tij + ɛ ij = m i (t) + ε ij (tted using package JM)

77 Slide 23/29 P Blanche et al. Joint model Longitudinal [ ] log Y i (t ij ) = (β 0 + b 0i ) + β 0,age AGE i + β 0,sex SEX i + ( β 1 + b 1i + β 1,age AGE i ) tij + ɛ ij = m i (t) + ε ij Survival (hazard) h i (t) = h 0 (t) exp { γ age AGE i + γ CV CV i + α 1 m i (t) + α 2 dm i (t) dt } (tted using package JM)

78 R 2 π(s, t) vs s (t=5 years) 95% pointwise CI 95% simultaneous CB R π 2 (s, t) 0 % 10 % 20 % 30 % Landmark time s (years) Slide 24/29 P Blanche et al.

79 Comparing R 2 π(s, t) vs s for dierent π(s, t) R π 2 (s, t) 0 % 10 % 20 % 30 % T ~ Age + CV + m(t) + m'(t) (JM) Landmark time s (years) Slide 25/29 P Blanche et al.

80 Comparing R 2 π(s, t) vs s for dierent π(s, t) R π 2 (s, t) 0 % 10 % 20 % 30 % T ~ Age + CV + m(t) + m'(t) (JM) T ~ Age + CV Landmark time s (years) Slide 25/29 P Blanche et al.

81 Slide 25/29 P Blanche et al. Comparing R 2 π(s, t) vs s for dierent π(s, t) R π 2 (s, t) 0 % 10 % 20 % 30 % T ~ Age + CV + m(t) + m'(t) (JM) T ~ Age + CV + Y(t=0) T ~ Age + CV Landmark time s (years)

82 Slide 25/29 P Blanche et al. Comparing R 2 π(s, t) vs s for dierent π(s, t) R π 2 (s, t) 0 % 10 % 20 % 30 % T ~ Age + CV + m(t) + m'(t) (JM) T ~ Age + CV + Y(t=0) Landmark time s (years)

83 Comparing R 2 π(s, t) vs s for dierent π(s, t) Difference in R π 2 (s,t) 5 % 0 % 5 % 10 % 20 % Landmark time s Slide 25/29 P Blanche et al.

84 Comparing R 2 π(s, t) vs s for dierent π(s, t) Difference in R π 2 (s,t) 5 % 0 % 5 % 10 % 20 % Landmark time s Slide 25/29 P Blanche et al.

85 Calibration plot (example for s = 3 years) 5 year risk (%) Predicted Observed 2% 8(4)% (0,10) (0;3.5] 4% 15(6)% (10,20) (3.5;5.3] 7% 15(5)% (20,30) (5.3;8] 13(5)% 16(6)% 12% 15% 9% (30,40) (8;10.4] (40,50) (10.4;13.1] 29(7)% 26(6)% 27% 21(7)% 19% (50,60) (13.1;16.5] (60,70) (16.5;22.3] (70,80) (22.3;33] 42% 36(9)% (80,90) (33;55.1] 77% 74(8)% (90,100) (55.1;100] Risk groups (quantile groups, in %) Slide 26/29 P Blanche et al.

86 Slide 27/29 P Blanche et al. Area under the ROC(s, t) curve vs s AUC π (s, t) 50% 60% 70% 80% 100% T ~ Age + CV + m(t) + m'(t) (JM) T ~ Age + CV + Y(t=0) T ~ Age + CV Landmark time s (years)

87 Summing up R 2 -type curve summarizes calibration and discriminating simultaneously has an understandable trend Simple model free inference predictions can be obtained from any model we do not assume any model to hold allows fair comparisons of dierent predictions The method accounts for: Censoring Dynamic setting (the at risk population changes) Slide 28/29 P Blanche et al.

88 Discussion The strong calibration assumption allows dierent interesting interpretations: Explained variation Correlation Mean risk dierence Unfortunately the strong calibration cannot be checked (curse of dimensionality) However weak and strong denitions are closely related: strong calibration implies weak calibration weak calibration can often be seen as a reasonable approximation of strong calibration in practice weak calibration can be assessed (plots) Slide 29/29 P Blanche et al.

89 Discussion The strong calibration assumption allows dierent interesting interpretations: Explained variation Correlation Mean risk dierence Unfortunately the strong calibration cannot be checked (curse of dimensionality) However weak and strong denitions are closely related: strong calibration implies weak calibration weak calibration can often be seen as a reasonable approximation of strong calibration in practice weak calibration can be assessed (plots) Thank you for your attention! Slide 29/29 P Blanche et al.