The Cox Proportional Hazards Model Mario Chen, PhD Advanced Biostatistics and RCT Workshop Office of AIDS Research, NIH ICSSC, FHI Goa, India, September 2009 1
The Model h i (t)=h 0 (t)exp(z i ), Z i = ß 1 T i + ß 2 X i1 + ß 3 X i2 +... + ß k X ip h 0 (t) : baseline hazard i = 1,, N individuals t i : Time Variable (dichotomous), T i : Treatment Variable, X ij : Predictors, j = 1,, p. 2
Characteristics Baseline hazard function is left unspecified Nonparametric Partial likelihood estimation Effect of covariates: Exponential h i (t)=h 0 (t)exp(z i ), 0 h i (t) 3
The Proportional Hazard Assumption Measure of Effect: Hazard Ratio (HR) h(t) i h (t) j = h 0(t) exp(z h (t) exp(z 0 = exp(z i i j ) = ) Z j ) exp(z exp(z i j ) ) 4
Example Is the intervention effective to reduce the risk of pregnancy? Treatment (AP; 0=Control, 1=Intervention) Need to control for: Site (NEV) AGE RACE Marital status (MARRIED) Hormonal contraceptive use at baseline (HEMETH) Ever been pregnant before the study (EVERPREG) 5
Example Analysis of Maximum Likelihood Estimates Parameter Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq AP 1-0.03381 0.18867 0.0321 0.8578 NEV 1 0.56274 0.23317 5.8248 0.0158 age 1-0.05018 0.03977 1.5920 0.2070 race 1 0.52562 0.28002 3.5234 0.0605 married 1 0.21621 0.38229 0.3199 0.5717 hemeth 1-0.44360 0.20379 4.7381 0.0295 everpreg 1 1.20215 0.21032 32.6720 <.0001 Analysis of Maximum Likelihood Estimates Hazard 95% Hazard Ratio Parameter Ratio Confidence Limits AP 0.967 0.668 1.399 NEV 1.755 1.112 2.772 age 0.951 0.880 1.028 race 1.692 0.977 2.928 married 1.241 0.587 2.626 hemeth 0.642 0.430 0.957 everpreg 3.327 2.203 5.025 6
Evaluating the PH assumption Graphical approach: S(t) or 1-S(t) plots 0.15 Probability of Pregnancy 0.10 0.05 0.00 Treatment Group Standard Advanced 0 30 60 90 120 150 180 210 240 270 300 330 360 390 Days since Enrollment 7
Evaluating the PH assumption Graphical approach: -ln[-ln(s(t)] -ln[-ln(survival Probability)] 2 3 4 5 6 7 0 2 4 6 ln(analysis time) ap = 0 ap = 1 8
Evaluating the PH assumption Time-Dependent covariates approach: Add a covariate by time interaction term Use t, log t, or other function of t 9
Example Analysis of Maximum Likelihood Estimates Parameter Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq AP 1 1.17916 0.88025 1.7945 0.1804 NEV 1 0.99259 1.16493 0.7260 0.3942 age 1-0.24842 0.18507 1.8017 0.1795 race 1-0.96664 1.63552 0.3493 0.5545 married 1-0.89980 2.43640 0.1364 0.7119 hemeth 1-1.72984 1.07774 2.5762 0.1085 everpreg 1 0.87371 0.94901 0.8476 0.3572 aplogt 1-0.25049 0.17632 2.0184 0.1554 nevlogt 1-0.08709 0.23039 0.1429 0.7054 agelogt 1 0.04105 0.03701 1.2303 0.2674 racelogt 1 0.30029 0.31818 0.8907 0.3453 marriedlogt 1 0.22417 0.47203 0.2255 0.6349 hemethlogt 1 0.26080 0.21249 1.5064 0.2197 prlogt 1 0.06870 0.19028 0.1304 0.7181 10
Likelihood Ratio Test Likelihood Ratio Test to compare two nested models with and without the interaction terms: Test Chi-Square DF Pr > ChiSq Likelihood Ratio 9.534 7 0.2165 11
Solutions to violations of the PH assumption Leave the appropriate interaction terms with time Problematic if PH assumption is violated for the treatment effect Need to interpret interaction with time. Test HR at different time points Use an approach not based on PH, e.g., stratified logrank, other models 12
Solutions to violations of the PH assumption Use a stratified Cox model Different baseline hazard for each level of the stratification variable, h 01 (t), h 02 (t), Same covariate model across strata, i.e., same coefficients and covariates Appropriate if stratification variable is not an effect of interest (i.e., not the treatment variable) and it does not interact with the effect of interest 13
Example: Stratification by site (NEV) Analysis of Maximum Likelihood Estimates Parameter Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq AP 1-0.03055 0.18866 0.0262 0.8714 age 1-0.05041 0.03978 1.6056 0.2051 race 1 0.52794 0.27983 3.5594 0.0592 married 1 0.21478 0.38237 0.3155 0.5743 hemeth 1-0.44427 0.20381 4.7514 0.0293 everpreg 1 1.20246 0.21037 32.6715 <.0001 Analysis of Maximum Likelihood Estimates Hazard 95% Hazard Ratio Parameter Ratio Confidence Limits AP 0.970 0.670 1.404 age 0.951 0.880 1.028 race 1.695 0.980 2.934 married 1.240 0.586 2.623 hemeth 0.641 0.430 0.956 everpreg 3.328 2.204 5.027 14
Time-dependent covariates A time-dependent covariate is one that changes over time: Interactions with time Internal covariates (e.g., contraceptive use, SBP, white blood cell count) Assumes that the effect of a time-dependent covariate on the hazard at time t depends on the value of the covariate at the same time t May use a lag-time effect 15
A note on ties Estimation requires that no two events occur at the same time (no ties) Methods for handling ties: Exact: time consuming Discrete: time consuming Efron: closer to exact methods Breslow: most efficient 16
Software SAS proc phreg data = sas.survex ; model timepr*pregevt(0) = ap age race married hemeth everpreg aplogt / ties=efron rl ; strata nev; aplogt=ap*log(timepr+1); run; 17
Software Stata stset timepr, failure(pregevt=1) stcox ap nev age race married hemeth everpreg, efron schoenfeld(res*) stphtest, log stphplot, by (ap) Options for time-dependent covariates: tvc (varlist ), texp(exp), e.g. texp(ln(_t)) 18
Sotware SPSS TIME PROGRAM. COMPUTE T_COV_ = LN(T_) * AP. COXREG timepr /STATUS=pregevt(1) /STRATA=NEV /METHOD=ENTER T_COV_ AP age race married hemeth everpreg 19
Concluding Remarks Choose model Specify model (No variable selection) Check data Estimate parameters Run model checks } Improve model Minimal Interpret 20