Checking Assumptions in the Cox Proportional Hazards Regression Model

Similar documents
SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates

Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach

SUMAN DUVVURU STAT 567 PROJECT REPORT

Statistics in Retail Finance. Chapter 6: Behavioural models

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

11. Analysis of Case-control Studies Logistic Regression

Survival analysis methods in Insurance Applications in car insurance contracts

Multivariate Logistic Regression

LOGISTIC REGRESSION ANALYSIS

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Introduction. Survival Analysis. Censoring. Plan of Talk

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

ATV - Lifetime Data Analysis

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Regression Modeling Strategies

Modeling Lifetime Value in the Insurance Industry

Ordinal Regression. Chapter

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Gamma Distribution Fitting

Paper PO06. Randomization in Clinical Trial Studies

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Multiple Regression: What Is It?

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

Simple linear regression

Comparison of Survival Curves

Time varying (or time-dependent) covariates

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survival Analysis And The Application Of Cox's Proportional Hazards Modeling Using SAS

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Modelling spousal mortality dependence: evidence of heterogeneities and implications

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Simple Regression Theory II 2010 Samuel L. Baker

Parametric Survival Models

Additional sources Compilation of sources:

Module 5: Multiple Regression Analysis

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Comparison of resampling method applied to censored data

The Cox Proportional Hazards Model

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Competing-risks regression

Survival Distributions, Hazard Functions, Cumulative Hazards

Imputing Missing Data using SAS

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Survival Analysis Using Cox Proportional Hazards Modeling For Single And Multiple Event Time Data

II. DISTRIBUTIONS distribution normal distribution. standard scores

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Elementary Statistics

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Score Test of Proportionality Assumption for Cox Models Xiao Chen, Statistical Consulting Group UCLA, Los Angeles, CA

Multivariable survival analysis S10. Michael Hauptmann Netherlands Cancer Institute Amsterdam, The Netherlands

Multinomial and Ordinal Logistic Regression

Module 3: Correlation and Covariance

SAS Software to Fit the Generalized Linear Model

Statistics 2014 Scoring Guidelines

Review Jeopardy. Blue vs. Orange. Review Jeopardy

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Binary Logistic Regression

Exercise 1.12 (Pg )

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Lecture 15 Introduction to Survival Analysis

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

Checking proportionality for Cox s regression model

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

SAS Analyst for Windows Tutorial

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Interpretation of Somers D under four simple models

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Introduction to mixed model and missing data issues in longitudinal studies

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Nominal and ordinal logistic regression

Father s height (inches)

Introduction to Fixed Effects Methods

UNDERSTANDING THE TWO-WAY ANOVA

7.6 Approximation Errors and Simpson's Rule

The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting

Data Analysis, Research Study Design and the IRB

Applying Survival Analysis Techniques to Loan Terminations for HUD s Reverse Mortgage Insurance Program - HECM

Chapter 7 Section 7.1: Inference for the Mean of a Population

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Simple example of collinearity in logistic regression

Transcription:

SD08 Checking Assumptions in the Cox Proportional Hazards Regression Model Brenda Gillespie, Ph.D. University of Michigan Presented at the 2006 Midwest SAS Users Group (MWSUG) Dearborn, Michigan October 22-24, 2006 2006 Center for Statistical Consultation and Research, University of Michigan All rights reserved.

Outline Introduction to the Cox model Overview of residuals for the Cox model Model assumptions Correct model specification Functional form of continuous covariates Covariate interactions Proportional hazards (PH) Graphical checks Tests of PH What to do if non-ph is found Stratification Time-dependent covariates Conclusions

Survival Analysis Methods to analyze time to event data. Useful for many different applications Time to death from disease diagnosis Length of hospital stay Cost of insurance claims. Time origin Event Time origin Censored value

Censoring and Truncation Censoring and truncation describe different forms of incomplete observation of event times: Censoring Right Left Interval Truncation Right Left (delayed entry) Interval (gap times) Here, we assume right-censored data only.

Random Censoring Reminder All standard methods of survival analysis assume that censoring is random: Those censored at time t i should be representative of all subjects still alive at t i (with the same covariate values). This assumption cannot be checked by any statistical test.

Cox Regression Model h(; t x) = ho()exp{ t β1x1l + + βkxk} where h(t ; x) is the hazard function at time t for a subject with covariate values x 1, x k, h 0 (t) is the baseline hazard function, i.e., the hazard function when all covariates equal zero. exp is the exponential function (exp(x)= e x ), x i is the i th covariate in the model, and β i is the regression coefficient for the i th covariate, x i.

Cox Regression (cont d) h(; t x) = ho()exp{ t β1x1+ L+ βkxk} The Cox Model is different from ordinary regression in that the covariates are used to predict the hazard function, and not Y itself. The baseline hazard function can take any form, but it cannot be negative. The exponential function of the covariates is used to insure that the hazard is positive. There is no intercept in the Cox Model. (Any intercept could be absorbed into the baseline hazard.)

Cox Regression (cont d) h(t, x i ) The basic Cox Model assumes that the hazard functions for two different levels of a covariate are proportional for all values of t. For example, if men have twice the risk of heart attack compared to women at age 50, they also have twice the risk of heart attack at age 60, or any other age. The underlying risk of heart attack as a function of age can have any form. t

Proportional Hazards To see the proportional hazards property analytically, take the ratio of h(t;x) for two different covariate values: h o (t) cancels out => the ratio of those hazards is the same at all time points. For a single dichotomous covariate, say with values 0 and 1, the hazard ratio is )} ( ) ( exp{ = } ) exp{ ( } exp{ ) ( ) ; ( ) ; ( 1 1 1 1 1 0 1 1 0 jk ik k j i jk k j ik k i j i x x x x x x t h x x t h x t h x t h + + + + + + = β β β β β β L L L β β β β e e e e t h e t h x t h x t h = = = = = 0 * 0 0 *1 0 ) ( ) ( 0 ) ; ( 1) ; (

Software for Cox Regression: PHREG Syntax for Cox regression using Proc PHREG The time variable is days The censor code is status (1=dead, 0=alive) Underlined items are user-specified proc phreg; model days*status (0) = sex age; output out=temp resmart=mresids resdev=dresids ressch=sresids; id subj group; run;

Overview of Residuals for Cox Regression Cox-Snell residuals range 0 to Martingale residuals a linear transform of Cox-Snell residuals range - to 1 Deviance residuals a transform of Martingale residuals to make symmetric around zero Score residuals (one per subject per covariate) Schoenfeld residuals (one per subject per covariate)

Note: Censoring and categorical covariates can produce banded residual patterns that do not reflect any problem with the model. Common Residual Plots Plot martingale residuals vs continuous covariates to check functional form of covariates Plot deviance residuals vs Observation # to check for outliers Plot Schoenfeld residuals for each covariate, vs Time or log(time) to check proportional hazards (PH)

Martingale Residuals Skewed Near 1 died too soon ; Large negative lived too long Plots of residuals vs. continuous covariates: Patterns may suggest continuous variables not properly fit

Example of Martingale Residuals 1 0 Martingale Residuals -1-2 -3-4 -5-6 -7 0 50 100 150 200 250 300 350 400 450 500 Observation Number

Deviance Residuals Roughly symmetrically distributed around zero, with approximate s.d. = 1.0 Positive values died too soon Negative values lived too long Very large or small values outliers This is the only plot that is useful for checking outliers.

Example of Deviance Residuals 4 3 Deviance Residuals 2 1 0-1 -2-3 0 50 100 150 200 250 300 350 400 450 500 Observation Number

Schoenfeld Residuals Schoenfeld residuals are computed with one per observation per covariate. Only defined at observed event times For the i th subject and k th covariate, the estimated Schoenfeld residual, r ik, is given by (notation from Hosmer and Lemeshow) w k Where x ik is the value of the k th covariate for individual i, and xˆw i k is a weighted mean of covariate values for those in the risk set at the given event time. A positive value of r ik shows an X value that is higher than expected at that death time. rˆ ik = x ik xˆ i

Schoenfeld Residuals Schoenfeld residuals sum to zero. For a dichotomous (0,1) variable, Schoenfeld residuals will be between 1 and 1. In this case, rˆ ik = x ik xˆ w k i = 0 1 for 0 1 The residual plot will have two bands, one above zero for x=1, and one below zero for x=0. xˆ xˆ w k i w k i,, for x x = =

Example of Schoenfeld Residuals for the dichotomous covariate, group, plotted by Observation Number 1.0 0.8 Schoenfeld Residuals 0.6 0.4 0.2 0.0-0.2-0.4-0.6-0.8 0 50 100 150 200 250 300 350 400 450 500 Observation Number gr oup 0 1

Example of Schoenfeld Residuals for the dichotomous covariate, group, Schoenfeld Residuals 1.0 0.8 0.6 0.4 0.2 0.0-0.2-0.4-0.6 plotted by Time -0.8 0 1 2 3 4 5 6 Time The increasing pattern Indicates non-ph. gr oup 0 1

An Example of Martingale and Deviance Residuals with non-ph Outcome: Time to death Covariate: treatment group (labels 0 and 1) The next 3 slides show Kaplan-Meier plot for the two groups Martingale residuals Deviance residuals

Survival Probability Example: KM Plot shows Crossing 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Survival Functions (non-ph) Group=0 has more early deaths, but also more longer lifetimes than Group=1. 0.0 0 1 2 3 4 5 6 Time group 0 1

1 Martingale Residuals 0 Martingale Residuals -1-2 -3-4 -5-6 -7 Group=0 (solid dots) has both the earliest deaths (top) and the longest surviving values (bottom). 0 50 100 150 200 250 300 350 400 450 500 Observation Number gr oup 0 1

Deviance Residuals 4 3 Deviance Residuals 2 1 0-1 -2-3 0 50 100 150 200 250 300 350 400 450 500 Observation Number gr oup 0 1

Observations In both Martingale and Deviance residuals, Group=0 had both the earliest deaths and the longest surviving values (most extreme values top and bottom). Such a pattern would indicate non-proportional hazards (non-ph) Other situations of non-ph may not be so easy to see from these plots. In this example, the Deviance residual plot does not show any outliers.

Assumptions of the Cox Model Structure of the model is assumed correct Model is multiplicative (e.g., vs additive) All relevant covariates have been included We will not consider these assumptions here Functional form Do we have the correct functional form for continuous covariates? Are there any significant interactions? Is the Proportional Hazards assumption met? If not, what are the options?

Assessing Functional Form of Continuous Covariates Often we assume continuous covariates have a linear form. However, this assumption should always be checked. We give 3 ways to check: Method 1 (try X categorical): Categorize X into 4 intervals, say by quantiles. Create dummy variables for the categories and fit a model with these dummy variables. Plot β estimates by X interval midpoints, with β=0 for the reference category. Look at the shape, and model X accordingly (e.g., linear, quadratic, threshold).

Plot of Beta Estimates by Age Category Midpoints 1. 6 1. 5 1. 4 1. 3 1. 2 1. 1 1. 0 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 Pattern looks linear 0. 1 0. 0 30 40 50 60 70 Age (years)

Assessing Functional Form (cont d) Method 2 (loess line through martingale residuals): Output martingale residuals from a model WITHOUT X. (proc phreg; model ; output out=temp resmart=resids;) Fit a loess line through the martingale residuals, as a function of X. (ods output ScoreResults=temp2; proc loess data=temp; model resids=x; score; run;) Plot martingale residuals (with loess curve) by X. (proc gplot data=temp2; plot resids*x p_resids*x / overlay; run;) Model X as appropriate (e.g., linear, quadratic, threshold), and re-check.

Plot of Martingale Residuals by Age, with Loess Line (Age not in model) 1 Martingale Residuals 0-1 -2-3 Reference line at 0 Loess line Looks like HR increases with age 30 40 50 60 70 Age (years)

Plot of Martingale Residuals by Age, with Loess Line (Age in model as linear) 1 0-1 -2-3 Linear age provides a good fit. The loess line wiggles around zero no trend. -4 30 40 50 60 70 Age (years)

Assessing Functional Form (cont d) Method 3 (ASSESS option of proc phreg plots cumulative sums of martingale residuals against X (to check functional form) or the observed score process against Time (to check PH): The following code checks Age for functional form. ods html; ods graphics on; /*required!*/ proc phreg data=pbc; assess var=(age_yrs) / npaths=50 CRpanel; model logfuday*status(0) = sex age_yrs hepatom; run; ods graphics off; ods html close;

Assessing the Cumulative Martingale Residual Plot The plot shows the observed curve for Age to be within the distribution of the simulated cumulative martingale residual curves, indicating acceptable fit with linear age. Note that ASSESS cannot check functional form with a variable out of the model. It must be included in the model in some form. To try to illustrate a bad fit, we try log(age), Age 2, and Age 5. Only Age 5 shows poor fit.

The Resample option of ASSESS The Resample option of ASSESS gives a test of the functional form A test of PH Tests are based on a Kolmogorov-type supremum test using 1000 simulated patterns. ASSESS var=(age_yrs) PH / resample;

Supremum Test for Functional Form Maximum Absolute Pr > Variable Value Reps MaxAbsVal age_yrs 6.0767 1000 0.6640 Supremum Test for Proportionals Hazards Assumption Maximum Absolute Pr > Variable Value Reps MaxAbsVal SEX 0.5985 1000 0.9930 HEPATOM 0.5504 1000 0.9920 age_yrs 0.5587 1000 0.9950

Summary of ASSESS Option The ASSESS option is a useful tool, but should be used in conjunction with other checks for functional form and PH. The cumulative martingale residual plots are not very sensitive for fine-tuning functional form. They can show grossly incorrect forms. We recommend martingale residuals (not cumulative), with a loess line to show functional form.

Covariate Interactions In many types of models, covariate interactions can be a challenge to interpret and present. With linear or logistic regression models, interaction plots are useful. With the Cox model, interaction plots, like variable effects, are based on Hazard Ratios.

Two dichotomous covariates: With interaction: h(t;x) = h o (t) exp{β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 } x 1, x 2 h(t;x) A, M 0, 1 h o (t) e β2 log h(t, x) 1, 1 h o (t) eβ1 + β2 + β3 Β, Μ Α, F B, F 0, 0 h o (t) 1, 0 h o (t) e β1

Presenting Covariate Interactions The hypothetical plot above cannot be drawn with data because we don t estimate h o (t). Option 1: Present interactions using hazard ratios separately within each level of one covariate. Let β = -0.3 (trt), β 1 2 = 0.7 (gender), and = -0.2 (interaction) β 3 Males: HR(trt B vs. trt A) = exp(β 1 + β 3 ) = exp(-0.3-0.2) = exp(-0.5) = 0.61 Females: HR(trt B vs. trt A) = exp(β 1 ) = exp(-0.3) = 0.74 Trt B better than A, but larger effect in males.

Presenting Covariate Interactions Option 2: Compare all subgroups to a single baseline group. These hazard ratios can be plotted. The reference group is Females on treatment A. HR HR 2 1 Males A B Females Males A 2.0 = e β2 B 1.2 = eβ1 + β2 + β3 Females A 1.0 B 0.7 = e β1

Presenting Covariate Interactions with Continuous Covariates For an interaction between a continuous and a categorical covariate, plot the HR by the continuous covariate, with separate lines for the levels of the categorical covariate. For an interaction between two continuous covariates, plot the HR by one of the the continuous covariate, with separate lines for selected values of the other covariate.

A striking interaction between age and severe edema in the PBC dataset. 250 Reference category is Age=30, No Edema 200 150 Severe Edema 100 50 No Edema 0 30 40 50 60 70 Age (years)

Checking Proportional Hazards (PH) Graphical methods to check PH Using time-dependent covariates to test PH Other tests for PH

Checking Proportional Hazards Graphical methods Plot ln(-ln(s(t))) vs. t or ln(t) and look for parallelism. Plot Observed and predicted S(t) and look for close fit. Use the PH graph in the ASSESS option of Proc PHREG Plot scaled Schoenfeld residuals vs time (schoen macro) Time-dependent covariates Add time*covariate interactions to the model to fit non-ph. If the coefficient for the time-dependent variable is significantly different from zero, non-ph is present. If significant non-ph is found, this model can be kept to fit and interpret the non-ph. Other tests for PH Test based on resampling using the ASSESS option. Test based on scaled Schoenfeld residuals (schoen macro)

Proportional Hazards: Graphical Check #1 Plot ln(-ln(s(t))) vs. t or ln(t) and look for parallelism. log(-log S(t)) FIN = 0 FIN = 1 Week Parallel curves PH Use Kaplan-Meier estimate for S(t). This plot shows reasonable fit to the PH assumption.

Proportional Hazards: Graphical Check #1 Interpreting plots is subjective. In general, conclude PH unless a distinct pattern of non-parallelism (e.g., crossing) is seen. Intertwined lines with no distinct pattern may simply indicate no difference between groups. Adjusting for other covariates may be needed. Example: To check PH for treatment, adjusted for age: Run a Cox model with age as a covariate, stratified by treatment. Output the estimated survivor functions for each treatment group at the overall mean age. Ŝ Plot ln(-ln( (t))) for each treatment group vs ln(t) and check for parallelism.

SAS Code for log(-log(s(t))) Plots Unadjusted PH check for Treatment: Proc lifetest data=data1 plots=(lls); time days*status(0); strata treat; run; Adjusted PH check for Treatment, adjusted for Age: data covs; age=52; run; /*Overall mean age*/ Proc phreg data=pbc; model days*status(0) = age; strata treat; baseline out=temp covariates=covs loglogs=lls; Proc plot data=temp; plot lls*days=treat; run;

Proportional Hazards: Graphical Check #2 Plot Observed and predicted S(t) and look for close fit. (Only feasible with small number of covariates.) Predicted is from Cox model. Observed is KM. (Figure from Kleinbaum)

Checking PH using ASSESS option The ASSESS option of Proc PHREG plots the cumulative score residuals against time to check PH. This is a tied down Brownian process, or Brownian bridge, meaning that the values always start and end at zero. Random paths are generated under PH. The path from the actual data is compared to the randomly-generated paths under PH. If the actual path is within the cloud of random paths, it indicates PH.

Checking PH using ASSESS option Cloud of random paths Path from the actual data Clear evidence of non-ph. But the form of the non-ph is not clear.

Checking PH using macro SCHOEN The SAS macro, SCHOEN, gives a different graphical check for PH. Consider the possibility that the β coefficient for a given covariate, β k, changes over time, thus giving a non-constant hazard ratio. Macro SCHOEN uses a scaled Schoenfeld residual, multiplying the vector of Schoenfeld residuals by the inverse of their covariance matrix. This scaled residual, r ik*, added to β k, is an estimate of the time-dependent β coefficient: r ik * + β k β k (t i ). r ik * + β k is plotted against time, or a function of time. PH is indicated by a flat pattern around Y=0. Non-PH is indicated by any deviation from a flat line at Y=0.

Which function of time? The Schoenfeld residuals can be plotted against any function of time, such as raw, log-transformed, or rank-transformed. The pattern shown over time indicates the form of non-ph. Different functions show different shapes, and some may be better for highlighting non-ph for a particular variable. Try more than one. Options available in the schoen macro are: Raw time Rank-transformed time Time transformed by (1 - Kaplan-Meier) (Similar to probability integral transformation.)

Checking PH with macro SCHOEN (KM-transformed time scale) Scal ed r esi dual s( Bt ) as a f cn of t i me. 5 4 3 Non-PH shown by increasing trend. Note the top and bottom lines are the same Schoenfeld residuals shown on slide 20. 2 1 0-1 -2-3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1- Overall Kaplan-Mei er schoen macr o: event =cens t i me=t strata= Xvar s= gr oup

Interpreting the SCHOEN Plot The previous plot clearly shows an increasing pattern, suggesting linear. The true hazard ratio is linearly increasing in log(t). The SCHOEN plot is more useful than the ASSESS plot in showing the appropriate functional form for a non-ph relationship.

Macro SCHOEN with raw time scale 5 Scal ed r esi dual s( Bt ) as a f cn of t i me. 4 3 2 1 SCHOEN plots are sensitive to the time scale used. Very few points beyond t=4. 0-1 -2-3 -4 0 1 2 3 4 5 6 7 8 schoen macr o: event =cens t i me=t strata= Xvar s= gr oup t

Macro SCHOEN with rank time scale Scal ed r esi dual s( Bt ) as a f cn of t i me. 5 Increasing trend, similar to that seen in KM time scale. 4 3 2 1 0-1 -2-3 0 100 200 300 400 500 600 700 800 900 1000 Rank f or Var i abl e t schoen macr o: event =cens t i me=t strata= Xvar s= gr oup

Macro SCHOEN time scales SCHOEN plots are sensitive to the time scale used. Try more than one. If data are very skewed, it is often better to use the rank or KM time scale. Note: Virtually all tests for PH are based on the choice of a particular time function, g(t), for the non-ph. A test will be most powerful to detect non-ph based on the particular g(t), and will have less power to detect non-ph of other forms.

Time-dependent covariates: Two types Time-varying covariates: Covariate values change over time. Ex: For time to re-arrest after release from prison, a time-varying covariate would be whether the person is employed (0=no, 1=yes) at a given time. Cox model for x 1 = fixed covariate, x 2 = time-varying covariate: h(; t x) = h0()exp{ t β1x1+ β2x2()} t Time x covariate interactions: used to test or model non-proportional hazards. We focus here on this type. h( t; x) = h0( t)exp{ β1x 1+ β2x1t} The hazard ratio for x 1 =1 vs. x 1 =0 changes (either increases or decreases) as t increases.

Time*Covariate Interactions h(t;x) = h o (t) exp{β 1 x + β 2 x t} β 2 >0 => HR increasing with time β 2 <0 => HR decreasing with time β 2 =0 => HR constant with time => PH Add x* t to the model to test PH (test H 0 : β 2 =0). If β 2 significant, then leave x* t in the model (to model the non-ph). Some authors suggest other interactions, e.g., x*log(t) or x*i[t>c] (heavyside function). Use whatever fits best.

SAS Code for Time*Covariates proc phreg; model week*arrest(0) = age fin TDfin; TDfin = fin*week; **** run; ****or: TDfin = fin*log(week); or: TDfin = fin*(week>25); (for a different hazard ratio before vs. after week 25)

Stratification vs. Time*Covariate Interactions for Handling Non-PH Time*Covariate Interaction Must choose a particular form, such as x*t or x*log(t). If this form is correct, yields more efficient estimates of other βs. (robustness vs. efficiency) The changing HR over time can be presented and interpreted. Stratification Takes less computation time Models any non-ph relationship, not just specific forms No inference is possible for the stratification variable; only makes sense for nuisance variables.

Checking PH with Many Covariates Check PH for each covariate separately. If interactions are present, check PH over all interaction subgroups (e.g., Males, A; Females, A; Males, B; Females, B) If collinearity (confounding, treatment imbalance) is present among covariates: To check PH for x 1, estimate S i (t) for the levels of x 1 based on a Cox model stratified by x 1, with other covariates in the model. Plot log( log Si ˆ ( t)) vs. log( t).

Difficulty of Checking PH In checking each covariate, we assume PH holds for the other covariates. Which covariate do we start with? If PH fails for a covariate, we should go back and re-check the others after adjusting for the non-ph of the first. A wrong functional form or a missing covariate can look like non-ph. Checking PH can be a difficult process. See Kleinbaum for more details.

Summary and Recommendations Check for outliers Deviance residual plot Check for functional form of continuous covariates Martingale residual plots Check for non-ph Use log(-log(s(t))) plots (either unadjusted or adjusted) Test time*covariate interactions Use the schoen macro to plot β k (t i )by time Checking assumptions takes time. Take the time. Checking can be never-ending, so balance is needed. Some checking is better than none.

The Cox Modeler s Blessing May your continuous covariates all be linear, and may all your covariates satisfy the proportional hazards assumption

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. References Therneau TM and Grambsch PM. (2000). Modeling Survival Data: Extending the Cox Model, Springer. Allison PD. (1995), Survival Analysis Using the SAS System: A Practical Guide, SAS Institute Inc. Klein JP and Moeschberger ML. (1997), Survival Analysis: Techniques for Censored and Truncated Data, Springer-Verlag. Marubini E and Valsecchi MG. (1995), Analysing Survival Data from Clinical Trials and Observational Studies, John Wiley & Sons Ltd. Hosmer DW Jr and Lemeshow S. (1999), Applied Survival Analysis, Wiley. Kalbfleisch JD and Prentice RL. (2002), The Statistical Analysis of Failure Time Data, 2 nd Edition, Wiley.