SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates

Similar documents
STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach

Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done?

USING ANALYTICS TO MEASURE THE VALUE OF EMPLOYEE REFERRAL PROGRAMS

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Name of the module: Multivariate biostatistics and SPSS Number of module:

SUMAN DUVVURU STAT 567 PROJECT REPORT

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015

Comparison of resampling method applied to censored data

The Landmark Approach: An Introduction and Application to Dynamic Prediction in Competing Risks

Survival Analysis And The Application Of Cox's Proportional Hazards Modeling Using SAS

Survival Analysis Using Cox Proportional Hazards Modeling For Single And Multiple Event Time Data

Missing data and net survival analysis Bernard Rachet

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

The Cox Proportional Hazards Model

ATV - Lifetime Data Analysis

Multiple logistic regression analysis of cigarette use among high school students

Illustration (and the use of HLM)

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Introduction to Fixed Effects Methods

Models of Risk and Return

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 19: Conditional Logistic Regression

CHAPTER TWELVE TABLES, CHARTS, AND GRAPHS

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Design and Analysis of Phase III Clinical Trials

Imputing Missing Data using SAS

The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting

Regression Modeling Strategies

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

eq5d: A command to calculate index values for the EQ-5D quality-of-life instrument

Effect of Risk and Prognosis Factors on Breast Cancer Survival: Study of a Large Dataset with a Long Term Follow-up

Statistics for Biology and Health

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time

Introduction. Survival Analysis. Censoring. Plan of Talk

13. Poisson Regression Analysis

Elementary Statistics

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

LOGISTIC REGRESSION ANALYSIS

Comparison of Survival Curves

Confounding in Epidemiology

BayesX - Software for Bayesian Inference in Structured Additive Regression

Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

Statistics Graduate Courses

Survey Analysis: Options for Missing Data

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

7.1 The Hazard and Survival Functions

Least Squares Estimation

Binary Logistic Regression

Package dsmodellingclient

Sampling Error Estimation in Design-Based Analysis of the PSID Data

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Time varying (or time-dependent) covariates

Bayesian survival analysis in clinical trials: what methods are used in practice? A systematic review of the literature

Performance Analysis of a Telephone System with both Patient and Impatient Customers

Competing-risks regression

Let us stop throwing out the baby with the bathwater: towards better analysis of longitudinal injury data

Missing data in randomized controlled trials (RCTs) can

Dealing with Missing Data

INTEREST RATES AND FX MODELS

The SURVEYFREQ Procedure in SAS 9.2: Avoiding FREQuent Mistakes When Analyzing Survey Data ABSTRACT INTRODUCTION SURVEY DESIGN 101 WHY STRATIFY?

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Predict the Popularity of YouTube Videos Using Early View Data

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Electronic Health Records in an Integrated Delivery System: Effects on Diabetes Care Quality

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Tutorial on Markov Chain Monte Carlo

Hormones and cardiovascular disease, what the Danish Nurse Cohort learned us

Dashboard. Campaign for Action. Welcome to the Future of Nursing:

Nominal and ordinal logistic regression

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Applying Survival Analysis Techniques to Loan Terminations for HUD s Reverse Mortgage Insurance Program - HECM

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

Public Health Insurance Expansions for Parents and Enhancement Effects for Child Coverage

Many research questions in epidemiology are concerned. Estimation of Direct Causal Effects ORIGINAL ARTICLE

Metadata and ADaM.

Interpretation of Somers D under four simple models

Understanding the Impact of Weights Constraints in Portfolio Theory

Statistics in Retail Finance. Chapter 6: Behavioural models

Training/Internship Brochure Advanced Clinical SAS Programming Full Time 6 months Program

Laboratory 3 Type I, II Error, Sample Size, Statistical Power

Inhibit terminal acid secretion from parietal cells by blocking H + /K + - ATPase pump

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Handling attrition and non-response in longitudinal data

Methods for Meta-analysis in Medical Research

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

SUGI 29 Statistics and Data Analysis

Transcription:

SAS and R calculations for cause specific hazard ratios in a competing risks analysis with time dependent covariates Martin Wolkewitz, Ralf Peter Vonberg, Hajo Grundmann, Jan Beyersmann, Petra Gastmeier, Sina Bärwolff, Christine Geffers, Michael Behnke, Henning Rüden, Martin Schumacher Abstract In this supplement, we demonstrate how to calculate the cause specific hazard ratios (CSHR) using the software SAS (SAS Institute, Inc., Cary, North Carolina) and the free software R (1). Special focus is given on the counting process format which is a convenient data representation of time dependent covariates. Motivated from hospital epidemiology, a typical situation in which the impact of a time dependent exposure (nosocomial pneumonia) on two competing endpoints (death and discharge) is exemplarily considered. Introduction Typically, survival analysis models the time to event; if there are several events present, a competing risks model is needed to take multiple types of events into account. The standard approach is modeling the cause specific hazard function (2). We present the SAS and R code for epidemiologists who are basically familiar with the methodological background of competing risks theory; we refer to the tutorial by Putter et al. (2). The inclusion of binary time dependent covariates is explicitely explained. Non reversible binary time dependent covariates The simplest time dependent covariate is a non reversible binary variable, whose values changes from 0 to 1 at the time of occurence of the intermediate event. It is defined as follows: Z i (t) 0 if = 1 if t time of occurence of the intermedia te event (unexposed) t > time of occurence of the intermedia te event (exposed) Such a variable occurs very frequently, e.g. as an exposure in epidemiology, and describes a subject who enters the study unexposed, gets exposed at time t and stays in this exposed status until the study endpoint. In fact, there are various types of time dependent covariates. Here, we focus on this special type and refer the interested reader to (3).

Regression on cause specific hazards The cause specific hazard of cause k for a subject with a covariate vector Z ( t ) = ( Z 1 ( t ),..., Z ( t )), which may contain time dependent as well as time independent covariates, is modeled as where λ ) k, 0 ( t p λ t Z ( t )) = λ ( t ) exp( β Z ( t ) +... + β Z ( )) k ( k, 0 1 1 p p t is the baseline cause specific hazard of cause k and β k are the regression coefficients that represents the covariate effects on cause k. Counting process format for time dependent covariates This style can handle time dependent covariates as well as left truncation and right censoring with controlling the risk set. For each patient, one row represents a time interval (start,stop], a status indicator and the values of the covariates. Here, we only consider one time dependent covariate (e.g. NP=nosocomial pneumonia, definition as above). For each patient, there are two possible outcomes: death or discharge. The status_ variable indicates the status on the stop day: status_=0 indicates no endpoint, status_=1 death and status_=2 indicates discharge. The data might look as follows: patient start stop status_ NP 1 0 4 0 0 1 4 10 1 1 2 0 12 0 0 2 12 15 2 1 3 0 20 2 0 4 0 5 0 0 5 0 7 0 0 5 7 13 0 1 In our hypothetical example, patient 1 enters the study free of NP, acquires NP on day 4 and dies on day 10. Patient 2 acquires NP on day 12 and is discharged on day 15. Patient 3 stays free of NP until his discharge on day 20. Patient 4 is administratively censored on day 5 and did not acquire NP until this day whereas patient 5 acquires NP on day 7 before being administratively censored on day 13.

Blowing up the data to the long format In order to fit one stratified analysis according to the competing endpoints, the data requires the long format. We include the variables endpoint and status and create two new dummy variables NP_death and NP_discharge (type specific covariates). patient start stop status endpoint NP_death NP_discharge 1 0 4 0 death 0 0 1 4 10 1 death 1 0 1 0 4 0 discharge 0 0 1 4 10 0 discharge 0 1 2 0 12 0 death 0 0 2 12 15 0 death 1 0 2 0 12 0 discharge 0 0 2 12 15 1 discharge 0 1 3 0 20 0 death 0 0 3 0 20 1 discharge 0 0 4 0 5 0 death 0 0 4 0 5 0 discharge 0 0 5 0 7 0 death 0 0 5 7 13 0 death 1 0 5 0 7 0 discharge 0 0 5 7 13 0 discharge 0 1

SAS or R codes to yield CSHR For fitting a stratified Cox model assuming proportional hazards the PHREG procedure in SAS (4,5) or, alternatively, the coxph function from the survival R package may be used. Software SAS R CSHR for death proc phreg covsandwich(aggregate) covm; model (start,stop)*status(0) = NP_death NP_discharge; strata endpoint; id patient; run; > coxph(surv(start,stop,status = =1) ~ NP_death + NP_discharge + strata(endpoint) + cluster(patient)) Please note, that status(0) indicates to SAS that 0 is technically considered censoring and the value 1 is the event of interest. In R, status = = 1 indicates the endpoint and the other possible values as censored. Robust / Sandwich variance Data contain several records per patient, thus robust estimates of standard errors might be one way to take this correlation into account. This can be done by specifying the SAS option covsandwich(aggregate) covm in the PHREG procedure in combination with the ID statement. In R one has to add cluster(patient) to yield robust estimates. Remarks Often, the data requires careful preparation in order to receive the counting process format. We strongly recommend to check the dataset before fitting the model. Several time dependent as well as time independent covariates may be described in this format.

Reference 1) Ihaka R, Gentleman R: R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, 1996, vol. 5 (3), 299 314 2) Putter H, Fiocco M, Geskus RB: Tutorial in biostatistics: competing risks and multi state models. Stat Med 2007, 26:2389 2430. 3) Klein JP, Moeschberger L: Survival analysis: techniques for censored and truncated data, 2nd ed., 2003 Springer 4) Allison PD: Survival Analysis Using the SAS System: A Practical Guide, Cary, NC: SAS Institute Inc., 1995. 292 pp. 5) Ake CF, Carpenter AL: Extending the Use of PROC PHREG in Survival Analysis, SAS Conference Proceedings: WUSS 2003, San Francisco, California