Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD
Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes For these outcomes, we have a very well structured set of methods/tools of analysis
Types of Analysis Based on Variable Characteristics
Types of Analysis Based on Measurement Scale
Survival Analysis The Idea One other common outcome is time to event (survival time)
Time-to-event outcome The Idea
What is Survival Analysis? Survival Analysis is referred to statistical methods for analyzing survival data
Survival Analysis The Idea Survival Analysis is also known as Reliability theory or reliability analysis in engineering, Duration analysis or duration modeling in economics or Event history analysis in sociology.
What is Survival Analysis? Survival Analysis is referred to statistical methods for analyzing survival data Survival data could be derived from laboratory studies of animals or from clinical and epidemiologic studies Survival data could relate to outcomes for studying acute or chronic diseases
Survival Analysis The Idea Survival analysis attempts to answer questions such as: What is the fraction of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the odds of survival?
Important Areas of Application Clinical Trials and Sources of Survival data Example: Recovery Time after heart surgery Longitudinal or Cohort Studies Example: Time to observing the event of interest Life Insurance Example: Time to file a claim Quality Control Example: The amount of force needed to damage a part such that it is not useable
Unique Features of Survival Event involved Analysis Progression on a dimension (usually time) until the event happens Length of progression may vary among subjects Event might not happen for some subjects
Terminology of Survival Analysis Time-to-event: The time from entry into a study until a subject has a particular outcome Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the study, or if the study ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study.
Examples of Events Examples of events: Death, infection, MI, hospitalization Recurrence of cancer after treatment Marriage, soccer goal Light bulb fails, computer crashes Balloon filling with air bursts 14
Structure of Survival Data Two-variable outcome : Time variable: t i = time at last diseasefree observation or time at event Censoring variable: c i =1 if had the event; c i =0 no event by time t i
Censoring Incomplete observations Right Incomplete follow-up Common and Easy to deal with Left Event has occurred before observation started (T 0 ), but exact time is unknown Not easy to deal with
Right Censoring May be due to: Event had not occurred at termination of the study Event occurred due to a cause that is not the cause of interest Loss to follow-up or drop-out of study. In this situation, we know that subject survived at least to time t.
Left Censoring Examples: Age smoking starts Data from interviews of adults Adult subject reports regular smoking Does not remember when he started smoking regularly Study of incidence of CMV infection in children Two subjects already infected at enrollment
Key Assumption with Censoring Censoring is independent of intervention and event of interest. Those still at risk at time t in the study are a random sample of the population at risk at time t, for all t This assumption means that the risk of the event occuring can be estimated in a fair/unbiased/valid way
Censoring with Covariate Effect Censoring must be independent within group Censoring must be independent given X Censoring can depend on X Among those with the same values of X, censored subjects must be at similar risk of subsequent events as subjects with continued follow-up Censoring can be different across groups
Other Concepts Truncation is about entering the study Right: Event has occurred (e.g. cancer registry) Left: staggered entry Remember: Censoring is about leaving the study Right: Incomplete follow-up (common) Left: Observed time > survival time
Left Truncation More in epidemiology than in medical studies Key Assumption Those who enter the study at time t are a random sample of those in the population still at risk at t. Example: Observational study of seizures in young children What is the relation between vaccine immunization and risk of first seizure? Time axis = age Some children observed from birth Others move in to the area at a later time but were Included at the time of entry into the cohort
Time Notation Denote observation time by t t defines the time axis (scale) t = 0 is the time origin or beginning of observation tmax = end of observation T: random outcome variable time at which event occurs Example: (T = 3) denotes a determination of event occurrence (s) at time 3 units.
Example I Recurrence of herpes lesions after treatment for a primary episode Event = recurrence Time origin = end of primary episode Time scale = months from end of primary episode T = time from end of primary episode to first recurrence
Example II Occupational exposure at nickel refinery Event = death from lung cancer Origin = first exposure Employment at refinery Scale = years since first exposure T = time: first employed to death from LC
Population Mortality Event = death Time origin = date of birth Time scale = age (years) T = age at death
Analysis of Time-To-Event Data
Remember: Features of Survival Event involved Analysis Progression on a dimension (usually time) until the event happens Length of progression may vary among subjects Event might not happen for some subjects
Analysis of Time-To-Event Data There are certain aspects of survival analysis data, such as Censoring and Non-normality, That generate great difficulty when trying to analyze the data using traditional statistical models such as multiple linear regression. The non-normality aspect of the data violates the normality assumption of most commonly used statistical model such as regression or ANOVA, etc.
Analysis of Time-To-Event Data Why not compare mean time-to-event between your groups using a t-test or linear regression? ignores censoring Why not compare proportion of events in your groups using risk/odds ratios or logistic regression? ignores time
Analysis of Time-To-Event Data The Right Tool for the Right Job
What is survival analysis? Model time to failure or time to event Unlike linear regression, survival analysis has a dichotomous (binary) outcome Unlike logistic regression, survival analysis analyzes the time to an event Able to account for censoring
Objectives of Survival Analysis Estimate time-to-event for a group of individuals, such as time until second heartattack for a group of MI patients. To compare time-to-event between two or more groups, such as treated vs. placebo MI patients in a randomized controlled trial. To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients?
Concepts in Survival Analysis Survival Function - A function describing the proportion of individuals surviving to or beyond a given time. Notation: T survival time of a randomly selected individual t a specific point in time. S(t) = P(T > t) Survival Function λ(t) instantaneous failure rate at time t aka hazard function
Tips for the Analysis of Survival Data In any data analysis it is always a great idea to do some univariate analysis before proceeding to more complicated models. In survival analysis it is highly recommended to look at the Kaplan-Meier curves for all the categorical predictors. This will provide insight into the shape of the survival function for each group and give an idea of whether or not the groups are proportional (i.e. the survival functions are approximately parallel).
Tips for the Analysis of Survival Data We also consider the tests of equality across strata to explore differences in survival probability between levels of the predictor. It is not feasible to calculate a Kaplan-Meier curve for the continuous predictors since there would be a curve for each level of the predictor and a continuous predictor simply has too many different levels. Instead we consider the Cox proportional hazard model with a single continuous predictor.
Estimation of The Survival Function Steps Identify the observed failure times: t (1) < <t (k) Number of individuals at risk before t (i) n i Number of individuals with failure time t (i) d i Estimated hazard function at t (i)
Estimation of The Survival Function There are two ways to estimate the survival function The Life-Table Method Product-Moment Method or Kaplan-Meier Method
Example
Life-Table D = death; C = censored; N = number of individuals who are alive (at risk) at beginning of the interval N = N (C/2) = number of individuals who are at risk during the interval S(t) = cumulative survival
Kaplan-Meier Estimate The beginning of each interval is determined by death Each interval contains one death (or more if there are ties) N(t) includes individuals with censored data at t
Assumptions for KM method Survival probabilities are the same for patients entering into the study early or late Actual event time is known Patients who are censored have the same survival probabilities as those who continue to be followed
Comparison Of Two Survival Curves Let S (t) and S (t) be the survival 1 2 functions of the two groups. The null hypothesis is H : S (t) =S (t), for all t > 0 0 1 2 The alternative hypothesis is: H : S (t) S (t), for some t > 0 1 1 2
Log-Rank Test to Compare 2 Survival Functions H 0 : Two Survival Functions are Identical H A : Two Survival Functions Differ T. S.: T P val MH R. R.: T MH = O 1 z V 1 α / 2 E : 2P( Z T 1 MH )
Limitations of Kaplan-Meier Mainly descriptive Doesn t control for covariates Requires categorical predictors Can t accommodate time-dependent variables
Cox Proportional Hazards Model Goal: Compare two or more groups (treatments), adjusting for other risk factors on survival times (like Multiple regression) p Explanatory variables (including dummy variables) Models Relative Risk of the event as function of time and covariates:
Example in SPSS
Life-Table
Output
Kaplan-Meier
Kaplan-Meier Two Groups
Adding Plots
Adding Plots
Cox Regression
Output
Questions?