Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD



Similar documents
Data Analysis, Research Study Design and the IRB

Study Design and Statistical Analysis

Survival Analysis of Dental Implants. Abstracts

Life Tables. Marie Diener-West, PhD Sukon Kanchanaraksa, PhD

Study Design. Date: March 11, 2003 Reviewer: Jawahar Tiwari, Ph.D. Ellis Unger, M.D. Ghanshyam Gupta, Ph.D. Chief, Therapeutics Evaluation Branch

Lecture 15 Introduction to Survival Analysis

Kaplan-Meier Survival Analysis 1

Guide to Biostatistics

Introduction. Survival Analysis. Censoring. Plan of Talk

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time

Competency 1 Describe the role of epidemiology in public health

Evaluation of Treatment Pathways in Oncology: Modeling Approaches. Feng Pan, PhD United BioSource Corporation Bethesda, MD

Early mortality rate (EMR) in Acute Myeloid Leukemia (AML)

13. Poisson Regression Analysis

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

The American Cancer Society Cancer Prevention Study I: 12-Year Followup

Regression Modeling Strategies

Measures of Prognosis. Sukon Kanchanaraksa, PhD Johns Hopkins University

Biostatistics: Types of Data Analysis

Design and Analysis of Phase III Clinical Trials

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Gordon S. Linoff Founder Data Miners, Inc.

Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Missing data and net survival analysis Bernard Rachet

Statistics for Biology and Health

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

Statistics Graduate Courses

ATV - Lifetime Data Analysis

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

ANNEX 2: Assessment of the 7 points agreed by WATCH as meriting attention (cover paper, paragraph 9, bullet points) by Andy Darnton, HSE

If several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form.

Quantifying Life expectancy in people with Type 2 diabetes

Basic Study Designs in Analytical Epidemiology For Observational Studies

Cancer research in the Midland Region the prostate and bowel cancer projects

Dealing with Missing Data

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Nominal and ordinal logistic regression

Chapter 5 Analysis of variance SPSS Analysis of variance

SECOND M.B. AND SECOND VETERINARY M.B. EXAMINATIONS INTRODUCTION TO THE SCIENTIFIC BASIS OF MEDICINE EXAMINATION. Friday 14 March

Journal of Statistical Software

CHILDHOOD CANCER SURVIVOR STUDY Analysis Concept Proposal

Introduction to Longitudinal Data Analysis

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

2 Precision-based sample size calculations

Likelihood of Cancer

Organizing Your Approach to a Data Analysis

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

7.1 The Hazard and Survival Functions

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

Sample Size Planning, Calculation, and Justification

Efficacy analysis and graphical representation in Oncology trials - A case study

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

Komorbide brystkræftpatienter kan de tåle behandling? Et registerstudie baseret på Danish Breast Cancer Cooperative Group

How to get accurate sample size and power with nquery Advisor R

Program Attendance in 41 Youth Smoking Cessation Programs in the U.S.

LOGISTIC REGRESSION ANALYSIS

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Exercise Answers. Exercise B 2. C 3. A 4. B 5. A

Social inequalities in all cause and cause specific mortality in a country of the African region

Personalized Predictive Medicine and Genomic Clinical Trials

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

Introduction to Survival Analysis

Hormones and cardiovascular disease, what the Danish Nurse Cohort learned us

Statistical Models in R

How To Model The Fate Of An Animal

(1) Comparison of studies with different follow-up periods

11. Analysis of Case-control Studies Logistic Regression

Glossary of Statistical Terms

Statistics in Retail Finance. Chapter 6: Behavioural models

Overview of study designs

Kaplan-Meier Plot. Time to Event Analysis Diagnostic Plots. Outline. Simulating time to event. The Kaplan-Meier Plot. Visual predictive checks

SUMAN DUVVURU STAT 567 PROJECT REPORT

Competing-risks regression

Methods for Meta-analysis in Medical Research

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

School of Public Health and Health Services Department of Epidemiology and Biostatistics

Linda Staub & Alexandros Gekenidis

Introduction to Statistics and Quantitative Research Methods

Basic research methods. Basic research methods. Question: BRM.2. Question: BRM.1

L Lang-Lazdunski, A Bille, S Marshall, R Lal, D Landau, J Spicer

Recall this chart that showed how most of our course would be organized:

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Ordinal Regression. Chapter

Logistic regression modeling the probability of success

List of Examples. Examples 319

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Master of Public Health Program Competencies. Implemented Fall 2015

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

PRECOMBAT Trial. Seung-Whan Lee, MD, PhD On behalf of the PRECOMBAT Investigators

Generalized Linear Models

Big Data Health Big Health Improvements? Dr Kerry Bailey MBBS BSc MSc MRCGP FFPH Dr Kelly Nock MPhys PhD

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

2 Right Censoring and Kaplan-Meier Estimator

Summary Measures (Ratio, Proportion, Rate) Marie Diener-West, PhD Johns Hopkins University

Transcription:

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD

Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes For these outcomes, we have a very well structured set of methods/tools of analysis

Types of Analysis Based on Variable Characteristics

Types of Analysis Based on Measurement Scale

Survival Analysis The Idea One other common outcome is time to event (survival time)

Time-to-event outcome The Idea

What is Survival Analysis? Survival Analysis is referred to statistical methods for analyzing survival data

Survival Analysis The Idea Survival Analysis is also known as Reliability theory or reliability analysis in engineering, Duration analysis or duration modeling in economics or Event history analysis in sociology.

What is Survival Analysis? Survival Analysis is referred to statistical methods for analyzing survival data Survival data could be derived from laboratory studies of animals or from clinical and epidemiologic studies Survival data could relate to outcomes for studying acute or chronic diseases

Survival Analysis The Idea Survival analysis attempts to answer questions such as: What is the fraction of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the odds of survival?

Important Areas of Application Clinical Trials and Sources of Survival data Example: Recovery Time after heart surgery Longitudinal or Cohort Studies Example: Time to observing the event of interest Life Insurance Example: Time to file a claim Quality Control Example: The amount of force needed to damage a part such that it is not useable

Unique Features of Survival Event involved Analysis Progression on a dimension (usually time) until the event happens Length of progression may vary among subjects Event might not happen for some subjects

Terminology of Survival Analysis Time-to-event: The time from entry into a study until a subject has a particular outcome Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the study, or if the study ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study.

Examples of Events Examples of events: Death, infection, MI, hospitalization Recurrence of cancer after treatment Marriage, soccer goal Light bulb fails, computer crashes Balloon filling with air bursts 14

Structure of Survival Data Two-variable outcome : Time variable: t i = time at last diseasefree observation or time at event Censoring variable: c i =1 if had the event; c i =0 no event by time t i

Censoring Incomplete observations Right Incomplete follow-up Common and Easy to deal with Left Event has occurred before observation started (T 0 ), but exact time is unknown Not easy to deal with

Right Censoring May be due to: Event had not occurred at termination of the study Event occurred due to a cause that is not the cause of interest Loss to follow-up or drop-out of study. In this situation, we know that subject survived at least to time t.

Left Censoring Examples: Age smoking starts Data from interviews of adults Adult subject reports regular smoking Does not remember when he started smoking regularly Study of incidence of CMV infection in children Two subjects already infected at enrollment

Key Assumption with Censoring Censoring is independent of intervention and event of interest. Those still at risk at time t in the study are a random sample of the population at risk at time t, for all t This assumption means that the risk of the event occuring can be estimated in a fair/unbiased/valid way

Censoring with Covariate Effect Censoring must be independent within group Censoring must be independent given X Censoring can depend on X Among those with the same values of X, censored subjects must be at similar risk of subsequent events as subjects with continued follow-up Censoring can be different across groups

Other Concepts Truncation is about entering the study Right: Event has occurred (e.g. cancer registry) Left: staggered entry Remember: Censoring is about leaving the study Right: Incomplete follow-up (common) Left: Observed time > survival time

Left Truncation More in epidemiology than in medical studies Key Assumption Those who enter the study at time t are a random sample of those in the population still at risk at t. Example: Observational study of seizures in young children What is the relation between vaccine immunization and risk of first seizure? Time axis = age Some children observed from birth Others move in to the area at a later time but were Included at the time of entry into the cohort

Time Notation Denote observation time by t t defines the time axis (scale) t = 0 is the time origin or beginning of observation tmax = end of observation T: random outcome variable time at which event occurs Example: (T = 3) denotes a determination of event occurrence (s) at time 3 units.

Example I Recurrence of herpes lesions after treatment for a primary episode Event = recurrence Time origin = end of primary episode Time scale = months from end of primary episode T = time from end of primary episode to first recurrence

Example II Occupational exposure at nickel refinery Event = death from lung cancer Origin = first exposure Employment at refinery Scale = years since first exposure T = time: first employed to death from LC

Population Mortality Event = death Time origin = date of birth Time scale = age (years) T = age at death

Analysis of Time-To-Event Data

Remember: Features of Survival Event involved Analysis Progression on a dimension (usually time) until the event happens Length of progression may vary among subjects Event might not happen for some subjects

Analysis of Time-To-Event Data There are certain aspects of survival analysis data, such as Censoring and Non-normality, That generate great difficulty when trying to analyze the data using traditional statistical models such as multiple linear regression. The non-normality aspect of the data violates the normality assumption of most commonly used statistical model such as regression or ANOVA, etc.

Analysis of Time-To-Event Data Why not compare mean time-to-event between your groups using a t-test or linear regression? ignores censoring Why not compare proportion of events in your groups using risk/odds ratios or logistic regression? ignores time

Analysis of Time-To-Event Data The Right Tool for the Right Job

What is survival analysis? Model time to failure or time to event Unlike linear regression, survival analysis has a dichotomous (binary) outcome Unlike logistic regression, survival analysis analyzes the time to an event Able to account for censoring

Objectives of Survival Analysis Estimate time-to-event for a group of individuals, such as time until second heartattack for a group of MI patients. To compare time-to-event between two or more groups, such as treated vs. placebo MI patients in a randomized controlled trial. To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients?

Concepts in Survival Analysis Survival Function - A function describing the proportion of individuals surviving to or beyond a given time. Notation: T survival time of a randomly selected individual t a specific point in time. S(t) = P(T > t) Survival Function λ(t) instantaneous failure rate at time t aka hazard function

Tips for the Analysis of Survival Data In any data analysis it is always a great idea to do some univariate analysis before proceeding to more complicated models. In survival analysis it is highly recommended to look at the Kaplan-Meier curves for all the categorical predictors. This will provide insight into the shape of the survival function for each group and give an idea of whether or not the groups are proportional (i.e. the survival functions are approximately parallel).

Tips for the Analysis of Survival Data We also consider the tests of equality across strata to explore differences in survival probability between levels of the predictor. It is not feasible to calculate a Kaplan-Meier curve for the continuous predictors since there would be a curve for each level of the predictor and a continuous predictor simply has too many different levels. Instead we consider the Cox proportional hazard model with a single continuous predictor.

Estimation of The Survival Function Steps Identify the observed failure times: t (1) < <t (k) Number of individuals at risk before t (i) n i Number of individuals with failure time t (i) d i Estimated hazard function at t (i)

Estimation of The Survival Function There are two ways to estimate the survival function The Life-Table Method Product-Moment Method or Kaplan-Meier Method

Example

Life-Table D = death; C = censored; N = number of individuals who are alive (at risk) at beginning of the interval N = N (C/2) = number of individuals who are at risk during the interval S(t) = cumulative survival

Kaplan-Meier Estimate The beginning of each interval is determined by death Each interval contains one death (or more if there are ties) N(t) includes individuals with censored data at t

Assumptions for KM method Survival probabilities are the same for patients entering into the study early or late Actual event time is known Patients who are censored have the same survival probabilities as those who continue to be followed

Comparison Of Two Survival Curves Let S (t) and S (t) be the survival 1 2 functions of the two groups. The null hypothesis is H : S (t) =S (t), for all t > 0 0 1 2 The alternative hypothesis is: H : S (t) S (t), for some t > 0 1 1 2

Log-Rank Test to Compare 2 Survival Functions H 0 : Two Survival Functions are Identical H A : Two Survival Functions Differ T. S.: T P val MH R. R.: T MH = O 1 z V 1 α / 2 E : 2P( Z T 1 MH )

Limitations of Kaplan-Meier Mainly descriptive Doesn t control for covariates Requires categorical predictors Can t accommodate time-dependent variables

Cox Proportional Hazards Model Goal: Compare two or more groups (treatments), adjusting for other risk factors on survival times (like Multiple regression) p Explanatory variables (including dummy variables) Models Relative Risk of the event as function of time and covariates:

Example in SPSS

Life-Table

Output

Kaplan-Meier

Kaplan-Meier Two Groups

Adding Plots

Adding Plots

Cox Regression

Output

Questions?