A simple to implement algorithm for natural direct and indirect effects in survival studies with a repeatedly measured mediator

Similar documents

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Randomized trials versus observational studies

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides:

Many research questions in epidemiology are concerned. Estimation of Direct Causal Effects ORIGINAL ARTICLE

Study Design and Statistical Analysis

Big data size isn t enough! Irene Petersen, PhD Primary Care & Population Health

Supplementary appendix

Personalized Predictive Medicine and Genomic Clinical Trials

Statistical Rules of Thumb

With Big Data Comes Big Responsibility

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Statistics in Retail Finance. Chapter 6: Behavioural models

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program

Missing data and net survival analysis Bernard Rachet

Sample Size Planning, Calculation, and Justification

Module 223 Major A: Concepts, methods and design in Epidemiology

Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects *

Evaluation of Treatment Pathways in Oncology: Modeling Approaches. Feng Pan, PhD United BioSource Corporation Bethesda, MD

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistics Graduate Courses

More details on the inputs, functionality, and output can be found below.

Elements of statistics (MATH0487-1)

Data Analysis, Research Study Design and the IRB

Organizing Your Approach to a Data Analysis

Problem of Missing Data

CS&SS / STAT 566 CAUSAL MODELING

SUMAN DUVVURU STAT 567 PROJECT REPORT

Lecture 15 Introduction to Survival Analysis

University of Maryland School of Medicine Master of Public Health Program. Evaluation of Public Health Competencies

Vignette for survrm2 package: Comparing two survival curves using the restricted mean survival time

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

School of Public Health and Health Services Department of Epidemiology and Biostatistics

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Correlational Research

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

The Harvard MPH in Epidemiology Online/On-Campus/In the Field. Two-Year, Part-Time Program. Sample Curriculum Guide

Regression Modeling Strategies

Electronic Health Records in an Integrated Delivery System: Effects on Diabetes Care Quality

Biostat Methods STAT 5820/6910 Handout #6: Intro. to Clinical Trials (Matthews text)

Introduction. Survival Analysis. Censoring. Plan of Talk

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

7.1 The Hazard and Survival Functions

Multivariate Logistic Regression

13. Poisson Regression Analysis

UMEÅ INTERNATIONAL SCHOOL

The Landmark Approach: An Introduction and Application to Dynamic Prediction in Competing Risks

Imputing Values to Missing Data

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Targeted Learning with Big Data

Alemtuzumab for treating relapsing-remitting multiple sclerosis

REGULATIONS FOR THE POSTGRADUATE DIPLOMA IN CLINICAL RESEARCH METHODOLOGY (PDipClinResMethodology)

Study Design. Date: March 11, 2003 Reviewer: Jawahar Tiwari, Ph.D. Ellis Unger, M.D. Ghanshyam Gupta, Ph.D. Chief, Therapeutics Evaluation Branch

The Harvard MPH in Epidemiology Online/On-Campus/In the Field. Two-Year, Part-Time Program. Sample Curriculum Guide

Prediction for Multilevel Models

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Guide to Biostatistics

Package dsmodellingclient

If several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form.

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Supplementary PROCESS Documentation

Survival analysis methods in Insurance Applications in car insurance contracts

Applying Survival Analysis Techniques to Loan Terminations for HUD s Reverse Mortgage Insurance Program - HECM

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

U.S. Food and Drug Administration

REGULATIONS FOR THE POSTGRADUATE CERTIFICATE IN PUBLIC HEALTH (PCPH) (Subject to approval)

Apixaban Plus Mono vs. Dual Antiplatelet Therapy in Acute Coronary Syndromes: Insights from the APPRAISE-2 Trial

Competency 1 Describe the role of epidemiology in public health

Evaluation of Treatment Pathways in Oncology: An Example in mcrpc

Gamma Distribution Fitting

Risk Factors for Alcoholism among Taiwanese Aborigines

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION

Gruppo di lavoro: Malattie Tromboemboliche

Moderation. Moderation

Introduction to Longitudinal Data Analysis

When to Use a Particular Statistical Test

Preparation course Msc Business & Econonomics

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Analysis of Bayesian Dynamic Linear Models

PSA Testing 101. Stanley H. Weiss, MD. Professor, UMDNJ-New Jersey Medical School. Director & PI, Essex County Cancer Coalition. weiss@umdnj.

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Transcription:

A simple to implement algorithm for natural direct and indirect effects in survival studies with a repeatedly measured mediator Susanne Strohmaier 1, Nicolai Rosenkranz 2, Jørn Wetterslev 2 and Theis Lange 3 1 University of Oslo, 2 Copenhagen University Hospital, 3 University of Copenhagen ISCB Utrecht, August 2015

Outline Background and motivation The 6S study Generalisation of natural direct and indirect effects Data structure and notation Objectives and encountered challenges Estimation General idea Proposed algorithm Application results Discussion References Additional Material

Background and motivation The 6S study Motivating example - The 6S trial 6S - Scandinavian Starch for Severe Sepsis/Septic Shock trial Background: Intravenous fluids are the mainstay of treatment for patients with hypovolemia due to severe sepsis to obtain fast circulatory stabilisation. Commonly applied interventions: Crystalloids including saline and dextrose (Ringer s solution) Colloids containing larger molecules such as starch or gelatine. Preferred choice in Scandinavian intensive care units (ICU) Hydtroxyethyl starch (HES) 130/0.42 (previously high molecular weight HES - reported to cause acute kidney failure and bleeding) However HES 130/0.42 is largely unstudied in patients with severe sepsis; Lack of efficacy data and concerns about safety

Background and motivation The 6S study 6S in brief Study population: Patients with severe sepsis admitted to an intensive care unit (ICU) who needed fluid for circulatory stabilisation Randomisation to: Hydroxyethyl starch (HES 130/0.42) Ringer s solution Primary outcome: Death or end-stage kidney failure at 90 days after randomisation Results reported in NEJM: (Perner et al. (2012)) At 90 days after randomisation, 201 of 398 assigned to HES had died as compared with 172 of 400 patients assigned to Ringer s acetate. (RR 1.17; 95% confidence interval, 1.01 to 1.36: p=0.03) Speculated mechanism Effect of HES vs. Ringer s solution is mediated through acute kidney impairment. Starch particles cause acute kidney injury.

A simple to implement algorithm for natural direct and indirect effects in survival studies with a repeatedly measured mediator Background and motivation The 6S study Particular data collection M 1 M 2... M τ A N 1 N 2... N τ L Figure: Assumed causal structure

A simple to implement algorithm for natural direct and indirect effects in survival studies with a repeatedly measured mediator Generalisation of natural direct and indirect effects Data structure and notation Assumed structure M A L N/dN/T Figure: Local independence graph C Some notation A... dichotomous treatment, values {0, 1} L... a set of baseline confounders T = min( T, U)... minimum of event- and censoring time = I ( T U)... censoring indicator M...mediator process M t... mediator measured at time t M t... mediator history up to time t N t = I (T t, = 1)... counting process the event of interest C t... censoring process

Generalisation of natural direct and indirect effects Objectives and encountered challenges Objectives Estimation of average natural direct and indirect effects Question we could consider What would be the effect of treatment on time-to-event if treatment had not had an effect on the mediator? Effect of treatment attributable to the treatment induced change in the mediator Simple to implement algorithm for estimation for direct parametrisation of those effects

Generalisation of natural direct and indirect effects Objectives and encountered challenges Effect definitions Average natural direct effect E[ T ama T a M a L] comparing the survival times (within levels of baseline covariates L) had the treatment been changed from a to a, but the mediator history had followed the path it would have naturally followed under some reference condition for the treatment A = a. Average natural indirect effect E[ T ama T ama L], which compares the effect of the mediator histories M a and M a on the outcome, when the treatment is set to A = a. Effect decompostion on the mean survial scale E[ T a T a L] = E[ T ama T a M a L] = E[ T ama T a M a L] + E[ T ama T ama L],

Generalisation of natural direct and indirect effects Objectives and encountered challenges Encountered Challenges Conterfactual survial times T a, mediator trajectories M a and respective nested counterfactuals T ama have only been defined for mediators measured at one single time point Considerations under which conditions generalised effects are well-defined qunatities (Zheng and van der Laan (2012)) Extend respective no-unmeasurened confounder assumptions and the sequential ignorability assumption to the process setting

Estimation General idea General idea Directly parameterise the natural direct and indirect effects Extension of Lange et al. (2012) A Simple unified approach for estimating natural direct and indirect effects to the current setting. For GLMs and a mediator measured at a single point in time g(e[y ama ]) = c 0 + c 1 a + c 2 a + c 3 a a Method builds on ideas of Hong (2010), but has been generalised to various types of outcomes and mediators. Estimate the effects through analysis of an expanded and reweighted data set (ratio of mediator probabilities under different treatment regimes)

Estimation General idea Considering survival outcomes Assuming independent censoring Cox and Aalen models for the hazard function corresponding to the counterfactual survival time, expressed as Cox natural effects model λ 0(t) exp {c 0 + c 1a + c 2a + c 3a a + c 4L} Aalen natural effects model γ 0(t) + c 1a + c 2a + c 3a a + c 4L Natural effects models for the counterfactual variable T can ama expressed in the same fashion Effect decomposition can also be done on the hazard or hazard ratio scale (VanderWeele (2011)).

Estimation Proposed algorithm Proposed estimation algorithm (1/3) 1. Organise your data in wide format Censored or dead subjects will have missing for the mediators where they were not observed. 2. Estimate models for the mediator at each measurement time t conditional on the assigned treatment A, the mediator history and baseline confounders L by using the original data set. 3. Construct a new dataset; repeating each observation twice and including an additional auxiliary variable A, which equal A for the first replication and 1 A for the second replication. Further, add and indicator,identifying which data originate from the same subject.

Estimation Proposed algorithm Proposed estimation algorithm (2/3) 4. Apply the predict function twice to predict from the models in step 2, once based on A and once based on A. By dividing the predicted values the local weights at time s can be obtained W s j = P(Ms = m s j M s 1 = m s 1 j, A = a j, L = l j ) P(M s = m s j Ms 1 = m s 1 j, A = a j, L = l j ) (1) 5. Compute the cumulative weights by multiplying the local weights obtained in step 4 across the relevant rows T j W j = Wj s (2) s=1

Estimation Proposed algorithm Proposed estimation algorithm (3/3) 6. Fit a suitable outcome model, i.e. the Cox model including A and A, possible interactions and baseline covariates. The model should be weighted by the weights from step 5. 7. Confidence intervals can be obtained using bootstrap.

Estimation Proposed algorithm Some intuition... The resulting regression coefficient associated with A will capture the natural direct effect while the coefficient of A* will capture the natural indirect effect The derived weights are a tool to distinguish between direct and indirect effects by giving more weights to observations where the observed mediator trajectory would have been more likely to occur under a different treatment level than actually observed. For a constructive illustration see Hong (2010)

Application results Back to the original medical problem A quick recapitulation Patients selection: 795 patients admitted to intensive care units (396 in HES group vs. 399 Ringer solution) Mediator: KDIGO (Kidney Disease: Improving Global Outcome) score, taking values {0, 1, 2, 3} measured daily within the first 5 days of follow-up. Mediator model: Multinomial logistic regression Outcome: Time to death within 90 days after randomisation Outcome model: Cox regression

Application results Table: Direct, indirect effect through KDIGO and total effects and 95% confidence intervals (CI) comparing HES to Ringer solution. HR (95% CI ) Direct effect 1.16 (0.95-1.40) Indirect efffect 1.05 (0.98-1.12) Total effect 1.21 (1.01-1.47) Both direct and indirect effect are not significant However, the magnitude of the direct effect indicates that could be another importatn pathway through which the treatment could bring about the outcome. Potential pathways Coagulapathy (bleeding disorder) Patients generally have a low immune response, additionally adding foreign bodies could cause there immune system to crash

Discussion Discussion Easy to implement in standard software (predict functionality, weighted modeling) Generally applicable for various mediator types, although some caution with continuous mediators (unstable weights) Can handle treatment-by-mediator interactions Effect definitions involves critical assumptions

References References Hong, G. (2010). Ratio od mediator probability weighting for estimating natural direct and indirect effects. Proceedings of the American Statistical Association, Biometrics Section, pages 2401 2415. Lange, T., Vansteelandt, S., and Bekaert, M. (2012). A simple unified approach for estimating natural direct and indirect effects. American Journal of Epidemiology, 176(3):190 195. Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainy in Artifcial Intelligence, pages 411 420. Perner, A., Haase, N., Guttormsen, A., Tenhunen, J., Klemenzson, G., Åneman, A., Madsen, K., Møller, M., Elkjæ, J., Poulsen, L., Bendtsen, A., Winding, R., Steensen, M., Berezowicz, P., Søe-Jensen, P., Bestle, M., Strand, K., Wiis, J., White, J., Thornberg, K., Quist, L., Nielsen, J., Andersen, L., Holst, L., Thormar, K., Kjældgaard, A., Fabritius, M., Mondrup, F., Pott, F., Møller, T., Winkel, P., Wetterslev, J., the 6S Trial Group, and the Scandinavian Critical Care Trials Group. (2012). Hydrocyethyl starch 130/0.42 versus Ringer s acetate in severe sepsis. New England Jounral of Medicine, 367:124 134. VanderWeele, T. (2011). Causal mediation analysis with survival data. Epidemiology, 22(4):582 585. VanderWeele, T. J. and Vansteelandt, S. (2009). Conceptual issues concerning mediation, interventions and composition. Stat Interface, pages 457 468. Zheng, W. and van der Laan, M. J. (2012). Causal mediation in a survival setting with time-dependent mediators. Berkeley Division of Biostatistics Working Paper Series. Working paper 295.

Additional Material (Nested) counterfactual quantities Assuming no censoring Counterfactual survival time Tam the outcome one would - possibly contrary to the fact - observe had the treatment A been set to the variable a and the entire mediator history M to m Counterfactual mediator trajectory M a corresponds to the mediator history that one would - possibly contrary to the fact - observe had treatment A been set to the value a. Nested counterfactual denoting the outcome one would have observed if A had been set to A = a and had M been set to the values it would have taken if A had been set to a. TaMa

Additional Material Average natural direct effect E[ T ama T a M a L] comparing the survival times (within levels of baseline covariates L) had the treatment been changed from a to a, but the mediator history had followed the path it would have naturally followed under some reference condition for the treatment A = a. Average natural indirect effect E[ T ama T ama L], which compares the effect of the mediator histories M a and M a on the outcome, when the treatment is set to A = a. Effect decompostion on the mean survial scale E[ T a T a L] = E[ T ama T a M a L] = E[ T ama T a M a L] + E[ T ama T ama L],

A simple to implement algorithm for natural direct and indirect effects in survival studies with a repeatedly measured mediator Additional Material Assumptions... to relate the counterfactual quantities to the observed data A M 1 M 2... M τ A N 1 N 2... N τ Figure: Augemented DAG L Treatment is randomised No unmeasured confounding between treatment and the outcome process, treatment and the mediator process and the mediator and outcome process.

Additional Material Non-parametric structural equation models Current mediator value M t Ma t = f M (M t 1 a, A = a, L = l, ε t a), with ε t a denoting an error term. Caution: Contains an assumption that all time-dependent predictors of both the mediator and survival status are contained in the mediator process (critical discussion in Zheng and van der Laan (2012)). Counterfactual event process N t aa = f N(M t a, Nt 1 aa, A = a, L = l, γt a), with γ t a again denoting an error term. Using those equations the assumptions can be pharsed in terms of the error terms.

Additional Material Assumptions Consistency: As stated in the main text, consistency assumes that the counterfactual outcomes T am and M a are the observed outcomes T with probability 1 among subjects with A = a and M = m and for the mediator trajectories M a are the observed mediator trajectories m with probability 1 for subject with A = a. Positivity: Further to ensure that the equations above describe non-degenerate probability distributions, we assume that the error terms follow uniform distributions ε t a unif ([0, 1]), a = 0, 1 (3) γ t a unif ([0, 1]), a = 0, 1 (4) and that fm t and f N t are not constant for the whole range ([0, 1]) in the arguments ε t and γ t. Further the full vectors of error terms (ε 1 a,..., ε τ a ) and (γa, 1..., γa τ ) are assumed to be an independent and identically distributed sequence.

Additional Material Assumptions cont. No unmeasured confounders: Random treatment allocation should ensure that treatment-outcome as well as the treatment - mediator relationship will not be confounded with either measured or unmeasured confounders. This can be expressed as ε t 0 ( ) ε t 1 A γ 0 t A. (5) γ1 t For the mediator-outcome relationship we assume ( ) ( ) ε t 0γ0 t A A and ε t 1γ1 t A A

Additional Material Assumptions cont. Extended sequential ignorability: The sequential ignorability assumption known from the single mediator case to be required for identifying natural effects needs to be modified to accommodate the multiple measurements. In terms of the error terms it can be expressed as ( ) ε t γ t 0 a γ1 t ( ) a = 0, 1 and γa t ε t 0 ε t 1 a = 0, 1. (6)

Additional Material Proof Algorithm Under these assumptions E[g((T, ) ama )] can be linked to the observed data as follows, where the function g() is any measureable function (this ensures that the distributions are the same, not just a single expected value) and h() corresponds to a density with respect to a suitable measure. E[g((T, ) ama ) L = l] = E[g(T, ) A = a, A = a, L = l] Recall that τ denotes the maximal follow up time, so we have = τ g(t, δ)h(t, δ A = a, A = a, L = l), δ=0,1 t=1 further, let M denote the space of possible mediator values, then = τ g(t, δ) h(t, δ, m t A = a, A = a, L = l)d m t δ=0,1 t=1 m t M t = τ g(t, δ) δ=0,1 t=1 m t M t h(t, δ m t, A = a, A = a, L = l) h( mt A = a, A = a, L = l) h( m t A = a, A = a, L = l) h( m t A = a, A = a, L = l)d m t,

Additional Material Proof continued conditioning on m t in the first term allows to remove A from the conditioning set, further using that M t depends only on A gives the following equality = τ δ=0,1 t=1 g(t, δ) h(t, δ m t, A = a, L = l) h( mt A = a, L = l) m t M t h( m t A = a, L = l) h( m t A = a, L = l)d m t, applying the same independence arguments again, we have = τ g(t, δ) h(t, δ m t, A = a, A = a, L = l)h( m t A = a, A = a, L = l) δ=0,1 t=1 m t M t h( mt A = a, L = l) h( m t A = a, L = l) )d mt, what can be writen as = τ g(t, δ)w ( m t, a, a, l)h(t, δ, m t A = a, A = a, L = l)d m t. δ=0,1 t=1 m t M t

Additional Material Proof continued The last expression has the sample counterpart 1 #{N } where T g(t i, i )W aa (M i i, a, a, L i ), i N N = {i {1,..., n} : A i = a, L i = l}.