Introduction to mixed model and missing data issues in longitudinal studies



Similar documents
Problem of Missing Data

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Dealing with Missing Data

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

Missing data and net survival analysis Bernard Rachet

A Basic Introduction to Missing Data

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Handling missing data in Stata a whirlwind tour

Latent class mixed models for longitudinal data. (Growth mixture models)

Analysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku,

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Imputation of missing data under missing not at random assumption & sensitivity analysis

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Least Squares Estimation

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Bayesian Approaches to Handling Missing Data

Electronic Theses and Dissertations UC Riverside

Sensitivity analysis of longitudinal binary data with non-monotone missing values

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Sensitivity Analysis in Multiple Imputation for Missing Data

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Imputing Missing Data using SAS

Multiple Imputation for Missing Data: A Cautionary Tale

Analysis of Longitudinal Data with Missing Values.

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Nominal and ordinal logistic regression

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

LCMM: a R package for the estimation of latent class mixed models for Gaussian, ordinal, curvilinear longitudinal data and/or time-to-event data

Analyzing Structural Equation Models With Missing Data

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Dr James Roger. GlaxoSmithKline & London School of Hygiene and Tropical Medicine.

Missing Data in Longitudinal Studies

2. Making example missing-value datasets: MCAR, MAR, and MNAR

TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies

VI. Introduction to Logistic Regression

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Introduction to General and Generalized Linear Models

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes


I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Note on the EM Algorithm in Linear Regression Model

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Reject Inference in Credit Scoring. Jie-Men Mok

Assignments Analysis of Longitudinal data: a multilevel approach

Missing Data Techniques for Structural Equation Modeling

Dealing with Missing Data

Implementation of Pattern-Mixture Models Using Standard SAS/STAT Procedures

MISSING DATA IN NON-PARAMETRIC TESTS OF CORRELATED DATA

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

A General Approach to Variance Estimation under Imputation for Missing Survey Data

How to choose an analysis to handle missing data in longitudinal observational studies

Linear Classification. Volker Tresp Summer 2015

SAS Syntax and Output for Data Manipulation:

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)

How to use SAS for Logistic Regression with Correlated Data

College of Public Health & Health Professions

Missing Data Sensitivity Analysis of a Continuous Endpoint An Example from a Recent Submission

Bayesian Statistics in One Hour. Patrick Lam

Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh

Standard errors of marginal effects in the heteroskedastic probit model

An extension of the factoring likelihood approach for non-monotone missing data

Basics of Statistical Machine Learning

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Handling attrition and non-response in longitudinal data

SPPH 501 Analysis of Longitudinal & Correlated Data September, 2012

ZHIYONG ZHANG AND LIJUAN WANG

Econometrics Simple Linear Regression

Lecture 15 Introduction to Survival Analysis

Lecture 3: Linear methods for classification

Illustration (and the use of HLM)

Statistical Machine Learning

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Credit Scoring Modelling for Retail Banking Sector.

Missing Data. Paul D. Allison INTRODUCTION

Missing Data in Longitudinal Studies: Dropout, Causal Inference, and Sensitivity Analysis

Introduction to Fixed Effects Methods

MATH5885 LONGITUDINAL DATA ANALYSIS

Applied Missing Data Analysis in the Health Sciences. Statistics in Practice

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

UNTIED WORST-RANK SCORE ANALYSIS

Multilevel Modeling of Complex Survey Data

A hidden Markov model for criminal behaviour classification

Transcription:

Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael

Outline of the talk I Introduction Mixed models Typology of missing data Exploring incomplete data Methods MAR data Conclusion

Longitudinal data : definition Definition : Variables measured at several times on the same subjects Examples : repeated measures of biological markers (CD4, HIV RNA) in HIV patients repeated measures of neuropsychological tests to study cognitive aging Repeated events : dental caries, absences from school or job,...

Longitudinal data analysis Objective : Describe change of the variable with time Identify factors associated with change Problem : Intra-subject correlation

Example : HIV clinical trial X i =1 if treatment A, X i =0 if treatment B Criterion : Change over time of CD4 Repeated measures of CD4 over the follow-up period. t = 0 at initiation of treatment. Y ij = CD4 measure for subject i at time t ij, i = 1,..., N, j = 1,..., n i.

Analysis assuming independence Y ij = β 0 + β 1 t ij + β 2 X i + β 3 X i t ij + ǫ ij with ǫ ij N(O, σ 2 ) and ǫ ij ǫ ij Intra-subject correlation ˆ Var( ˆβ) biased Tests for β biased For time-independent covariate : var(ˆβ 2 ) under-estimated Tests for H 0 : β 2 = 0 anti-conservative (p value too small)

Linear mixed model with random intercept Y ij = (β 0 + γ 0i ) + β 1 t ij + β 2 X i + β 3 X i t ij + ǫ ij with γ 0i N(O, σ 2 0 ), and ǫ ij N(O, σ 2 ) and ǫ ij ǫ ij γ 0i are random variables Only one additional parameter : σ 2 0

Linear mixed model with random intercept (2) Population (marginal) mean : E(Y ij ) = β 0 + β 1 t ij + β 2 X i + β 3 X i t ij Subject-specific (conditional) mean : E(Y ij γ 0i ) = (β 0 + γ 0i ) + β 1 t ij + β 2 X i + β 3 X i t ij Assume common correlation between all the repeated measures

Linear mixed model with random intercept and slope Y ij = (β 0 + γ 0i ) + (β 1 + γ 1i )t ij + β 2 X i + β 3 X i t ij + ǫ ij, γ 0i N(O, σ 2 0 ), γ 1i N(O, σ 2 1 ), ǫ ij N(O, σ 2 ), ǫ ij ǫ ij Population (marginal) mean : E(Y ij ) = β 0 + β 1 t ij + β 2 X i + β 3 X i t ij Subject-specific (conditional) mean : E(Y ij γ i ) = (β 0 + γ 0i ) + (β 1 + γ 1i )t ij + β 2 X i + β 3 X i t ij The correlation between repeated measures depend on measurement times

Linear mixed model : general formulation Y ij = X T ijβ + Z T ijγ i + ǫ ij γ i N(0, B) and ǫ i N(0, R i ). X ij : vector of explanatory variables β : vector of fixed effects Z ij : sub-vector of X ij (including functions of time) γ i : vector of random effects. Population (marginal) mean : E(Y ij ) = X T ij β Subject-specific (conditional) mean : E(Y ij γ i ) = X T ij β + ZT ij γ i

Linear mixed model : example Linear mixed model with AR Gaussian error Y ij = (β 0 + γ 0i ) + (β 1 + γ 1i )t ij + β 2 X i + β 3 X i t ij + w ij + e ij with γ t i = (γ 0i, γ 1i ) N(0, B), e ij N(O, σ 2 ), e ij e ij, w ij N(O, σ 2 w) and Corr(w ij, w ij ) = exp( δ t ij t ij )

Linear mixed model : Estimation Maximum likelihood estimator Y i = (Y i1,..., Y ij,..., Y ini ) T multivariate Gaussian with mean X i β and covariance matrix V i = Z i BZ T i + R i Softwares : SAS Proc mixed, R lme, stata

Generalized linear mixed model Y ij exponential family of distribution and g(e(y ij γ i )) = X T ijβ + Z T ijγ i with γ i N(O, B). Example : Logistic mixed model logit(pr(y ij = 1 γ i )) = Xijβ T + Zijγ T i with γ i N(0, B). Maximum likelihood estimation : Numerical integration Softwares : SAS Proc nlmixed, R nlme, stata

Typology of missing data in longitudinal studies Notation : Y i = (Y obs,i, Y mis,i ) with Y obs,i the observed part of Y i and Y mis,i the missing part, R ij = 1 if Y ij is observed and R ij = 0 if Y ij is missing R i = (R i1,..., R ij,..., R ini ) X i explanatory variables completely observed

Typology of missing data (2) Monotone missing data = dropout : P(R ij = 0 R ij 1 = 0) = 1 R i may be summarized by the time to dropout T i and an indicator for dropout δ i Intermittent missing data : P(R ij = 0 R ij 1 = 0) < 1

Typology of missing data (3) Missing Completely at random (MCAR) : P(R ij = 1) is constant The observed sample is representative of the whole sample. Loss of precision, no bias Covariate-dependent missingness process : P(R ij = 1) = f(x i ) Loss of precision, no bias if analyses are adjusted on X i

Typology of missing data (4) Missing at random (MAR) : P(R ij = 1) = f(y obs,i, X i ) Example : Probability of dropout depends on past observed values Loss of precision, no bias with appropriate statistical methods Informatives or MNAR : P(R ij = 1) = f(y mis,i, Y obs,i, X i ) Example : Probability that Y be observed depends on current Y value Loss of precision, biases Sensitivity analyses

Exploring incomplete data Describe missing data frequency Cross classify missing data patterns with covariates Compare mean evolution for available data and complete cases Compare mean evolution until time t given observation status at time t + 1 Logistic regression for P(R ij = 1) given covariates and Y ik, k < j Cox regression for time to dropout given covariates Impossible to distinguish MAR from MNAR

An example : Paquid data set The Paquid Cohort in Gironde 2792 subjects of 65 years and older at baseline Living at home at the beginning of the study (1988) in Gironde (France) Seen at home at 1, 3, 5, 8, and 10 years after the baseline visit Cognitive measure : Digit Symbol Substitution Test of Wechsler (attention, limited time to 90s) Sample : 2026 subjects without diagnosis of dementia between T0 and T10 with the test completed at least once (at T0)

Description of dropout : Kaplan-Meyer Dropout time (=event) : first visit with missing score Probability to be in the cohort 1 95% confidence interval Kaplan-Meyer estimate 0.9 0.8 0.7 Probability 0.6 0.5 0.4 0.3 0.2 0 1 5 8 10 Follow-up time

Observed means of the DSST score given time 40 Available data 35 30 Score 25 20 15 10 65-69 years 70-74 75-79 80 and + Age

Observed means of the DSST score given time 40 Complete data Available data 35 30 Score 25 20 15 10 65-69 years 70-74 75-79 80 and + Age

Logistic regression model for dropout in the first 5 years Covariates OR 95% CI of the OR T3 0.02 0.003-0.10 T5 0.01 0.001-0.09 age 1.01 0.99-1.02 age T3 1.05 1.03-1.08 age T5 1.06 1.03-1.09 previous MMSE score 0.91 0.88-0.93 men 0.86 0.75-0.99 Education (vs university level) No education 1.88 1.15-3.07 no diploma 2.02 1.39-2.93 CEP 1.67 1.17-2.40 high school level 1.39 0.96-2.00

Methods for MCAR or MAR data Complete case analysis (loss of precision, require MCAR) Imputation (require MCAR or MAR) Maximum likelihood using available data (require MAR)

Maximum likelihood for MAR data (1) Objective : Estimate θ from the distribution f(y θ) Likelihood of the observed data : Y obs, R f(y obs, R θ, ψ) = f(y obs, Y mis θ)f(r Y obs, Y mis, ψ)dy mis

Maximum likelihood for MAR data (2) If the data are MAR : f(y obs, R θ, ψ) = f(y obs, Y mis θ)f(r Y obs, ψ)dy mis = f(r Y obs, ψ) f(y obs, Y mis θ)dy mis Log-likelihood : = f(r Y obs, ψ)f(y obs θ) l(θ, ψ Y obs, R) = l(θ Y obs ) + l(ψ R, Y obs ) If ψ and θ are distinct : the missing data are ignorable θ is estimated by maximisation of l(θ Y obs ) using only available reponses.

Example : MAR analysis of Paquid data Mixed effect model Y ij test score for subject i at time t ij Y ij = (β 0 + age iγ 0 + α 0i ) + (β 1 + age iγ 1 + α 1i ) t ij + β 3 I {tij =0} + e ij with α i = (α 0i α 1i ) T N(0, G), e ij N ( 0, σe 2 ) age i vector of indicators for baseline age classes (70-74, 75-79, 80 years and older, ref= 65-69) I {tij =0} indicator of the baseline visit

Observed and predicted means of the score given time 40 35 Complete data Available data Mixed model (MAR) 30 Score 25 20 15 10 65-69 years 70-74 75-79 80 and + Age

Advantages of mixed models Conclusion use all the available information (repeated measures) Flexibly handle intra-subject correlation (unbiased inference) Any number and times of measurements Robust to missing at random data Available in most softwares Limits of mixed models Assume homogeneous population extended models included latent classes(mixture) As the MAR assumption is uncheckable, complete the study by a sensitivity analysis extended models for MNAR data

References Chavance, M. et Manfredi R. Modélisation d observation incomplètes. Revue d Epidémiologie et Santé Publique 2000,48,389-400. Diggle PJ, Heagerty P, Liang KY, Zeger SL. Analysis of Longitudinal Data.2nd Edition. Oxford Statistical Science series 2002, Oxford University Press. Jacqmin-Gadda H, Commenges D, Dartigues JF. Analyse de données longitudinales gaussiennes comportant des données manquantes sur la variable à expliquer. Revue d Epidémiologie et Santé Publique 1999, 47,525-534. Little R.J.A. et Rubin D.B. Statistical Analysis with Missing Data, New York : John Wiley & Sons, 1987. Verbeke G and Molenberghs G Linear mixed models for longitudinal data. Springer Series in Statistics, Springer-Verlag,2000, New-York.