Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out



Similar documents
Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

A Basic Introduction to Missing Data

Problem of Missing Data

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh

Analyzing Structural Equation Models With Missing Data

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Imputation and Analysis. Peter Fayers

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

How to choose an analysis to handle missing data in longitudinal observational studies

Dealing with Missing Data

Handling missing data in Stata a whirlwind tour

Introduction to mixed model and missing data issues in longitudinal studies

Bayesian Approaches to Handling Missing Data

Handling attrition and non-response in longitudinal data

A Review of Methods for Missing Data

2. Making example missing-value datasets: MCAR, MAR, and MNAR

An introduction to modern missing data analyses

Dealing with Missing Data

APPLIED MISSING DATA ANALYSIS

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Multiple Imputation for Missing Data: A Cautionary Tale

Analysis of Longitudinal Data with Missing Values.

Missing Data. Katyn & Elena

Guideline on missing data in confirmatory clinical trials

A Review of Missing Data Treatment Methods

Data Cleaning and Missing Data Analysis

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set

MISSING DATA IN NON-PARAMETRIC TESTS OF CORRELATED DATA

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku,

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Workpackage 11 Imputation and Non-Response. Deliverable 11.2

Dr James Roger. GlaxoSmithKline & London School of Hygiene and Tropical Medicine.

Missing Data: Patterns, Mechanisms & Prevention. Edith de Leeuw

IBM SPSS Missing Values 20

Missing Data Sensitivity Analysis of a Continuous Endpoint An Example from a Recent Submission

Study Design and Statistical Analysis

Missing data are ubiquitous in clinical research.

Methodological Challenges in Analyzing Patient-reported Outcomes

Applied Missing Data Analysis in the Health Sciences. Statistics in Practice

Application in Predictive Analytics. FirstName LastName. Northwestern University

Missing data in randomized controlled trials (RCTs) can

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey

IBM SPSS Missing Values 22

Imputation of missing network data: Some simple procedures

Sensitivity Analysis in Multiple Imputation for Missing Data

Best Practices for Missing Data Management in Counseling Psychology

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy

Imputation of missing data under missing not at random assumption & sensitivity analysis

Analysis of Various Techniques to Handling Missing Value in Dataset Rajnik L. Vaishnav a, Dr. K. M. Patel b a

Critical Appraisal of Article on Therapy

1. I have 4 sides. My opposite sides are equal. I have 4 right angles. Which shape am I?

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Exploratory Data Analysis

An Analysis of Four Missing Data Treatment Methods for Supervised Learning

Imputing Missing Data using SAS

Missing data and net survival analysis Bernard Rachet

Descriptive Methods Ch. 6 and 7

Missing Data. Paul D. Allison INTRODUCTION

M. Ehren N. Shackleton. Institute of Education, University of London. June Grant number: LLP-NL-KA1-KA1SCR

Module 14: Missing Data Stata Practical

Missing data: the hidden problem

Craig K. Enders Arizona State University Department of Psychology

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

Reject Inference in Credit Scoring. Jie-Men Mok

II. DISTRIBUTIONS distribution normal distribution. standard scores

Advances in Missing Data Methods and Implications for Educational Research. Chao-Ying Joanne Peng, Indiana University-Bloomington

Study Designs. Simon Day, PhD Johns Hopkins University

HCUP Methods Series Missing Data Methods for the NIS and the SID Report #

Missing Data Part 1: Overview, Traditional Methods Page 1

A Review of Methods. for Dealing with Missing Data. Angela L. Cool. Texas A&M University

In almost any research you perform, there is the potential for missing or

Electronic Theses and Dissertations UC Riverside

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Dealing with missing data: Key assumptions and methods for applied analysis

Introduction to Longitudinal Data Analysis

The PCORI Methodology Report. Appendix A: Methodology Standards

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Transcription:

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014

Objectives Baseline Adjustment Introduce approaches Guidance on when to use different approaches Missing Data/Drop-out Raise awareness regarding issues/challenges caused by missing data Importance for study design and data analysis Basic understanding of approaches to handling with missing data

In longitudinal studies, subjects typically have a baseline measurement Interest is commonly on differences in change over time between groups Does the degree of change differ between groups? Differences in starting values (i.e., baseline) important to consider when trying to assess change over time. 6 8 10 12 14 Time

Four options for baseline adjustment 1. Retain baseline value as outcome with no assumptions about group differences at baseline 2. Retain baseline value as outcome and assume group means are equal at baseline 3. Subtract baseline from post baseline responses and analyze differences from baseline 4. Include baseline value as a covariate.

Retain baseline as outcome; No assumptions at baseline Group 1 Group 2 Time Allow intercepts (baselines) to differ between groups

Retain baseline as outcome; Assume equal at baseline Group 1 Group 2 Time Assume same intercepts (baselines) in both groups

Subtract baseline from post-baseline responses Define new variable as response variable Model as before Interpretation of results a bit different Group Are there differences at time 2? Group Time Are the lines parallel from time 2 to n? Joint test of Group and Group Time required to evaluate whether the patterns of change are the same over time

Use Baseline as covariate Outcome becomes adjusted change scores (i.e., change over time adjusted for baseline) Similar interpretation issues as Approach 3

Relationship Among Approaches Retain baseline as outcome? YES NO Assume equal means at baseline? Analyze change from baseline Include baseline as covariate YES NO Approach 1 Approach 2 Approach 3 Approach 4

Which approach to use? Randomized or Observational Study? If randomized, reasonable to assume equal baseline values across groups Approach 2 If observational Approach 2 if reasonable to assume equal baseline values across groups Approach 1 if baseline values differ across groups Approaches 3 and 4 applicable where Approaches 1 and 2 are applicable, respectively.

What is it? What does it matter? What do we do about it?

What are missing data and drop-out? Missing Data Observations researcher was to collect but didn t Many different causes for missing data Not specific to longitudinal data but common Drop-out Subjects leave a study before the intended end Special class of missing data unique to longitudinal data

What does it matter? Potential for bias and incorrect inferences Bias can be severe Loss of information/power Reduced precision and efficiency of estimates relative to complete data Data are unbalanced over time Problem for some analytical methods

Six Cities Study of Air Pollution and Health Hypothetical Weight Loss Study Muscatine Coronary Risk Factor Study

Six Cities Study of Air Pollution and Health Objective: Characterize lung function growth in children Enrolled 1 st /2 nd grade, followed until graduation Annual lung function tests Wide range (1-12) of observations per child Late enrollment moved into school district after 2 nd grade Drop out moved out of school district Consider reasons for moving out of district

Hypothetical Weight Loss Study Objective: Determine if coached program is more effective than on-line program Randomize subjects to each program Collect weight weekly for 3 months Types of missing values Drop-out: missing all values after time t Missing observation: missing one or more observations in the middle of the study What could cause the missing values?

Muscatine Coronary Risk Factor Study Objective: Examine development and persistence of coronary disease risk factors Children aged 5-15 Measured height and weight biennially; classified children as obese or not Parental consent required for each measurement Less 40% of children with complete data What factors contribute to missing values? No consent form Child absent from school on day of measurements

Missing Data Mechanisms 3 types distinguished based on relationship between the probability of missingness and the actual values (observed or unobserved) Missing Completely at Random (MCAR) Missing at Random (MAR) Not Missing at Random (NMAR) Mechanisms have different assumptions and methods for adequately handling missing values differ among the mechanisms

Missing Completely at Random Probability of missing response is unrelated to The value of the response had it been obtained The value of observed responses Examples: Missed appointment due to car trouble Variables measured on a subset of subjects by study design Missingness is simply chance event unrelated to any of the data observed or unobserved Observed data can be considered random sample of the complete data

Missing at Random Probability of missing response depends on the set of observed responses but unrelated to the specific missing value that would have been observed Examples: Removal of subject from study once pre-specified value obtained by study design Higher educated people don t report income Observed data can NOT be considered random sample of the complete data

Not Missing at Random Probability of missing response is related to the specific values that would have been obtained Examples Value is below the detection limit People with higher incomes don t report income Subjects skips appointment because of weight gain Missingness is non-ignorable

Revisit Examples Weight Loss Study Moves out of area - MCAR Achieves goal weight MAR or MNAR Not losing weight MAR or MNAR Air Pollution and Health Study Job relocation MCAR Child developed respiratory problems MAR Avoid developing respiratory problems MNAR Coronary Risk Factor Study Forgot to sign consent - MCAR Obese child feigns illness to avoid weighing MNAR

Approaches to Handling Missing Data Deletion Methods Complete-case analysis (listwise deletion) Available-data analysis (pairwise deletion) Single Imputation Methods Model-Based Methods Multiple imputation Maximum likelihood

Deletion Methods Complete-Case Analysis Only analyze subjects with complete data Available-Data Analysis Analyzing all data that was observed Different analytical methods can handle partial data (e.g., random effect models) More efficient/power than complete case because uses more information

Deletion Methods Advantages and Disadvantages Advantages Simple; available-data analysis is default for statistics programs Disadvantages Reduced sample size Complete-case analysis discards data Biased estimates unless data is MCAR

Single Imputation Substitute missing values with an imputed value Analyze complete data using standard methods Many different approaches to single imputation

Single Imputation Methods Mean value imputation Substitute mean value for missing value Last value carried forward imputation Use last value observed Regression imputation Replaces missing value with value predicted from regression derived from observed data K-nearest neighbor imputation Impute value based on k most similar subjects

Single Imputation Methods Advantages and Disadvantages Advantages Simple to implement and understand Maintains sample size Uses all available information Disadvantages Can reduce variability in the data Can weaken correlations/covariances Reduce standard errors because it doesn t reflect the uncertainty about the predicted unknown values

Maximum Likelihood Parameters estimated based on maximum likelihood using available data Random effect models implement this approach Advantages Uses all available information Unbiased estimates for MCAR and MAR data Disadvantages Model must be correctly specified

Multiple Imputation Missing values are imputed from a model (e.g., regression model) Imputation conducted multiple times Replacing missing value with a set of plausible values Each imputed data is analyzed Results from analysis of each imputed data set are pooled into single estimate

Multiple Imputation Advantages and Disadvantages Advantages Better reflects data variability Considers variability due to sampling and imputation Disadvantages More time and computer intensive

What if I have MNAR missingness? Selection models Pattern mixture models Random effect models Shared parameter models

What to do study design? Carefully consider potential challenges to obtaining complete data Duration of study, number of visits/surveys, travel distance, participant characteristics/motivations Provide appropriate compensation/incentives Plan to enhance/support/encourage completion If possible, collect information about why an observation is missing

What to do data analysis? Evaluate missingness in data How much data is missing? Are there patterns to missingness? Are there differences between subjects with complete and incomplete data? Are there differences in missingness among experimental groups? Within experimental groups? Consider and compare alternative approaches to addressing missing data