Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014
Objectives Baseline Adjustment Introduce approaches Guidance on when to use different approaches Missing Data/Drop-out Raise awareness regarding issues/challenges caused by missing data Importance for study design and data analysis Basic understanding of approaches to handling with missing data
In longitudinal studies, subjects typically have a baseline measurement Interest is commonly on differences in change over time between groups Does the degree of change differ between groups? Differences in starting values (i.e., baseline) important to consider when trying to assess change over time. 6 8 10 12 14 Time
Four options for baseline adjustment 1. Retain baseline value as outcome with no assumptions about group differences at baseline 2. Retain baseline value as outcome and assume group means are equal at baseline 3. Subtract baseline from post baseline responses and analyze differences from baseline 4. Include baseline value as a covariate.
Retain baseline as outcome; No assumptions at baseline Group 1 Group 2 Time Allow intercepts (baselines) to differ between groups
Retain baseline as outcome; Assume equal at baseline Group 1 Group 2 Time Assume same intercepts (baselines) in both groups
Subtract baseline from post-baseline responses Define new variable as response variable Model as before Interpretation of results a bit different Group Are there differences at time 2? Group Time Are the lines parallel from time 2 to n? Joint test of Group and Group Time required to evaluate whether the patterns of change are the same over time
Use Baseline as covariate Outcome becomes adjusted change scores (i.e., change over time adjusted for baseline) Similar interpretation issues as Approach 3
Relationship Among Approaches Retain baseline as outcome? YES NO Assume equal means at baseline? Analyze change from baseline Include baseline as covariate YES NO Approach 1 Approach 2 Approach 3 Approach 4
Which approach to use? Randomized or Observational Study? If randomized, reasonable to assume equal baseline values across groups Approach 2 If observational Approach 2 if reasonable to assume equal baseline values across groups Approach 1 if baseline values differ across groups Approaches 3 and 4 applicable where Approaches 1 and 2 are applicable, respectively.
What is it? What does it matter? What do we do about it?
What are missing data and drop-out? Missing Data Observations researcher was to collect but didn t Many different causes for missing data Not specific to longitudinal data but common Drop-out Subjects leave a study before the intended end Special class of missing data unique to longitudinal data
What does it matter? Potential for bias and incorrect inferences Bias can be severe Loss of information/power Reduced precision and efficiency of estimates relative to complete data Data are unbalanced over time Problem for some analytical methods
Six Cities Study of Air Pollution and Health Hypothetical Weight Loss Study Muscatine Coronary Risk Factor Study
Six Cities Study of Air Pollution and Health Objective: Characterize lung function growth in children Enrolled 1 st /2 nd grade, followed until graduation Annual lung function tests Wide range (1-12) of observations per child Late enrollment moved into school district after 2 nd grade Drop out moved out of school district Consider reasons for moving out of district
Hypothetical Weight Loss Study Objective: Determine if coached program is more effective than on-line program Randomize subjects to each program Collect weight weekly for 3 months Types of missing values Drop-out: missing all values after time t Missing observation: missing one or more observations in the middle of the study What could cause the missing values?
Muscatine Coronary Risk Factor Study Objective: Examine development and persistence of coronary disease risk factors Children aged 5-15 Measured height and weight biennially; classified children as obese or not Parental consent required for each measurement Less 40% of children with complete data What factors contribute to missing values? No consent form Child absent from school on day of measurements
Missing Data Mechanisms 3 types distinguished based on relationship between the probability of missingness and the actual values (observed or unobserved) Missing Completely at Random (MCAR) Missing at Random (MAR) Not Missing at Random (NMAR) Mechanisms have different assumptions and methods for adequately handling missing values differ among the mechanisms
Missing Completely at Random Probability of missing response is unrelated to The value of the response had it been obtained The value of observed responses Examples: Missed appointment due to car trouble Variables measured on a subset of subjects by study design Missingness is simply chance event unrelated to any of the data observed or unobserved Observed data can be considered random sample of the complete data
Missing at Random Probability of missing response depends on the set of observed responses but unrelated to the specific missing value that would have been observed Examples: Removal of subject from study once pre-specified value obtained by study design Higher educated people don t report income Observed data can NOT be considered random sample of the complete data
Not Missing at Random Probability of missing response is related to the specific values that would have been obtained Examples Value is below the detection limit People with higher incomes don t report income Subjects skips appointment because of weight gain Missingness is non-ignorable
Revisit Examples Weight Loss Study Moves out of area - MCAR Achieves goal weight MAR or MNAR Not losing weight MAR or MNAR Air Pollution and Health Study Job relocation MCAR Child developed respiratory problems MAR Avoid developing respiratory problems MNAR Coronary Risk Factor Study Forgot to sign consent - MCAR Obese child feigns illness to avoid weighing MNAR
Approaches to Handling Missing Data Deletion Methods Complete-case analysis (listwise deletion) Available-data analysis (pairwise deletion) Single Imputation Methods Model-Based Methods Multiple imputation Maximum likelihood
Deletion Methods Complete-Case Analysis Only analyze subjects with complete data Available-Data Analysis Analyzing all data that was observed Different analytical methods can handle partial data (e.g., random effect models) More efficient/power than complete case because uses more information
Deletion Methods Advantages and Disadvantages Advantages Simple; available-data analysis is default for statistics programs Disadvantages Reduced sample size Complete-case analysis discards data Biased estimates unless data is MCAR
Single Imputation Substitute missing values with an imputed value Analyze complete data using standard methods Many different approaches to single imputation
Single Imputation Methods Mean value imputation Substitute mean value for missing value Last value carried forward imputation Use last value observed Regression imputation Replaces missing value with value predicted from regression derived from observed data K-nearest neighbor imputation Impute value based on k most similar subjects
Single Imputation Methods Advantages and Disadvantages Advantages Simple to implement and understand Maintains sample size Uses all available information Disadvantages Can reduce variability in the data Can weaken correlations/covariances Reduce standard errors because it doesn t reflect the uncertainty about the predicted unknown values
Maximum Likelihood Parameters estimated based on maximum likelihood using available data Random effect models implement this approach Advantages Uses all available information Unbiased estimates for MCAR and MAR data Disadvantages Model must be correctly specified
Multiple Imputation Missing values are imputed from a model (e.g., regression model) Imputation conducted multiple times Replacing missing value with a set of plausible values Each imputed data is analyzed Results from analysis of each imputed data set are pooled into single estimate
Multiple Imputation Advantages and Disadvantages Advantages Better reflects data variability Considers variability due to sampling and imputation Disadvantages More time and computer intensive
What if I have MNAR missingness? Selection models Pattern mixture models Random effect models Shared parameter models
What to do study design? Carefully consider potential challenges to obtaining complete data Duration of study, number of visits/surveys, travel distance, participant characteristics/motivations Provide appropriate compensation/incentives Plan to enhance/support/encourage completion If possible, collect information about why an observation is missing
What to do data analysis? Evaluate missingness in data How much data is missing? Are there patterns to missingness? Are there differences between subjects with complete and incomplete data? Are there differences in missingness among experimental groups? Within experimental groups? Consider and compare alternative approaches to addressing missing data