APPLIED MISSING DATA ANALYSIS



Similar documents
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Analyzing Structural Equation Models With Missing Data

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Applied Missing Data Analysis

A Basic Introduction to Missing Data

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Craig K. Enders Arizona State University Department of Psychology

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

Handling attrition and non-response in longitudinal data

A Review of Methods for Missing Data

ZHIYONG ZHANG AND LIJUAN WANG

IBM SPSS Missing Values 22

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Imputing Missing Data using SAS

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Best Practices for Missing Data Management in Counseling Psychology

An introduction to modern missing data analyses

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Statistics Graduate Courses

Chapter 7 Factor Analysis SPSS

Analysis of Longitudinal Data with Missing Values.

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Data Cleaning and Missing Data Analysis

Missing Data. Katyn & Elena

Dealing with Missing Data

Statistical Machine Learning

Multiple Imputation for Missing Data: A Cautionary Tale

Bayesian networks - Time-series models - Apache Spark & Scala

Problem of Missing Data

Lecture 3: Linear methods for classification

Regression Modeling Strategies

Comparison of Estimation Methods for Complex Survey Data Analysis

A Brief Introduction to SPSS Factor Analysis

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Running head: SCHOOL COMPUTER USE AND ACADEMIC PERFORMANCE. Using the U.S. PISA results to investigate the relationship between

SAS Certificate Applied Statistics and SAS Programming

How To Understand The Theory Of Probability

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Missing Data. Paul D. Allison INTRODUCTION

Multivariate Normal Distribution

IBM SPSS Missing Values 20

11. Time series and dynamic linear models

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Reject Inference in Credit Scoring. Jie-Men Mok

Linear regression methods for large n and streaming data

Parametric fractional imputation for missing data analysis

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A MULTIVARIATE OUTLIER DETECTION METHOD

Missing Data Techniques for Structural Equation Modeling

Univariate and Multivariate Methods PEARSON. Addison Wesley

STA 4273H: Statistical Machine Learning

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide

Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization

Multivariate Statistical Inference and Applications

Analysis of Bayesian Dynamic Linear Models

Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set

STATISTICA Formula Guide: Logistic Regression. Table of Contents

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Dealing with missing data: Key assumptions and methods for applied analysis

Probability and Statistics

Dimensionality Reduction: Principal Components Analysis

Centre for Central Banking Studies

BayesX - Software for Bayesian Inference in Structured Additive Regression

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

lavaan: an R package for structural equation modeling

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Data Preparation and Statistical Displays

A distribution-based stochastic model of cohort life expectancy, with applications

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Before LISREL: Preparing the Data using PRELIS

Latent Class Regression Part II

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Missing data: the hidden problem

A General Approach to Variance Estimation under Imputation for Missing Survey Data

Handling missing data in Stata a whirlwind tour

An Introduction to Latent Class Growth Analysis and Growth Mixture Modeling

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Linear Threshold Units

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Transcription:

APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London

Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview 2 1.3 Missing Data Patterns 2 1.4 A Conceptual Overview of Missing Data Theory 5 1.5 A More Formal Description of Missing Data Theory 9 1.6 Why Is the Missing Data Mechanism Important? 13 1.7 How Plausible Is the Missing at Random Mechanism? 14 1.8 An Inclusive Analysis Strategy 16 1.9 Testing the Missing Completely at Random Mechanism 17 1.10 Planned Missing Data Designs 21 1.11 The Three-Form Design 23 1.12 Planned Missing Data for Longitudinal Designs 28 1.13 Conducting Power Analyses for Planned Missing Data Designs 30 1.14 Data Analysis Example 32 1.15 Summary 35 1.16 Recommended Readings 36 2 Traditional Methods for Dealing with Missing Data 37 2.1 Chapter Overview 37 2.2 An Overview of Deletion Methods 39 2.3 Listwise Deletion 39 2.4 Pairwise Deletion 40 2.5 An Overview of Single Imputation Methods 42 2.6 Arithmetic Mean Imputation 42 2.7 Regression Imputation 44 2.8 Stochastic Regression Imputation 46 2.9 Hot-Deck Imputation 49 2.10 Similar Response Pattern Imputation 49 2.11 Averaging the Available Items 50 2.12 Last Observation Carried Forward 51 2.13 An Illustrative Computer Simulation Study 52

xü Contents 2.14 Summary 54 2.15 Recommended Readings 55 3 An Introduction to Maximum Likelihood Estimation 56 3.1 Chapter Overview 56 3.2 The Univariate Normal Distribution 56 3.3 The Sample Likelihood 59 3.4 The Log-Likelihood 60 3.5 Estimating Unknown Parameters 60 3.6 The Role of First Derivatives 63 3.7 Estimating Standard Errors 65 3.8 Maximum Likelihood Estimation with Multivariate Normal Data 69 3.9 A Bivariate Analysis Example 73 3.10 Iterative Optimization Algorithms 75 3.11 Significance Testing Using the Wald Statistic 77 3.12 The Likelihood Ratio Test Statistic 78 3.13 Should I Use the Wald Test or the Likelihood Ratio Statistic? 79 3.14 Data Analysis Example 1 80 3.15 Data Analysis Example 2 81 3.16 Summary 83 3.17 Recommended Readings 85 4 Maximum Likelihood Missing Data Handling 86 4.1 Chapter Overview 86 4.2 The Missing Data Log-Likelihood 88 4.3 How Do the Incomplete Data Records Improve Estimation? 92 4.4 An Illustrative Computer Simulation Study 95 4.5 Estimating Standard Errors with Missing Data 97 4.6 Observed versus Expected Information 98 4.7 A Bivariate Analysis Example 99 4.8 An Illustrative Computer Simulation Study 102 4.9 An Overview of the EM Algorithm 103 4.10 A Detailed Description of the EM Algorithm 105 4.11 A Bivariate Analysis Example 106 4.12 Extending EM to Multivariate Data 110 4.13 Maximum Likelihood Estimation Software Options 112 4.14 Data Analysis Example 1 113 4.15 Data Analysis Example 2 115 4.16 Data Analysis Example 3 118 4.17 Data Analysis Example 4 119 4.18 Data Analysis Example 5 122 4.19 Summary 125 4.20 Recommended Readings 126 5 Improving the Accuracy of Maximum Likelihood Analyses 127 5.1 Chapter Overview 127 5.2 The Rationale for an Inclusive Analysis Strategy 127 5.3 An Illustrative Computer Simulation Study 129 D. 5.4 Identifying a Set of Auxiliary Variables 131

Contents xiü 5.5 Incorporating Auxiliary Variables into a Maximum Likelihood Analysis 133 5.6 The Saturated Correlates Model 134 5.7 The Impact of Non-Normal Data 140 5.8 Robust Standard Errors 141 5.9 Bootstrap Standard Errors 145 5.10 The Rescaled Likelihood Ratio Test 148 5.11 Bootstrapping the Likelihood Ratio Statistic 150 5.12 Data Analysis Example 1 154 5.13 Data Analysis Example 2 155 5.14 Data Analysis Example 3 157 5.15 Summary 161 5.16 Recommended Readings 163 6 * An Introduction to Bayesian Estimation 164 6.1 Chapter Overview 164 6.2 What Makes Bayesian Statistics Different? 165 6.3 A Conceptual Overview of Bayesian Estimation 165 6.4 Bayes' Theorem 170 6.5 An Analysis Example 171 6.6 How Does Bayesian Estimation Apply to Multiple Imputation? 175 6.7 The Posterior Distribution of the Mean 176 6.8 The Posterior Distribution of the Variance 179 6.9 The Posterior Distribution of a Covariance Matrix 183 6.10 Summary 185 6.11 Recommended Readings 186 7 The Imputation Phase of Multiple Imputation 1 87 7.1 Chapter Overview 187 7.2 A Conceptual Description of the Imputation Phase 190 7.3 A Bayesian Description of the Imputation Phase 191 7.4 A Bivariate Analysis Example 194 7.5 Data Augmentation with Multivariate Data 199 7.6 Selecting Variables for Imputation 201 7.7 The Meaning of Convergence 202 7.8 Convergence Diagnostics 203 7.9 Time-Series Plots 204 7.10 Autocorrelation Function Plots 207 7.11 Assessing Convergence from Alternate Starting Values 209 7.12 Convergence Problems 210 7.13 Generating the Final Set of Imputations 211 7.14 How Many Data Sets Are Needed? 212 7.15 Summary 214 7.16 Recommended Readings 216 8 The Analysis and Pooling Phases of Multiple Imputation 217 8.1 Chapter Overview 217 8.2 The Analysis Phase 218 8.3 Combining Parameter Estimates in the Pooling Phase 219 8.4 Transforming Parameter Estimates Pnor to Combining 220

xiv Contents 8.5 Pooling Standard Errors 221 8.6 The Fraction of Missing Information and the Relative Increase in Variance 224 8.7 When Is Multiple Imputation Comparable to Maximum Likelihood? 227 8.8 An Illustrative Computer Simulation Study 229 8.9 Significance Testing Using the í Statistic 230 8.10 An Overview of Multiparameter Significance Tests 233 8.11 Testing Multiple Parameters Using the Dj Statistic 233 8.12 Testing Multiple Parameters by Combining Wald Tests 239 8.13 Testing Multiple Parameters by Combining Likelihood Ratio Statistics 240 8.14 Data Analysis Example 1 242 8.15 Data Analysis Example 2 245 8.16 Data Analysis Example 3 247 8.17 Summary 252 8.18 Recommended Readings 252 9 Practical Issues in Multiple Imputation 254 9.1 Chapter Overview 254 9.2 Dealing with Convergence Problems 254 9.3 Dealing with Non-Normal Data 259 9.4 To Round or Not to Round? 261 9.5 Preserving Interaction Effects 265 9.6 Imputing Multiple-Item Questionnaires 269 9.7 Alternate Imputation Algorithms 272 9.8 Multiple-Imputation Software Options 278 9.9 Data Analysis Example 1 279 9.10 Data Analysis Example 2 281 9.11 Summary 283 9.12 Recommended Readings 286 10 Models for Missing Not at Random Data 287 10.1 Chapter Overview 287 10.2 An Ad Hoc Approach to Dealing with MNAR Data 289 10.3 The Theoretical Rationale for MNAR Models 290 10.4 The Classic Selection Model 291 10.5 Estimating the Selection Model 295 10.6 Limitations of the Selection Model 296 10.7 An Illustrative Analysis 297 10.8 The Pattern Mixture Model 298 10.9 Limitations of the Pattern Mixture Model 300 10.10 An Overview of the Longitudinal Growth Model 301 10.11 A Longitudinal Selection Model 303 10.12 Random Coefficient Selection Models 305 10.13 Pattern Mixture Models for Longitudinal Analyses 306 10.14 Identification Strategies for Longitudinal Pattern Mixture Models 307 10.15 Delta Method Standard Errors 309 10.16 Overview of the Data Analysis Examples 312 10.17 Data Analysis Example 1 314 10.18 Data Analysis Example 2 315 10.19 Data Analysis Example 3 317 10.20 Data Analysis Example 4 321 10.21 Summary 326 10.22 Recommended Readings 328

Contents xv 11«Wrapping Things Up: Some Final Practical Considerations 11.1 Chapter Overview 329 11.2 Maximum Likelihood Software Options 329 11.3 Multiple-Imputation Software Options 333 11.4 Choosing between Maximum Likelihood and Multiple Imputation 11.5 Reporting the Results from a Missing Data Analysis 340 11.6 Final Thoughts 343 11.7 Recommended Readings 344 336 329 References Author Index Subject Index About the Author 347 359 365 377 The companion website (yvww.appliedmissingdata.com) includes data files and syntax for the examples in the book, as well as up-to-date information on software.