HANDLING MISSING DATA IN CLINICAL TRIALS: AN OVERVIEW



Similar documents
A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Problem of Missing Data

Multiple Imputation for Missing Data: A Cautionary Tale

A Basic Introduction to Missing Data

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Missing Data Sensitivity Analysis of a Continuous Endpoint An Example from a Recent Submission

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Handling missing data in Stata a whirlwind tour

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Analysis Issues II. Mary Foulkes, PhD Johns Hopkins University

Dealing with Missing Data

The Consequences of Missing Data in the ATLAS ACS 2-TIMI 51 Trial

TUTORIAL on ICH E9 and Other Statistical Regulatory Guidance. Session 1: ICH E9 and E10. PSI Conference, May 2011

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

Imputing Missing Data using SAS

Introduction to mixed model and missing data issues in longitudinal studies

Sensitivity Analysis in Multiple Imputation for Missing Data

PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku,

How to choose an analysis to handle missing data in longitudinal observational studies

Guideline on missing data in confirmatory clinical trials

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling attrition and non-response in longitudinal data

MISSING DATA: THE POINT OF VIEW OF ETHICAL COMMITTEES

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)

Dealing with Missing Data

The Prevention and Treatment of Missing Data in Clinical Trials: An FDA Perspective on the Importance of Dealing With It

Imputation and Analysis. Peter Fayers

Dr James Roger. GlaxoSmithKline & London School of Hygiene and Tropical Medicine.

1.0 Abstract. Title: Real Life Evaluation of Rheumatoid Arthritis in Canadians taking HUMIRA. Keywords. Rationale and Background:

Missing data in randomized controlled trials (RCTs) can

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Missing data and net survival analysis Bernard Rachet

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Applied Missing Data Analysis in the Health Sciences. Statistics in Practice

2. Making example missing-value datasets: MCAR, MAR, and MNAR

Bayesian Statistical Analysis in Medical Research

Study Design and Statistical Analysis

Comparison/Control Groups. Mary Foulkes, PhD Johns Hopkins University

Analyzing Structural Equation Models With Missing Data

Workpackage 11 Imputation and Non-Response. Deliverable 11.2

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Intent-to-treat Analysis of Randomized Clinical Trials

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Missing data are ubiquitous in clinical research.

Analysis of Longitudinal Data with Missing Values.

II. DISTRIBUTIONS distribution normal distribution. standard scores

Implementation of Pattern-Mixture Models Using Standard SAS/STAT Procedures

STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS

Personalized Predictive Medicine and Genomic Clinical Trials

Bridging Statistical Analysis Plan and ADaM Datasets and Metadata for Submission

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)

Discussion. Seppo Laaksonen Introduction

This clinical study synopsis is provided in line with Boehringer Ingelheim s Policy on Transparency and Publication of Clinical Study Data.

What are confidence intervals and p-values?

Fairfield Public Schools

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Additional sources Compilation of sources:

Introduction to Fixed Effects Methods

The Product Review Life Cycle A Brief Overview

Missing data: the hidden problem

Comparison of Imputation Methods in the Survey of Income and Program Participation

A Pseudo Nearest-Neighbor Approach for Missing Data Recovery on Gaussian Random Data Sets

A Review of Methods. for Dealing with Missing Data. Angela L. Cool. Texas A&M University

Application in Predictive Analytics. FirstName LastName. Northwestern University

Design and Analysis of Phase III Clinical Trials

Paper PO06. Randomization in Clinical Trial Studies

UNTIED WORST-RANK SCORE ANALYSIS

Data Cleaning and Missing Data Analysis

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Likelihood Approaches for Trial Designs in Early Phase Oncology

Randomized trials versus observational studies

Proof-of-Concept Studies and the End of Phase IIa Meeting with the FDA

If several different trials are mentioned in one publication, the data of each should be extracted in a separate data extraction form.

NON-PROBABILITY SAMPLING TECHNIQUES

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

Descriptive Methods Ch. 6 and 7

UNIVERSITY OF NAIROBI

Chapter 3. Sampling. Sampling Methods

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Sample Size and Power in Clinical Trials

What is a P-value? Ronald A. Thisted, PhD Departments of Statistics and Health Studies The University of Chicago

Guidance for Industry

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

Fixed-Effect Versus Random-Effects Models

Causality Assessment in Practice Pharmaceutical Industry Perspective. Lachlan MacGregor Senior Safety Scientist F. Hoffmann-La Roche Ltd.

U.S. Food and Drug Administration

Transcription:

Drug Information Journal, Vol. 34, pp. 525 533, 2000 0092-8615/2000 Printed in the USA. All rights reserved. Copyright 2000 Drug Information Association Inc. HANDLING MISSING DATA IN CLINICAL TRIALS: AN OVERVIEW WILLIAM R. MYERS, PHD Senior Statistician, Department of Biometrics and Statistical Sciences, Procter and Gamble Pharmaceuticals, Cincinnati, Ohio A major problem in the analysis of clinical trials is missing data caused by patients dropping out of the study before completion. This problem can result in biased treatment comparisons and also impact the overall statistical power of the study. This paper discusses some basic issues about missing data as well as potential watch outs. The topic of missing data is often not a major concern until it is time for data collection and data analysis. This paper provides potential design considerations that should be considered in order to mitigate patients from dropping out of a clinical study. In addition, the concept of the missing-data mechanism is discussed. Five general strategies of handling missing data are presented: 1. Complete-case analysis, 2. Weighting methods, 3. Imputation methods, 4. Analyzing data as incomplete, and 5. Other methods. Within each strategy, several methods are presented along with advantages and disadvantages. Also briefly discussed is how the International Conference on Harmonization (ICH) addresses the issue of missing data. Finally, several of the methods that are illustrated in the paper are compared using a simulated data set. Key Words: Clinical trials; Missing data; Dropouts; Imputation methods; Missing-data mechanism INTRODUCTION trials can produce biased treatment comparisons and reduce the overall statistical power. A PRIMARY CONCERN when conducting This paper will focus on the case where missa clinical trial is that patients will drop out ing data occur as a result of patients dropping (or withdraw) before study completion. The out of the study. More specifically, it focuses reason for withdrawal may be study-related on the case in which a patient s missing re- (eg, adverse event, death, unpleasant study sponse at assessment time t implies it will procedures, lack of improvement) or unrebe missing at all subsequent times. This is lated to the study (eg, moving away, unretermed a monotone pattern of unit-level misslated disease). This problem is especially ing data (1). An example where missing data prevalent in clinical trials where a slow-actdeviate from the aforementioned pattern is ing treatment or a drug that may be intolerant in the case of health related quality-of-life is being investigated. Dropouts in clinical research, where a patient does not answer an item (or question) within a questionnaire, but does not necessarily drop out of the clinical Presented at the DIA 35 th Annual Meeting, June 27 trial. July 1, 1999, Baltimore, Maryland. There are numerous issues one must con- Reprint address: William R. Myers, PhD, Departsider when confronted with a data set where ment of Biometrics and Statistical Sciences, Procter and Gamble Pharmaceuticals, 8700 Mason Montgomery patients have dropped out of the clinical Road, P.O. Box 47, Mason, OH 45040 9462. study. First, compliant patients often have a 525

526 William R. Myers not be practical depending on the particular scenario of the clinical study. One is to care- fully consider eligibility criteria in order to exclude patients with characteristics that might prevent them from completing the study. This could obviously have an impact on product labeling in the case of a pivotal Phase III trial being considered for FDA ap- proval. Eligibility criteria may be more appropriate in an explanatory trial to determine if a drug has potential therapeutic effect. Another design consideration is to have a non- randomized trial period where all patients receive active treatment. Then those who are free of side effects are randomized to either active treatment or placebo. This can be done in order to account for early toxicity of a drug. It will most likely provide a selected subset of the original patient sample and potentially impact the results with respect to the patient population that was studied. In many cases this will not be a practical option. There are ways to potentially mitigate the impact of patients dropping out of a clinical study. A very important consideration at the design stage of a clinical trial is to collect data on all baseline variables that could potentially impact the likelihood of a patient dropping out. How collecting these variables will help assess the potential missing-data mechanism will be discussed later. These particular variables may be incorporated into the statistical analysis, as well as help identify the type of patients who are intolerable to a particular treatment. better response to treatment than noncompliant patients, even in the placebo group. Therefore, the fully compliant subgroup is not a random subsample of the original sample. In addition, the pattern of selection might differ between the placebo group and the active treatment group. It can be problematic if the rate, time to, and reason for withdrawal differ widely among treatment groups. Also, the last observed response for an early dropout often does not reflect the potential benefit of a drug with a slow onset of action. Furthermore, there are situations where a patient may drop out of a clinical study because of early recovery. He/she may then suffer a relapse. Many of these situations will result in biased treatment comparisons. Unstated reasons for dropping out of a clinical trial may be associated with the last observed response or other study-related reasons. This will be a primary concern for those methods that incorporate the reason for dropout in the statistical analysis. It is imperative to have accurate documentation of the cause of dropout. The method for handling dropouts in clinical studies will often depend on the objective of the study. On the one hand, the goal may be more explanatory in nature, as in the case of a Phase I or early Phase II study, where the true pharmacological properties of a drug are being investigated. On the other hand, it may have a more pragmatic objective, where a pivotal Phase III study is providing an overall evaluation of treatment policy in clinical practice. CLINICAL DESIGN THE MISSING-DATA MECHANISM CONSIDERATIONS A concept that is often discussed when missing Most of the literature on handling dropouts data occur is the missing-data mecha- (or missing data) in clinical trials involves nism. Little (2) has also used the term dropout the statistical analysis. Those involved with mechanism when it relates to patients conducting clinical trials, however, should be dropping out of a clinical study prematurely. cognizant of the issue at the design stage. These two terms will be used synonymously There are potential ways to minimize patient throughout the paper. Little and Rubin (3) withdrawal. One obvious method is through classify the missing-data mechanism into patient and investigator retention programs three basic categories. They define missing including education and information sharing completely at random (MCAR) as the pro- regarding the research program. There are cess in which the probability of dropout is other possible options which may or may independent of both observed measurements

Handling Missing Data in Clinical Trials 527 (eg, baseline covariates, observed responses) that the MAR assumption is inherently untestable. and unobserved measurements (those that Therefore, one can never truly would have been observed if the patient had achieve complete certainty that conditioning stayed in the study). Under MCAR the observed on observed variables achieves ignorable responses form a random subsample dropout. of the sampled responses. When data are If the dropout mechanism is neither MCAR there is no impact on bias and there- MCAR or MAR then it is nonignorable. In fore most standard approaches of analysis this case, the dropout mechanism needs to are valid. A loss of statistical power, how- be incorporated into the analysis. This paper ever, can still occur. The MCAR assumption will briefly discuss, in the following section, can be assessed by comparing the distribu- available methods in the case of nonignorable tion of observed variables between dropouts dropout. Diggle (9) proposed a method and nondropouts (4). If no significant differ- for testing for random dropouts in repeated ences are found with respect to the variables, measures data. Diggle uses the term random then there is no apparent evidence that the dropout in a more restrictive sense than Rubin s data from the clinical trial are representative missing at random. In addition, Park only of completers. The MCAR assumption and Davis (10) proposed a method for testing is often not plausible in clinical trials (1). the missing-data mechanism for repeated cat- The second classification, which is more egorical data. restrictive than MCAR, is missing at random (MAR). Under MAR, the probability of GENERAL STRATEGIES FOR dropout depends on the observed data, but STATISTICAL ANALYSIS not the unobserved data. When the response is MAR the observed responses are a random Much of the literature involving missing data subsample of the sampled values within a (or dropouts) in clinical trials pertain to the subclass defined by the observed data (eg, various methods developed to handle the age). In this case, the missing-data mechanism problem. This paper will divide those meth- can be considered ignorable (5,6). Laird ods into the following five basic classifica- (6) points out that ignorable missingness is tions: plausible in longitudinal studies, when patient withdrawal is related to a previous re- 1. Complete-case analysis, sponse. In addition, Murray and Findlay (7) 2. Weighting methods, point out that if a protocol removes patients 3. Imputation methods, from a study because they reach a threshold 4. Analyzing data as incomplete, and for the response (eg, diastolic blood pressure 5. Other methods. exceeding 110 mmhg) then the data are MAR. The rationale for the MAR assump- The category defined as other methods includes tion here is traced to the fact that the lack of those methods that do not logically response can be deduced from the recorded fit into the other four classifications. value. Little (2) also discusses a special case Complete-case methods use only those of MAR dropout which is not discussed ex- patients with complete data. For example, in tensively in the literature. For this case the a longitudinal study, a complete-case analy- dropout mechanism depends only on the co- sis will use only those patients who have variates X and is classified as covariate-dependent observed responses at each scheduled time dropout. For this missing-data point. Another example is the case of a sin- mechanism Diggle and Kenward (8) used the gle-endpoint study where a regression analy- term completely random dropout. Murray sis is used. Only those patients who have the and Findlay (7) state that an observation is observed endpoint and observed values for MAR if the fact that it is missing is not in all relevant covariates are involved in the itself informative. Lavori et al. (1) point out analysis. An obvious advantage of this type

528 William R. Myers which allows generalized estimating equation (GEE) analyses to be correct under the MAR assumption. A third classification of statistical analy- ses to handle missing data is imputation methods. Imputation is any method whereby missing values in the data set are filled with plausible estimates. The choice of plausible estimates is what differentiates the various imputation methods. The objective of any imputation method is to produce a complete data set which can then be analyzed using standard statistical methods. The Last Observation Carried Forward (LOCF) method is a commonly used imputation procedure. This method is implemented when longitudinal measurements are observed for each patient. LOCF takes the last available response and substitutes the value into all subsequent missing values. The LOCF can be problematic if early dropouts occur and if the response variable has ex- pected changes over time. It can provide biased treatment comparisons if there are dif- ferent rates of dropout or different time to dropout between the treatment groups. For example, LOCF can provide conservative results with respect to active treatment if pla- cebo patients drop out early because of lack of efficacy. In this case, the mean placebo response is biased upward. On the other hand, when the active treatment is slowing down the severity of an illness and if those patients in the active treatment group are dropping out early due to intolerability, the LOCF method can render anti-conservative results with respect to active treatment. Another type of imputation method is a worst case analysis. This analysis imputes the worst response observed among the ac- tive treatment group for those missing values within the active treatment group. For the placebo group, the method imputes the best observed response among the placebo group for those missing values within the placebo group. This particular analysis can be viewed as a type of sensitivity analysis (14). From a purely statistical perspective this type of method can increase the overall variability, bias the active treatment mean downward, and of analysis is ease in implementation. In addition, it provides valid results in the case of MCAR. There are numerous disadvantages, however, to excluding patients with incomplete data. First, the complete-case method provides inefficient estimates, that is, loss of statistical power. If the dropout mechanism is not MCAR, then the analysis can produce biased treatment comparisons. In the case of a clinical trial with longitudinal measurements, it is typically not good practice to throw out data. Maybe the most important concern with this type of analysis is that it does not follow the Intention-to-Treat paradigm. The following simple example demonstrates the limitations of the complete-case analysis. Suppose Treatment X is modestly effective for patients regardless of the baseline severity of their condition. On the other hand, Treatment Y provides a benefit for the less severe patients, while providing no improvement for the more severely ill patients. If the patients have a tendency to drop out before completion because of lack of efficacy, then the complete-case analysis may unduly favor Treatment Y. A second strategy to handle missing data is weighting methods. Some may actually consider this a form of imputation. The general idea is to construct weights for complete cases in order to reduce or remove bias. Little and Schenker (11) discuss the basic concept of weighting adjustments in the sample survey setting. Heyting et al. (12) describe the heuristic appeal of this method as follows. Each patient belongs to a subgroup in the patient population in which all patients have a similar baseline and response profile. A proportion within each subgroup are destined to complete the clinical trial, while the remainder are destined to drop out early. Those completer patients with a very low probability of completing can certainly have an overly strong influence on the results. Heyting et al. (12) provide a particular weighting method where the evaluation of the mean treatment differences at the end of the study is of primary interest. The authors primary objective was explanatory in nature. Robins et al. (13) introduced a weighting method

Handling Missing Data in Clinical Trials 529 bias the placebo mean upward. This could (18). Multiple imputation can be implemented potentially limit a promising treatment from for either longitudinal measurements demonstrating efficacy. In many cases, how- or a single response. ever, this method is usually not the planned The general strategy for multiple imputa- primary analysis, but rather a secondary anal- tion is to replace each missing value with ysis. It can be used to assess the robustness two or more values from an appropriate dis- of the results and provide a so called lower tribution for the missing values. This produces bound on treatment efficacy. If the worst two or more complete data sets. Rebound case analysis demonstrates a treatment ben- peated draws are made from the posterior efit it can be a very powerful result. For predictive distribution of the missing values, example, one could state either that the treat- Y miss. As Rubin and Schenker (18) point out, ment efficacy is so strong that even imputing in practice, implicit models can be used in the worst case scenario does not alter the place of explicit models. Lavori et al. (1) positive results or the missing response rate discuss a propensity-based imputation where is so low that it does not alter the conclusions. one models the probability of remaining in Brown (15) proposed a slightly different the study given a vector of observed covarimethod than that discussed above. A prede- ates. A logistic regression model is typically termined percentile (eg, median) of the placebo used. This method stratifies patients into group is assigned to all those patients groups based on propensity scores, that is, who dropped out from either the placebo the propensity to drop out of the study. Impu- group or the active treatment group. This pre- tations are made by the approximate Bayesian determined score is assigned to all patients bootstrap. One first draws a potential set in both groups with values worse than the of observed responses at random with replacement assigned score. A Mann-Whitney statistic is from the observed responses in used to test the equality of the distribution the propensity quintile. Then the imputed of the two groups and thus provides a bound values are chosen at random from the poten- on the test of the efficacy for the treatment. tial observed sample. This process is performed There are other single imputation methods m times in order to produce m com- such as mean imputation, conditional mean plete data sets. Analyses for each complete imputation, and the Hot deck method. Little data set are then combined in a way that and Rubin (3) discuss these methods as well reflects the extra variability. The total variability as other single imputation methods. Paik (16) consists of both within and between proposed a mean imputation method as well imputation variability. Multiple imputation as a multiple imputation method for handling methodology relies on the MAR assumption. missing data in GEE analysis. Each method Little and Schenker (11) and Rubin (19) provides valid estimates when data are MAR. indicate that typically only a few imputations One particular imputation method that has (m = 3 5) are necessary for a modest amount received a significant amount of attention in of missing information (eg, < 30%). The pre- the literature recently is multiple imputation. vious authors do point out, however, that as The idea of multiple imputation, first proposed the percentage of missing data increases by Rubin (17), is to impute more than more imputations will be necessary. The soft- one value for the missing item. The advantage ware SOLAS for Missing Data Analysis (20) of multiple imputation is that it repre- can implement the previously discussed mulware sents the uncertainty about which value to tiple imputation methods as well as other impute. This is as opposed to imputing the single imputation methods. A good list of mean response which does not incorporate references on this topic include Lavori et al. the degree of uncertainty about which value (1), Little and Schenker (11), Rubin and to impute. Therefore, the analyses that treat Schenker (18), and Rubin (19). singly imputed values just like observed val- A fourth strategy to handle missing data ues generally underestimate the variability is to analyze data as incomplete. This is typi-

530 William R. Myers cally performed in longitudinal studies. One fication of Gould s method by taking into option is to use summary statistics. In longitudinal account the time to dropout in order to pro- studies a slope estimate could be used vide a finer measure for the desirability out- to summarize each patient s response. Early come. For example, patients who drop out dropouts could obviously be problematic very early in the study due to intolerability with this type of analysis. Likelihood methods would be assigned a lower desirability out- that ignore the drop-out mechanism are come than those patients who dropped out also a popular analysis of choice. Most software later in the study due to intolerability. packages (eg, PROC MIXED in SAS ) Shih and Quan (23) proposed a method assume MAR and ignore the missing-data that they defined as a composite approach. mechanism. In the case of nonignorable They consider the situation where the outcome dropout, inferences based on likelihood measure is a continuous response varidropout, methods that ignore the dropout mechanism able and the final outcome is of main interest. may produce biased results. Little and Rubin Shih and Quan (23) indicate that the relevant (2) and Little (3) discuss nonignorable miss- clinical question when dropouts occur is ing-data models. Implementing a nonignorable what is the chance that a patient completes missing-data model is not trivial. Little the prescribed treatment course and if he/ (3) indicates that results can be sensitive to she does complete the therapy, what is the misspecification of the missing-data mecha- expected response? This results in two nism and if little knowledge is known about hypotheses. The first is the probability of the mechanism then sensitivity analysis dropping out of the clinical study and the should be performed. In longitudinal studies second is the expected response for those where non-gaussian responses are measured patients who completed the clinical study. A (eg, binary or count data) GEE is often used. logistic regression model could be used to GEE generally requires MCAR or covariate- model the probability of dropping out of the dependent dropout in order to yield consistent study, while an analysis of variance could be estimates (2). implemented for the expected response of The fifth and final category of analyses is completers. Shih and Quan (23) point out defined as other methods. As stated earlier, that the alternative hypotheses need to be in these are methods that do not fall into any of the same direction. For example, the placebo the four previously discussed classifications. has a greater percent of patients dropping out One of the methods, which was proposed by for unfavorable reasons compared to active Gould (21), converts all information on an treatment and the mean response for completers outcome variable into ranking of patients in is favorable for active treatment. The terms of desirability outcome. These desir- proposed method first tests the joint hypothesis. ability outcomes are ordered from the least If that is rejected the individual hypotheability desirable (eg, early withdrawal due to lack of ses are subsequently tested. A closed testing efficacy or intolerance) to the most desirable procedure is used in order to control the over- (eg, early withdrawal due to efficacy). Be- all type I error rate. This particular method tween the two ends of the desirability spectrum makes no MAR type assumption. The fol- are the ranks of the scores for those lowing are a few papers that discuss some patients who complete the study. A standard additional methods: Dawson and Lagakos two-sample test based on ranks could then (24) and Shih and Quan (25). be implemented. This particular method excludes those patients whose reason for dropout is unrelated to the study. The utility of ICH GUIDELINES this approach certainly requires a clear un- The International Conference for Harmonization derstanding of the reason for dropout. This (ICH) guidelines E9 Statistical Prinderstanding method does not directly address the issue ciples for Clinical Trials address the issue of estimation. Cornell (22) proposed a modi- of missing data. The guidelines indicate that

Handling Missing Data in Clinical Trials 531 methods for dealing with missing data should (20%). Side effects were classified into three be predefined in the protocol. The guidelines categories (mild, moderate, severe). The also point out that methods for dealing with high dropout group was those with severe missing data can be refined in the statistical side effects and a response 0.5 standard devi- analysis plan during the blind review of the ations below the mean response for the active data. This is a very important step to consider, treatment group or those patients with mod- given that it can be difficult to anticipate all erate side effects and a response 1.0 standard potential missing-data problems that could occur. deviation below the mean response for the Probably the most important suggestion active treatment group. The second scenario the ICH guidelines make, however, is to investigate is where low responders, irrespective of the sensitivity of the results of the analy- treatment, have a higher dropout rate (60%) sis to the method of handling missing data, than the rest of the patients (20%). For this that is, sensitivity analysis. case, the high dropout group was those patients who have a response that is 1.0 standard deviation below the mean response of EXAMPLES BASED ON A the placebo group. SIMULATED DATA SET Of the four possible treatment/dropout The following examples are not intended to scenarios, this paper focuses on two of the be an extensive simulation study, but rather more interesting cases. The first is where the some simple scenarios based on simulated active treatment is known not to be signifi- data sets. They are presented to demonstrate cantly superior to placebo and where there is the potential advantages or disadvantages of a higher dropout rate for the active treatment some of the methods under certain dropout group (Scenario 1). The second is where the scenarios. The examples will consider a active treatment is known to be superior to study where a single endpoint is of primary interest. The following two treatment scenarios are considered: placebo and there is a higher dropout rate among the placebo group (Scenario 2). The three methods that were applied were the complete-case analysis, multiple imputation method, and the composite approach. 1. The active treatment is known to be significantly superior to placebo [µ T = 15 units The first row of Table 1 provides the results and µ P = 10.5 units], and when analyzing the complete data set. 2. The active treatment is known not to be The other rows provide the results for the superior to placebo [µ T = 16 units and µ P three methods. For Scenario 1, the complete- = 15 units]. case analysis, as expected, provided a treatment mean that was biased upward and Three hundred patients were initially randomized thereby erroneously demonstrated efficacy to each treatment group with a of a nonefficacious treatment. On the other known standard deviation of 19 units. The hand, the multiple imputation method pro- first scenario provides approximately 80% vided mean values and results that more statistical power, while the second scenario closely resembled the truth. For the compos- provides approximately 10% statistical ite approach the differences for each hypothesis power. were in different directions (the acpower. After the complete data sets were created, tive treatment group has a higher dropout patient observations were deleted based on rate for unfavorable reasons, while the mean the following two dropout scenarios. First, response for the completers favors the active patients in the active treatment group who treatment group), therefore, it was not rea- were poor responders and who had signifi- sonable to combine the tests and perform the cant side effects have a higher dropout rate analysis. For Scenario 2, the complete-case (60%) than the rest of those in either the analysis, as expected, provided a placebo active treatment group or the placebo group mean that was biased upward and thereby

532 William R. Myers TABLE 1 Simulated Data Set Analysis Scenario 1 Scenario 2 Complete Data Set X T = 16.7 p = 0.23 X T = 15.3 p = 0.001 X P = 14.9 X P = 10.5 Complete-Case X T = 18.7 p = 0.02 X T = 14.7 p = 0.15 X P = 14.6 X P = 12.4 Multiple Imputation X T = 16.2 p = 0.74 X T = 16.6 p = 0.056 X P = 15.6 X P = 13.6 Composite Approach Joint hypothesis p = 0.135 Scenario 1 active treatment is not significantly superior to placebo and the active treatment group has a higher dropout rate due to intolerability. Scenario 2 active treatment is significantly superior to placebo and the placebo group has a higher dropout rate due to lack of efficacy. that in order to detect whether this level of missing data has been reached one can per- form what was earlier called the worst case analysis. A major part of the paper was dedicated to the method of multiple imputation. It appears that regulatory agencies are still uncertain about the degree of usefulness of multiple imputation methods. Shih and Quan (23) provide several examples by which making inferences on the complete-data parameter may not always be of practical interest. For example, when a patient dies before the end of the study, the outcome measure for the end of the study simply does not exist. Another ex- ample is when the glomerular filtration rate in a renal disease study is not legitimate for patients who drop out due to renal failure and who are referred for kidney dialysis. They continued to point out that one should not estimate their nonexisting/missing values that would have been observed if the censoring event had not occurred. As Lavori et al. (1) point out, further work should be performed to better understand how well multi- ple imputation works under various types of dropout mechanisms, most specifically the nonignorable case. Obviously, it is very important to clearly understand the limitations of the various methods. This paper attempted to outline many of these limitations. This directly leads to the utility of performing some form of sensitivity analysis and how necessary and valuable it is. In conclusion, was unable to demonstrate that an efficacious treatment was effective. The multiple imputation method provided results that more closely mimicked the complete data set, even though it was not significant at the 0.05 significance level. The composite approach was unable to demonstrate a statistically significant result with the joint hypothesis, thus, the individual hypotheses were not tested. SUMMARY/DISCUSSION The objective of the paper was to present an overview of issues, concerns, and available methodology in the case of missing data as a result of patients dropping out of a clinical trial. It should be emphasized that sophisticated statistical analysis is no substitute for a good clinical plan in order to mitigate patients dropping out of a study. It is important to continue following patients even after they have dropped out of a clinical study. In addition, understanding both the disease and the therapy being studied can be helpful in selecting an appropriate statistical method. The choice of a particular method for handling missing data depends on whether one is considering a more pragmatic or a more explanatory perspective. There is often the question of whether there are too many missing data. Spriet and Dupin-Spriet (14) point out that the tolerable amount of missing data is that which would not conceal an effect in the opposite direction. They go on to add

Handling Missing Data in Clinical Trials 533 it is important not to necessarily consider the 13. Robins JM, Rotnitzky A, Zhao LP. Analysis of semi- various methods that handle dropouts (missin the presence of missing data. J Am Stat Assoc. parametric regression models for repeated outcomes ing data) as rivals but rather consider them 1995;90:106 121. as methods that can compliment one another. 14. Spriet A, Dupin-Spriet T. Imperfect data analysis. Drug Inf J. 1993;27:985 994. 15. Brown B. A test for the difference between two REFERENCES treatments in a continuous measure of outcome when 1. Lavori PW, Dawson R, Shera D. A multiple imputation there are dropouts. Controll Clini Trials. 1992;13: strategy for clinical trials with truncation of 213 225. patient data. Stat Med. 1995;14:1913 1925. 16. Paik MC. The generalized estimating equation ap- 2. Little RJA. Modeling the drop-out mechanism in proach when data are not missing completely at random. repeated-measures studies. J Am Stat Assoc. 1995; J Am Stat Assoc. 1997;92:1320 1329. 90:1112 1121. 17. Rubin DB. Multiple imputations in sample surveys A 3. Little RJA, Rubin DB. Statistical Analysis with Missing phenomenological Bayesian approach to Data. New York: John Wiley & Sons; 1987. nonresponse. Imputation and Editing of Faulty or 4. Little RJA. A test of missing completely at random Missing Survey Data. U.S. Department of Comfor multivariate data with missing values. J Am Stat merce. 1978:1 23. Assoc. 1988;83:1198 1202. 18. Rubin DB, Schenker N. Multiple imputation in 5 Rubin DB. Inference and missing data. Biometrika. health-care databases: An overview and some applications. 1976;63:581 592. Stat Med. 1991;10:585 598. 6. Laird NM. Missing data in longitudinal studies. Stat 19. Rubin DB. Multiple imputation after 18+ years. J Med. 1988;7:305 315. Am Stat Assoc. 1996;91;473 489. 7. Murray GD, Findlay JG. Correcting for the bias 20. SOLAS for Missing Data Analysis 1.0. Statistical caused by drop-outs in hypertension trials. Stat Med. Solutions, Ltd. 1997. 1988;7:941 946. 21. Gould AL. A new approach to the analysis of clinical 8. Diggle P, Kenward MG. Informative dropout in longitudinal drug trials with withdrawals. Biometrics. 1980;36: data analysis (with discussion). Appl Stat. 721 727. 1994;43:49 94. 22. Cornell RG. Handling dropouts and related issues 9. Diggle P. Testing for random dropouts in repeated Statistical methodology in the pharmaceutical sciences. measurement data. Biometrics. 1989;45:1255 1258. Marcel Dekker, Inc. 1990;271 289. 10. Park T, Davis CS. A test of the missing data mecha- 23. Shih WJ, Quan H. Testing for treatment differences nism for repeated categorical data. Biometrics. 1993; with dropouts present in clinical trials A compos- 49:631 638. ite approach. Stat Med. 1997;16:1225 1239. 11. Little RJA, Schenker N. Handbook of Statistical 24. Dawson JD, Lagakos SW. Size and power of two- Methodology Missing Data. New York: Plenum sample tests of repeated measures data. Biometrics. Press; 1995. 1993;49:1022 1032. 12. Heyting A, Tolboom J, Essers J. Statistical handling 25. Shih WJ, Quan H. Stratified testing for treatment of drop-outs in longitudinal clinical trials. Stat Med. 1992;11:2043 2061. effects with missing data. Biometrics. 1998;54:782 787.