Imputation and Analysis. Peter Fayers

Similar documents

Problem of Missing Data

Dealing with Missing Data

A Basic Introduction to Missing Data

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

2. Making example missing-value datasets: MCAR, MAR, and MNAR

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Multiple Imputation for Missing Data: A Cautionary Tale

How to choose an analysis to handle missing data in longitudinal observational studies

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Missing data in randomized controlled trials (RCTs) can

Guideline on missing data in confirmatory clinical trials

Handling attrition and non-response in longitudinal data

Missing Data Sensitivity Analysis of a Continuous Endpoint An Example from a Recent Submission

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS

Data Cleaning and Missing Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Handling missing data in Stata a whirlwind tour

Analysis Issues II. Mary Foulkes, PhD Johns Hopkins University

A Review of Missing Data Treatment Methods

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

Sensitivity Analysis in Multiple Imputation for Missing Data

Missing Data: Patterns, Mechanisms & Prevention. Edith de Leeuw

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

p ˆ (sample mean and sample

Non-random/non-probability sampling designs in quantitative research

Randomized trials versus observational studies

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Math Quizzes Winter 2009

Statistics 2014 Scoring Guidelines

Mind on Statistics. Chapter 12

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Bayesian Statistics in One Hour. Patrick Lam

Week 4: Standard Error and Confidence Intervals

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Descriptive Methods Ch. 6 and 7

Introduction to mixed model and missing data issues in longitudinal studies

Mathematical goals. Starting points. Materials required. Time needed

Critical appraisal. Gary Collins. EQUATOR Network, Centre for Statistics in Medicine NDORMS, University of Oxford

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)

Sample Paper for Research Methods. Daren H. Kaiser. Indiana University Purdue University Fort Wayne

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Psoriasis, Incidence, Quality of Life, Psoriatic Arthritis, Prevalence

Exploratory Data Analysis

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Point and Interval Estimates

What are confidence intervals and p-values?

NON-PROBABILITY SAMPLING TECHNIQUES

Dealing with Missing Data

Bayesian Approaches to Handling Missing Data

Missing data: the hidden problem

Data Analysis, Research Study Design and the IRB

Systematic Reviews and Meta-analyses

Analysis and Interpretation of Clinical Trials. How to conclude?

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

Service delivery interventions

The Consequences of Missing Data in the ATLAS ACS 2-TIMI 51 Trial

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Simple linear regression

Introduction Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups...

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Dr James Roger. GlaxoSmithKline & London School of Hygiene and Tropical Medicine.

Randomization in Clinical Trials

Review. March 21, S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

Missing Data Part 1: Overview, Traditional Methods Page 1

What is critical appraisal?

Sampling. COUN 695 Experimental Design

The Procedures of Monte Carlo Simulation (and Resampling)

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 8: Quantitative Sampling

Surveys/questionnaires

Oncology Nursing Society Annual Progress Report: 2008 Formula Grant

Missing Data. Paul D. Allison INTRODUCTION

An Introduction to Secondary Data Analysis

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set

Critical Appraisal of Article on Therapy

Workpackage 11 Imputation and Non-Response. Deliverable 11.2

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Sample Paper for Learning Research Proposal. Daren H. Kaiser. Indiana University Purdue University Fort Wayne

Chapter 3. Sampling. Sampling Methods

Developing an implementation research proposal. Session 2: Research design

What Works Clearinghouse

Analysis of Longitudinal Data with Missing Values.

The impact of question wording reversal on probabilistic estimates of defection / loyalty for a subscription product

MISSING DATA: THE POINT OF VIEW OF ETHICAL COMMITTEES

Transcription:

Missing Data in Palliative Care Research Imputation and Analysis Peter Fayers Department of Public Health University of Aberdeen NTNU Det medisinske fakultet

Missing data Missing data is a major problem when reporting QoL, especially in trials with extended follow-up in palliative care The problem: arguably, patients with the worst HRQL are the ones most likely to stop completing questionnaires. 2 NTNU Det medisinske fakultet

Why does missing data matter? 1. Bias If the proportion of data missing is not small then: are the characteristics of patients with missing data different from those for whom complete data are available? 3 NTNU Det medisinske fakultet

Why does missing data matter? 2. Power A study loses power if data are missing a larger sample size is required. Note that increasing the sample size will compensate for the loss of power, but will not reduce the bias. Always try to minimise the amount of missing data! 4 NTNU Det medisinske fakultet

Compliance Many clinical trials have poor compliance with QoL assessment. It is common for less than 2/3 patients to return QoL assessments during or after treatment. Which patients fail to complete the questionnaires? The most ill? How can we interpret the results of the study if there is possible bias? 5 NTNU Det medisinske fakultet

Compliance Unavoidable reasons patient attrition including death. Low compliance forms which (theoretically) could have been completed but were not; this includes patients who are in pain, very ill, or frail. 6 NTNU Det medisinske fakultet

Patterns of missing data Missing Completely at Random (MCAR) the probability of response at time t is independent of both observed data and the unobserved data. Missing At Random (MAR) the probability of response at time t depends on the observed values but not the unobserved data. Not Missing At Random (NMAR) the probability of response at time t depends on the unobserved values. 7 NTNU Det medisinske fakultet

Methods for missing forms - imputation Imputation best guess estimate. Use information from other similar patients, values from previous and/or later assessments by the same patient, or a mixture of both. If items are used only as components of the scale, it may not be necessary to impute values for those items, only for the scale score itself. 8 NTNU Det medisinske fakultet

Naïve methods of imputation Last Value Carried Forward (LVCF) The values that were recoded by the patient at the last previously completed QoL assessment are carried forward. Simple Mean Imputation The replacement of missing QoL scores by the mean score calculated for patients who did complete the assessment. 9 NTNU Det medisinske fakultet

Horizontal Mean Imputation Horizontal mean imputation uses the mean of each patient s previous scores. It reduces to the LVCF method if there is only one previous assessment available. Many other, more general, regressionbased methods are available. 10 NTNU Det medisinske fakultet

Reduced Standard Deviation (SD) Methods such as Simple Mean Imputation result in a biased estimate of Standard Deviation. The estimate of the SD will be reduced artificially. This can lead to distorted significance tests, and falsely narrow CIs. (The SD should be corrected, or equivalently we must make adjustment in analyses.) 11 NTNU Det medisinske fakultet

Markov Chain (MC) Imputation In the methods described so far, the imputed values will be the same for any two patients with the same profile of successive non-missing values. MC imputation allows these two patients to have different imputed QoL values. It assigns, for a patient in a particular QoL state at one assessment, transition probabilities of being in each of the possible states. E.g. If a female patient aged 60 has a QoL score of 70 (on 0 100 scale), what is the probability her next (missing) value would have increased to, say, 80? Or 90? Or decreased to 50? Etc. 12 NTNU Det medisinske fakultet

Hot Deck (HD) Imputation Suppose in a palliative care RCT a male patient aged 65 with metastatic SCLC has missing data at 1 month. Identify other patients in the RCT who have same age, same gender, same disease, etc. Select a QoL score, at random, from these patients. Substitute this as the imputed value for the patient with the missing QoL assessment. 13 NTNU Det medisinske fakultet

Multiple imputation Since a random element is included in the selection of values, the augmented dataset will be just a random one out of many potential datasets. The idea of multiple imputation is that many alternative complete datasets can be created. The analysis can be repeated for each dataset and then combined into a final summary analysis (Rubin 1987). 14 NTNU Det medisinske fakultet

Analytical methods 1. For each of N patients with missing data, 2. identify a set of patients with similar prognosis. 3. Impute a set of values from these similar patients. 4. Analyse the augmented, seemingly-complete, data set. 5. Calculate p-values, estimate treatment effects, etc. 6. Repeat this process M times. 7. Average the M results, using Rubin s methods. 8. The overall results provide the multiple imputation estimates. 15 NTNU Det medisinske fakultet

Aberdeen HSRU trials 5 RCTs Strenuous efforts to recover QoL data that was initially missing, By issuing repeated reminders, By offering to interview patients. Unique opportunity to explore the performance of imputation methods. 1. Assume data collected by reminders is missing, as it would be in most RCTs. 2. Apply imputation procedures, and compare results against the data actually retrieved by reminders. 16 NTNU Det medisinske fakultet

Five Aberdeen HSRU trials Multiple imputation was shown to be more suitable then simple imputation methods. MI models the uncertainty in the missing data and is based on the MAR assumption, which is more plausible in QoL data than MCAR. 17 NTNU Det medisinske fakultet

Don t ignore missing data Many investigators are suspicious about imputation techniques, because of the assumptions overtly involved. However, ignoring missing data in effect makes the assumption that patients who failed to respond are similar to those who did. Imputation tries to use available information to make better allowance for patients with missing data. 18 NTNU Det medisinske fakultet

Sensitivity analysis Check whether choice of imputation method affects the results. Try extreme cases: 1. Assume missing values correspond to patients with very poor QoL. 2. Assume missing values correspond to patients with very good QoL. Are the results essentially unchanged? Or are the conclusions sensitive to the assumptions made for imputation? 19 NTNU Det medisinske fakultet

Conclusions - I Methods such as LVCF are simple; however, naïve use of LVCF cannot be recommended. Multiple imputation methods are efficient. They take additional patient information into account, and preserve the magnitude of the SDs and CI. Decide and specify the method of imputation in advance. The QoL scales that are major endpoints should be the focus for determining the imputation process. 20 NTNU Det medisinske fakultet

Conclusions - II Sophisticated imputation methods are no substitute for the real data. One cannot create data from nothing! Imputation is a salvage job. The only way to be confident of no bias is ensure good compliance. Studies with poor compliance remain unconvincing and unpublishable, no matter how carefully the data is analysed. Always aim for 100% compliance! 21 NTNU Det medisinske fakultet

Fayers,PM & Hays, R (eds.) 2005 Assessing Quality of Life in Clinical Trials. Oxford University Press. ISBN: 0-19-852769-1 Especially chapters 2.4 (proxies), 3.2 (preventing missing data), 3.3 (missing data: analyses) Also see: Fairclough DL (2002) Design and Analysis of Quality of Life Studies in Clinical Trials. Publ: Chapman & Hall ISBN: 1-58488-263-8. 22 NTNU Det medisinske fakultet