How to choose an analysis to handle missing data in longitudinal observational studies

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "How to choose an analysis to handle missing data in longitudinal observational studies"

Transcription

1 How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK

2 Plan Why are missing data a problem? Methods: multiple imputation When is MI the best approach? When is MI not the best approach? How to decide Examples This is a talk about how to choose the analysis method - not about how to do MI Based on work funded by the Population Health Sciences Research Network done by Shaun Seaman (BSU) with Chris Power & Leah Li (ICH) and Alastair Leyland, Seeromanie Harding & Michaela Benzeval (MRC Social and Public Health Sciences Unit) 2

3 Why are missing data a problem? 1. Loss of power (compared to power achieved with no missing data) can t regain lost power 2. Any analysis must make an untestable assumption about the missing data wrong assumption biased estimates 3. Some popular analyses with missing data give biased estimates (no matter how the missingness arises) [missing indicator method] biased standard errors (resulting in incorrect p- values and confidence intervals) [mean imputation] inefficient estimates [complete case] 3

4 Missing data are a problem: so what must we do? 1. Loss of power minimise amount of missing data 2. Any analysis must make an untestable assumption about the missing data think carefully about the right assumption perform sensitivity analyses around that assumption 3. Some popular analyses with missing data give biased estimates / biased standard errors / inefficient estimates make a good choice of analysis (today's topic) 4

5 Menu of analyses Complete-cases analysis (CCA) Simple imputation mean imputation regression imputation stochastic imputation last observation carried forward Multiple imputation (MI) Inverse probability weighting (IPW) Likelihood-based methods (mixed models etc.) includes complex Bayesian modelling 5

6 A simple problem - and a useful graph id sat96 sat sat94 sat96 67% complete cases 6 individuals Observed Missing Satisfaction variable measured at two times Some missing values on sat96

7 Complete-cases analysis id sat96 sat94 id sat96 sat Usually inefficient Default in most stats packages

8 Mean imputation id sat96 sat94 id sat96 sat Makes results too certain and distorts associations between variables

9 sat96 Mean imputation again x x X1 missing sat94

10 sat96 Regression imputation x x X1 missing sat94 Better than mean imputation as it preserves relationships between variables but exaggerates correlations

11 sat96 Stochastic imputation x x X1 missing sat94

12 Stochastic imputation id sat96 sat94 id sat96 sat Still over-precise because it treats imputed values as correct

13 Multiple imputation id sat96 sat94 id sat96 sat id sat96 sat

14 Basics of multiple imputation Not "making up data" but "making up data honestly"! Idea is to impute data several times in order to express the full uncertainty about the missing data uses the "imputation model" Each completed data set is analysed using standard methods the "substantive model" The results are combined using Rubin s rules which allow for variation between imputed data sets as a source of uncertainty 14

15 Missing at random MI is usually done assuming missing at random (MAR): the probability of data being missing depends only on observed variables, not on unobserved variables e.g. whether a GP measures cholesterol depends only on the patient's age, sex, smoking, blood pressure and whether diabetic cholesterol is MAR The opposite is missing not at random (MNAR) e.g. whether a researcher interviews a patient with severe mental illness is likely to depend on their current symptom severity as well as their age, sex, etc. symptom severity is MNAR 15

16 When MI is or isn't a good choice MI is probably applicable to all missing data problems aim here is to see when we might in practice prefer some other analysis Assume the substantive model is a regression analysis: regressing an outcome on an exposure adjusting for several confounders where all these may be repeatedly measured (e.g. lifecourse eipdemiology) A particular alternative to MI is CCA 16

17 When is MI the best choice? 1. Incomplete confounders MI is most applicable when there is lots of missing data in the confounders e.g. here <10% of data points are missing but complete-cases analysis would discard 44% of the observations Outcome Exposure C1 C2 C3 C4 C5 C6 100 individuals Observed 56% complete cases Missing 17

18 When is MI the best choice? 2. Auxiliary variables Auxiliary variables are variables that are not in the substantive model associated with the missing data sometimes observed when data are missing and can therefore be used to improve the imputations Examples: outcomes from case-notes are a useful auxiliary for an interview-collected outcome NB the auxiliary variable is collected whether or not the main variable is collected Auxiliary variables are easy to include in multiple imputation and usually hard to include in other methods But need strong associations (e.g. correlation>0.3) before auxiliary variables make discernable difference 18

19 When may MI not be the best choice? 1. Very little missing data 2. Missing data only in the outcome 3. Other special missing data patterns 4. Multilevel data 5. Interactions in the model 6. Mis-specified model 7. Simple missing data patterns 8. Too much missing data 19

20 1. Very little missing data With very small amounts of missing data, any method (e.g. CCA) is adequate what matters is the % of incomplete cases But how much data is very little? Harrell (2001): <5% incomplete cases Barzi & Woodward (2004) and Burton et al (2010): <10% incomplete cases Depends on other factors: e.g. consider a binary outcome with prevalence 1% and 1% missing data if in fact all missing values are cases then prevalence is 2% not 1% so results are still sensitive to (extreme) departures from MAR 20

21 2. Missing data only in the outcome Assume no auxiliary variables Incomplete cases hold no information about the substantive model Hence it is entirely appropriate to restrict analysis to complete cases makes the same MAR assumption as MI etc. MI would just give CC results + random error (if impute from substantive model) Outcome Exposure C1 C2 C3 C4 C5 C6 100 individuals Observed 49% complete cases This is why MI is less relevant in randomised trials Missing 21

22 3. Other special missing data patterns: missing data in the exposure Incomplete cases hold very little information about the regression model (still assuming no auxiliary variables) Often reasonable to restrict analysis to complete cases But note the different assumptions: MI: being missing may depend on Outcome but not on Exposure (MAR) CC: being missing may depend on Exposure but not on Outcome Outcome Exposure C1 C2 C3 C4 C5 C6 100 individuals Observed 49% complete cases Missing 22

23 Other special missing data patterns: Introducing the FICO MI enables us to make use of the incomplete cases How much information do the incomplete cases hold? We approximate this by the Fraction of Incomplete Cases among cases with outcome and exposure Observed (FICO) Small FICO (e.g. <10%) & no auxiliary variables complete cases is adequate Large FICO or auxiliary variables MI is needed Outcome Exposure C1 C2 C3 C4 C5 C6 FICO =0% 100 individuals Observed 49% complete cases Missing White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Statistics in Medicine 2010; 28:

24 FICO: simple examples Outcome Outcome Outcome Exposure Exposure Exposure C1 C1 C1 C2 C2 C2 C3 C3 C3 C4 C4 C4 C5 C5 C5 C6 C6 C6 100 individuals Observed Missing 50% incomplete cases; FICO=0% 100 individuals Observed Missing 50% incomplete cases; FICO=0% 100 individuals Observed Missing 50% incomplete cases; FICO=50% 24

25 FICO: more realistic illustration Outcome Exposure C1 C2 C3 C4 C5 C6 100 individuals Observed Missing 77% incomplete cases; FICO=50% Worth imputing Outcome Exposure C1 C2 C3 C4 C5 C6 100 individuals Observed Missing 54% incomplete cases; FICO=0% Not worth imputing? 25

26 FICO: summary CCA is a reasonable alternative to MI if there are few incomplete cases among those with complete outcome and exposure (e.g. assessed by low FICO) there are no (strong) auxiliary variables 26

27 4. Multilevel data MI is harder for multilevel data. Options include impute ignoring clustering (underestimates clustering hence standard errors likely to be too small) if clusters are large, impute with cluster as fixed effects (overestimates clustering ) REALCOM - stand-alone software that can be called from MLWin or Stata R: some facilities in mice; jomo If missing data are only in the outcome, again complete cases may be appropriate (look at FICO in level 1 units) usually involves mixed models Repeated measures can be seen as correlated not multilevel (use wide format) 27

28 5. Interactions in the substantive model Key fact about MI: the imputation model must contain all the variables in the substantive model If the substantive model contains interactions then these need to be reflected in the imputation model e.g. you are exploring whether a particular association differs between boys and girls you impute assuming that the association is the same in boys and girls then you are biasing your analysis Easy to see the problem; harder to fix it Bartlett JW et al. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research (online). 28

29 6. Model mis-specification Let's go back to this missing data pattern. There are two reasons why we might not be happy with complete-cases analysis: 1. Some auxiliary variables may predict both Outcome and whether Outcome is missing 2. We may not believe the model Outcome Exposure C1 C2 C3 C4 C5 C6 100 individuals Observed 49% complete cases Missing 29

30 Model mis-specification (example) Illustrate in a hypothetical RCT of an anti-influenza drug drug would be used before flu was definitively diagnosed drug is effective in people with flu drug is harmful in those without flu so we care about the balance of benefit and harm measure these in quality-adjusted life-hours (!) Substantive model is a regression of quality-adjusted life-hours on assignment to flu drug model is mis-specified because there's an omitted interaction with flu status 30

31 Model mis-specification (example) Flu status Count Mean outcome Placebo Flu drug Difference Flu Not flu All What if data are missing at random for half the not-flu? Flu status Count Mean outcome Placebo Flu drug Difference Flu Not flu All We wrongly conclude overall benefit 31

32 Mis-specified models Can solve this by inverse probability weighting (IPW) weight each person by 1 / their probability of being observed here, weight each flu case as 1 and each non-flu case as 2 restores the "right" answer Could also solve it by imputing missing outcomes with the correct IM (i.e. one which allows for an interaction between flu status and drug given) In general, IPW is appropriate for protecting against model mis-specification 32

33 7. Simple missing data patterns This pattern has only 1 incomplete pattern - perhaps because Outcome and C2-C6 are measured at interview in adult life and Exposure & C1 are measured at birth MI would require a correct imputation models for 6 variables IPW is a good alternative: only requires one model for being a complete case given Exposure & C1 (and any auxiliary variables) Outcome Exposure C1 C2 C3 C4 C5 C individuals Observed Missing 80% incomplete cases; FICO=0% 33

34 Simple missing data patterns (ctd) This pattern is similar but also has some extra missing data (presumably missing items in those interviewed) Could use IPW-MI hybrid build model for being interviewed weights impute only among those interviewed (with weights in the imputation model) Seaman S, White I, Copas A, Li L. Combining multiple imputation and inverseprobability weighting. Biometrics 2012; 68: Outcome Exposure C1 C2 C3 C4 C5 C individuals Observed Missing 85% incomplete cases; FICO=21% 34

35 8. Too much missing data In principle, MI can handle very large amounts of missing data, but the impact of anything you do wrong is much greater with more missing data departures from MAR will be very influential» sensitivity analysis to departures from MAR will identify this problem MI errors will matter a lot» e.g. with 70% missing data, omitting the outcome variable from the imputaiton model would dilute associations by 70% 35

36 How to choose an analysis: what to consider 1. Fraction of missing values for each variable in model 2. Fraction of incomplete cases 3. Fraction of incomplete cases among those with observed outcome and exposure (FICO) 4. Availability of auxiliary variables 5. Distribution of number of missing values 6. Patterns of jointly missing data 7. Reasons for missing data 8. Plausible missingness mechanisms 9. Clustering of data low FICO & no AVs CCA? simple pattern IPW? possible departures from MAR 36

37 Example 1 (auxiliary variables) Southampton Women's Survey (Crozier et al., 2009) 1987 women interviewed pre-pregnancy 1553 (78%) interviewed at early pregnancy 1893 (95%) interviewed at late pregnancy Analysis: regress mother's daily caffeine consumption at early pregnancy on her examination qualifications and age at conception MAR was considered plausible FICO is 1.5% - main missing data are in outcome suggests a complete cases analysis But recent caffeine consumption at pre-pregnancy and late pregnancy are useful auxiliary variables for caffeine consumption at early pregnancy we therefore recommend MI with auxiliary variables 37

38 Example 2 (repeated exposure) 1958 birth cohort Exposures Count % Exposure: maternal % interest in the education of the participant in % childhood, reported by % teachers at ages 7, % and 16 years and formed into a summary measure Total % Outcome: participants' cognitive function at 50 years. Table shows the 9649/17638 with observed outcome. The FICO is calculated here as the fraction of incomplete cases among those with observed outcome and partly or fully observed exposure. This works out as ( )/( )=55%. We recommend MI. 38

39 Example 3 (repeated outcomes) 1958 Birth Cohort Outcome: the trajectory of maths scores at age 7 to 16 years Exposure: birth weight Covariates: none The graph on the right summarises the data in a "wide" format (1 record per child) bwtkg math7a math11a math16a individuals Observed Missing 39

40 Example 3 ctd. The graphs below summarises the data in the more appropriate "long" format (1 record per wave per child) proportion of complete cases is 71% FICO=0 (obviously) Recommend a random-effects model (easier than MI) Age 7 Age 11 Age 16 bwtkg bwtkg bwtkg math math math individuals Observed Missing 75% complete cases; FICO=0% individuals Observed Missing 70% complete cases; FICO=0% individuals Observed Missing 58% complete cases; FICO=0% 40

41 Summary: the alternatives to MI 1. Very little missing data 2. Missing data only in the outcome 3. Other patterns with low FICO 4. Multilevel data 5. Interactions in the model 6. Mis-specified model 7. Simple missing data patterns 8. Too much missing data sensitivity analysis low FICO & no AVs CCA? REALCOM etc.? care IPW? 41

Combining Multiple Imputation and Inverse Probability Weighting

Combining Multiple Imputation and Inverse Probability Weighting Combining Multiple Imputation and Inverse Probability Weighting Shaun Seaman 1, Ian White 1, Andrew Copas 2,3, Leah Li 4 1 MRC Biostatistics Unit, Cambridge 2 MRC Clinical Trials Unit, London 3 UCL Research

More information

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey MRC Biostatistics Unit Institute of Public Health Forvie Site Robinson Way Cambridge

More information

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

HANDLING DROPOUT AND WITHDRAWAL IN LONGITUDINAL CLINICAL TRIALS

HANDLING DROPOUT AND WITHDRAWAL IN LONGITUDINAL CLINICAL TRIALS HANDLING DROPOUT AND WITHDRAWAL IN LONGITUDINAL CLINICAL TRIALS Mike Kenward London School of Hygiene and Tropical Medicine Acknowledgements to James Carpenter (LSHTM) Geert Molenberghs (Universities of

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random [Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator

More information

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014 Objectives Baseline Adjustment Introduce approaches Guidance

More information

Missing data and net survival analysis Bernard Rachet

Missing data and net survival analysis Bernard Rachet Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Imputation and Analysis. Peter Fayers

Imputation and Analysis. Peter Fayers Missing Data in Palliative Care Research Imputation and Analysis Peter Fayers Department of Public Health University of Aberdeen NTNU Det medisinske fakultet Missing data Missing data is a major problem

More information

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

METHODOLOGY 101 Training For Patients and Stakeholders Training Booklet & Resource Guide

METHODOLOGY 101 Training For Patients and Stakeholders Training Booklet & Resource Guide METHODOLOGY 101 Training For Patients and Stakeholders Training Booklet & Resource Guide V2 December, 2014 PCORI METHODOLOGY 101 TRAINING BOOKLET AND RESOURCE GUIDE 1 Methodology 101 Training Booklet and

More information

Can we establish cause-and-effect relationships in large healthcare databases?

Can we establish cause-and-effect relationships in large healthcare databases? Can we establish cause-and-effect relationships in large healthcare databases? Lawrence McCandless Associate Professor lmccandl@sfu.ca Faculty of Health Sciences, Simon Fraser University Spring 2016 Example

More information

Guideline on missing data in confirmatory clinical trials

Guideline on missing data in confirmatory clinical trials 2 July 2010 EMA/CPMP/EWP/1776/99 Rev. 1 Committee for Medicinal Products for Human Use (CHMP) Guideline on missing data in confirmatory clinical trials Discussion in the Efficacy Working Party June 1999/

More information

Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

More information

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

More information

2. Making example missing-value datasets: MCAR, MAR, and MNAR

2. Making example missing-value datasets: MCAR, MAR, and MNAR Lecture 20 1. Types of missing values 2. Making example missing-value datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data

More information

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Methods Report A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Hrishikesh Chakraborty and Hong Gu March 9 RTI Press About the Author Hrishikesh Chakraborty,

More information

Missing Data Dr Eleni Matechou

Missing Data Dr Eleni Matechou 1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.

More information

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

More information

When Does it Make Sense to Perform a Meta-Analysis?

When Does it Make Sense to Perform a Meta-Analysis? CHAPTER 40 When Does it Make Sense to Perform a Meta-Analysis? Introduction Are the studies similar enough to combine? Can I combine studies with different designs? How many studies are enough to carry

More information

Bayesian Approaches to Handling Missing Data

Bayesian Approaches to Handling Missing Data Bayesian Approaches to Handling Missing Data Nicky Best and Alexina Mason BIAS Short Course, Jan 30, 2012 Lecture 1. Introduction to Missing Data Bayesian Missing Data Course (Lecture 1) Introduction to

More information

Social Class Differences in Weight Gain from Birth to 3 Years

Social Class Differences in Weight Gain from Birth to 3 Years Niamh at 9 months Niamh at 3 years Social Class Differences in Weight Gain from Birth to 3 Years Niamh at 5 years Richard Layte (ESRI) Regien Biesma-Blanco (RCSI) www.growingup.ie Introduction - 1 Rates

More information

Dealing with Missing Data

Dealing with Missing Data Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

More information

Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

More information

Health 2011 Survey: An overview of the design, missing data and statistical analyses examples

Health 2011 Survey: An overview of the design, missing data and statistical analyses examples Health 2011 Survey: An overview of the design, missing data and statistical analyses examples Tommi Härkänen Department of Health, Functional Capacity and Welfare The National Institute for Health and

More information

Imputation of missing network data: Some simple procedures

Imputation of missing network data: Some simple procedures Imputation of missing network data: Some simple procedures Mark Huisman Dept. of Psychology University of Groningen Abstract Analysis of social network data is often hampered by non-response and missing

More information

Epidemiology-Biostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME:

Epidemiology-Biostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME: Epidemiology-Biostatistics Exam Exam 2, 2001 PRINT YOUR LEGAL NAME: Instructions: This exam is 30% of your course grade. The maximum number of points for the course is 1,000; hence this exam is worth 300

More information

Imputation of missing data under missing not at random assumption & sensitivity analysis

Imputation of missing data under missing not at random assumption & sensitivity analysis Imputation of missing data under missing not at random assumption & sensitivity analysis S. Jolani Department of Methodology and Statistics, Utrecht University, the Netherlands Advanced Multiple Imputation,

More information

Applied Missing Data Analysis in the Health Sciences. Statistics in Practice

Applied Missing Data Analysis in the Health Sciences. Statistics in Practice Brochure More information from http://www.researchandmarkets.com/reports/2741464/ Applied Missing Data Analysis in the Health Sciences. Statistics in Practice Description: A modern and practical guide

More information

Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

More information

Sensitivity Analysis in Multiple Imputation for Missing Data

Sensitivity Analysis in Multiple Imputation for Missing Data Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Evidence translation for effective early childhood intervention

Evidence translation for effective early childhood intervention Evidence translation for effective early childhood intervention Catherine Chittleborough, 1,2 Debbie Lawlor, 1,3 John Lynch 1,2 1 Social and Community Medicine, Bristol 2 Population Health and Clinical

More information

Missing data in randomized controlled trials (RCTs) can

Missing data in randomized controlled trials (RCTs) can EVALUATION TECHNICAL ASSISTANCE BRIEF for OAH & ACYF Teenage Pregnancy Prevention Grantees May 2013 Brief 3 Coping with Missing Data in Randomized Controlled Trials Missing data in randomized controlled

More information

Missing Data in Survival Analysis and Results from the MESS Trial

Missing Data in Survival Analysis and Results from the MESS Trial Missing Data in Survival Analysis and Results from the MESS Trial J. K. Rogers J. L. Hutton K. Hemming Department of Statistics University of Warwick Research Students Conference, 2008 Outline Background

More information

Dealing with Missing Data

Dealing with Missing Data Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

More information

Alissa Goodman and Ellen Greaves 1. Institute for Fiscal Studies

Alissa Goodman and Ellen Greaves 1. Institute for Fiscal Studies Alissa Goodman and Ellen Greaves 1 Institute for Fiscal Studies Does being married rather than cohabiting lead to more stability in relationships between parents? This assertion is made in the government

More information

Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh

Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh Megan Marron and Abdus Wahed Graduate School of Public Health Outline My Experience Motivation for

More information

Do mothers decide? The impact of preferences in maternity care

Do mothers decide? The impact of preferences in maternity care Do mothers decide? The impact of preferences in maternity care Jostein Grytten, Irene Skau and Rune J. Sørensen University of Oslo/Oslo University Hospital and BI Norwegian Business School Propositions

More information

Answer keys for Assignment 7: Experimental study designs

Answer keys for Assignment 7: Experimental study designs Answer keys for Assignment 7: Experimental study designs (The correct answer is underlined in bold text) 1) Which of the following interventions can be tested in a clinical trial? a) b) c) d) e) New Vaccines

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

Errors in epidemiological studies

Errors in epidemiological studies Errors in epidemiological studies What can go wrong?? sampled from the target population? sample large enough? measured precisely? measured accurately? disease correctly diagnosed? absence of disease correctly

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

A Guide to Imputing Missing Data with Stata Revision: 1.4

A Guide to Imputing Missing Data with Stata Revision: 1.4 A Guide to Imputing Missing Data with Stata Revision: 1.4 Mark Lunt December 6, 2011 Contents 1 Introduction 3 2 Installing Packages 4 3 How big is the problem? 5 4 First steps in imputation 5 5 Imputation

More information

How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power

How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power STRUCTURAL EQUATION MODELING, 9(4), 599 620 Copyright 2002, Lawrence Erlbaum Associates, Inc. TEACHER S CORNER How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power Linda K. Muthén

More information

UMEÅ INTERNATIONAL SCHOOL

UMEÅ INTERNATIONAL SCHOOL UMEÅ INTERNATIONAL SCHOOL OF PUBLIC HEALTH Master Programme in Public Health - Programme and Courses Academic year 2015-2016 Public Health and Clinical Medicine Umeå International School of Public Health

More information

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) R.KAVITHA KUMAR Department of Computer Science and Engineering Pondicherry Engineering College, Pudhucherry, India DR. R.M.CHADRASEKAR Professor,

More information

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA 123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

More information

Big data size isn t enough! Irene Petersen, PhD Primary Care & Population Health

Big data size isn t enough! Irene Petersen, PhD Primary Care & Population Health Big data size isn t enough! Irene Petersen, PhD Primary Care & Population Health Introduction Reader (Statistics and Epidemiology) Research team epidemiologists/statisticians/phd students Primary care

More information

AP Statistics Final Examination Multiple-Choice Questions Answers in Bold

AP Statistics Final Examination Multiple-Choice Questions Answers in Bold AP Statistics Final Examination Multiple-Choice Questions Answers in Bold Name Date Period Answer Sheet: Multiple-Choice Questions 1. A B C D E 14. A B C D E 2. A B C D E 15. A B C D E 3. A B C D E 16.

More information

Sample Size Estimation and Power Analysis

Sample Size Estimation and Power Analysis yumi Shintani, Ph.D., M.P.H. Sample Size Estimation and Power nalysis March 2008 yumi Shintani, PhD, MPH Department of Biostatistics Vanderbilt University 1 researcher conducted a study comparing the effect

More information

PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku,

PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku, PATTERN MIXTURE MODELS FOR MISSING DATA Mike Kenward London School of Hygiene and Tropical Medicine Talk at the University of Turku, April 10th 2012 1 / 90 CONTENTS 1 Examples 2 Modelling Incomplete Data

More information

www.rmsolutions.net R&M Solutons

www.rmsolutions.net R&M Solutons Ahmed Hassouna, MD Professor of cardiovascular surgery, Ain-Shams University, EGYPT. Diploma of medical statistics and clinical trial, Paris 6 university, Paris. 1A- Choose the best answer The duration

More information

arxiv:1301.2490v1 [stat.ap] 11 Jan 2013

arxiv:1301.2490v1 [stat.ap] 11 Jan 2013 The Annals of Applied Statistics 2012, Vol. 6, No. 4, 1814 1837 DOI: 10.1214/12-AOAS555 c Institute of Mathematical Statistics, 2012 arxiv:1301.2490v1 [stat.ap] 11 Jan 2013 ADDRESSING MISSING DATA MECHANISM

More information

DMRI Drug Misuse Research Initiative

DMRI Drug Misuse Research Initiative DMRI Drug Misuse Research Initiative Executive Summary The psychosocial consequences of drug misuse: a systematic review of longitudinal studies Research Report submitted to the Department of Health in

More information

Prospective, retrospective, and cross-sectional studies

Prospective, retrospective, and cross-sectional studies Prospective, retrospective, and cross-sectional studies Patrick Breheny April 3 Patrick Breheny Introduction to Biostatistics (171:161) 1/17 Study designs that can be analyzed with χ 2 -tests One reason

More information

Experimental Designs leading to multiple regression analysis

Experimental Designs leading to multiple regression analysis Experimental Designs leading to multiple regression analysis 1. (Randomized) designed experiments. 2. Randomized block experiments. 3. Observational studies: probability based sample surveys 4. Observational

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

COHORT STUDIES. Concept of a cohort : A group of individuals that are all similar in some trait and move forward together as a unit.

COHORT STUDIES. Concept of a cohort : A group of individuals that are all similar in some trait and move forward together as a unit. OCW Epidemiology and Biostatistics, 2010 Alice Tang Tufts University School of Medicine October 5, 2010 COHORT STUDIES Learning objectives for this session: 1) Know when it is appropriate/feasible to use

More information

Analysing Complex Social Surveys

Analysing Complex Social Surveys Analysing Complex Social Surveys Scottish Social Survey Network, Master Class Stirling, 25 March 2010 Peter Lynn University of Essex What is a Complex Survey? Features of importance to analysts: Sample

More information

Methods: Simple Random Sampling Topics: Introduction to Simple Random Sampling

Methods: Simple Random Sampling Topics: Introduction to Simple Random Sampling Methods: Simple Random Sampling Topics: Introduction to Simple Random Sampling - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The Rwanda 2010 Demographic and Health Survey

More information

Modern Methods for Missing Data

Modern Methods for Missing Data Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years

More information

Selection bias in secondary analysis of electronic health record data. Sebastien Haneuse, PhD

Selection bias in secondary analysis of electronic health record data. Sebastien Haneuse, PhD Selection bias in secondary analysis of electronic health record data Sebastien Haneuse, PhD Department of Biostatistics Harvard School of Public Health 1 Symposium on Health Care Data Analytics, Seattle,

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy

More information

How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power

How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power Linda K. Muthén Muthén & Muthén 11965 Venice Blvd., Suite 407 Los Angeles, CA 90066 Telephone: (310) 391-9971 Fax: (310) 391-8971

More information

Item Imputation Without Specifying Scale Structure

Item Imputation Without Specifying Scale Structure Original Article Item Imputation Without Specifying Scale Structure Stef van Buuren TNO Quality of Life, Leiden, The Netherlands University of Utrecht, The Netherlands Abstract. Imputation of incomplete

More information

Surveying Prisoner Crime Reduction (SPCR) Adjusting for Missing Data Technical Report

Surveying Prisoner Crime Reduction (SPCR) Adjusting for Missing Data Technical Report Surveying Prisoner Crime Reduction (SPCR) Adjusting for Missing Data Technical Report Ian Brunton-Smith, University of Surrey James Carpenter and Mike Kenward, London School of Hygiene and Tropical Medicine

More information

5 Risk factors for the persistence of wheeze during childhood

5 Risk factors for the persistence of wheeze during childhood 5 Risk factors for the persistence of wheeze during childhood Asthma has a variable natural history, with onset and remission occurring at any age. Longitudinal studies suggest that the pattern of asthma

More information

What is the difference between association and causation?

What is the difference between association and causation? What is the difference between association and causation? And why should we bother being formal about it? Rhian Daniel and Bianca De Stavola ESRC Research Methods Festival, 5th July 2012, 10.00am Association

More information

Longitudinal Studies, The Institute of Education, University of London. Square, London, EC1 OHB, U.K. Email: R.D.Wiggins@city.ac.

Longitudinal Studies, The Institute of Education, University of London. Square, London, EC1 OHB, U.K. Email: R.D.Wiggins@city.ac. A comparative evaluation of currently available software remedies to handle missing data in the context of longitudinal design and analysis. Wiggins, R.D 1., Ely, M 2. & Lynch, K. 3 1 Department of Sociology,

More information

Combining information from different survey samples - a case study with data collected by world wide web and telephone

Combining information from different survey samples - a case study with data collected by world wide web and telephone Combining information from different survey samples - a case study with data collected by world wide web and telephone Magne Aldrin Norwegian Computing Center P.O. Box 114 Blindern N-0314 Oslo Norway E-mail:

More information

Missing Data: Patterns, Mechanisms & Prevention. Edith de Leeuw

Missing Data: Patterns, Mechanisms & Prevention. Edith de Leeuw Missing Data: Patterns, Mechanisms & Prevention Edith de Leeuw Thema middag Nonresponse en Missing Data, Universiteit Groningen, 30 Maart 2006 Item-Nonresponse Pattern General pattern: various variables

More information

6/15/2005 7:54 PM. Affirmative Action s Affirmative Actions: A Reply to Sander

6/15/2005 7:54 PM. Affirmative Action s Affirmative Actions: A Reply to Sander Reply Affirmative Action s Affirmative Actions: A Reply to Sander Daniel E. Ho I am grateful to Professor Sander for his interest in my work and his willingness to pursue a valid answer to the critical

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

Technical Report on Response in the Teacher Survey in MCS 4 (Age 7)

Technical Report on Response in the Teacher Survey in MCS 4 (Age 7) Millennium Cohort Study Technical Report on Response in the Teacher Survey in MCS 4 (Age 7) Tarek Mostafa with contributions from Rachel Rosenberg November 2013 Centre for Longitudinal Studies Following

More information

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL) PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf)

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Missing values in data analysis: Ignore or Impute?

Missing values in data analysis: Ignore or Impute? ORIGINAL ARTICLE Missing values in data analysis: Ignore or Impute? Ng Chong Guan 1, Muhamad Saiful Bahri Yusoff 2 1 Department of Psychological Medicine, Faculty of Medicine, University Malaya 2 Medical

More information

Sample size in cluster randomised trials. Sandra Eldridge Professor of Biostatistics Director of Pragmatic Clinical Trials Unit

Sample size in cluster randomised trials. Sandra Eldridge Professor of Biostatistics Director of Pragmatic Clinical Trials Unit Sample size in cluster randomised trials Sandra Eldridge Professor of Biostatistics Director of Pragmatic Clinical Trials Unit Outline Introduction Background to trials in health services research/ primary

More information

What is meant by "randomization"? (Select the one best answer.)

What is meant by randomization? (Select the one best answer.) Preview: Post-class quiz 5 - Clinical Trials Question 1 What is meant by "randomization"? (Select the one best answer.) Question 2 A. Selection of subjects at random. B. Randomization is a method of allocating

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Weighing the Evidence: Critical Appraisal and Systematic Review of RCTs

Weighing the Evidence: Critical Appraisal and Systematic Review of RCTs Weighing the Evidence: Critical Appraisal and Systematic Review of RCTs Dr. Rosie Mayston, Centre for Global Mental Health, Institute of Psychiatry, King s College London Overview Today s session aims

More information

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA Hatice UENAL Institute of Epidemiology and Medical Biometry, Ulm University, Germany

More information

Raul Cruz-Cano, HLTH653 Spring 2013

Raul Cruz-Cano, HLTH653 Spring 2013 Multilevel Modeling-Logistic Schedule 3/18/2013 = Spring Break 3/25/2013 = Longitudinal Analysis 4/1/2013 = Midterm (Exercises 1-5, not Longitudinal) Introduction Just as with linear regression, logistic

More information

Always Start with PECO

Always Start with PECO Goals of This Course Be able to understand a study design (very basic concept) Be able to understand statistical concepts in a medical paper Be able to perform a data analysis Understanding: PECO study

More information

2. Background This was the fourth submission for everolimus requesting listing for clear cell renal carcinoma.

2. Background This was the fourth submission for everolimus requesting listing for clear cell renal carcinoma. PUBLIC SUMMARY DOCUMENT Product: Everolimus, tablets, 5 mg and 10 mg, Afinitor Sponsor: Novartis Pharmaceuticals Australia Pty Ltd Date of PBAC Consideration: November 2011 1. Purpose of Application To

More information

Chemicals and childhood leukemia

Chemicals and childhood leukemia Chemicals and childhood leukemia Claire Infante-Rivard MD, PhD McGill University, Montréal, Canada Currently at Inserm UMR-S S 754, Paris, France Supported by a UICC Yamagiwa-Yoshida Yoshida Memorial International

More information

Randomized trials versus observational studies

Randomized trials versus observational studies Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James

More information

WWC Single Study Review A review of the design and summary of findings for an individual study

WWC Single Study Review A review of the design and summary of findings for an individual study What Works Clearinghouse WWC Single Study Review A review of the design and summary of findings for an individual study U.S. DEPARTMENT OF EDUCATION July 2015 WWC Review of the Report Interactive Online

More information

PRACTICE PROBLEMS FOR BIOSTATISTICS

PRACTICE PROBLEMS FOR BIOSTATISTICS PRACTICE PROBLEMS FOR BIOSTATISTICS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION 1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period.

More information

Assessment of Rescue Medication Effect in Psychiatric Clinical Trials

Assessment of Rescue Medication Effect in Psychiatric Clinical Trials Society for Clinical Trials 36 th Annual Meeting Assessment of Rescue Medication Effect in Psychiatric Clinical Trials Zhibao Mi, John H. Krystal, Karen M. Jones, Robert A. Rosenheck, Joseph F. Collins

More information

Sample Size Planning, Calculation, and Justification

Sample Size Planning, Calculation, and Justification Sample Size Planning, Calculation, and Justification Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa

More information