# 2. Making example missing-value datasets: MCAR, MAR, and MNAR

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Lecture Types of missing values 2. Making example missing-value datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data Methods Clinical trial randomly assigned 100 patients with major depression to an experimental drug (D) or to placebo (P). (Source: Dmitrienko et. al. (2005). Participants completed the Hamilton depression rating scale (HAMD) at baseline and again after 9-week treatment. Study outcome was HAMD at end; higher scores mean worse depression. Participants at 5 centers: drug center Frequency Total Drug Placebo Total

2 The first 5 observations from the Depression Study data ID baseline final drug center D D D D D 1 Model(s) to compare final HAMD between treatments, adjusted for baseline and center: We ll return to these models to analyze this data. 3 Missing Values Suppose that some final surveys were missing not completed. What happens to these participants data in the fitting the adjusted model? What if patients with the worst side-effects to the experimental drug (D) dropped out and didn t complete the final survey? 4

3 Types of Missing Data Missing completely at random (MCAR): data are missing independently of both observed and unobserved data. Example: a participant flips a coin to decide whether to complete the depression survey. Missing at random (MAR): given the observed data, data are missing independently of unobserved data. Example: male participants are more likely to refuse to fill out the depression survey, but it does not depend on the level of their depression. 5 MCAR implies MAR, but not the other way round. Most methods assume MAR. We can ignore missing data ( = omit missing observations) if we have MAR or MCAR. Missing Not at Random (MNAR): missing observations related to values of unobserved data. Example: participants with severe depression, or side-effects from the medication, were more likely to be missing at end. Informative missingness: the fact that data is missing contains information about the response. Observed data is biased sample. Missing data cannot be ignored. 6

4 Cannot distinguish MAR from MNAR without additional information. SAS default is to omit cases with missing data = ignore missing data. With MNAR, you get a non-representative sample and biased estimates. References: Dmitrienko et. al. (2005) Analysis of Clinical Trials Using SAS, Chapter 5 R Little and D Rubin (2002) Statistical Analysis with Missing Data, Second Edition 7 Plan: 1. Delete observations from HAMD data to make an example of each type of missing data. 2. Discuss approaches to handling missing data. 3. Compare these approaches on our constructed examples from HAMD. 8

5 Make missing completely at random (MCAR) example MCAR: data are missing independently of both observed and unobserved data. Example: participant flips a coin to decide whether to complete final survey. Randomly select 30% of the observations in HAMD, set to missing. data MCAR; set ph6470.hamd2; missing = 0; if (ranuni(457392) <.3) then do; select 30% random sample final =. ; missing=1; label missing values end; 9 MCAR example, first 10 observations. Obs missing baseline final drug center D D D D D D D D D D 1 10

6 proc freq data=mcar; tables missing; missing Frequency Percent Frequency Percent What percent are actually missing? 11 Missing at random (MAR) example Missing at random (MAR): given the observed data, data are missing independently of unobserved data. Example: male participants more likely to refuse to fill out final survey, independent of their level of their depression. Data does not include gender. Missing values related to observed data: only at centers 1, 2, and 3. Need to get º 33 missing cases. Centers 1, 2, 3 together have 64/100 patients in study. What proportion p should be missing? p 64 = 33 gives x =

7 data MAR; set ph6470.hamd2; missing = 0; if (ranuni(457392) <.516 and center IN (1, 2, 3)) then do; final =. ; missing=1; end; proc freq data=mar; tables missing; Cumulative Cumulative missing Frequency Percent Frequency Percent Adjusting the cutoff for the uniform random number gives: data MAR; set ph6470.hamd2; missing = 0; if (ranuni(457392) <.435 final =. ; missing=1; end; and center IN (1, 2, 3)) then do; This produces 34 missing values, nearly the same number as the MCAR example. 14

8 MAR example, first 10 observations. Obs missing baseline final change drug center D D D D D D D D D D 1 15 Missing not at random (MNAR) example MNAR: missing observations related to values of unobserved data. Example: participants with most severe depression were less likely to complete final HAMD survey. Identify high final values. Randomly select 33 among these to delete want same amount of missing data as other examples. How do we identify top 50% of baseline values? 16

9 Proc univariate data=ph6470.hamd2; var final; Quantile Estimate 100% Max % % % % Q % Median % Q % 4.0 5% 2.0 1% 1.0 0% Min What proportion do we remove? p 50 = 33 gives p =.66 data MNAR; set ph6470.hamd2; missing=0; if ( final GE 14.5 final =. ; missing=1; end; proc freq data=mnar; tables missing; and ranuni(884739) <.66 ) then do; This gives only 30 missing values, and we want 33 or 34. What do we adjust to get a few more missing values? 18

10 Trial and error leads to: data MNAR; set ph6470.hamd2; missing=0; if (final GE 14.5 and ranuni(884739) <.69 ) then do; final =. ; missing=1; end; which gives 33 missing values. 19 MNAR example, first 10 observations: Obs missing baseline final change drug center D D D D D D D D D D 1 20

11 Review the plan: 1. Delete observations from HAMD data to make an example of each type of missing data: MCAR, MAR, MNAR. All data sets have 33% missing data. 2. Overview: approaches to handling missing data. 3. Compare these approaches on our constructed examples from HAMD. Results will depend on type of missingness, not amount of missing data. 21 Common methods for MAR data MAR property: missing-ness related only to observed data, not the missing data. 1. Complete case analysis. Omit observations missing any part of the data. SAS default for many procedures. Requires MCAR to be unbiased. 22

12 2. Last observation carried forward (LOCF). Longitudinal data collection where early measurements are not missing but final measurements are missing. Use each subject s last non-missing measurement to fill in later missing values. Reduces apparent change in response. Requires strong assumptions about response; does not account for uncertainty of missing data. Better approach: use Proc Mixed which can handle missing values in longitudinal data Imputation. This means filling in each missing value with a guess. Many ways to impute, such as: Use mean of individual s other values. Replace missing value in a group with group mean. Predict missing values in a variable V from regression of V on other variables. Requires strong assumptions about response; does not account for uncertainty of missing data. 24

13 4. Multiple imputation: (a) Impute observations for all missing values in a variable V : use random samples from normal distribution with mean and SD of V. (Or use regression to predict mean and SD, then sample from this normal distribution.) (b) Do the imputation M times, creating M complete data sets. (c) Analyze each of the M complete data sets. (d) Combine the results of the M analyses to draw conclusions. Requires MAR to be unbiased. Partially accounts for uncertainty of missing data. 25 Compare these approaches on our constructed missing-data examples from HAMD Estimate treatment means, test treatment center interaction from full data, and constructed examples of MCAR, MAR, and MNAR. For MCAR, MAR, and MNAR, apply 1. complete case analysis 2. last observation carried forward (LOCF) 3. multiple imputation 26

14 Full data analysis Test interaction between treatments and centers: Proc GLM data=ph6470.hamd2; class drug center; model final = baseline drug center drug*center; Estimate treatment means using main-effect model: Proc GLM data=ph6470.hamd2; class drug center; model final = baseline drug center; LSmeans drug / stderr; 27 Source DF Type III SS Mean Square F Value Pr > F baseline <.0001 drug <.0001 center drug*center

15 From the main-effects model (parallel lines), which assumes no interaction: Standard Parameter Estimate Error t Value Pr > t Intercept B baseline <.0001 drug D B <.0001 drug P B... center B center B center B center B center B... Least Squares Means H0:LSMean1= Standard H0:LSMEAN=0 LSMean2 drug final LSMEAN Error Pr > t Pr > t D <.0001 <.0001 P <.0001 Where is estimate of treatment difference? 29 Higher scores on HAMD mean worse depression. Drug effect shown by higher final scores with placebo than with drug. Interaction Drug Effect Drug Effect Data Method P-value (Pbo Drug) ± SE P-value Full ± 1 <.0001 MCAR MAR MNAR 30

16 Complete Case (CC) Proc GLM omits any observations with missing values for the response or any predictors in the model or class statement. Apply interaction and main-effects Proc GLM to MCAR, MAR, MNAR data sets. MCAR complete case The GLM Procedure Class Level Information Class Levels Values drug 2 D P center Number of Observations Read 100 Number of Observations Used Source DF Type III SS Mean Square F Value Pr > F baseline <.0001 drug <.0001 center drug*center Standard Parameter Estimate Error t Value Pr > t Intercept B baseline <.0001 drug D B <.0001 drug P B... center B center B center B center B center B... H0:LSMean1= Standard H0:LSMEAN=0 LSMean2 drug final LSMEAN Error Pr > t Pr > t D <.0001 <.0001 P <.0001 Repeat this analysis with MAR and MNAR examples. 32

17 Interaction Drug Effect Drug Effect Data Method P-value (Pbo Drug) ± SE P-value Full ± 1 <.0001 MCAR CC ± 1 <.0001 MAR CC ± 1 <.0001 MNAR CC ± 1 < Last observation carried forward (LOCF) Fill in the missing final values with baseline in a data step. data MCAR_lcf; set MCAR; final_lcf =final; create a new response variable if final=. then final_lcf=baseline; fill in missing with baseline data MAR_lcf; set MAR; final_lcf=final; if final=. then final_lcf=baseline; data MNAR_lcf; set MNAR; final_lcf=final; if final=. then final_lcf=baseline; 34

18 MCAR last value carried forward The GLM Procedure Class Levels Values drug 2 D P center Dependent Variable: final_lcf Number of Observations Read 100 Number of Observations Used 100 The GLM Procedure No missing data now, because we have filled all the holes. 35 Source DF Type III SS Mean Square F Value Pr > F baseline <.0001 drug center drug*center Standard Parameter Estimate Error t Value Pr > t Intercept B baseline <.0001 drug D B drug P B... center B center B center B center B center B... H0:LSMean1= final_lcf Standard H0:LSMEAN=0 LSMean2 drug LSMEAN Error Pr > t Pr > t D < P <

19 Repeating this analysis with MAR and MNAR examples gives: Interaction Drug Effect Drug Effect Data Method P-value (Pbo Drug) ± SE P-value Full ± 1 <.0001 MCAR CC ± 1 <.0001 LOCF ± MAR CC ± 1 <.0001 LOCF ± MNAR CC ± 1 <.0001 LOCF ± 2 < Multiple Imputation: Proc MI + Proc MIanalyze We want to estimate a parameter µ (eg. adjusted mean or regression coefficient) from data with missing values. 1. Proc MI For each missing value Y i, generate M estimates y im, m = 1,..., M using the distribution of observed values. Use MAR property: missingness related only to observed data. Fill in missing values in the data using each set {y im }, to produce M complete data sets. 2. Fit a model to each of the M complete data sets to get a parameter estimate ˆµ m with variance V m (squared standard error). 38

20 3. Proc MIanalyze Combine the results of the M analyses. Combined estimate of µ is the average of the M estimates { ˆµ m }: µ M = 1 M MX ˆµ m. Variance of this estimate comes from the within-imputation variance, estimated by the mean V M of the variances {V m }, and the between-imputation variance and so its standard error is: B M = SE( µ M ) = 1 1 MX ( ˆµ m µ M ) 2, M 1 r 1 V M + M + 1 M B M. Little & Rubin (2002) Statistical Analysis with Missing Data, Second Edition 39 For Depression Study example, imputation code will have 3 steps: 1. Proc MI generates M complete data sets, indexed by _Imputation_ 2. Proc GLM fits the model, BY _Imputation_, and outputs the results as a dataset (use ODS close listing to prevent writing them to the output window) 3. Proc MIanalyze reads output dataset, makes combined estimate µ M and SE( µ M ) An additional problem is that drug and center are CLASS variables and MIanalyze has problems with these. Need to add these indicators to data. 40

21 Make indicators for CLASS variables in MCAR, MAR, and MNAR data sets: data ph6470.hamd_mcar; set mar; drugd = (drug="d"); logical variables to make indicators center1=(center=1); center2=(center=2); center3=(center=3); center4=(center=4); drugcenter_1 = drugd * center1; drugcenter_2 = drugd * center2; drugcenter_3 = drugd * center3; drugcenter_4 = drugd * center4; 41 Multiple Imputation SAS code Step 1. Make 20 complete datasets using imputation Proc MI data=ph6470.hamd_mcar out=c output data set nimpute=20 number of filled-in datasets seed= minimum= 0 maximum= 40 reject values outside 0-40, range of HAMD round=1.0; round to integer var final; variables to fill in 42

22 The MI Procedure Model Information Data Set PH6470.HAMD_MCAR Method MCMC Multiple Imputation Chain Single Chain Initial Estimates for MCMC EM Posterior Mode Start Starting Value Prior Jeffreys Number of Imputations 20 Number of Burn-in Iterations 200 Number of Iterations 100 Seed for random number generator Missing Data Patterns Group Means Group baseline final Freq Percent baseline final 1 X X X Step 2. Fit model in Proc GLM to each of the 20 imputed datasets. Write results to output datasets see examples in Help Documentation for Proc MIanalyze. ODS listing close; Proc GLM data=c; model final = baseline drugd center1 center2 center3 center4 drugcenter_1 drugcenter_2 drugcenter_3 drugcenter_4 / inverse solution; by _Imputation_; ODS output ParameterEstimates=glmparms InvXPX=glmxpxi; run; ODS listing; 44

23 Step 3. Combine estimates. Proc MIanalyze parms=glmparms xpxi=glmxpxi ; modeleffects Intercept baseline drugd center1 center2 center3 center4 drugcenter_1 drugcenter_2 drugcenter_3 drugcenter_4; Very difficult to figure out what output should be passed from procedures (step 2) to Proc MIanalyze. Follow examples given in documentation for MIanalyze or use Google to look for examples. 45 MIanalyze: interaction model The MIANALYZE Procedure Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF drugd center center center center drugcenter_ drugcenter_ drugcenter_ drugcenter_ t for H0: Parameter Theta0 Parameter=Theta0 Pr > t drugcenter_ drugcenter_ drugcenter_ drugcenter_ Interaction significant? 46

24 From main-effects model: The MIANALYZE Procedure Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF Intercept baseline drugd center center center center t for H0: Parameter Theta0 Parameter=Theta0 Pr > t Intercept baseline <.0001 drugd Interaction Drug Effect Drug Effect Data Method P-value (Pbo Drug) ± SE P-value Full ± 1 <.0001 MCAR CC ± 1 <.0001 LOCF ± MI NS 4.5 ± MAR CC ± 1 <.0001 LOCF ± MI NS 4.8 ± MNAR CC ± 1 <.0001 LOCF ± 2 <.0001 MI NS 3.7 ± Imputing values when data are not missing at random can lead to severe bias. 48

25 Difficult questions with missing data imputation: 1. Do you have missing at random? How do you know? 2. How do you choose an imputation method? How can you use what you know to improve the process of imputation? References: Dmitrienko et. al. (2005) Analysis of Clinical Trials Using SAS, Chapter 5 R Little and D Rubin (2002) Statistical Analysis with Missing Data, Second Edition Lachin JM. Worst-rank score analysis with informatively missing observations in clinical trials. Control Clin Trials 1999; 20:

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Sensitivity Analysis in Multiple Imputation for Missing Data

Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

### Implementation of Pattern-Mixture Models Using Standard SAS/STAT Procedures

PharmaSUG2011 - Paper SP04 Implementation of Pattern-Mixture Models Using Standard SAS/STAT Procedures Bohdana Ratitch, Quintiles, Montreal, Quebec, Canada Michael O Kelly, Quintiles, Dublin, Ireland ABSTRACT

### Imputing Missing Data using SAS

ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

### Dealing with Missing Data

Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

### Missing Data Dr Eleni Matechou

1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.

### Discrete with many values are often treated as continuous: Minnesota math scores, years of education.

Lecture 9 1. Continuous and categorical predictors 2. Indicator variables 3. General linear model: child s IQ example 4. Proc GLM and Proc Reg 5. ANOVA example: Rat diets 6. Factorial design: 2 2 7. Interactions

### Imputation and Analysis. Peter Fayers

Missing Data in Palliative Care Research Imputation and Analysis Peter Fayers Department of Public Health University of Aberdeen NTNU Det medisinske fakultet Missing data Missing data is a major problem

### Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

### Missing Data Sensitivity Analysis of a Continuous Endpoint An Example from a Recent Submission

Missing Data Sensitivity Analysis of a Continuous Endpoint An Example from a Recent Submission Arno Fritsch Clinical Statistics Europe, Bayer November 21, 2014 ASA NJ Chapter / Bayer Workshop, Whippany

### A Method of Using Multiple Imputation in Clinical Data Analysis

A Method of Using Multiple Imputation in Clinical Data Analysis Christina M. Gullion, Center for Health Research, Portland, OR Chuhe Chen, Center for Health Research, Portland, OR Gayle T. Meltesen, Center

### Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

### Chapter 10 The MIANALYZE Procedure

Chapter 10 The MIANALYZE Procedure Chapter Table of Contents OVERVIEW...203 GETTING STARTED...203 SYNTAX...206 PROCMIANALYZEStatement...207 BYStatement...208 VARStatement...209 DETAILS...209 Input Data

### Modern Methods for Missing Data

Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years

### Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple

### Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014 Objectives Baseline Adjustment Introduce approaches Guidance

### Missing Data. Paul D. Allison INTRODUCTION

4 Missing Data Paul D. Allison INTRODUCTION Missing data are ubiquitous in psychological research. By missing data, I mean data that are missing for some (but not all) variables and for some (but not all)

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### UNTIED WORST-RANK SCORE ANALYSIS

Paper PO16 UNTIED WORST-RANK SCORE ANALYSIS William F. McCarthy, Maryland Medical Research Institute, Baltime, MD Nan Guo, Maryland Medical Research Institute, Baltime, MD ABSTRACT When the non-fatal outcome

### Dealing with missing data: Key assumptions and methods for applied analysis

Technical Report No. 4 May 6, 2013 Dealing with missing data: Key assumptions and methods for applied analysis Marina Soley-Bori msoley@bu.edu This paper was published in fulfillment of the requirements

### Dealing with Missing Data for Credit Scoring

ABSTRACT Paper SD-08 Dealing with Missing Data for Credit Scoring Steve Fleming, Clarity Services Inc. How well can statistical adjustments account for missing data in the development of credit scores?

### Multiple Imputation for Missing Data: Concepts and New Development (Version 9.0)

Multiple Imputation for Missing Data: Concepts and New Development (Version 9.0) Yang C. Yuan, SAS Institute Inc., Rockville, MD Abstract Multiple imputation provides a useful strategy for dealing with

### Missing Data. Katyn & Elena

Missing Data Katyn & Elena What to do with Missing Data Standard is complete case analysis/listwise dele;on ie. Delete cases with missing data so only complete cases are le> Two other popular op;ons: Mul;ple

### 1. Proc GLM: Residual plots, interaction plots. 2. Proc GLM: means and LSmeans. 3. Dates and ages. 4. Calculating differences with arrays

Lecture 6 1. Proc GLM: Residual plots, interaction plots 2. Proc GLM: means and LSmeans 3. Dates and ages 4. Calculating differences with arrays 1 ANOVA Example: Multicenter Depression Trial (source: Dmitrienko

### A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

### Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set

Imputing Attendance Data in a Longitudinal Multilevel Panel Data Set April 2015 SHORT REPORT Baby FACES 2009 This page is left blank for double-sided printing. Imputing Attendance Data in a Longitudinal

### Let s explore SAS Proc T-Test

Let s explore SAS Proc T-Test Ana Yankovsky Research Statistical Analyst Screening Programs, AHS Ana.Yankovsky@albertahealthservices.ca Goals of the presentation: 1. Look at the structure of Proc TTEST;

### Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know

### Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk

### Dealing with Missing Data

Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### IBM SPSS Missing Values 22

IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

### Lecture Outline (week 13)

Lecture Outline (week 3) Analysis of Covariance in Randomized studies Mixed models: Randomized block models Repeated Measures models Pretest-posttest models Analysis of Covariance in Randomized studies

### HANDLING DROPOUT AND WITHDRAWAL IN LONGITUDINAL CLINICAL TRIALS

HANDLING DROPOUT AND WITHDRAWAL IN LONGITUDINAL CLINICAL TRIALS Mike Kenward London School of Hygiene and Tropical Medicine Acknowledgements to James Carpenter (LSHTM) Geert Molenberghs (Universities of

### Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC MEANS A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC MEANS is a basic procedure within BASE SAS

### Bayesian Approaches to Handling Missing Data

Bayesian Approaches to Handling Missing Data Nicky Best and Alexina Mason BIAS Short Course, Jan 30, 2012 Lecture 1. Introduction to Missing Data Bayesian Missing Data Course (Lecture 1) Introduction to

### Dr James Roger. GlaxoSmithKline & London School of Hygiene and Tropical Medicine.

American Statistical Association Biopharm Section Monthly Webinar Series: Sensitivity analyses that address missing data issues in Longitudinal studies for regulatory submission. Dr James Roger. GlaxoSmithKline

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu

### How to choose an analysis to handle missing data in longitudinal observational studies

How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### ECON Introductory Econometrics. Lecture 17: Experiments

ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

### EXST SAS Lab Lab #9: Two-sample t-tests

EXST700X Lab Spring 014 EXST SAS Lab Lab #9: Two-sample t-tests Objectives 1. Input a CSV file (data set #1) and do a one-tailed two-sample t-test. Input a TXT file (data set #) and do a two-tailed two-sample

### Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh

Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh Megan Marron and Abdus Wahed Graduate School of Public Health Outline My Experience Motivation for

### Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey MRC Biostatistics Unit Institute of Public Health Forvie Site Robinson Way Cambridge

### Paper PO06. Randomization in Clinical Trial Studies

Paper PO06 Randomization in Clinical Trial Studies David Shen, WCI, Inc. Zaizai Lu, AstraZeneca Pharmaceuticals ABSTRACT Randomization is of central importance in clinical trials. It prevents selection

### Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

### Imputation of missing data under missing not at random assumption & sensitivity analysis

Imputation of missing data under missing not at random assumption & sensitivity analysis S. Jolani Department of Methodology and Statistics, Utrecht University, the Netherlands Advanced Multiple Imputation,

### Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

### Regression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Regression in ANOVA 1 Introduction 2 Basic Linear

### An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing

### Missing Data Techniques for Structural Equation Modeling

Journal of Abnormal Psychology Copyright 2003 by the American Psychological Association, Inc. 2003, Vol. 112, No. 4, 545 557 0021-843X/03/\$12.00 DOI: 10.1037/0021-843X.112.4.545 Missing Data Techniques

### PHP 2510 Central limit theorem, confidence intervals. PHP 2510 October 20,

PHP 2510 Central limit theorem, confidence intervals PHP 2510 October 20, 2008 1 Distribution of the sample mean Case 1: Population distribution is normal For an individual in the population, X i N(µ,

### Paired Differences and Regression

Paired Differences and Regression Students sometimes have difficulty distinguishing between paired data and independent samples when comparing two means. One can return to this topic after covering simple

### Analysis of Longitudinal Data with Missing Values.

Analysis of Longitudinal Data with Missing Values. Methods and Applications in Medical Statistics. Ingrid Garli Dragset Master of Science in Physics and Mathematics Submission date: June 2009 Supervisor:

### Examining a Fitted Logistic Model

STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

### EXST SAS Lab Lab #7: Hypothesis testing with Paired t-tests and One-tailed t-tests

EXST SAS Lab Lab #7: Hypothesis testing with Paired t-tests and One-tailed t-tests Objectives 1. Infile two external data sets (TXT files) 2. Calculate a difference between two variables in the data step

### Can Annuity Purchase Intentions Be Influenced?

Can Annuity Purchase Intentions Be Influenced? Jodi DiCenzo, CFA, CPA Behavioral Research Associates, LLC Suzanne Shu, Ph.D. UCLA Anderson School of Management Liat Hadar, Ph.D. The Arison School of Business,

### Factor B: Curriculum New Math Control Curriculum (B (B 1 ) Overall Mean (marginal) Females (A 1 ) Factor A: Gender Males (A 2) X 21

1 Factorial ANOVA The ANOVA designs we have dealt with up to this point, known as simple ANOVA or oneway ANOVA, had only one independent grouping variable or factor. However, oftentimes a researcher has

### Missing Data: Patterns, Mechanisms & Prevention. Edith de Leeuw

Missing Data: Patterns, Mechanisms & Prevention Edith de Leeuw Thema middag Nonresponse en Missing Data, Universiteit Groningen, 30 Maart 2006 Item-Nonresponse Pattern General pattern: various variables

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

Methods Report A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Hrishikesh Chakraborty and Hong Gu March 9 RTI Press About the Author Hrishikesh Chakraborty,

### 1. Longitudinal data: negative correlation within-person example

Lecture 17 1. Longitudinal data: negative correlation within-person example 2. Mixed model example: assay plates 1 Negative correlation within-person Study to compare bioimpedance methods of estimating

### In almost any research you perform, there is the potential for missing or

SIX DEALING WITH MISSING OR INCOMPLETE DATA Debunking the Myth of Emptiness In almost any research you perform, there is the potential for missing or incomplete data. Missing data can occur for many reasons:

### 4 Other useful features on the course web page. 5 Accessing SAS

1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu

### Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

### Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina

Paper PO-21 Permuted-block randomization with varying block sizes using SAS Proc Plan Lei Li, RTI International, RTP, North Carolina ABSTRACT Permuted-block randomization with varying block sizes using

### Missing Data in Longitudinal Studies

Missing Data in Longitudinal Studies Hedeker D & Gibbons RD (1997). Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods, 2, 64-78. Chapter

### EXPECTED MEAN SQUARES AND MIXED MODEL ANALYSES. This will become more important later in the course when we discuss interactions.

EXPECTED MEN SQURES ND MIXED MODEL NLYSES Fixed vs. Random Effects The choice of labeling a factor as a fixed or random effect will affect how you will make the F-test. This will become more important

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

### Craig K. Enders Arizona State University Department of Psychology craig.enders@asu.edu

Craig K. Enders Arizona State University Department of Psychology craig.enders@asu.edu Topic Page Missing Data Patterns And Missing Data Mechanisms 1 Traditional Missing Data Techniques 7 Maximum Likelihood

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### 12-1 Multiple Linear Regression Models

12-1.1 Introduction Many applications of regression analysis involve situations in which there are more than one regressor variable. A regression model that contains more than one regressor variable is

### New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

### Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### Assignments Analysis of Longitudinal data: a multilevel approach

Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans

### Missing Data Part 1: Overview, Traditional Methods Page 1

Missing Data Part 1: Overview, Traditional Methods Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 17, 2015 This discussion borrows heavily from: Applied

### PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku,

PATTERN MIXTURE MODELS FOR MISSING DATA Mike Kenward London School of Hygiene and Tropical Medicine Talk at the University of Turku, April 10th 2012 1 / 90 CONTENTS 1 Examples 2 Modelling Incomplete Data

### Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia

Statistical Analysis The First Steps Jennifer L. Waller Medical College of Georgia, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what data look like before more rigorous

### Bridging Statistical Analysis Plan and ADaM Datasets and Metadata for Submission

, October 24-26, 2012, San Francisco, USA Bridging Statistical Analysis Plan and ADaM Datasets and Metadata for Submission Abstract In this article, the relationship between the Statistical Analysis Plan

### The general form of the PROC MEANS statement is

Describing Your Data Using PROC MEANS PROC MEANS can be used to compute various univariate descriptive statistics for specified variables including the number of observations, mean, standard deviation,

### Data Cleaning and Missing Data Analysis

Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore

### How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power

How To Use A Monte Carlo Study To Decide On Sample Size and Determine Power Linda K. Muthén Muthén & Muthén 11965 Venice Blvd., Suite 407 Los Angeles, CA 90066 Telephone: (310) 391-9971 Fax: (310) 391-8971

### Generating Randomization Schedules Using SAS Programming Chunqin Deng and Julia Graz, PPD, Inc., Research Triangle Park, North Carolina

Paper 267-27 Generating Randomization Schedules Using SAS Programming Chunqin Deng and Julia Graz, PPD, Inc., Research Triangle Park, North Carolina ABSTRACT Randomization as a method of experimental control

### How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia

Paper HW-05 How to Perform and Interpret Chi-Square and T-Tests Jennifer L. Waller, Georgia Health Sciences University, Augusta, Georgia ABSTRACT For both statisticians and non-statisticians, knowing what

### Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### IBM SPSS Missing Values 20

IBM SPSS Missing Values 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 87. This edition applies to IBM SPSS Statistics 20 and to all