# Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Save this PDF as:

Size: px
Start display at page:

Download "Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models"

## Transcription

1 Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear Models Generalized Estimating Equations (GEE) 4 Transition Models 5 Further Topics Time-Varying Covariates Other Topics

2 Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear Models Generalized Estimating Equations (GEE) 4 Transition Models 5 Further Topics Time-Varying Covariates Other Topics

3 Time-Varying Covariates Types of Covariates There are two types of covariates: Time-stationary covariates: For example gender race treatment (if fixed for the whole study period) study center... Time-varying covariates: For example age dietary intake bloodmarker air pollution treatment in certain studies (crossover studies, observational studies)... Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

4 Time-Varying Covariates Types of Time-Varying Covariates There are also two types of time-varying covariates: Fixed by study design: For example treatment in a crossover study time since baseline (when measurement times fixed by study design) Stochastic covariates, which vary randomly over time: For example blood pressure dietary intake air pollution exposure blood marker When a covariate is time-varying and stochastic, we may encounter new issues in its interpretation as well as in estimation of regression parameters. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

5 Time-Varying Covariates Time-Varying Covariates In our models, we assumed a relationship for the mean g(e(y ij X ij )) = X ijβ for a known link function g( ). Implicitly, we are assuming that the conditional mean of Y ij given X i1,..., X ini only depends on X ij, E(Y ij X i ) = E(Y ij X i1,..., X ini ) = E(Y ij X ij ). This is of course true for time-invariant variables. It may not hold for time-varying stochastic covariates, however. In this case, preceding or subsequent values of the time-varying covariate can confound the relationship between Y ij and X ij, and can lead to biased estimates of parameters of interest. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

6 Time-Varying Covariates Time-Varying Covariates Example: Longitudinal study on the effects of physical activity in reducing blood glucose levels in patients with type 2 diabetes. If patients with high blood glucose levels at one visit increase their physical activity subsequently (feeling guilty! - feedback mechanism), while other patients do not, we will underestimate the relationship between physical activity and blood glucose level. Here: the current value of Y ij, given X ij, predicts X ij+1, and thus E(Y ij X i1,..., X ini ) E(Y ij X ij ). Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

7 Time-Varying Covariates External Covariates A covariate is called exogenous or external when [X i,j+1 X i1,..., X ij, Y i1,..., Y ij ] = [X i,j+1 X i1,..., X ij ]. Otherwise, the covariate is called internal or endogenous. An example of an external covariate is ambient air pollution. While ambient air pollution is time-varying and stochastic, future values conditional on past values are not predicted by the health outcome studied. However, personal exposure to air pollution would not be an external covariate if study subjects with poor health outcomes decided to stay indoors to avoid exposure to high levels of air pollution. For an external covariate, E(Y ij X i ) = E(Y ij X i1,..., X ini ) = E(Y ij X i1,..., X ij ). Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

8 Time-Varying Covariates External Covariates When variables are external, we can focus on specifying a model for [Y ij X i1,..., X ij ]. Possible models include concurrent, model E(Y ij X ij ) lagged, model E(Y ij X i,j k ) for some k j cumulative, model E(Y ij X ik ) k=1 distributed lags, regression coefficients for X ij,..., X i,j k follow some pre-specified structure (e.g. polynomial) However, be aware that e.g. modeling E(Y ij X ij ), while Y ij depends on both X ij and X i,j 1, can still give misleading results. When variables are internal, we have to think both about meaningful targets of inference and valid methods of inference. Methods include causal inference, and modeling of the joint process {Y j, X j }. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

9 Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear Models Generalized Estimating Equations (GEE) 4 Transition Models 5 Further Topics Time-Varying Covariates Other Topics

10 Missing data in longitudinal studies is a very common phenomenon. Data is missing if a measurement that was intended to be taken is not taken, or not available for another reason. An important question always is why the measurements are missing. For example The lab assistant accidentally destroyed the sample to be measured. We probably would not be worried and would simply analyze the available measurements. The values are missing because they were below the limit of detection (censored). In this case, leaving out the missing values could mask an important effect. The values are missing because the subjects did not show up for their scheduled visits. In this case, we might worry about the tendency of sicker subjects with higher/lower values to have missing data. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

11 Missing data patterns: Dropout / loss-to-follow-up: Whenever Y ij is missing, so are Y ik for all k j. Intermittent missing values Missing data mechanisms: (Rubin, Biometrika, 1976) Let Y indicate the complete data, with Y obs the observed part and Y mis the missing part of Y. Define the missing data indicator R ij = 1 if Y ij is missing, and = 0 otherwise. (Note: sometimes R is defined the other way around.) Missing completely at random (MCAR): p(r Y) = p(r), so that the observed data are a completely random sample of the complete data Missing at random (MAR): p(r Y) = p(r Y obs ), so that the missing data mechanism does not depend on the actual missing values Not missing at random (NMAR): p(r Y) depends on Y mis, so that whether or not an observation is observed depends on the quantities that we were not able to observe. (Also called informative missingness.) Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

12 Intermittent: Censoring: e.g. all values below a threshold (limit of detection, for example) are missing. In this case, the EM algorithm is a possibility (Dempster, Laird & Rubin, JRSS-B, 1977). Other cases: For intermittent missing values, the reason is often known, as subjects remain in the study. One can then find out whether an MCAR or MAR assumption is reasonable. Dropouts: For dropout, we usually have to suspect a relation between the dropout and the measurement process. Examples: Ethical considerations require a patient to be withdrawn from the study if their condition is not adequately controlled by the treatment they are on. MAR Patients stop showing up for visits because they are feeling too sick, which would be reflected in their measured Y value if it could be obtained. NMAR Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

13 Simple commonly used strategies to deal with dropout 1 Last observation carried forward: In the simplest case, if y ij is the last observed value, y ik is set to y ij for all k j. A refinement is to fit a time-trend and to extrapolate that. 2 Complete case analysis: Non-completers are completely deleted. This is wasteful of data and has the potential to introduce bias. Neither can be recommended in general. Testing for MCAR There are strategies to test for complete randomness of dropout by comparing at each time point the history of y ij values of people about to drop out and those not dropping out. See e.g. Diggle, Biometrics, Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

14 Likelihood-Based Inference and For likelihood-based inference, it is most important to distinguish between MCAR/MAR on the one hand, and NMAR on the other hand. The joint probability density function of (Y obs, Y mis, R) is f (y obs, y mis, r) = f (y obs, y mis )f (r y obs, y mis ). The joint pdf of the observable data then is f (y obs, r) = f (y obs, y mis )f (r y obs, y mis )dy mis MAR = f (y obs, y mis )dy mis f (r y obs ) The log-likelihood then is = f (y obs )f (r y obs ). log L = log f (y obs ) + log f (r y obs ). Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

15 Likelihood-Based Inference and The log-likelihood is log L = log f (y obs ) + log f (r y obs ). It is maximized by maximizing the two terms separately. Since the second term contains no information about [Y obs ], we can ignore it for inference on [Y obs ]. Thus, MCAR/MAR are sometimes jointly referred to as ignorable missingness. However, ignorability depends on the likelihood being the basis for inference. GEE is only valid under the stronger assumption of MCAR. if log f (y obs ) and log f (r y obs ) actually share parameters, ignoring log f (r y obs ) will result in a loss of efficiency. this approach assumes that the distribution [Y obs ] is the target of inference. Example: A clinical trial for treatment of a potentially life-threatening disease. Missingness is a result of patients death. It might be more meaningful to draw inference about the distribution of the survival time and the conditional distribution of Y obs given survival, than the unconditional distribution [Y obs ]. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

16 GEE and Advantage of GEE is that if interest is in the mean, GEE provides consistent inference if the mean model is correctly specified, even for misspecified variance matrix, and without distributional assumption. However, GEE assumes MCAR observations, otherwise we lose consistency. The basic GEE was S β (β, α) = I i=1 ( ) µi W 1 i (Y i µ β i ) = 0. Robins et al. (JASA, 1995) propose a weighted GEE for MAR, where they assume that an observed measurement is representative of missing observations from subjects with the same history y i1,..., y ij 1 and covariate values. Measurements with small probabilities are then upweighted (inverse probability weighting). Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

17 GEE and The weighted GEE then is S β (β, α) = I i=1 ( ) µi W 1 i P 1 i (Y i µ β i ) = 0, where P i is a diagonal matrix containing the probabilities p ij that subject i has not dropped out by time t ij, given the history y ij 1,..., y i1 and covariates. This method requires that we can consistently estimate the p ij, and is thus better suited for larger studies. Also, it requires a parametric model for the p ij (while there is sparse information for the dropout process), in a setting where a parametric model for the covariance structure is avoided. Scharfstein et al (JASA, 1999) propose an extension to non-ignorable dropout (NMAR), giving a range of inferences based on different parameters for informative dropout processes. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

18 Modeling the Dropout Process There are also approaches to jointly model the dropout process D and the intended measurements Y. Let D be the dropout time, Y = (Y 1,..., Y n ) the intended measurements, and Y = (Y 1,..., Y n ) the observed measurements (can be NA). Selection models factorize [Y, D] = [Y ][D Y ]. Pattern mixture models factorize [Y, D] = [D][Y D]. Random effects models assume an underlying bivariate random effect U = (U 1, U 2 ) and factorize [Y, D, U] = [Y U 1 ][D U 2 ][U]. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

19 Modeling the Dropout Process Models for informative dropout have identifiability issues, and randomness (MAR) of dropout cannot be verified from the data. Thus, analyses of longitudinal data with dropout has always to proceed with caution. However, addressing the dropout problem is often still more reasonable than to simply assume ignorability of dropout. Read more e.g. in Diggle et al (2002), or Molenberghs & Verbeke (2005). Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

20 Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear Models Generalized Estimating Equations (GEE) 4 Transition Models 5 Further Topics Time-Varying Covariates Other Topics

21 Other Topics Joint Modeling of Longitudinal and Survival Data In many longitudinal studies, we have the following data structure I observational units (subjects, animals,... ). n i observations on the ith unit, i = 1,..., I. The observations Y ij are taken at time points t ij, t i1 < < t ini. We also observe times F i to an event (e.g. death). Joint modeling: [Y, F x, θ] Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

22 Other Topics Why joint modeling? Interest could be in the time-to-event variable F, but for heavy censoring, joint modeling could improve inference about marginal distribution [F ]. Example: Randomized clinical trial (RCT) for a treatment for liver cirrhosis. F = time of death, Y = prothombin measurements, 30% survival to 96 months Interest could be in Y, but informative dropout or death could lead to misleading conclusions. Example: RCT for three schizophrenia treatments, interest in reducing PANSS score Y, but drop-out rate is treatment-dependent. Interest could be in the relationship between Y and F. Example: RCT for two heart surgeries. Y = left-ventricular-mass-index (LVMI), F = time to death. Long-term build-up of LVMI may increase hazard for fatal heart-attack interested in modeling relationship between survival and subject-level LVMI Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

23 Other Topics Joint Models for Longitudinal and Survival Data One approach to jointly model longitudinal and survival data: Random effects model: mixed model for Y proportional hazards model for λ(t) with time-dependent frailty (random effect) for time-to-event F submodels linked through shared random effects. For example (Henderson, Diggle & Dobson, Biostatistics, 2000) Y j = x 1 (t j ) β 1 + W 1 (t j ) + Z t, Z t iid N(0, σ 2 ) λ(t) = λ 0 (t) exp{x 2 (t) β 2 + W 2 (t)} W 1 (t) = z 1 (t) U 1 + V 1 (t) and W 2 (t) = z 1 (t) U 1 + V 2 (t), where U 1 and U 2 are jointly multivariate Gaussian random effects, and the bivariate Gaussian latent stochastic process W (t) has non-zero cross-covariance γ 12 (u) = Cov{W 1 (t), W 2 (t u)}. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

24 Other Topics Multivariate Longitudinal Data Often the outcome Y measured in a longitudinal study is multivariate. Examples: several blood markers or symptom scores which reflect improvement in health, vector of voxel intensities in neuroimaging. One possible way to model a multivariate outcome (here linear model): Y ijk = x ijkβ k + ε ijk, k = 1,..., K, where Y ijk is the kth outcome for the ith subject at time t ij. We would then need to specify a model for Var(ε ij ), the variance matrix for the K outcomes for subject i at time t ij Cov(ε ij, ε ij ), the covariance matrix between times t ij and t ij for subject i to obtain the Kn i Kn i variance matrix V i for ε i. Inference can then proceed as it would in the univariate longitudinal case. Robust standard errors can account for potential misspecification of V. The challenge still is to find good approximations for V i, however, so that gains in efficiency from joint modeling of the K outcomes are not lost. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

25 Other Topics Multivariate Longitudinal Data Another strategy is to use a hierarchical mixed model with random effects for both subjects and outcome items. We could assume, for example, Y ijk = x ijkβ ik + ε ijk β ik = β + δ k + b i δ k iid N(0, Dδ ) independent of b i iid N(0, Db ). This reduces the number of parameters necessary to specify V dramatically and borrows strength across items. We can proceed with REML-based inference in this linear mixed model, and can use the BLUPs for β + δ k to look at the (population average) item specific coefficients. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

26 Other Topics Sample Size Calculations To determine the required sample size of a longitudinal study, we need Type I error rate α Smallest meaningful difference to be detected d Power P Measurement variation σ 2 Number of repeated observations per subject n Correlation among repeated observations Sample size formulas can then be derived, see e.g. Diggle et al (2002). Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

27 Other Topics Sample Size Calculations For example, consider continuous responses and a comparison of two groups, Y ij = β 0A + β 1A x ij + ε ij, j = 1,..., n, i = 1,..., I, and analogous with β 0B and β 1B for group B, where x ij = x j for all i, Var(ε ij ) = σ 2 and Corr(Y ij, Y ik ) = ρ, j k. Then, for fixed known n, the number of subjects in each group I needed is I = 2(z α + z 1 P ) 2 σ 2 (1 ρ) ns 2 x d 2, with d = β 1B β 1A, z p the pth quantile of the standard normal distribution, and sx 2 = 1 n j (x j x) 2 the within-subject variance of the x j. Advanced Longitudinal Data Analysis Johns Hopkins University, Spring

### Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

### A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

### Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

### A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

Methods Report A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Hrishikesh Chakraborty and Hong Gu March 9 RTI Press About the Author Hrishikesh Chakraborty,

### A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY Ramon Alemany Montserrat Guillén Xavier Piulachs Lozada Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter

### Missing Data Dr Eleni Matechou

1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.

### Analysis of Longitudinal Data for Inference and Prediction. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data for Inference and Prediction Patrick J. Heagerty PhD Department of Biostatistics University of Washington 1 ISCB 2010 Outline of Short Course Inference How to handle stochastic

### PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku,

PATTERN MIXTURE MODELS FOR MISSING DATA Mike Kenward London School of Hygiene and Tropical Medicine Talk at the University of Turku, April 10th 2012 1 / 90 CONTENTS 1 Examples 2 Modelling Incomplete Data

### Analysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Correlated Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Heagerty, 6 Course Outline Examples of longitudinal data Correlation and weighting Exploratory data

### An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

### Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Chapter 1 Longitudinal Data Analysis 1.1 Introduction One of the most common medical research designs is a pre-post study in which a single baseline health status measurement is obtained, an intervention

### Applied Missing Data Analysis in the Health Sciences. Statistics in Practice

Brochure More information from http://www.researchandmarkets.com/reports/2741464/ Applied Missing Data Analysis in the Health Sciences. Statistics in Practice Description: A modern and practical guide

### Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator

### Two Tools for the Analysis of Longitudinal Data: Motivations, Applications and Issues

Two Tools for the Analysis of Longitudinal Data: Motivations, Applications and Issues Vern Farewell Medical Research Council Biostatistics Unit, UK Flexible Models for Longitudinal and Survival Data Warwick,

### SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

### An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.

An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao

### Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Dealing with Missing Data

Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

### Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

### Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014 Objectives Baseline Adjustment Introduce approaches Guidance

### Dealing with Missing Data

Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

### Longitudinal Data Analysis

Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

### Sensitivity analysis of longitudinal binary data with non-monotone missing values

Biostatistics (2004), 5, 4,pp. 531 544 doi: 10.1093/biostatistics/kxh006 Sensitivity analysis of longitudinal binary data with non-monotone missing values PASCAL MININI Laboratoire GlaxoSmithKline, UnitéMéthodologie

### Missing data and net survival analysis Bernard Rachet

Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,

### Analysis of Longitudinal Data with Missing Values.

Analysis of Longitudinal Data with Missing Values. Methods and Applications in Medical Statistics. Ingrid Garli Dragset Master of Science in Physics and Mathematics Submission date: June 2009 Supervisor:

### How to choose an analysis to handle missing data in longitudinal observational studies

How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS John Banasik, Jonathan Crook Credit Research Centre, University of Edinburgh Lyn Thomas University of Southampton ssm0 The Problem We wish to estimate an

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Missing Data. Paul D. Allison INTRODUCTION

4 Missing Data Paul D. Allison INTRODUCTION Missing data are ubiquitous in psychological research. By missing data, I mean data that are missing for some (but not all) variables and for some (but not all)

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

### Missing Data in Longitudinal Studies

Outline Introduction Missing Mechanism Missing Data Pattern Simple Solutions and Limitations Testing for MCAR Weighted Estimating Equations Data Example Explore Dropout Mechanism 1 Introduction Missing

### Study Design and Statistical Analysis

Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

### Modern Methods for Missing Data

Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years

### Statistical Rules of Thumb

Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

### Item Imputation Without Specifying Scale Structure

Original Article Item Imputation Without Specifying Scale Structure Stef van Buuren TNO Quality of Life, Leiden, The Netherlands University of Utrecht, The Netherlands Abstract. Imputation of incomplete

### School of Public Health and Health Services Department of Epidemiology and Biostatistics

School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated

### Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults

Journal of Data Science 7(2009), 469-485 Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults Liang Zhu, Jianguo Sun and Phillip Wood University of Missouri Abstract: Alcohol

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Parametric fractional imputation for missing data analysis

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in

### APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

### (Brief rest break when appealing) Please silence your cell phones and pagers. Invite brief questions of clarification.

The Ugly, the Bad, and the Good of Missing and Dropout Data in Analysis and Sample Size Selection Keith E. Muller Professor and Director of the Division of Biostatistics, Epidemiology and Health Policy

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### 2. Making example missing-value datasets: MCAR, MAR, and MNAR

Lecture 20 1. Types of missing values 2. Making example missing-value datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data

### CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

### Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

### Analysis of Microdata

Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2

### Primary Prevention of Cardiovascular Disease with a Mediterranean diet

Primary Prevention of Cardiovascular Disease with a Mediterranean diet Alejandro Vicente Carrillo, Brynja Ingadottir, Anne Fältström, Evelyn Lundin, Micaela Tjäderborn GROUP 2 Background The traditional

### A Short Course in Longitudinal Data Analysis

A Short Course in Longitudinal Data Analysis Peter J Diggle Nicola Reeve, Michelle Stanton (School of Health and Medicine, Lancaster University) Lancaster, June 2011 Timetable Day 1 9.00 Registration 9.30

### Organizing Your Approach to a Data Analysis

Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

### Randomized trials versus observational studies

Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Department of Epidemiology and Public Health Miller School of Medicine University of Miami

Department of Epidemiology and Public Health Miller School of Medicine University of Miami BST 630 (3 Credit Hours) Longitudinal and Multilevel Data Wednesday-Friday 9:00 10:15PM Course Location: CRB 995

### Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

### CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

CHOOSING APPROPRIATE METHODS FOR MISSING DATA IN MEDICAL RESEARCH: A DECISION ALGORITHM ON METHODS FOR MISSING DATA Hatice UENAL Institute of Epidemiology and Medical Biometry, Ulm University, Germany

### Analysis and reporting of stepped wedge randomisedcontrolledtrials:synthesisandcritical appraisalofpublishedstudies,2010to2014

Davey et al. Trials (2015) 16:358 DOI 10.1186/s13063-015-0838-3 TRIALS RESEARCH Open Access Analysis and reporting of stepped wedge randomisedcontrolledtrials:synthesisandcritical appraisalofpublishedstudies,2010to2014

### A Review of Methods for Missing Data

Educational Research and Evaluation 1380-3611/01/0704-353\$16.00 2001, Vol. 7, No. 4, pp. 353±383 # Swets & Zeitlinger A Review of Methods for Missing Data Therese D. Pigott Loyola University Chicago, Wilmette,

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Electronic Theses and Dissertations UC Riverside

Electronic Theses and Dissertations UC Riverside Peer Reviewed Title: Bayesian and Non-parametric Approaches to Missing Data Analysis Author: Yu, Yao Acceptance Date: 01 Series: UC Riverside Electronic

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### Longitudinal Data Analysis. Wiley Series in Probability and Statistics

Brochure More information from http://www.researchandmarkets.com/reports/2172736/ Longitudinal Data Analysis. Wiley Series in Probability and Statistics Description: Longitudinal data analysis for biomedical

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

ALLISON 1 TABLE OF CONTENTS 1. INTRODUCTION... 3 2. ASSUMPTIONS... 6 MISSING COMPLETELY AT RANDOM (MCAR)... 6 MISSING AT RANDOM (MAR)... 7 IGNORABLE... 8 NONIGNORABLE... 8 3. CONVENTIONAL METHODS... 10

### CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

### A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011 1 Abstract We report back on methods used for the analysis of longitudinal data covered by the course We draw lessons for

### MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)

MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS) R.KAVITHA KUMAR Department of Computer Science and Engineering Pondicherry Engineering College, Pudhucherry, India DR. R.M.CHADRASEKAR Professor,

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies

STATISTICS IN MEDICINE Statist. Med. 2004; 23:1455 1497 (DOI: 10.1002/sim.1728) TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies Joseph W. Hogan 1; ;, Jason Roy 2; and Christina Korkontzelou

### Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk

### A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada

### Missing Data Techniques for Structural Equation Modeling

Journal of Abnormal Psychology Copyright 2003 by the American Psychological Association, Inc. 2003, Vol. 112, No. 4, 545 557 0021-843X/03/\$12.00 DOI: 10.1037/0021-843X.112.4.545 Missing Data Techniques

### Bayesian Approaches to Handling Missing Data

Bayesian Approaches to Handling Missing Data Nicky Best and Alexina Mason BIAS Short Course, Jan 30, 2012 Lecture 1. Introduction to Missing Data Bayesian Missing Data Course (Lecture 1) Introduction to

### Robust Tree-based Causal Inference for Complex Ad Effectiveness Analysis

Robust Tree-based Causal Inference for Complex Ad Effectiveness Analysis Pengyuan Wang Wei Sun Dawei Yin Jian Yang Yi Chang Yahoo Labs, Sunnyvale, CA, USA Purdue University, West Lafayette, IN,USA {pengyuan,

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### CS229 Lecture notes. Andrew Ng

CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

### Analysis of Randomized Controlled Trials Peduzzi et al. Analysis of Randomized Controlled Trials

Epidemiologic Reviews Copyright 2002 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 24, No. 1 Printed in U.S.A. Analysis of Randomized Controlled Trials Peduzzi et al.

### State Space Time Series Analysis

State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

### Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88)

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### Many research questions in epidemiology are concerned. Estimation of Direct Causal Effects ORIGINAL ARTICLE

ORIGINAL ARTICLE Maya L. Petersen, Sandra E. Sinisi, and Mark J. van der Laan Abstract: Many common problems in epidemiologic and clinical research involve estimating the effect of an exposure on an outcome

### Guideline on missing data in confirmatory clinical trials

2 July 2010 EMA/CPMP/EWP/1776/99 Rev. 1 Committee for Medicinal Products for Human Use (CHMP) Guideline on missing data in confirmatory clinical trials Discussion in the Efficacy Working Party June 1999/

### Programme du parcours Clinical Epidemiology 2014-2015. UMR 1. Methods in therapeutic evaluation A Dechartres/A Flahault

Programme du parcours Clinical Epidemiology 2014-2015 UR 1. ethods in therapeutic evaluation A /A Date cours Horaires 15/10/2014 14-17h General principal of therapeutic evaluation (1) 22/10/2014 14-17h

### Revenue Management with Correlated Demand Forecasting

Revenue Management with Correlated Demand Forecasting Catalina Stefanescu Victor DeMiguel Kristin Fridgeirsdottir Stefanos Zenios 1 Introduction Many airlines are struggling to survive in today's economy.

### On Treatment of the Multivariate Missing Data

On Treatment of the Multivariate Missing Data Peter J. Foster, Ahmed M. Mami & Ali M. Bala First version: 3 September 009 Research Report No. 3, 009, Probability and Statistics Group School of Mathematics,

### Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

### The Landmark Approach: An Introduction and Application to Dynamic Prediction in Competing Risks

Landmarking and landmarking Landmarking and competing risks Discussion The Landmark Approach: An Introduction and Application to Dynamic Prediction in Competing Risks Department of Medical Statistics and

### DROPOUTS IN LONGITUDINAL STUDIES: DEFINITIONS AND MODELS

Journal of Biopharmaceutical Statistics, 10(4), 503 525 (2000) DROPOUTS IN LONGITUDINAL STUDIES: DEFINITIONS AND MODELS J. K. Lindsey Department of Biostatistics Limburgs University 3590 Diepenbeek, Belgium