# Missing data and net survival analysis Bernard Rachet

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, July 2015 Missing data and net survival analysis Bernard Rachet

2 General context Population-based, routine data Cancer registry data Clinical data tumour, treatment, comorbidity Cancer survival and roles played by patient, tumour and healthcare factors (very) large data sets, but incomplete information, which we have handled using multiple imputation procedure with Rubin s rules

3 Preliminary results of on-going work

4 Multiple imputation procedure Under Missing At Random (MAR) assumption 1. Impute the missing data from f data sets Y M Y O to give K complete 2. Fit the substantive model to each of the K data sets, to obtain K estimates of the parameters and estimates of their variance 3. Combine them using Rubin s rules

5 Multiple imputation steps Imputation Analysis Pooling Incomplete data Final results K completed data sets K analysis results

6 Pooling K estimates Rubin s rules Given K completed data sets, there are: K estimates with variance ˆk,k 2 ˆ k 1,...,K,k 1,...,K Pooled estimate Total variance ˆ ˆ V MI MI Wˆ 1 ˆ K 1 k within-imputation variance between-imputation variance K (1 k 1 K )Bˆ Wˆ Bˆ 1 K 1 K -1 K k 1 K k 1 2 k ( ˆ ˆ k MI 2 )

7 Multiple imputation procedure Congeniality 1. Imputation model congenial with substantive model 2. Given the substantive model from f Y X, f Y X g X is a congenial imputation model if both f and g are correctly specified 3. Valid inference (under MAR) if f Y X g X (approximately) represents data structure and substantive model

8 Concepts and measures of interest Aims Concepts Prognosis of a cancer and impact at population level Excess hazard Excess hazard ratio Net survival Crude probabilities of death from cancer and other causes Relative survival data setting Population-based data Expected mortality hazard from life tables By single year age and sex, and calendar year, geography, deprivation

9 Nur et al, Settings Population-based cohort of colorectal cancer patients Complete information on age, sex, follow-up time, vital status, deprivation, comorbidity, surgical treatment Tumour stage, morphology and grade: 45% incomplete data Relative survival data setting λ x = λ P x + exp xβ Substantive model: generalised linear model (Dickman et al, Stat Med 2005) Link function log μ j d Pj = log y j + xβ d j ~Poisson μ j ; μ j = λ j y j ; y j person-time at risk d Pj expected number of deaths life tables Excess hazard ratio (+ Ederer-2 relative survival) Offset

10 Data description Variable Stage Patients Category No. % I II III IV Missing (39.5) Missing information associated with: Older ages More deprived categories Less treatment with curative intent Higher probability of death Morphology Adenocarcinoma Mucinous and serous Other Neoplasm, NOS (11.6) Grade I II III/IV Missing (25.0)

11 Missing information in several variables Multiple imputation using Full Conditional Specification (chained equations van Buuren, 1999) Same basic assumptions than in multiple imputation Assumes a joint (multivariate) distribution exists without specifying its form f Y, Y,..., Y f Y Y,..., Y i,1 i,2 i, p i, p i,1 i, p 1 f Y Y,..., Y... f Y Y f Y i, p 1 i,1 i, p 2 i,2 i,1 i,1 Imputation model (joint model for the data) Gibbs sampler to: 1. Estimate the parameters in the joint imputation model 2. Impute the missing data Y ~ N β, Ω Multivariate problem split into a series of univariate problems

12 Imputation models Outcomes Ordinal regression for stage and grade Polytomous regression for morphology Covariables Other two covariables with incomplete information Sex, age, deprivation, comorbidity, treatment, cancer site Vital status Follow-up time (years): piecewise function (0, 0.5, 1, 2, 3, 4, 5, 5+) Time-dependent effects (categorical) for deprivation and age Substantive (excess hazard) model includes all these variables (binary) time-dependent effects

13 Results Variable Stage Patients Data after imputation Category No. % % I II III IV Missing (39.5) Missing information associated with: Older ages More deprived categories Less treatment with curative intent Higher probability of death Morphology Adenocarcinoma Mucinous and serous Other Neoplasm, NOS (11.6) Grade I II III/IV Missing (25.0)

14 Results Complete-case analysis ( cases) Five years** First year Second to fifth years Period since diagnosis over which EHR was estimated Multiple imputation ( cases) Five years** First year Second to fifth years EHR 95% CI EHR 95% CI EHR 95% CI EHR 95% CI EHR 95% CI EHR 95% CI I II III IV Missing 15 to to to to to to Other results Indicator approach Systematically underestimates variance of EHRs Overestimates EHRs for tumour morphology Underestimates EHRs for age and deprivation Does not identify time-dependent effects

15 Stage-specific survival Before imputation After imputation Relative survival (%) I II III IV missing Years since diagnosis 0 I II III IV Years since diagnosis

16 Limitations Tutorial paper no systematic evaluation Relatively simple substantive model piecewise model categorical variables Further recent methodological developments in: multiple imputation net survival, flexible modelling More systematic evaluation simulations

17 Concepts and measures of interest Excess hazard λ E t = λ O t λ P t λ O t dt = dnw t ; λ Y W t P t dt = i=1 n Net survival S E t = e 0 Crude mortality F C t = 0 W t = 1 S Pi t t λe u du t S O u λ E u du Yi W t λpi t Y W t Expected probability of surviving up to t

18 Modelling approach Flexible multivariable excess hazard model Excess hazard Time-dependent and non-linear effects (splines) Variables affecting both mortality processes (cancer and other causes of death) included in the model Net survival is the mean of individual net survival functions predicted by the model

19 Multiple imputation procedure Congeniality 1. Imputation model congenial with substantive model 2. Given the substantive model from f Y X, f Y X g X is a congenial imputation model if both f and g are correctly specified 3. Valid inference (under MAR) if f Y X g X (approximately) represents data structure and substantive model 4. Problematic within net survival setting and with nonlinear and time-dependent effects

20 Falcaro et al, 2015 Study settings Data 44,461 men diagnosed with a colorectal cancer in , followed up to 2009 Age at diagnosis (continuous), tumour stage (4 categories), deprivation (5 categories) Missing stage: 30% MCAR logit Pr MAR on X logit Pr MAR logit Pr R i = 1 Z i = δ 0 R i = 1 Z i = α 0 + α 1 (age i 60) R i = 1 Z i = γ 0 + γ 1 (age i 60) + γ 2 T i + γ 3 D i R = 1 if stage missing 100 simulated data sets per scenario

21 Distribution on fully observed data and empirical expected distribution in remaining complete records

22 Substantive model Flexible log cumulative excess hazard model ln Λ E t x i = s 1 ln t ; γ 1, k 1 + β x i + s 2 age i ; γ 2, k 2 Flexible functions: restricted cubic splines Baseline excess hazard: 5 df, 4 internal knots and 2 boundary knots Age (continuous): 3 df, 2 internal knots Covariables: deprivation and stage Aims: estimate effect of stage (log EHR) and stage-specific net survival at 1, 5 and 10 years since diagnosis

23 Imputation models Outcome (stage) Ordinal or multinomial logistic regression Covariables Survival time and log(survival time) or Nelson-Aalen estimate of the cumulative hazard Event indicator Age splines defined as in the substantive model Deprivation dummy variables 30 imputations Net survival: Rubin s rules applied on log log S E t to obtain approximate normality, then back-transformed

24 Multiple imputation strategy Multiple Imputation Strategy Functional Form How Survival Is Modeled in the Imputation MI_ologit_surv Ordinal logistic Survival time and log survival time MI_ologit_na Ordinal logistic Nelson-Aalen estimate of cumulative hazard MI_mlogit_surv Multinomial logistic Survival time and log survival time MI_mlogit_na Multinomial logistic Nelson-Aalen estimate of cumulative hazard

25 Results Bias in log excess hazard ratio estimates for stage (reference stage 1), 100 replications Poor results with ordered logit even under MCAR scenario

26 Stage-specific net survival at 1 year, 100 replications

27 Results Bias in stage-specific net survival estimates at 1 year, 100 replications

28 Comments Promising results despite that the parameter estimated in the substantive model (here excess hazard) does not correspond to the final outcome of interest (net survival) Limitations No time-dependent effects of stage Which joint model? Which variables in the imputation models? Vital status Nelson-Aalen estimates of cumulative hazard Interactions with time since diagnosis (age at diagnosis, deprivation ) Other relevant interactions (tumour stage, region ) other factors (treatment variables, co-morbidities, hospital volume, surgeon s experience )

29 Limitations and challenges: preliminary study Simulated data set colon cancer, 12,048 men followed up at least 5 years Baseline excess hazard: 5 df, 4 internal knots Covariables: stage, deprivation, age Time-dependent effects of stage: 2 df, 1 internal knot for each higher stage Non-linear effects of age: 3 df, 2 internal knots Substantive model ln Λ E t x i = s 1 ln t ; γ 1, k 1 + β x i + s 2 age i ; γ 2, k 2 + s 3j stage j t ; γ 3, k 3 Missing stage simulated as in previous example 100 data sets per scenario, with 30% missing stage Focus on MAR here

30 Limitations and challenges: preliminary study Time (year) Net Survival function Complete MAR Stage Simulation of missingness mechanisms as in previous example Same imputation model was applied (multinomial, Nelson-Aalen)

31 Results Excess hazard ratios for stage 3.5 Tumour stage 2 (reference stage 1) True EHR Complete-case EHRs Imputed EHRs Time since diagnosis (years)

32 Results Excess hazard ratios for stage Tumour stage 3 (reference stage 1) True EHR Complete-case EHRs Imputed EHRs Time since diagnosis (years)

33 Results Excess hazard ratios for stage Tumour stage 4 (reference stage 1) True EHR Complete-case EHRs Imputed EHRs Time since diagnosis (years)

34 Results Stage-specific net survival 1 Tumour stage Time since diagnosis (years)

35 Results Stage-specific net survival 1 Tumour stage Time since diagnosis (years)

36 Results Stage-specific net survival 1 Tumour stage Time since diagnosis (years)

37 Results Stage-specific net survival 1 Tumour stage Time since diagnosis (years)

38 Conclusion and development Why MI? Strength: clear division between imputation and analysis stages both efficiency and MAR plausibility increased Challenge: incompatibility between imputation and substantive models asymptotically biased estimates Define joint model for flexible excess hazard models Multiple imputation by fully conditional specification with substantive model compatible algorithm (SMC-FCS) Bartlett JW et al. Statistical Methods in Medical Research 2015

39 References Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: John Wiley & Sons; Van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 1999; 18: White IR, Royston P. Imputing missing covariate values for the Cox model. Stat Med 2009; 28: Nur U, Shack LG, Rachet B, Carpenter JR, Coleman MP. Modelling relative survival in the presence of incomplete data: a tutorial. Int J Epidemiol 2010; 39: Carpenter JR, Kenward MG. Multiple imputation and its application. Chichester: John Wiley & Sons; Falcaro M, Nur U, Rachet B, Carpenter JR. Estimating excess hazard ratios and net survival when covariate data are missing: strategies for multiple imputation. Epidemiology 2015; 26: Bartlett JW, Seaman SR, White IR, Carpenter JR. Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Methods Med Res 2015; 24:

### Dealing with Missing Data

Dealing with Missing Data Roch Giorgi email: roch.giorgi@univ-amu.fr UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University

Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple

### Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

### Missing Data Dr Eleni Matechou

1 Statistical Methods Principles Missing Data Dr Eleni Matechou matechou@stats.ox.ac.uk References: R.J.A. Little and D.B. Rubin 2nd edition Statistical Analysis with Missing Data J.L. Schafer and J.W.

### SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

### Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

### Prognosis of survival for breast cancer patients

Prognosis of survival for breast cancer patients Ken Ryder Breast Cancer Unit Data Section Guy s Hospital Patrick Royston, MRC Clinical Trials Unit London Outline Introduce the data and outcomes requested

### Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Statistical modelling with missing data using multiple imputation Session 4: Sensitivity Analysis after Multiple Imputation James Carpenter London School of Hygiene & Tropical Medicine Email: james.carpenter@lshtm.ac.uk

### Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

### Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

### Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

### Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know

### Big data size isn t enough! Irene Petersen, PhD Primary Care & Population Health

Big data size isn t enough! Irene Petersen, PhD Primary Care & Population Health Introduction Reader (Statistics and Epidemiology) Research team epidemiologists/statisticians/phd students Primary care

### Relative survival an introduction and recent developments

Relative survival an introduction and recent developments Paul W. Dickman Department of Medical Epidemiology and Biostatistics Karolinska Institutet, Stockholm, Sweden paul.dickman@ki.se 11 December 2008

### An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.

An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao

### How to choose an analysis to handle missing data in longitudinal observational studies

How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:

### Social inequalities impacts of care management and survival in patients with non-hodgkin lymphomas (ISO-LYMPH)

Session 3 : Epidemiology and public health Social inequalities impacts of care management and survival in patients with non-hodgkin lymphomas (ISO-LYMPH) Le Guyader-Peyrou Sandra Bergonie Institut Context:

### Dealing with missing data: Key assumptions and methods for applied analysis

Technical Report No. 4 May 6, 2013 Dealing with missing data: Key assumptions and methods for applied analysis Marina Soley-Bori msoley@bu.edu This paper was published in fulfillment of the requirements

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Imputation of missing data under missing not at random assumption & sensitivity analysis

Imputation of missing data under missing not at random assumption & sensitivity analysis S. Jolani Department of Methodology and Statistics, Utrecht University, the Netherlands Advanced Multiple Imputation,

### 13. Poisson Regression Analysis

136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### Development and validation of a prediction model with missing predictor data: a practical approach

Journal of Clinical Epidemiology 63 (2010) 205e214 Development and validation of a prediction model with missing predictor data: a practical approach Yvonne Vergouwe a, *, Patrick Royston b, Karel G.M.

### Item Imputation Without Specifying Scale Structure

Original Article Item Imputation Without Specifying Scale Structure Stef van Buuren TNO Quality of Life, Leiden, The Netherlands University of Utrecht, The Netherlands Abstract. Imputation of incomplete

### Sensitivity Analysis in Multiple Imputation for Missing Data

Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

### Analyzing Structural Equation Models With Missing Data

Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.

### Dealing with Missing Data

Res. Lett. Inf. Math. Sci. (2002) 3, 153-160 Available online at http://www.massey.ac.nz/~wwiims/research/letters/ Dealing with Missing Data Judi Scheffer I.I.M.S. Quad A, Massey University, P.O. Box 102904

### Missing Data Sensitivity Analysis of a Continuous Endpoint An Example from a Recent Submission

Missing Data Sensitivity Analysis of a Continuous Endpoint An Example from a Recent Submission Arno Fritsch Clinical Statistics Europe, Bayer November 21, 2014 ASA NJ Chapter / Bayer Workshop, Whippany

### Modern Methods for Missing Data

Modern Methods for Missing Data Paul D. Allison, Ph.D. Statistical Horizons LLC www.statisticalhorizons.com 1 Introduction Missing data problems are nearly universal in statistical practice. Last 25 years

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

### A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values

Methods Report A Mixed Model Approach for Intent-to-Treat Analysis in Longitudinal Clinical Trials with Missing Values Hrishikesh Chakraborty and Hong Gu March 9 RTI Press About the Author Hrishikesh Chakraborty,

### The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1

### PATTERN MIXTURE MODELS FOR MISSING DATA. Mike Kenward. London School of Hygiene and Tropical Medicine. Talk at the University of Turku,

PATTERN MIXTURE MODELS FOR MISSING DATA Mike Kenward London School of Hygiene and Tropical Medicine Talk at the University of Turku, April 10th 2012 1 / 90 CONTENTS 1 Examples 2 Modelling Incomplete Data

### Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015

1 Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015 Instructor: Joanne M. Garrett, PhD e-mail: joanne_garrett@med.unc.edu Class Notes: Copies of the class lecture slides

### Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

### Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done?

272 Occup Environ Med 1998;55:272 277 Prevalence odds ratio or prevalence ratio in the analysis of cross sectional data: what is to be done? Mary Lou Thompson, J E Myers, D Kriebel Department of Biostatistics,

### Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Programme du parcours Clinical Epidemiology 2014-2015. UMR 1. Methods in therapeutic evaluation A Dechartres/A Flahault

Programme du parcours Clinical Epidemiology 2014-2015 UR 1. ethods in therapeutic evaluation A /A Date cours Horaires 15/10/2014 14-17h General principal of therapeutic evaluation (1) 22/10/2014 14-17h

### Randomized trials versus observational studies

Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James

### Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

### A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY Ramon Alemany Montserrat Guillén Xavier Piulachs Lozada Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter

### A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

123 Kwantitatieve Methoden (1999), 62, 123-138. A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA Joop J. Hox 1 ABSTRACT. When we deal with a large data set with missing data, we have to undertake

Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

### EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA A CASE STUDY EXAMINING RISK FACTORS AND COSTS OF UNCONTROLLED HYPERTENSION ISPOR 2013 WORKSHOP

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Imputation Methods to Deal with Missing Values when Data Mining Trauma Injury Data

Imputation Methods to Deal with Missing Values when Data Mining Trauma Injury Data Kay I Penny Centre for Mathematics and Statistics, Napier University, Craiglockhart Campus, Edinburgh, EH14 1DJ k.penny@napier.ac.uk

### Guide to Biostatistics

MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### HCUP Methods Series Missing Data Methods for the NIS and the SID Report # 2015-01

HCUP Methods Series Contact Information: Healthcare Cost and Utilization Project (HCUP) Agency for Healthcare Research and Quality 540 Gaither Road Rockville, MD 20850 http://www.hcup-us.ahrq.gov For Technical

### The point estimate you choose depends on the nature of the outcome of interest odds ratio hazard ratio

Point Estimation Definition: A point estimate is a onenumber summary of data. If you had just one number to summarize the inference from your study.. Examples: Dose finding trials: MTD (maximum tolerable

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### Health 2011 Survey: An overview of the design, missing data and statistical analyses examples

Health 2011 Survey: An overview of the design, missing data and statistical analyses examples Tommi Härkänen Department of Health, Functional Capacity and Welfare The National Institute for Health and

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey

Re-analysis using Inverse Probability Weighting and Multiple Imputation of Data from the Southampton Women s Survey MRC Biostatistics Unit Institute of Public Health Forvie Site Robinson Way Cambridge

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach

Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach Abhijit Kanjilal Fractal Analytics Ltd. Abstract: In the analytics industry today, logistic regression is

### Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Exam C, Fall 2006 PRELIMINARY ANSWER KEY Question # Answer Question # Answer 1 E 19 B 2 D 20 D 3 B 21 A 4 C 22 A 5 A 23 E 6 D 24 E 7 B 25 D 8 C 26 A 9 E 27 C 10 D 28 C 11 E 29 C 12 B 30 B 13 C 31 C 14

### III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

III. INTRODUCTION TO LOGISTIC REGRESSION 1. Simple Logistic Regression a) Example: APACHE II Score and Mortality in Sepsis The following figure shows 30 day mortality in a sample of septic patients as

### The Basics of Regression Analysis. for TIPPS. Lehana Thabane. What does correlation measure? Correlation is a measure of strength, not causation!

The Purpose of Regression Modeling The Basics of Regression Analysis for TIPPS Lehana Thabane To verify the association or relationship between a single variable and one or more explanatory One explanatory

### Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh

Using Medical Research Data to Motivate Methodology Development among Undergraduates in SIBS Pittsburgh Megan Marron and Abdus Wahed Graduate School of Public Health Outline My Experience Motivation for

### Survival Analysis Using SPSS. By Hui Bian Office for Faculty Excellence

Survival Analysis Using SPSS By Hui Bian Office for Faculty Excellence Survival analysis What is survival analysis Event history analysis Time series analysis When use survival analysis Research interest

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Sun Li Centre for Academic Computing lsun@smu.edu.sg

Sun Li Centre for Academic Computing lsun@smu.edu.sg Elementary Data Analysis Group Comparison & One-way ANOVA Non-parametric Tests Correlations General Linear Regression Logistic Models Binary Logistic

### R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Faculty of Health Sciences R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models Inference & application to prediction of kidney graft failure Paul Blanche joint work with M-C.

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### Longitudinal Data Analysis. Wiley Series in Probability and Statistics

Brochure More information from http://www.researchandmarkets.com/reports/2172736/ Longitudinal Data Analysis. Wiley Series in Probability and Statistics Description: Longitudinal data analysis for biomedical

### The CRM for ordinal and multivariate outcomes. Elizabeth Garrett-Mayer, PhD Emily Van Meter

The CRM for ordinal and multivariate outcomes Elizabeth Garrett-Mayer, PhD Emily Van Meter Hollings Cancer Center Medical University of South Carolina Outline Part 1: Ordinal toxicity model Part 2: Efficacy

### Introduction to Analysis Methods for Longitudinal/Clustered Data, Part 3: Generalized Estimating Equations

Introduction to Analysis Methods for Longitudinal/Clustered Data, Part 3: Generalized Estimating Equations Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Goa, India,

### Checking proportionality for Cox s regression model

Checking proportionality for Cox s regression model by Hui Hong Zhang Thesis for the degree of Master of Science (Master i Modellering og dataanalyse) Department of Mathematics Faculty of Mathematics and

### Missing data are ubiquitous in clinical research.

Advanced Statistics: Missing Data in Clinical Research Part 1: An Introduction and Conceptual Framework Jason S. Haukoos, MD, MS, Craig D. Newgard, MD, MPH Abstract Missing data are commonly encountered

### Incorrect Analyses of Radiation and Mesothelioma in the U.S. Transuranium and Uranium Registries Joey Zhou, Ph.D.

Incorrect Analyses of Radiation and Mesothelioma in the U.S. Transuranium and Uranium Registries Joey Zhou, Ph.D. At the Annual Meeting of the Health Physics Society July 15, 2014 in Baltimore A recently

### SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

### Travel Distance to Healthcare Centers is Associated with Advanced Colon Cancer at Presentation

Travel Distance to Healthcare Centers is Associated with Advanced Colon Cancer at Presentation Yan Xing, MD, PhD, Ryaz B. Chagpar, MD, MS, Y Nancy You MD, MHSc, Yi Ju Chiang, MSPH, Barry W. Feig, MD, George

### 7.1 The Hazard and Survival Functions

Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

### Missing data in randomized controlled trials (RCTs) can

EVALUATION TECHNICAL ASSISTANCE BRIEF for OAH & ACYF Teenage Pregnancy Prevention Grantees May 2013 Brief 3 Coping with Missing Data in Randomized Controlled Trials Missing data in randomized controlled

### Sampling Error Estimation in Design-Based Analysis of the PSID Data

Technical Series Paper #11-05 Sampling Error Estimation in Design-Based Analysis of the PSID Data Steven G. Heeringa, Patricia A. Berglund, Azam Khan Survey Research Center, Institute for Social Research

### Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS Philip Merrigan ESG-UQAM, CIRPÉE Using Big Data to Study Development and Social Change, Concordia University, November 2103 Intro Longitudinal

### Analysis of Longitudinal Data with Missing Values.

Analysis of Longitudinal Data with Missing Values. Methods and Applications in Medical Statistics. Ingrid Garli Dragset Master of Science in Physics and Mathematics Submission date: June 2009 Supervisor:

### Komorbide brystkræftpatienter kan de tåle behandling? Et registerstudie baseret på Danish Breast Cancer Cooperative Group

Komorbide brystkræftpatienter kan de tåle behandling? Et registerstudie baseret på Danish Breast Cancer Cooperative Group Lotte Holm Land MD, ph.d. Onkologisk Afd. R. OUH Kræft og komorbiditet - alle skal

### Probability Calculator

Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Multiply imputing missing values in data sets with. generalised linear models

Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models Min Lee Robin Mitra School of Mathematics University of Southampton, Southampton,

### Portfolio Using Queuing Theory

Modeling the Number of Insured Households in an Insurance Portfolio Using Queuing Theory Jean-Philippe Boucher and Guillaume Couture-Piché December 8, 2015 Quantact / Département de mathématiques, UQAM.

### An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX

An Introduction to Generalized Linear Mixed Models Using SAS PROC GLIMMIX Phil Gibbs Advanced Analytics Manager SAS Technical Support November 22, 2008 UC Riverside What We Will Cover Today What is PROC

### Multilevel Modelling of medical data

Statistics in Medicine(00). To appear. Multilevel Modelling of medical data By Harvey Goldstein William Browne And Jon Rasbash Institute of Education, University of London 1 Summary This tutorial presents

### Goodness of fit assessment of item response theory models

Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing

### Missing values in data analysis: Ignore or Impute?

ORIGINAL ARTICLE Missing values in data analysis: Ignore or Impute? Ng Chong Guan 1, Muhamad Saiful Bahri Yusoff 2 1 Department of Psychological Medicine, Faculty of Medicine, University Malaya 2 Medical