Prediction for Multilevel Models



Similar documents
Multilevel Modeling of Complex Survey Data

gllamm companion for Contents

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

10 Dichotomous or binary responses

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Nominal and ordinal logistic regression

SUMAN DUVVURU STAT 567 PROJECT REPORT

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Package EstCRM. July 13, 2015

Multivariate Normal Distribution

Factors affecting online sales

Comparison of Estimation Methods for Complex Survey Data Analysis

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Statistics Graduate Courses

Introduction to mixed model and missing data issues in longitudinal studies

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Multivariate Logistic Regression

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Statistical Models in R

Introduction to Longitudinal Data Analysis

Handling attrition and non-response in longitudinal data

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Statistics 104: Section 6!

Interpretation of Somers D under four simple models

Lecture 5 Three level variance component models

Ordinal Regression. Chapter

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Generalized Linear Models

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Simple Linear Regression Inference

MTH 140 Statistics Videos

From the help desk: Swamy s random-coefficients model

Latent Class Regression Part II

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

From the help desk: Bootstrapped standard errors

Curriculum - Doctor of Philosophy

Nonlinear Regression:

Electronic Thesis and Dissertations UCLA

Applying Statistics Recommended by Regulatory Documents

Handling missing data in Stata a whirlwind tour

Sun Li Centre for Academic Computing

Poisson Models for Count Data

Goodness of fit assessment of item response theory models

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Northumberland Knowledge

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Generalized Linear Mixed Models

2013 MBA Jump Start Program. Statistics Module Part 3

HLM software has been one of the leading statistical packages for hierarchical

CREDIT SCORING MODEL APPLICATIONS:

BayesX - Software for Bayesian Inference in Structured Additive Regression

APPLIED MISSING DATA ANALYSIS

Department of Epidemiology and Public Health Miller School of Medicine University of Miami

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

II. DISTRIBUTIONS distribution normal distribution. standard scores

Biostatistics: Types of Data Analysis

Illustration (and the use of HLM)

Least Squares Estimation

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Overview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice

DATA INTERPRETATION AND STATISTICS

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Final Exam Practice Problem Answers

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

Lecture 3: Linear methods for classification

Gamma Distribution Fitting

Fitting Subject-specific Curves to Grouped Longitudinal Data

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

VI. Introduction to Logistic Regression

Imputing Missing Data using SAS

Dongfeng Li. Autumn 2010

PS 271B: Quantitative Methods II. Lecture Notes

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

430 Statistics and Financial Mathematics for Business

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Parallelization Strategies for Multicore Data Analysis

SAS Software to Fit the Generalized Linear Model

Transcription:

Prediction for Multilevel Models Sophia Rabe-Hesketh Graduate School of Education & Graduate Group in Biostatistics University of California, Berkeley Institute of Education, University of London Joint work with Anders Skrondal UK Stata Users Group meeting London, September 08. p.1

Outline Application: Abuse of antibiotics in China Three-level logistic regression model Prediction of random effects Empirical Bayes (EB) prediction Standard errors for EB prediction and approximate CI Prediction of response probabilities Conditional response probabilities Posterior mean response probabilities (different types) Population-averaged response probabilities Concluding remarks. p.2

Abuse of antibiotics in China Acute respiratory tract infection (ARI) can lead to pneumonia and death if not properly treated Inappropriate frequent use of antibiotics was common in China in 90 s, leading to drug resistance In the 90 s the WHO introduced a program of case management for children under 5 with ARI in China Data collected on 855 children i (level 1) treated by 4 doctors j (level 2) in 36 hospitals k (level 3) in two counties (one of which was in the WHO program) Response variable: Whether antibiotics were prescribed when there were no clinical indications based on medical files Reference: Min Yang (01). Multinomial Regression. In Goldstein and Leyland (Eds). Multilevel Modelling of Health Statistics, pages 7-1.. p.3

Three-level data structure Hospital Level 3: 36 hospitals k Dr. Wang Dr. Yang Dr. Ying Level 2: 4 doctors j Xiang Jiang Chou Chang Min Shu-Ying Level 1: 855 children i. p.4

Variables Response variable y ijk Antibiotics prescribed without clinical indications (1: yes, 0: no) 7 covariates x ijk Patient level i [Age] Age in years (0-4) [Temp] Body temperature, centered at 36 C [Paymed] Pay for medication (yes=1, no=0) [Selfmed] Self medication (yes=1, no=0) [Wrdiag] Failure to diagnose ARI early (yes=1, no=0) Doctor level j [DRed] Doctor s education (6 categories from self-taught to medical school) Hospital level k [WHO] Hospital in WHO program (yes=1, no=0). p.5

Three-level random intercept logistic regression Logistic regression with random intercepts for doctors and hospitals logit[pr(y ijk = 1 x ijk, ζ (2) jk, ζ(3) k )] = x ijkβ + ζ (2) jk + ζ(3) k Level 3: ζ (3) k x ijk N(0, ψ (3) ) independent across hospitals ψ (3) is residual between-hospital variance Level 2: ζ (2) jk x ijk, ζ (3) k N(0, ψ (2) ) independent across doctors, independent of ζ (3) k ψ (2) is residual between-doctor, within-hospital variance gllamm command: gllamm abuse age temp Paymed Selfmed Wrdiag DRed WHO, /// i(doc hosp) link(logit) family(binom) adapt. p.6

Maximum likelihood estimates No covariates Full model Parameter Est (SE) Est (SE) (OR) β 0 [Cons] 0.87 (0.) 1.52 (0.46) β 1 [Age] 0. (0.07) 1. β 2 [Temp] 0.72 (0.) 0.49 β 3 [Paymed] 0.38 (0.30) 1.46 β 4 [Selfmed] 0.65 (0.) 0.52 β 5 [Wrdiag] 1.97 (0.) 7.18 β 6 [DRed] 0. (0.) 0.82 β 7 [WHO] 1. (0.32) 0. ψ (2) 0. 0. ψ (3) 0.36 0. Log-likelihood 5. 4.76 using gllamm with adaptive quadrature. p.7

Distributions of random effects and responses Vector of all random intercepts for hospital k ζ k(3) (ζ (2) 1k,...,ζ(2) J k k, ζ(3) k ) Random effects distribution [Prior distribution] ϕ(ζ k(3) ), ϕ(ζ (2) jk ), ϕ(ζ(3) k ), all (multivariate) normal Conditional response distribution of all responses y k(3) for hospital k, given all covariates X k(3) and all random effects ζ k(3) for hospital k [Likelihood] f(y k(3) X k(3), ζ k(3) ) = all docs j in hosp k all patients i of doc j f(y ijk x ijk, ζ (2) jk, ζ(3) k ). p.8

Posterior distribution Use Bayes theorem to obtain posterior distribution of random effects given the data: ω(ζ k(3) y k(3),x k(3) ) = ϕ(ζ k(3) )f(y k(3) X k(3), ζ k(3) ) ϕ(ζk(3) )f(y k(3) X k(3), ζ k(3) )dζ k(3) ϕ(ζ (3) k ) j ϕ(ζ (2) jk ) i f(y ijk x ijk, ζ (2) jk, ζ(3) k ) Denominator, marginal likelihood contribution for hospital k, simplifies ϕ(ζ (3) k ) [ ϕ(ζ (2) jk ) ] f(y ijk x ijk, ζ (2) jk, ζ(3) k ) dζ(2) jk dζ (3) k j i. p.9

Empirical Bayes prediction of random effects Empirical Bayes (EB) prediction is mean of posterior distribution ζ k(3) = ζ k(3) ω(ζ k(3) y k(3),x k(3) ) dζ k(3) Standard error of EB is standard deviation of posterior distribution Using gllapred with the u option gllapred eb, u ebm1 contains ζ (2) jk ebs1 contains SE( ζ (2) jk ) ebm2 contains ζ (3) k ebs2 contains SE( ζ (3) k ) For approximately normal posterior, use Wald-type interval, e.g., for (3) hospital k, 95% CI is ζ k ± 1.96 SE( ζ (3) k ). p.

Confidence intervals for hospital random effects 2 11 11 11 11 11 11 11 11 11 11 11 11 11 4 9 3 18 18 18 18 18 18 32 32 33 33 33 33 33 33 33 33 35 35 35 35 34 34 34 34 34 34 34 34 34 34 34 36 36 36 36 1 7 30 30 30 30 30 30 30 30 30 30 30 30 8 5 31 31 31 31 31 31 31 31 31 31 31 31 31 31 6 2 1 0 1 Hospital random intercept with 95% CI 1 2 3 4 5 6 7 8 9 11 18 30 31 32 33 34 35 36 Rank of predicted hospital random intercept ζ (3) k ± 1.96 SE( ζ (3) k ) Identify the good and bad with caution. p.11

Predicted probability for patient of hypothetical doctor Predicted conditional probability for hypothetical values x 0 of the covariates and ζ 0 of the random intercepts Pr(y = 1 x 0, ζ 0 ) = exp(x0 β + ζ (2)0 + ζ (3)0 ) 1 + exp(x 0 β + ζ (2)0 + ζ (3)0 ) If ζ (2)0 + ζ (3)0 = 0, median of distribution for ζ (2) k predicted conditional probability is median probability Analogously for other percentiles Using gllapred with mu and us() option: jk + ζ(3), then replace age = 2 /* etc.: change covariates to x 0 */ generate zeta1 = 0 generate zeta2 = 0 gllapred probc, mu us(zeta). p.

Predicted probability for new patient of existing doctor in existing hospital Posterior mean probability for new patient of existing doctor j in hospital k Pr jk (y = 1 x 0 ) = Pr(y = 1 x 0, ζ k(3) ) ω(ζ k(3) y k(3),x k(3) ) dζ k(3) Invent additional patient i jk with covariate values x i jk = x 0 Make sure that invented observation does not contribute to posterior ω(ζ k(3) y k(3),x k(3) ) ω(ζ k(3) y k(3),x k(3) ) ϕ(ζ (3) k ) j ϕ(ζ (2) jk ) i =i f(y ijk x ijk, ζ (2) jk, ζ(3) k ) Cannot simply plug in EB prediction ζ k(3) for ζ k(3) Pr jk (y = 1 x 0 ) Pr(y = 1 x 0, ζ k(3) = ζ k(3) ). p.

Prediction dataset: One new patient per doctor Data (ignore gaps) id doc hosp abuse 1 1 1 0 2 1 1 1. 1 1. Data with invented observations id doc hosp abuse 1 1 1 0 2 1 1 1. 1 1. 3 2 2 0. 2 2. 4 3 2 1 5 3 2 1 3 2 2 0. 2 2. 4 3 2 1 5 3 2 1. 3 2.. 3 2. Response variable abuse must be missing for invented observations Use required value of doc Can invent several patients per doctor. p.

Prediction dataset: One new patient per doctor (continued) Data with invented observations terms for posterior id doc hosp abuse hospital doctor patient 1 1 1 0 ϕ(ζ (3) 1 ) ϕ(ζ(2) 11 ) f(y 111 ζ (3) 1, ζ(2) 11 ) 2 1 1 1 f(y 1 ζ (3) 1, ζ(2) 11 ). 1 1. 1 3 2 2 0 ϕ(ζ (3) 2 ) ϕ(ζ(2) ) f(y 3 ζ (3) 2, ζ(2) ). 2 2. 1 4 3 2 1 ϕ(ζ (2) 32 ) f(y 432 ζ (3) 2, ζ(2) 32 ) 5 3 2 1 f(y 532 ζ (3) 2, ζ(2) 32 ). 3 2. 1 Using gllapred with mu and fsample options: gllapred probd, mu fsample. p.

Predicted probability for new patient of new doctor in existing hospital Posterior mean probability for new patient of new doctor in existing hospital k Pr k (y=1 x 0 ) = Pr(y=1 x 0, ζ k(3)) ω(ζ k(3) y k(3),x k(3) ) dζ 3(k) Invent additional observation i j k with covariates in x i j k = x 0 ζ k(3) = (ζ (2) j k, ζ k(3)) Make sure that invented doctor but not invented patient contribute to posterior ω(ζ k(3) y k(3),x k(3) ) ω(ζ k(3) y k(3),x k(3) ) Prior { }} { ϕ(ζ (2) j k ) ω(ζ k(3) y k(3),x k(3) ). p.

Prediction dataset: One new doctor and patient per hospital Data (ignore gaps) id doc hosp abuse 1 1 1 0 2 1 1 1. 1 1. 3 2 2 0 4 3 2 1 5 3 2 1. 3 2. Data with invented observations id doc hosp abuse 1 1 1 0 2 1 1 1. 0 1. 3 2 2 0 4 3 2 1 5 3 2 1. 0 2. Response variable abuse must be missing for invented observations Use unique (for that hospital) value of doc Can invent several new docs which can all have the same value of doc. p.

Prediction dataset: One new doctor and patient per hospital (continued) Data with invented observations terms for posterior id doc hosp abuse hospital doctor patient 1 1 1 0 ϕ(ζ (3) 1 ) ϕ(ζ(2) 11 ) f(y 111 ζ (3) 1, ζ(2) 11 ) 2 1 1 1 f(y 1 ζ (3) 1, ζ(2) 11 ). 0 1. ϕ(ζ (2) 01 ) 1 3 2 2 0 ϕ(ζ (3) 2 ) ϕ(ζ(2) ) f(y 3 ζ (3) 2, ζ(2) ) 4 3 2 1 ϕ(ζ (2) 32 ) f(y 432 ζ (3) 2, ζ(2) 32 ) 5 3 2 1 f(y 532 ζ (3) 2, ζ(2) 32 ). 0 2. ϕ(ζ (2) 02 ) 1 Using gllapred with mu and fsample options: gllapred probh, mu fsample. p.18

Example: Predicted probability for new patient of new doctor in existing hospital Predicted posterior mean probability.2.4.6.8 1 2 3 4 5 6 Doctor s qualification Each curve represents a hospital For each hospital: 6 new doctors with [DRed] = 1, 2, 3, 4, 5, 6 For each doctor: 1 new patient with [Age] = 2, [Temp] = 1 (37 C), [Paymed] = 0, [Selfmed] = 0, [Wrdiag] = 0. p.

Example: Predicted probability for new patient of existing doctor in existing hospital 1 3 4 6 Predicted posterior mean probability.2.4.6.8.2.4.6.8.2.4.6.8 8 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Graphs by hosp Doctor s qualification of the hospitals, with curves as in previous slide Dots represent doctors with [DRed] as observed For each doctor: predicted probability for 1 new patient with [Age] = 2, [Temp] = 1, [Paymed] = 0, [Selfmed] = 0, [Wrdiag] = 0. p.

Predicted probability for new patient of new doctor in new hospital Population-averaged or marginal probability: Pr(y=1 x 0 ) = Pr(y = 1 x 0, ζ (2) jk, ζ(3) k ) ϕ(ζ(2) jk ), ϕ(ζ(3) k ) dζ(2) jk dζ(3) k Cannot plug in means of random intercepts Pr(y = 1 x 0 ) Pr(y = 1 x 0, ζ (2) jk mean median = 0, ζ(3) k = 0) Using gllapred with the mu and marg options: gllapred prob, mu marg fsample Confidence interval, by sampling parameters from the estimated asymptotic sampling distribution of their estimates ci_marg_mu lower upper, level(95) dots. p.

Illustration Cluster-specific: versus population averaged probability Probability 0.0 0.2 0.4 0.6 0.8 1.0 0 40 60 80 0 x cluster-specific (random sample) median population averaged. p.

Illustration Cluster-specific: versus population averaged probability Probability 0.0 0.2 0.4 0.6 0.8 1.0 0 40 60 80 0 x cluster-specific (random sample) median population averaged. p.

Example: Predicted probability for new patient of new doctor in new hospital Predicted population averaged probability 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 Doctor s qualification no intervention intervention Same patient covariates as before Confidence bands represent parameter uncertainty. p.

Example: Predicted probability for new patient of new doctor in new hospital Population averaged probability with 95% CI 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 Doctor s qualification no intervention intervention Same patient covariates as before Confidence bands represent parameter uncertainty. p.

Concluding remarks Discussed: Empirical Bayes (EB) prediction of random effects and CI using gllapred, ignoring parameter uncertainty Prediction of different kinds of probabilities using gllapred after careful preparation of prediction dataset Simulation-based CI for predicted marginal probabilities using new command ci_marg_mu Methods work for any GLLAMM model, including random-coefficient models and models for ordinal, nominal or count data Assumed normal random effects distribution EB predictions not robust to misspecification of distribution Could use nonparametric maximum likelihood in gllamm, followed by same gllapred and ci_marg_mu commands. p.

References Rabe-Hesketh, S. and Skrondal, A. (08). Multilevel and Longitudinal Modeling Using Stata (2nd Edition). College Station, TX: Stata Press. Skrondal, A. and Rabe-Hesketh, S. (08). Prediction in multilevel generalized linear models. Journal of the Royal Statistical Society, Series A, in press. Rabe-Hesketh, S., Skrondal, A. and Pickles, A. (05). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics 1, 301-3.. p.