Understanding Variability in Multilevel Models for Categorical Responses. Sophia Rabe-Hesketh. Outline

Understanding Variability in Multilevel Models for Categorical Responses Outline Two-level logistic random-intercept model (1: students, 2: schools) Measures of residual variability and dependence Sophia Rabe-Hesketh University of California, Berkeley Median odds ratio Intraclass correlation of latent responses Intraclass correlation of observed responses & Institute of Education, University of London Standard deviation of probabilities Variance partition coefficient Graphical displays of variability Joint work with Anders Skrondal Densities of probabilities Percentiles of probabilities Probabilities for schools in the data Hierarchical Linear Modeling SIG AERA Annual Meeting, Vancouver, April 2012 p.1 Ordinal responses and three-level data (1: students, 2: teachers, 3: schools) p.2 Example: PISA (Programme for International Student Assessment) PISA: Distribution of SES PISA study assesses reading, math and science achievement among 15-year-old students in tens of countries every 3 years Here consider U.S. data from 2000 Schools j were randomly sampled and then students i were randomly sampled from the selected schools Variables: Reading proficiency y ij (1=yes, 0=no) Student SES x ij mn: School mean SES x j dv: Deviation of student SES from school mean x ij x j Student SES (mn+dv) 20 40 60 80 100 mn+11 mn 11 21 27 33 39 45 51 57 63 69 School mean SES (mn) Min to Max P25 to P75 For schools, mn: 25th & 75th %tiles: 39 & 51 Mean & Median: 45 For students, dv: 25th & 75th %tiles: -11 & 11 Mean & Median: 0 Three covariate patterns: x ij mn dv ses lo 39-11 28 me 45 0 45 hi 51 11 62 Sample size (listwise) 2069 students in 148 schools Number of students per school: 1 to 28, mean/median 14 p.3 p.4

Two-level logistic random-intercept model Two-level logistic random-intercept model for unit i (level 1) nested in cluster j (level 2) logit[pr(y ij =1 x ij,ζ j )] = [ ] Pr(yij =1 x ij,ζ j ) log Pr(y ij =0 x ij,ζ j ) }{{} Odds = β 1 + β 2 dv ij + β 3 mn j + ζ j Maximum likelihood estimates of model parameters Null model Full model Param. Covariate Est (SE) Est (SE) OR (95% CI) β 1 0.71 (0.10) 4.79 (0.43) 10β 2 dv/10 0.18 (0.03) 1.2 (1.1,1.3) 10β 3 mn/10 0.89 (0.09) 2.4 (2.1,2.9) ψ 0.82 0.28 log[odds(y ij =1 x ij,ζ j )] = β 1 + β 2 dv ij + β 3 mn j + ζ j Odds(y ij =1 x ij,ζ j ) = exp(β 1 )exp(β 2 ) dv ij exp(β 3 ) mn j exp(ζ j ) x ij is vector of covariates (dv ij, mn j ) β 1 is the fixed intercept β 2 and β 3 are fixed regression coefficients ζ j is a random intercept ζ j N(0,ψ) Sometimes write x ij β β 1 + β 2 dv ij + β 3 mn j p.5 Conditional odds ratio associated with 10-point increase in dv ij for given mn j and ζ j : exp(β 1 )exp(β 2 ) a+10 exp(β 3 ) mn j exp(ζ j ) exp(β 1 )exp(β 2 ) a exp(β 3 ) mn =exp(β 2 ) 10 =exp(10β 2 ) j exp(ζ j ) Comparing two students from same school (given mn j and ζ j ) whose SES differs by 10 points, student with higher SES has 1.2 times the odds of being proficient as other student odds are 20% greater Conditional odds ratio associated with 10-point increase in mn j for given dv ij and ζ j : Comparing two schools that differ in their mean SES by 10 points and have the same random intercept, odds of a student (with SES at the school mean) being proficient are 2.4 times as great for higher-ses school p.6 Predicted conditional probabilities p(x ij,ζ j ) Conditional probability with estimates β 1, β 2, β 3 plugged in p(x ij,ζ j ) Pr(yij =1 x ij,ζ j )= exp( β 1 + β 2 dv ij + β 3 mn j + ζ j ) 1+exp( β 1 + β 2 dv ij + β 3 mn j + ζ j ) Plug in interesting values of x ij and ζ j, e.g., ζ j =0(mean & median) p(lo, 0) = 0.18, p(me, 0) = 0.32, p(hi, 0) = 0.49 21 27 33 39 45 51 57 63 69 School mean SES (mn) dv = 11 dv = 0 dv = 11 33 22 11 0 11 22 33 Deviation of SES from school mean (dv) mn = 39 mn = 45 mn = 51 p.7 Model for conditional odds Median odds ratio Odds(y ij =1 x ij,ζ j )=exp(β 1 )exp(β 2 ) dv ij exp(β 3 ) mn j exp(ζ j ) Randomly sample schools, then students (for given x) Odds ratio, for two students i and i from different schools j and j : Odds(y ij =1 x,ζ j ) Odds(y i j =1 x,ζ j ) =exp(ζ j ζ j ) Odds ratio, comparing student with greater odds to other student: OR =exp( ζ j ζ j ), (ζ j ζ j ) N(0, 2ψ) Odds ratio exp( ζ j ζ j ) varies randomly and has estimated median = exp{ 2 ψφ 1 (3/4)} [Larsen et al., 2000] OR median p.8

Median odds ratio (cont d) Latent response formulation For two randomly drawn students from two randomly drawn schools, estimated median odds ratio, comparing student from school with larger random intercept to other student, is OR median = 2.4 for null model OR median = 1.7 for full model In null model, odds ratio due to random intercept exceeds 2.4 half the time In full model, estimated odds ratio for 10-point increase in school-mean SES is 2.4 p.9 Imagine latent (unobserved) continuous response y ij (e.g., reading ability) Observed response is 1 if latent response is greater than 0 (e.g., proficient if ability greater than 0): 1 if yij y ij = > 0 0 otherwise Model for latent response: y ij = β 1 + β 2 dv ij + β 3 mn j +ζ j + ɛ ij, ζ j N(0,ψ), ɛ ij logistic }{{} x ijβ Logistic distribution has mean 0 and variance π 2 /3 and is similar to normal distribution Model for observed response: logit[pr(y ij =1 x ij,ζ j )] = β 1 + β 2 dv ij + β 3 mn j + ζ j, ζ j N(0,ψ) p.10 Equivalence of formulations Intraclass correlation of latent responses Model for yij is standard multilevel/hierarchical linear model, therefore use standard ICC Var(ζ j ) ICC lat = Var(ζ j )+Var(ɛ ij ) = ψ ψ + π 2 /3 Pr(yij =1 xij,ζj) 0.5 Correlation between students i and i in same school j Cor(y ij,y i j x ij, x i j)=cor(ζ j + ɛ ij,ζ j + ɛ i j)= ψ ψ + π 2 /3 y ij 0.0 x ij β + ζ j Proportion of variance that is between schools Var(yij x ij )=Var(ζ j + ɛ ij )=Var(ζ j )+Var(ɛ ij )=ψ + π 2 /3 For PISA data, plug in ψ ICC lat = 0.82 =0.20 for null model 0.82 + 3.29 ICC lat = 0.28 0.28 + 3.29 =0.08 for full model p.12 p.11

Intraclass correlation of observed responses Correlation Cor(y ij,y i j x ij, x i j) of observed responses for students i and i in same school j depends on covariates x ij, x i j For simplicity, assume x ij = x i j = x ICC obs = Ĉor(y ij,y i j x) = p 11(x) p(x) 2 p(x)[1 p(x)] Intraclass correlation of observed responses (cont d) PISA: Null model Full model lo me hi ICC obs 0.14 0.04 0.06 0.06 ICC lat 0.20 0.08 0.08 0.08 Among schools and students with covariates x, randomly choose a school and then randomly choose students from the school p(x) is probability that a student is proficient p 11 (x) is probability that two students are both proficient Population averaged or marginal probability p(x) = Pr(yij =1 x) = p(x,ζ j )g(ζ j ;0, ψ) dζ j Average over random effects with Gaussian density g(ζ j ;0, ψ) ICC 0.2.4.6.8 0 1 2 3 4 5 Random-intercept standard deviation ψ Observed Latent logit[pr(y ij =1 ζ j )] = β 1 +ζ j ζ j N(0,ψ) ICC obs for β 1 =0to β 1 =3.7 ICC obs < ICC lat ICC obs largest for β 1 =0 [Rodríguez & Elo, 2003] [Rodríguez & Elo, 2003] p.13 p.14 Kappa coefficient of observed responses Standard deviation of probabilities Recall ICC obs = Ĉor(y ij,y i j x) = p 11(x) p(x) 2 p(x)[1 p(x)] Probabilities of agreement for students in same school: Pr(y ij = y i j =1 x) p 11 (x) = ICC obs p(x)[1 p(x)] + p(x) 2 Pr(y ij = y i j =0 x) p 00 (x) = ICC obs p(x)[1 p(x)] + [1 p(x)] 2 Probabilities of agreement for students in different schools: Pr(y ij = y i j =1 x) = p(x)2 Pr(y ij = y i j =0 x) = [1 p(x)]2 =1 2p(x)+p(x) 2 =1 2p(x)+p(x) 2 Pr(y ij = y i j x) = 1 2p(x)[1 p(x)] κ = Pr(y ij = y i j x) Pr(y ij = y i j x) 1 Pr(y ij = y i j x) = ICC obs 2p(x)[1 p(x)] = ICC obs 2p(x)[1 p(x)] Already considered median of p(x,ζ j ) Median[ p(x,ζ j )] = p(x, 0) Already considered mean of p(x,ζ j ) p(x) = Pr(yij =1 x) = p(x,ζ j )g(ζ j ;0, ψ) dζ j Get standard deviation of p(x,ζ j ) by integration or by simulation sd[ p(x,ζ j )] = Null model Var[ p(x,ζ j )] Full model lo me hi 0.18 0.08 0.11 0.12 [Eldridge et al., 2009] p.15 p.16

General result Var(responses) }{{} Total variance Variance partition coefficient = Var(School-mean response) + Mean(Within-school variance) }{{}}{{} Between-school variance v 2 Within-school variance v 1 For school j: School-mean response: Mean(y ij x,ζ j )= p(x,ζ j ) Find variance over distribution of ζ j Within-school variance: Var(yij x,ζ j )= p(x,ζ j )[1 p(x,ζ j )] Find mean over distribution of ζ j Total variance: Var(y ij x ij )=v 2 + v 1 = p(x ij )[1 p(x ij )] All results for PISA x mn dv p(x, 0) p(x) sd[ p(x,ζ j )] ICC lat ICC obs VPC Null model 0.331 0.354 0.180 0.200 0.143 0.143 Full model lo 39-11 0.181 0.193 0.081 0.078 0.042 0.042 me 45 0 0.315 0.333 0.111 0.078 0.056 0.056 hi 51 11 0.491 0.491 0.124 0.078 0.062 0.062 Variance partition coefficient [Goldstein et al., 2002, Simulation, Method B] VPC = v 2 v 2 + v 1 = ICC obs ICC lat p.17 p.18 Density of probabilities Conditional probabilities, conditional on ζ j, depend on ζ j p(x,ζ j ) Pr(y ij =1 x ij = x,ζ j ) When ζ j varies, corresponding probability p p(x,ζ j ) varies with density f( p) =g(log[ p/(1 p)]; x β,ψ) 1 p(1 p) Percentiles of probabilities A given percentile of p(x,ζ j ), for given x, can be found by plugging in corresponding percentile of ζ j N(0, ψ) Left figure: p(x, ±1.96 ψ) and p(x, 0) versus mn with dv= 0 Randomly sample schools with a given mn: Interval is 95% range of probabilities for students with dv= 0 Density 0.5 1 1.5 2 Null model 0.2.4.6.8 1 Conditional probability Median Mean Density 0 2 4 6 Full model 0.2.4.6.8 1 Conditional probability lo me hi 21 27 33 39 45 51 57 63 69 School mean SES (mn) Median 2.5th %tile to 97.5th %tile 33 22 11 0 11 22 33 Deviation of SES from school mean (dv) mn = 39 mn = 51 25th %tile to 75th %tile 25th %tile to 75th %tile [Duchateau & Janssen, 2005] p.19 p.20

Probabilities for schools in the data Probabilities for schools in the data (cont d) Probability p(x) of proficiency for a randomly sampled student with covariates x in an existing school j Data on students in school j: y j =(y 1j,...,y njj) and X j = Information about ζ j (Bayes theorem): x 1j. x n jj Posterior(ζ j y j, X j )= g(ζ j;0,ψ)pr(y j X j,ζ j ) Pr(y j X j ) Best prediction of probability (in terms of mean-squared error) is p(x) = Posterior mean[ p(x,ζ j )] = p(x,ζ j )Posterior(ζ j y j, X j ) dζ j 26 14115 11552 5 All schools, x =(0, mn j ) 16 19 35 108 12 112 74 93 104 130 132127 116 110 91 2 8410943 125 79 68 97 137 28 39 134 147 114 341 72101 111 65 41 150 42 9 54 149 21 48 76 36 87 67 8378 64 32 33 10 82 18 143 50 564 126 47 103 77 90 27 13 61 2217142 60 124 94 53 45 4611989 310651 57 86 144 55 148 70 102 31 6929 30 49 118 151 40 120 139 58 1227123 99 11 85 133 62 88 98 7 15 24 136 145 73 105 6 100 59 25 121 23 129 128 81 117 141 21 27 33 39 45 51 57 63 69 School mean SES (mn) Median 38 37 44 8 8075 20 113 140 96 131 107 66 2.5th %tile to 97.5th %tile 135 95 19 randomly chosen schools, x = x ij 115 98 102 85 105 23 29 18 128 13 58 122 97 42 104 32 21 28 45 62 79 Student SES (mn+dv) 38 20 Not equal to p(x, ζ j ), where ζ j is posterior mean of ζ j [Skrondal & Rabe-Hesketh, 2009] p.21 p.22 Probabilities for schools in the data (cont d) In previous graphs, p(x) varies between schools for given value of dv because mn j and ζ j vary Isolate variability due to ζ j by setting mn j =45and dv ij =0 (boxplot on right) 137 35 19 130 Three-level ordinal cumulative logit model Tennessee class-size experiment In 4th grade, teachers assessed student participation 2217 students i of 262 teachers j in 75 schools k Ordinal response variable: Teacher s rating of how frequently student pays attention y ijk Rated as: 1 (never), 2, 3 (sometimes), 4, 5 (always) No covariates for simplicity dv=0 mn=45, dv=0 p.23 p.24

Three-level ordinal cumulative logit model: OR median Model probability of exceeding a category s =1, 2, 3, 4 logit[pr(y ijk >s ζ (2) jk,ζ(3) k )] = κ s + ζ (2) jk + ζ(3) k Teacher random intercept ζ (2) jk N(0,ψ(2) ), ψ(2) =0.44 School random intercept ζ (3) k N(0,ψ (3) ), ψ(3) =0.12 Odds ratio, comparing students of teachers j and j in school k Pr(y ijk >s ζ (2) jk,ζ(3) k ) Pr(y ij k >s ζ (2) ) =exp(ζ(2) jk j k,ζ(3) k ζ(2) j k ) Median of odds ratio, comparing student with larger odds to other student: OR median Different teacher, same school 1.88 Different teacher, different school 2.04 p.25 Three-level ordinal cumulative logit model: ICC lat Latent response y ijk y ijk = ζ (2) jk Observed response y ijk + ζ(3) k + ɛ ijk, ɛ ijk logistic results from cutting up y ijk at cut-points or thresholds κ 1 to κ 4 : 1 2 3 4 5 yijk κ 1 κ 2 κ 3 κ 4 Intraclass correlations of latent responses Same school, different teacher: ψ (2) ICC lat (school) = ψ (2) + ψ (3) + π 2 /3 =0.031 Same school, same teacher: ICC lat (teacher, school) = ψ (2) + ψ (3) ψ (2) + ψ (3) + π 2 /3 =0.145 p.26 Three-level ordinal cumulative logit model: ICC obs Probabilities for teachers and schools in the data Two randomly chosen students i, i from same school (either different or same teachers) Pearson correlation, Cor(y ijk,y i j k), Cor(y ijk,y i jk), or Kendall s τ b : ICC lat Cor ICC obs Same school, different teacher 0.031 0.029 0.025 Same school, same teacher 0.145 0.136 0.118 Method: 5 5 table of response for student i (rows) versus response for student i (columns) π st (school) = Pr(y ijk = s, y i j k = t) π st (teacher, school) = Pr(y ijk = s, y i jk = t) τ b Probability Probability that student pays attention more than sometimes (y >3) Randomly chosen student, for: Teachers in data 17 52 Schools in data (new teacher) Integrate over posterior of ζ (2) jk, ζ(3) k Integrate over prior of ζ (2) jk and posterior of ζ(3) k Probability 5 15 Use these probabilities as frequency weights for Cor and τ b p.27 p.28

Software Stata was used for the results and graphs Datasets and Stata commands available from http://www.gllamm.org/pres.html Models estimated by ML with adaptive quadrature [Rabe-Hesketh et al., 2005] Commands for binary logit models: xtlogit, xtmelogit xtrho [Rodríguez & Elo, 2003] for ICC obs = VPC Command for binary, ordinal, or multinomial logit models (and other response types): gllamm [Rabe-Hesketh et al., 2005] gllapred used to obtain p(x), p(x), p st (x), etc. [Rabe-Hesketh & Skrondal, 2009] Discussion Have discussed measures and graphs for interpreting variability implied by model with estimated parameters Ignored parameter uncertainty (no SEs or CIs) Did not consider variance explained by covariates Extensions Include random coefficient of x ij for p(x), p(x), and ICC obs, integrate over all random effects ICC lat and OR median now depend on x ij More levels: Straightforward Relax normality assumption for random effects: non-parametric maximum likelihood estimation (NPMLE) Other response types Nominal responses: Similar to binary responses Counts: No ICC lat, but can obtain IRR median and ICC obs p.29 p.30 References Duchateau, L. & Janssen, P. (2005). Understanding heterogeneity in generalized linear mixed and frailty models. The American Statistician 59, 143-146. Eldridge, S. M, et al. (2009). The intra-cluster correlation coefficient in cluster randomized trials: A review of definitions. International Statistical Review 77, 378-394. Goldstein, H., et al. (2002). Partitioning variation in multilevel models. Understanding Statistics 1, 223-331. Larsen, K., et al. (2000). Interpreting parameters in the logistic regression model with random effects. Biometrics 56, 909-914. Rabe-Hesketh, S. & Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata (Third Edition). College Station, TX: Stata Press. Rabe-Hesketh, S. & Skrondal, A. (in prep). Understanding variability in multilevel models for categorical responses. Rabe-Hesketh, S., et al. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics 128, 301-323. Skrondal, A. & Rabe-Hesketh, S. (2009). Prediction in multilevel generalized linear models. Journal of the Royal Statistical Society, Series A 172, 659-687. Rodríguez, G. & Elo, I. (2003). Intra-class correlation in random-effects models for binary data. The Stata Journal 3, 32-46. p.31