Understanding Variability in Multilevel Models for Categorical Responses. Sophia Rabe-Hesketh. Outline

Transcription

1 Understanding Variability in Multilevel Models for Categorical Responses Outline Two-level logistic random-intercept model (1: students, 2: schools) Measures of residual variability and dependence Sophia Rabe-Hesketh University of California, Berkeley Median odds ratio Intraclass correlation of latent responses Intraclass correlation of observed responses & Institute of Education, University of London Standard deviation of probabilities Variance partition coefficient Graphical displays of variability Joint work with Anders Skrondal Densities of probabilities Percentiles of probabilities Probabilities for schools in the data Hierarchical Linear Modeling SIG AERA Annual Meeting, Vancouver, April 2012 p.1 Ordinal responses and three-level data (1: students, 2: teachers, 3: schools) p.2 Example: PISA (Programme for International Student Assessment) PISA: Distribution of SES PISA study assesses reading, math and science achievement among 15-year-old students in tens of countries every 3 years Here consider U.S. data from 2000 Schools j were randomly sampled and then students i were randomly sampled from the selected schools Variables: Reading proficiency y ij (1=yes, 0=no) Student SES x ij mn: School mean SES x j dv: Deviation of student SES from school mean x ij x j Student SES (mn+dv) mn+11 mn School mean SES (mn) Min to Max P25 to P75 For schools, mn: 25th & 75th %tiles: 39 & 51 Mean & Median: 45 For students, dv: 25th & 75th %tiles: -11 & 11 Mean & Median: 0 Three covariate patterns: x ij mn dv ses lo me hi Sample size (listwise) 2069 students in 148 schools Number of students per school: 1 to 28, mean/median 14 p.3 p.4

2 Two-level logistic random-intercept model Two-level logistic random-intercept model for unit i (level 1) nested in cluster j (level 2) logit[pr(y ij =1 x ij,ζ j )] = [ ] Pr(yij =1 x ij,ζ j ) log Pr(y ij =0 x ij,ζ j ) }{{} Odds = β 1 + β 2 dv ij + β 3 mn j + ζ j Maximum likelihood estimates of model parameters Null model Full model Param. Covariate Est (SE) Est (SE) OR (95% CI) β (0.10) 4.79 (0.43) 10β 2 dv/ (0.03) 1.2 (1.1,1.3) 10β 3 mn/ (0.09) 2.4 (2.1,2.9) ψ log[odds(y ij =1 x ij,ζ j )] = β 1 + β 2 dv ij + β 3 mn j + ζ j Odds(y ij =1 x ij,ζ j ) = exp(β 1 )exp(β 2 ) dv ij exp(β 3 ) mn j exp(ζ j ) x ij is vector of covariates (dv ij, mn j ) β 1 is the fixed intercept β 2 and β 3 are fixed regression coefficients ζ j is a random intercept ζ j N(0,ψ) Sometimes write x ij β β 1 + β 2 dv ij + β 3 mn j p.5 Conditional odds ratio associated with 10-point increase in dv ij for given mn j and ζ j : exp(β 1 )exp(β 2 ) a+10 exp(β 3 ) mn j exp(ζ j ) exp(β 1 )exp(β 2 ) a exp(β 3 ) mn =exp(β 2 ) 10 =exp(10β 2 ) j exp(ζ j ) Comparing two students from same school (given mn j and ζ j ) whose SES differs by 10 points, student with higher SES has 1.2 times the odds of being proficient as other student odds are 20% greater Conditional odds ratio associated with 10-point increase in mn j for given dv ij and ζ j : Comparing two schools that differ in their mean SES by 10 points and have the same random intercept, odds of a student (with SES at the school mean) being proficient are 2.4 times as great for higher-ses school p.6 Predicted conditional probabilities p(x ij,ζ j ) Conditional probability with estimates β 1, β 2, β 3 plugged in p(x ij,ζ j ) Pr(yij =1 x ij,ζ j )= exp( β 1 + β 2 dv ij + β 3 mn j + ζ j ) 1+exp( β 1 + β 2 dv ij + β 3 mn j + ζ j ) Plug in interesting values of x ij and ζ j, e.g., ζ j =0(mean & median) p(lo, 0) = 0.18, p(me, 0) = 0.32, p(hi, 0) = School mean SES (mn) dv = 11 dv = 0 dv = Deviation of SES from school mean (dv) mn = 39 mn = 45 mn = 51 p.7 Model for conditional odds Median odds ratio Odds(y ij =1 x ij,ζ j )=exp(β 1 )exp(β 2 ) dv ij exp(β 3 ) mn j exp(ζ j ) Randomly sample schools, then students (for given x) Odds ratio, for two students i and i from different schools j and j : Odds(y ij =1 x,ζ j ) Odds(y i j =1 x,ζ j ) =exp(ζ j ζ j ) Odds ratio, comparing student with greater odds to other student: OR =exp( ζ j ζ j ), (ζ j ζ j ) N(0, 2ψ) Odds ratio exp( ζ j ζ j ) varies randomly and has estimated median = exp{ 2 ψφ 1 (3/4)} [Larsen et al., 2000] OR median p.8

3 Median odds ratio (cont d) Latent response formulation For two randomly drawn students from two randomly drawn schools, estimated median odds ratio, comparing student from school with larger random intercept to other student, is OR median = 2.4 for null model OR median = 1.7 for full model In null model, odds ratio due to random intercept exceeds 2.4 half the time In full model, estimated odds ratio for 10-point increase in school-mean SES is 2.4 p.9 Imagine latent (unobserved) continuous response y ij (e.g., reading ability) Observed response is 1 if latent response is greater than 0 (e.g., proficient if ability greater than 0): 1 if yij y ij = > 0 0 otherwise Model for latent response: y ij = β 1 + β 2 dv ij + β 3 mn j +ζ j + ɛ ij, ζ j N(0,ψ), ɛ ij logistic }{{} x ijβ Logistic distribution has mean 0 and variance π 2 /3 and is similar to normal distribution Model for observed response: logit[pr(y ij =1 x ij,ζ j )] = β 1 + β 2 dv ij + β 3 mn j + ζ j, ζ j N(0,ψ) p.10 Equivalence of formulations Intraclass correlation of latent responses Model for yij is standard multilevel/hierarchical linear model, therefore use standard ICC Var(ζ j ) ICC lat = Var(ζ j )+Var(ɛ ij ) = ψ ψ + π 2 /3 Pr(yij =1 xij,ζj) 0.5 Correlation between students i and i in same school j Cor(y ij,y i j x ij, x i j)=cor(ζ j + ɛ ij,ζ j + ɛ i j)= ψ ψ + π 2 /3 y ij 0.0 x ij β + ζ j Proportion of variance that is between schools Var(yij x ij )=Var(ζ j + ɛ ij )=Var(ζ j )+Var(ɛ ij )=ψ + π 2 /3 For PISA data, plug in ψ ICC lat = 0.82 =0.20 for null model ICC lat = =0.08 for full model p.12 p.11

4 Intraclass correlation of observed responses Correlation Cor(y ij,y i j x ij, x i j) of observed responses for students i and i in same school j depends on covariates x ij, x i j For simplicity, assume x ij = x i j = x ICC obs = Ĉor(y ij,y i j x) = p 11(x) p(x) 2 p(x)[1 p(x)] Intraclass correlation of observed responses (cont d) PISA: Null model Full model lo me hi ICC obs ICC lat Among schools and students with covariates x, randomly choose a school and then randomly choose students from the school p(x) is probability that a student is proficient p 11 (x) is probability that two students are both proficient Population averaged or marginal probability p(x) = Pr(yij =1 x) = p(x,ζ j )g(ζ j ;0, ψ) dζ j Average over random effects with Gaussian density g(ζ j ;0, ψ) ICC Random-intercept standard deviation ψ Observed Latent logit[pr(y ij =1 ζ j )] = β 1 +ζ j ζ j N(0,ψ) ICC obs for β 1 =0to β 1 =3.7 ICC obs < ICC lat ICC obs largest for β 1 =0 [Rodríguez & Elo, 2003] [Rodríguez & Elo, 2003] p.13 p.14 Kappa coefficient of observed responses Standard deviation of probabilities Recall ICC obs = Ĉor(y ij,y i j x) = p 11(x) p(x) 2 p(x)[1 p(x)] Probabilities of agreement for students in same school: Pr(y ij = y i j =1 x) p 11 (x) = ICC obs p(x)[1 p(x)] + p(x) 2 Pr(y ij = y i j =0 x) p 00 (x) = ICC obs p(x)[1 p(x)] + [1 p(x)] 2 Probabilities of agreement for students in different schools: Pr(y ij = y i j =1 x) = p(x)2 Pr(y ij = y i j =0 x) = [1 p(x)]2 =1 2p(x)+p(x) 2 =1 2p(x)+p(x) 2 Pr(y ij = y i j x) = 1 2p(x)[1 p(x)] κ = Pr(y ij = y i j x) Pr(y ij = y i j x) 1 Pr(y ij = y i j x) = ICC obs 2p(x)[1 p(x)] = ICC obs 2p(x)[1 p(x)] Already considered median of p(x,ζ j ) Median[ p(x,ζ j )] = p(x, 0) Already considered mean of p(x,ζ j ) p(x) = Pr(yij =1 x) = p(x,ζ j )g(ζ j ;0, ψ) dζ j Get standard deviation of p(x,ζ j ) by integration or by simulation sd[ p(x,ζ j )] = Null model Var[ p(x,ζ j )] Full model lo me hi [Eldridge et al., 2009] p.15 p.16

5 General result Var(responses) }{{} Total variance Variance partition coefficient = Var(School-mean response) + Mean(Within-school variance) }{{}}{{} Between-school variance v 2 Within-school variance v 1 For school j: School-mean response: Mean(y ij x,ζ j )= p(x,ζ j ) Find variance over distribution of ζ j Within-school variance: Var(yij x,ζ j )= p(x,ζ j )[1 p(x,ζ j )] Find mean over distribution of ζ j Total variance: Var(y ij x ij )=v 2 + v 1 = p(x ij )[1 p(x ij )] All results for PISA x mn dv p(x, 0) p(x) sd[ p(x,ζ j )] ICC lat ICC obs VPC Null model Full model lo me hi Variance partition coefficient [Goldstein et al., 2002, Simulation, Method B] VPC = v 2 v 2 + v 1 = ICC obs ICC lat p.17 p.18 Density of probabilities Conditional probabilities, conditional on ζ j, depend on ζ j p(x,ζ j ) Pr(y ij =1 x ij = x,ζ j ) When ζ j varies, corresponding probability p p(x,ζ j ) varies with density f( p) =g(log[ p/(1 p)]; x β,ψ) 1 p(1 p) Percentiles of probabilities A given percentile of p(x,ζ j ), for given x, can be found by plugging in corresponding percentile of ζ j N(0, ψ) Left figure: p(x, ±1.96 ψ) and p(x, 0) versus mn with dv= 0 Randomly sample schools with a given mn: Interval is 95% range of probabilities for students with dv= 0 Density Null model Conditional probability Median Mean Density Full model Conditional probability lo me hi School mean SES (mn) Median 2.5th %tile to 97.5th %tile Deviation of SES from school mean (dv) mn = 39 mn = 51 25th %tile to 75th %tile 25th %tile to 75th %tile [Duchateau & Janssen, 2005] p.19 p.20

6 Probabilities for schools in the data Probabilities for schools in the data (cont d) Probability p(x) of proficiency for a randomly sampled student with covariates x in an existing school j Data on students in school j: y j =(y 1j,...,y njj) and X j = Information about ζ j (Bayes theorem): x 1j. x n jj Posterior(ζ j y j, X j )= g(ζ j;0,ψ)pr(y j X j,ζ j ) Pr(y j X j ) Best prediction of probability (in terms of mean-squared error) is p(x) = Posterior mean[ p(x,ζ j )] = p(x,ζ j )Posterior(ζ j y j, X j ) dζ j All schools, x =(0, mn j ) School mean SES (mn) Median th %tile to 97.5th %tile randomly chosen schools, x = x ij Student SES (mn+dv) Not equal to p(x, ζ j ), where ζ j is posterior mean of ζ j [Skrondal & Rabe-Hesketh, 2009] p.21 p.22 Probabilities for schools in the data (cont d) In previous graphs, p(x) varies between schools for given value of dv because mn j and ζ j vary Isolate variability due to ζ j by setting mn j =45and dv ij =0 (boxplot on right) Three-level ordinal cumulative logit model Tennessee class-size experiment In 4th grade, teachers assessed student participation 2217 students i of 262 teachers j in 75 schools k Ordinal response variable: Teacher s rating of how frequently student pays attention y ijk Rated as: 1 (never), 2, 3 (sometimes), 4, 5 (always) No covariates for simplicity dv=0 mn=45, dv=0 p.23 p.24

7 Three-level ordinal cumulative logit model: OR median Model probability of exceeding a category s =1, 2, 3, 4 logit[pr(y ijk >s ζ (2) jk,ζ(3) k )] = κ s + ζ (2) jk + ζ(3) k Teacher random intercept ζ (2) jk N(0,ψ(2) ), ψ(2) =0.44 School random intercept ζ (3) k N(0,ψ (3) ), ψ(3) =0.12 Odds ratio, comparing students of teachers j and j in school k Pr(y ijk >s ζ (2) jk,ζ(3) k ) Pr(y ij k >s ζ (2) ) =exp(ζ(2) jk j k,ζ(3) k ζ(2) j k ) Median of odds ratio, comparing student with larger odds to other student: OR median Different teacher, same school 1.88 Different teacher, different school 2.04 p.25 Three-level ordinal cumulative logit model: ICC lat Latent response y ijk y ijk = ζ (2) jk Observed response y ijk + ζ(3) k + ɛ ijk, ɛ ijk logistic results from cutting up y ijk at cut-points or thresholds κ 1 to κ 4 : yijk κ 1 κ 2 κ 3 κ 4 Intraclass correlations of latent responses Same school, different teacher: ψ (2) ICC lat (school) = ψ (2) + ψ (3) + π 2 /3 =0.031 Same school, same teacher: ICC lat (teacher, school) = ψ (2) + ψ (3) ψ (2) + ψ (3) + π 2 /3 =0.145 p.26 Three-level ordinal cumulative logit model: ICC obs Probabilities for teachers and schools in the data Two randomly chosen students i, i from same school (either different or same teachers) Pearson correlation, Cor(y ijk,y i j k), Cor(y ijk,y i jk), or Kendall s τ b : ICC lat Cor ICC obs Same school, different teacher Same school, same teacher Method: 5 5 table of response for student i (rows) versus response for student i (columns) π st (school) = Pr(y ijk = s, y i j k = t) π st (teacher, school) = Pr(y ijk = s, y i jk = t) τ b Probability Probability that student pays attention more than sometimes (y >3) Randomly chosen student, for: Teachers in data Schools in data (new teacher) Integrate over posterior of ζ (2) jk, ζ(3) k Integrate over prior of ζ (2) jk and posterior of ζ(3) k Probability 5 15 Use these probabilities as frequency weights for Cor and τ b p.27 p.28

8 Software Stata was used for the results and graphs Datasets and Stata commands available from Models estimated by ML with adaptive quadrature [Rabe-Hesketh et al., 2005] Commands for binary logit models: xtlogit, xtmelogit xtrho [Rodríguez & Elo, 2003] for ICC obs = VPC Command for binary, ordinal, or multinomial logit models (and other response types): gllamm [Rabe-Hesketh et al., 2005] gllapred used to obtain p(x), p(x), p st (x), etc. [Rabe-Hesketh & Skrondal, 2009] Discussion Have discussed measures and graphs for interpreting variability implied by model with estimated parameters Ignored parameter uncertainty (no SEs or CIs) Did not consider variance explained by covariates Extensions Include random coefficient of x ij for p(x), p(x), and ICC obs, integrate over all random effects ICC lat and OR median now depend on x ij More levels: Straightforward Relax normality assumption for random effects: non-parametric maximum likelihood estimation (NPMLE) Other response types Nominal responses: Similar to binary responses Counts: No ICC lat, but can obtain IRR median and ICC obs p.29 p.30 References Duchateau, L. & Janssen, P. (2005). Understanding heterogeneity in generalized linear mixed and frailty models. The American Statistician 59, Eldridge, S. M, et al. (2009). The intra-cluster correlation coefficient in cluster randomized trials: A review of definitions. International Statistical Review 77, Goldstein, H., et al. (2002). Partitioning variation in multilevel models. Understanding Statistics 1, Larsen, K., et al. (2000). Interpreting parameters in the logistic regression model with random effects. Biometrics 56, Rabe-Hesketh, S. & Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata (Third Edition). College Station, TX: Stata Press. Rabe-Hesketh, S. & Skrondal, A. (in prep). Understanding variability in multilevel models for categorical responses. Rabe-Hesketh, S., et al. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics 128, Skrondal, A. & Rabe-Hesketh, S. (2009). Prediction in multilevel generalized linear models. Journal of the Royal Statistical Society, Series A 172, Rodríguez, G. & Elo, I. (2003). Intra-class correlation in random-effects models for binary data. The Stata Journal 3, p.31