Understanding Variability in Multilevel Models for Categorical Responses. Sophia Rabe-Hesketh. Outline

Similar documents
Prediction for Multilevel Models

Multilevel Modeling of Complex Survey Data

Models for Longitudinal and Clustered Data

10 Dichotomous or binary responses

Ordinal Regression. Chapter

Interpretation of Somers D under four simple models

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Power and sample size in multilevel modeling

Introduction to Longitudinal Data Analysis

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Statistical Models in R

Multinomial and Ordinal Logistic Regression

MS RR Booil Jo. Supplemental Materials (to be posted on the Web)

Handling attrition and non-response in longitudinal data

SAS Software to Fit the Generalized Linear Model

Problem of Missing Data

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Multilevel modelling of complex survey data

Analysing Questionnaires using Minitab (for SPSS queries contact -)

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Statistics Graduate Courses

Fairfield Public Schools

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

POS 6933 Section 016D Spring 2015 Multilevel Models Department of Political Science, University of Florida

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Lecture 5 Three level variance component models

10. Analysis of Longitudinal Studies Repeat-measures analysis

HLM software has been one of the leading statistical packages for hierarchical

GLM I An Introduction to Generalized Linear Models

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

From the help desk: Bootstrapped standard errors

Scatter Plots with Error Bars

Introduction to General and Generalized Linear Models

BayesX - Software for Bayesian Inference in Structured Additive Regression

Generalized Linear Models

Introduction to Quantitative Methods

Additional sources Compilation of sources:

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Final Exam Practice Problem Answers

Data analysis process

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Geostatistics Exploratory Analysis

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Multiple Choice Models II

The Basic Two-Level Regression Model

Goodness of fit assessment of item response theory models

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Some Essential Statistics The Lure of Statistics

Nominal and ordinal logistic regression

Simple linear regression

Bayesian Statistics in One Hour. Patrick Lam

Logistic Regression (a type of Generalized Linear Model)

Imputing Missing Data using SAS

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

DATA INTERPRETATION AND STATISTICS

Statistical Machine Learning

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

UNIVERSITY OF NAIROBI

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

11. Analysis of Case-control Studies Logistic Regression

Handling missing data in Stata a whirlwind tour

CALCULATIONS & STATISTICS

Basic Statistical and Modeling Procedures Using SAS

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Linear Classification. Volker Tresp Summer 2015

Statistics 104: Section 6!

Simple Linear Regression Inference

Factors affecting online sales

Introduction to Data Analysis in Hierarchical Linear Models

Implications of Big Data for Statistics Instruction 17 Nov 2013

Study Guide for the Final Exam

Standard errors of marginal effects in the heteroskedastic probit model

gllamm companion for Contents

Chapter 5 Analysis of variance SPSS Analysis of variance

College Readiness LINKING STUDY

430 Statistics and Financial Mathematics for Business

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

STAT 360 Probability and Statistics. Fall 2012

Qualitative vs Quantitative research & Multilevel methods

The Statistics Tutor s Quick Guide to

Descriptive Statistics

Establishing and Closing Down Plants - Assessing the Effects of Firms Financial Status *

Linear Threshold Units

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Poisson Models for Count Data

AN ILLUSTRATION OF MULTILEVEL MODELS FOR ORDINAL RESPONSE DATA

Transcription:

Understanding Variability in Multilevel Models for Categorical Responses Outline Two-level logistic random-intercept model (1: students, 2: schools) Measures of residual variability and dependence Sophia Rabe-Hesketh University of California, Berkeley Median odds ratio Intraclass correlation of latent responses Intraclass correlation of observed responses & Institute of Education, University of London Standard deviation of probabilities Variance partition coefficient Graphical displays of variability Joint work with Anders Skrondal Densities of probabilities Percentiles of probabilities Probabilities for schools in the data Hierarchical Linear Modeling SIG AERA Annual Meeting, Vancouver, April 2012 p.1 Ordinal responses and three-level data (1: students, 2: teachers, 3: schools) p.2 Example: PISA (Programme for International Student Assessment) PISA: Distribution of SES PISA study assesses reading, math and science achievement among 15-year-old students in tens of countries every 3 years Here consider U.S. data from 2000 Schools j were randomly sampled and then students i were randomly sampled from the selected schools Variables: Reading proficiency y ij (1=yes, 0=no) Student SES x ij mn: School mean SES x j dv: Deviation of student SES from school mean x ij x j Student SES (mn+dv) 20 40 60 80 100 mn+11 mn 11 21 27 33 39 45 51 57 63 69 School mean SES (mn) Min to Max P25 to P75 For schools, mn: 25th & 75th %tiles: 39 & 51 Mean & Median: 45 For students, dv: 25th & 75th %tiles: -11 & 11 Mean & Median: 0 Three covariate patterns: x ij mn dv ses lo 39-11 28 me 45 0 45 hi 51 11 62 Sample size (listwise) 2069 students in 148 schools Number of students per school: 1 to 28, mean/median 14 p.3 p.4

Two-level logistic random-intercept model Two-level logistic random-intercept model for unit i (level 1) nested in cluster j (level 2) logit[pr(y ij =1 x ij,ζ j )] = [ ] Pr(yij =1 x ij,ζ j ) log Pr(y ij =0 x ij,ζ j ) }{{} Odds = β 1 + β 2 dv ij + β 3 mn j + ζ j Maximum likelihood estimates of model parameters Null model Full model Param. Covariate Est (SE) Est (SE) OR (95% CI) β 1 0.71 (0.10) 4.79 (0.43) 10β 2 dv/10 0.18 (0.03) 1.2 (1.1,1.3) 10β 3 mn/10 0.89 (0.09) 2.4 (2.1,2.9) ψ 0.82 0.28 log[odds(y ij =1 x ij,ζ j )] = β 1 + β 2 dv ij + β 3 mn j + ζ j Odds(y ij =1 x ij,ζ j ) = exp(β 1 )exp(β 2 ) dv ij exp(β 3 ) mn j exp(ζ j ) x ij is vector of covariates (dv ij, mn j ) β 1 is the fixed intercept β 2 and β 3 are fixed regression coefficients ζ j is a random intercept ζ j N(0,ψ) Sometimes write x ij β β 1 + β 2 dv ij + β 3 mn j p.5 Conditional odds ratio associated with 10-point increase in dv ij for given mn j and ζ j : exp(β 1 )exp(β 2 ) a+10 exp(β 3 ) mn j exp(ζ j ) exp(β 1 )exp(β 2 ) a exp(β 3 ) mn =exp(β 2 ) 10 =exp(10β 2 ) j exp(ζ j ) Comparing two students from same school (given mn j and ζ j ) whose SES differs by 10 points, student with higher SES has 1.2 times the odds of being proficient as other student odds are 20% greater Conditional odds ratio associated with 10-point increase in mn j for given dv ij and ζ j : Comparing two schools that differ in their mean SES by 10 points and have the same random intercept, odds of a student (with SES at the school mean) being proficient are 2.4 times as great for higher-ses school p.6 Predicted conditional probabilities p(x ij,ζ j ) Conditional probability with estimates β 1, β 2, β 3 plugged in p(x ij,ζ j ) Pr(yij =1 x ij,ζ j )= exp( β 1 + β 2 dv ij + β 3 mn j + ζ j ) 1+exp( β 1 + β 2 dv ij + β 3 mn j + ζ j ) Plug in interesting values of x ij and ζ j, e.g., ζ j =0(mean & median) p(lo, 0) = 0.18, p(me, 0) = 0.32, p(hi, 0) = 0.49 21 27 33 39 45 51 57 63 69 School mean SES (mn) dv = 11 dv = 0 dv = 11 33 22 11 0 11 22 33 Deviation of SES from school mean (dv) mn = 39 mn = 45 mn = 51 p.7 Model for conditional odds Median odds ratio Odds(y ij =1 x ij,ζ j )=exp(β 1 )exp(β 2 ) dv ij exp(β 3 ) mn j exp(ζ j ) Randomly sample schools, then students (for given x) Odds ratio, for two students i and i from different schools j and j : Odds(y ij =1 x,ζ j ) Odds(y i j =1 x,ζ j ) =exp(ζ j ζ j ) Odds ratio, comparing student with greater odds to other student: OR =exp( ζ j ζ j ), (ζ j ζ j ) N(0, 2ψ) Odds ratio exp( ζ j ζ j ) varies randomly and has estimated median = exp{ 2 ψφ 1 (3/4)} [Larsen et al., 2000] OR median p.8

Median odds ratio (cont d) Latent response formulation For two randomly drawn students from two randomly drawn schools, estimated median odds ratio, comparing student from school with larger random intercept to other student, is OR median = 2.4 for null model OR median = 1.7 for full model In null model, odds ratio due to random intercept exceeds 2.4 half the time In full model, estimated odds ratio for 10-point increase in school-mean SES is 2.4 p.9 Imagine latent (unobserved) continuous response y ij (e.g., reading ability) Observed response is 1 if latent response is greater than 0 (e.g., proficient if ability greater than 0): 1 if yij y ij = > 0 0 otherwise Model for latent response: y ij = β 1 + β 2 dv ij + β 3 mn j +ζ j + ɛ ij, ζ j N(0,ψ), ɛ ij logistic }{{} x ijβ Logistic distribution has mean 0 and variance π 2 /3 and is similar to normal distribution Model for observed response: logit[pr(y ij =1 x ij,ζ j )] = β 1 + β 2 dv ij + β 3 mn j + ζ j, ζ j N(0,ψ) p.10 Equivalence of formulations Intraclass correlation of latent responses Model for yij is standard multilevel/hierarchical linear model, therefore use standard ICC Var(ζ j ) ICC lat = Var(ζ j )+Var(ɛ ij ) = ψ ψ + π 2 /3 Pr(yij =1 xij,ζj) 0.5 Correlation between students i and i in same school j Cor(y ij,y i j x ij, x i j)=cor(ζ j + ɛ ij,ζ j + ɛ i j)= ψ ψ + π 2 /3 y ij 0.0 x ij β + ζ j Proportion of variance that is between schools Var(yij x ij )=Var(ζ j + ɛ ij )=Var(ζ j )+Var(ɛ ij )=ψ + π 2 /3 For PISA data, plug in ψ ICC lat = 0.82 =0.20 for null model 0.82 + 3.29 ICC lat = 0.28 0.28 + 3.29 =0.08 for full model p.12 p.11

Intraclass correlation of observed responses Correlation Cor(y ij,y i j x ij, x i j) of observed responses for students i and i in same school j depends on covariates x ij, x i j For simplicity, assume x ij = x i j = x ICC obs = Ĉor(y ij,y i j x) = p 11(x) p(x) 2 p(x)[1 p(x)] Intraclass correlation of observed responses (cont d) PISA: Null model Full model lo me hi ICC obs 0.14 0.04 0.06 0.06 ICC lat 0.20 0.08 0.08 0.08 Among schools and students with covariates x, randomly choose a school and then randomly choose students from the school p(x) is probability that a student is proficient p 11 (x) is probability that two students are both proficient Population averaged or marginal probability p(x) = Pr(yij =1 x) = p(x,ζ j )g(ζ j ;0, ψ) dζ j Average over random effects with Gaussian density g(ζ j ;0, ψ) ICC 0.2.4.6.8 0 1 2 3 4 5 Random-intercept standard deviation ψ Observed Latent logit[pr(y ij =1 ζ j )] = β 1 +ζ j ζ j N(0,ψ) ICC obs for β 1 =0to β 1 =3.7 ICC obs < ICC lat ICC obs largest for β 1 =0 [Rodríguez & Elo, 2003] [Rodríguez & Elo, 2003] p.13 p.14 Kappa coefficient of observed responses Standard deviation of probabilities Recall ICC obs = Ĉor(y ij,y i j x) = p 11(x) p(x) 2 p(x)[1 p(x)] Probabilities of agreement for students in same school: Pr(y ij = y i j =1 x) p 11 (x) = ICC obs p(x)[1 p(x)] + p(x) 2 Pr(y ij = y i j =0 x) p 00 (x) = ICC obs p(x)[1 p(x)] + [1 p(x)] 2 Probabilities of agreement for students in different schools: Pr(y ij = y i j =1 x) = p(x)2 Pr(y ij = y i j =0 x) = [1 p(x)]2 =1 2p(x)+p(x) 2 =1 2p(x)+p(x) 2 Pr(y ij = y i j x) = 1 2p(x)[1 p(x)] κ = Pr(y ij = y i j x) Pr(y ij = y i j x) 1 Pr(y ij = y i j x) = ICC obs 2p(x)[1 p(x)] = ICC obs 2p(x)[1 p(x)] Already considered median of p(x,ζ j ) Median[ p(x,ζ j )] = p(x, 0) Already considered mean of p(x,ζ j ) p(x) = Pr(yij =1 x) = p(x,ζ j )g(ζ j ;0, ψ) dζ j Get standard deviation of p(x,ζ j ) by integration or by simulation sd[ p(x,ζ j )] = Null model Var[ p(x,ζ j )] Full model lo me hi 0.18 0.08 0.11 0.12 [Eldridge et al., 2009] p.15 p.16

General result Var(responses) }{{} Total variance Variance partition coefficient = Var(School-mean response) + Mean(Within-school variance) }{{}}{{} Between-school variance v 2 Within-school variance v 1 For school j: School-mean response: Mean(y ij x,ζ j )= p(x,ζ j ) Find variance over distribution of ζ j Within-school variance: Var(yij x,ζ j )= p(x,ζ j )[1 p(x,ζ j )] Find mean over distribution of ζ j Total variance: Var(y ij x ij )=v 2 + v 1 = p(x ij )[1 p(x ij )] All results for PISA x mn dv p(x, 0) p(x) sd[ p(x,ζ j )] ICC lat ICC obs VPC Null model 0.331 0.354 0.180 0.200 0.143 0.143 Full model lo 39-11 0.181 0.193 0.081 0.078 0.042 0.042 me 45 0 0.315 0.333 0.111 0.078 0.056 0.056 hi 51 11 0.491 0.491 0.124 0.078 0.062 0.062 Variance partition coefficient [Goldstein et al., 2002, Simulation, Method B] VPC = v 2 v 2 + v 1 = ICC obs ICC lat p.17 p.18 Density of probabilities Conditional probabilities, conditional on ζ j, depend on ζ j p(x,ζ j ) Pr(y ij =1 x ij = x,ζ j ) When ζ j varies, corresponding probability p p(x,ζ j ) varies with density f( p) =g(log[ p/(1 p)]; x β,ψ) 1 p(1 p) Percentiles of probabilities A given percentile of p(x,ζ j ), for given x, can be found by plugging in corresponding percentile of ζ j N(0, ψ) Left figure: p(x, ±1.96 ψ) and p(x, 0) versus mn with dv= 0 Randomly sample schools with a given mn: Interval is 95% range of probabilities for students with dv= 0 Density 0.5 1 1.5 2 Null model 0.2.4.6.8 1 Conditional probability Median Mean Density 0 2 4 6 Full model 0.2.4.6.8 1 Conditional probability lo me hi 21 27 33 39 45 51 57 63 69 School mean SES (mn) Median 2.5th %tile to 97.5th %tile 33 22 11 0 11 22 33 Deviation of SES from school mean (dv) mn = 39 mn = 51 25th %tile to 75th %tile 25th %tile to 75th %tile [Duchateau & Janssen, 2005] p.19 p.20

Probabilities for schools in the data Probabilities for schools in the data (cont d) Probability p(x) of proficiency for a randomly sampled student with covariates x in an existing school j Data on students in school j: y j =(y 1j,...,y njj) and X j = Information about ζ j (Bayes theorem): x 1j. x n jj Posterior(ζ j y j, X j )= g(ζ j;0,ψ)pr(y j X j,ζ j ) Pr(y j X j ) Best prediction of probability (in terms of mean-squared error) is p(x) = Posterior mean[ p(x,ζ j )] = p(x,ζ j )Posterior(ζ j y j, X j ) dζ j 26 14115 11552 5 All schools, x =(0, mn j ) 16 19 35 108 12 112 74 93 104 130 132127 116 110 91 2 8410943 125 79 68 97 137 28 39 134 147 114 341 72101 111 65 41 150 42 9 54 149 21 48 76 36 87 67 8378 64 32 33 10 82 18 143 50 564 126 47 103 77 90 27 13 61 2217142 60 124 94 53 45 4611989 310651 57 86 144 55 148 70 102 31 6929 30 49 118 151 40 120 139 58 1227123 99 11 85 133 62 88 98 7 15 24 136 145 73 105 6 100 59 25 121 23 129 128 81 117 141 21 27 33 39 45 51 57 63 69 School mean SES (mn) Median 38 37 44 8 8075 20 113 140 96 131 107 66 2.5th %tile to 97.5th %tile 135 95 19 randomly chosen schools, x = x ij 115 98 102 85 105 23 29 18 128 13 58 122 97 42 104 32 21 28 45 62 79 Student SES (mn+dv) 38 20 Not equal to p(x, ζ j ), where ζ j is posterior mean of ζ j [Skrondal & Rabe-Hesketh, 2009] p.21 p.22 Probabilities for schools in the data (cont d) In previous graphs, p(x) varies between schools for given value of dv because mn j and ζ j vary Isolate variability due to ζ j by setting mn j =45and dv ij =0 (boxplot on right) 137 35 19 130 Three-level ordinal cumulative logit model Tennessee class-size experiment In 4th grade, teachers assessed student participation 2217 students i of 262 teachers j in 75 schools k Ordinal response variable: Teacher s rating of how frequently student pays attention y ijk Rated as: 1 (never), 2, 3 (sometimes), 4, 5 (always) No covariates for simplicity dv=0 mn=45, dv=0 p.23 p.24

Three-level ordinal cumulative logit model: OR median Model probability of exceeding a category s =1, 2, 3, 4 logit[pr(y ijk >s ζ (2) jk,ζ(3) k )] = κ s + ζ (2) jk + ζ(3) k Teacher random intercept ζ (2) jk N(0,ψ(2) ), ψ(2) =0.44 School random intercept ζ (3) k N(0,ψ (3) ), ψ(3) =0.12 Odds ratio, comparing students of teachers j and j in school k Pr(y ijk >s ζ (2) jk,ζ(3) k ) Pr(y ij k >s ζ (2) ) =exp(ζ(2) jk j k,ζ(3) k ζ(2) j k ) Median of odds ratio, comparing student with larger odds to other student: OR median Different teacher, same school 1.88 Different teacher, different school 2.04 p.25 Three-level ordinal cumulative logit model: ICC lat Latent response y ijk y ijk = ζ (2) jk Observed response y ijk + ζ(3) k + ɛ ijk, ɛ ijk logistic results from cutting up y ijk at cut-points or thresholds κ 1 to κ 4 : 1 2 3 4 5 yijk κ 1 κ 2 κ 3 κ 4 Intraclass correlations of latent responses Same school, different teacher: ψ (2) ICC lat (school) = ψ (2) + ψ (3) + π 2 /3 =0.031 Same school, same teacher: ICC lat (teacher, school) = ψ (2) + ψ (3) ψ (2) + ψ (3) + π 2 /3 =0.145 p.26 Three-level ordinal cumulative logit model: ICC obs Probabilities for teachers and schools in the data Two randomly chosen students i, i from same school (either different or same teachers) Pearson correlation, Cor(y ijk,y i j k), Cor(y ijk,y i jk), or Kendall s τ b : ICC lat Cor ICC obs Same school, different teacher 0.031 0.029 0.025 Same school, same teacher 0.145 0.136 0.118 Method: 5 5 table of response for student i (rows) versus response for student i (columns) π st (school) = Pr(y ijk = s, y i j k = t) π st (teacher, school) = Pr(y ijk = s, y i jk = t) τ b Probability Probability that student pays attention more than sometimes (y >3) Randomly chosen student, for: Teachers in data 17 52 Schools in data (new teacher) Integrate over posterior of ζ (2) jk, ζ(3) k Integrate over prior of ζ (2) jk and posterior of ζ(3) k Probability 5 15 Use these probabilities as frequency weights for Cor and τ b p.27 p.28

Software Stata was used for the results and graphs Datasets and Stata commands available from http://www.gllamm.org/pres.html Models estimated by ML with adaptive quadrature [Rabe-Hesketh et al., 2005] Commands for binary logit models: xtlogit, xtmelogit xtrho [Rodríguez & Elo, 2003] for ICC obs = VPC Command for binary, ordinal, or multinomial logit models (and other response types): gllamm [Rabe-Hesketh et al., 2005] gllapred used to obtain p(x), p(x), p st (x), etc. [Rabe-Hesketh & Skrondal, 2009] Discussion Have discussed measures and graphs for interpreting variability implied by model with estimated parameters Ignored parameter uncertainty (no SEs or CIs) Did not consider variance explained by covariates Extensions Include random coefficient of x ij for p(x), p(x), and ICC obs, integrate over all random effects ICC lat and OR median now depend on x ij More levels: Straightforward Relax normality assumption for random effects: non-parametric maximum likelihood estimation (NPMLE) Other response types Nominal responses: Similar to binary responses Counts: No ICC lat, but can obtain IRR median and ICC obs p.29 p.30 References Duchateau, L. & Janssen, P. (2005). Understanding heterogeneity in generalized linear mixed and frailty models. The American Statistician 59, 143-146. Eldridge, S. M, et al. (2009). The intra-cluster correlation coefficient in cluster randomized trials: A review of definitions. International Statistical Review 77, 378-394. Goldstein, H., et al. (2002). Partitioning variation in multilevel models. Understanding Statistics 1, 223-331. Larsen, K., et al. (2000). Interpreting parameters in the logistic regression model with random effects. Biometrics 56, 909-914. Rabe-Hesketh, S. & Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata (Third Edition). College Station, TX: Stata Press. Rabe-Hesketh, S. & Skrondal, A. (in prep). Understanding variability in multilevel models for categorical responses. Rabe-Hesketh, S., et al. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics 128, 301-323. Skrondal, A. & Rabe-Hesketh, S. (2009). Prediction in multilevel generalized linear models. Journal of the Royal Statistical Society, Series A 172, 659-687. Rodríguez, G. & Elo, I. (2003). Intra-class correlation in random-effects models for binary data. The Stata Journal 3, 32-46. p.31