I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Transcription

1 Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of 34

2 Outline Logistic regression example Reaction time example Question 3 Feel free to browse lecture notes at: and Slides at Linear Algebra Slide 2 of 34

3 Binary Logistic Regression: Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On Response variable: Y ijk = targetfix (target fixtation?) y ijk = 0; n = 70, 198; 58.89% y ijk = 1; n = 48, 996; 41.11% where i =subject, j =trialid, k replication 39 subjects 44 trials 72 replications Total number of observations = 119, 194 Time in seconds: x = 0.649, s = 0.207, min= 0.3, max= 1.0 Gender of speaker (half female, half male) Looks like fully crossed design: Subject Trial ID ( Replication) Linear Algebra Slide 3 of 34

4 Random Effect Logistic Regression Model Random Component: y ijk U 0i, U 0j binomial(π ijk ) Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On Linear Algebra Slide 4 of 34

5 Random Effect Logistic Regression Model Random Component: Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On y ijk U 0i, U 0j binomial(π ijk ) Link Function: logit (natural log of odds) ( ) ( ) P(Y ijk = 1) πijk ln = ln = η ijk P(Y ijk = 0) 1 π ijk Linear Algebra Slide 4 of 34

6 Random Effect Logistic Regression Model Random Component: Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On y ijk U 0i, U 0j binomial(π ijk ) Link Function: logit (natural log of odds) ( ) ( ) P(Y ijk = 1) πijk ln = ln = η ijk P(Y ijk = 0) 1 π ijk Linear Predictor η ijk = β 0 + β 1 (male) ijk + β 2 (time) ijk + U 0i + U 0j, Linear Algebra Slide 4 of 34

7 Random Effect Logistic Regression Model Random Component: Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On y ijk U 0i, U 0j binomial(π ijk ) Link Function: logit (natural log of odds) ( ) ( ) P(Y ijk = 1) πijk ln = ln = η ijk P(Y ijk = 0) 1 π ijk Linear Predictor η ijk = β 0 + β 1 (male) ijk + β 2 (time) ijk + U 0i + U 0j, where ( U0i U 0j ) MVN (( 0 0 ), ( τ 2 Ss 0 0 τ 2 Tr )) Linear Algebra Slide 4 of 34

8 Putting it All Together Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On The conditional model: ( ) πijk ln = β 0 + β 1 (male) 1 π ijk + β 2 (time) ijk + U 0i + U 0j ijk Linear Algebra Slide 5 of 34

9 Putting it All Together Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On The conditional model: ( ) πijk ln = β 0 + β 1 (male) 1 π ijk + β 2 (time) ijk + U 0i + U 0j ijk or P(Y ijk = 1 U 0i, U 0j ) = exp[β 0 + β 1 (male) ijk + β 2 (time) ijk + U 0i + U 0j ] 1 + exp[β 0 + β 1 (male) ijk + β 2 (time) ijk + U 0i + U 0j ] Linear Algebra Slide 5 of 34

10 Putting it All Together Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On The conditional model: ( ) πijk ln = β 0 + β 1 (male) 1 π ijk + β 2 (time) ijk + U 0i + U 0j ijk or P(Y ijk = 1 U 0i, U 0j ) = exp[β 0 + β 1 (male) ijk + β 2 (time) ijk + U 0i + U 0j ] 1 + exp[β 0 + β 1 (male) ijk + β 2 (time) ijk + U 0i + U 0j ] The U 0i and U 0j are unobserved contributions to the intercept of the model. We assume that they are random and estimate their variances. The model is collapsed over the U s; they are integrated out. The model that is estimated is P(Y ijk = 1) exp[β 0 + β 1 (male) ijk + β 2 (time) ijk + U 0i + U 0j ] = 1 + exp[β 0 + β 1 (male) ijk + β 2 (time) ijk + U 0i + U 0j ] f(u 0k, U 0j )du 0i Linear Algebra Slide 5 of 34

11 Estimation For Normal Models: MLE and REML Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On Linear Algebra Slide 6 of 34

12 Estimation For Normal Models: MLE and REML Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On For Others: A difficult problem Gauss-Hermite quadrature: MLE but problematic for large number of random effects. Laplace: Does pretty well (close to MLE) Bayesian: Difficult and very time consuming. Others: Can lead to very biased results, especially estimates of variances and covariances (i.e., τ s). Active area of development. If only interested in population model, then use GEE (a marginal model and not a random effects one). Linear Algebra Slide 6 of 34

13 Summary of Logistic Models fit To Data No Random Ss Random Trial Random Both Std Std Std Std Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On Effect Est. Err Est. Err Est. Err Est. Err Fixed Effects (regression coefficients) Intercept β (.02) 3.04 (.06) 3.04 (.08) 3.12 (.10) female β (.01) 0.39 (.01) 0.39 (.11) 0.40 (.11) male time β (.03) 3.73 (.03) 3.74 (.03) 3.84 (.03) Random Effects (Variances) Subject τss (.03) 0.12 (.03) Trial τt 2 r 0.13 (.03) 0.14 (.03) # param lnlike 146, , , AIC 146, , , BIC 146, , , Linear Algebra Slide 7 of 34

14 An even better model? Fit Statistics Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On -2 Log Likelihood AIC (smaller is better) BIC (smaller is better) Covariance Parameter Estimates Standard Cov Parm Subject Estimate Error Intercept trialid Intercept subject Linear Algebra Slide 8 of 34

15 An even better model... Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On Solutions for Fixed Effects Standard Effect Estimate Error t Value Pr > t Intercept <.0001 Female Male 0... time <.0001 time*female <.0001 time*male Infty <.0001 A problem here? Linear Algebra Slide 9 of 34

16 What s Going On Message in the LOG file: Random Effect Logistic Regression Model Putting it All Together Estimation Summary of Logistic Models fit To Data An even better model? An even better model... What s Going On NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: At least one element of the gradient is greater than 1e-3. NOTE: PROCEDURE GLIMMIX used (Total process time): real time seconds cpu time seconds Elements of the Gradient should be 0 at the maximum of the likelihood. In this model, the largest element of the gradient is Estimation methods other than Laplace did not converge. Model too complex for the data? The model is not a good one for the data. Linear Algebra Slide 10 of 34

17 Reaction times are (generally) not normal: Non-negative continuous and positively skewed. y Gamma(µ, φ) and possibily ln as the link The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification f(y) Gamma(4,1) Gamma(6,0.50) Gamma(4,0.50) Gamma(6,0.33) y Linear Algebra Slide 11 of 34

18 The Data: Overall Distribution The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 12 of 34

19 Distribution of RT without Outliers The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 13 of 34

20 Distribution of RT1 by Verb The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 14 of 34

23 How about distribution for some Subjects The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 17 of 34

24 A couple more The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 18 of 34

25 ... and more The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 19 of 34

26 ... and even some more The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 20 of 34

27 A Model for the Data y ijk = reaction time for subject i on item j on replication k. The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 21 of 34

28 A Model for the Data y ijk = reaction time for subject i on item j on replication k. Random component: y ijk U 0i, U 1j Gamma(µ ijk, V ijk ) The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 21 of 34

29 A Model for the Data y ijk = reaction time for subject i on item j on replication k. Random component: y ijk U 0i, U 1j Gamma(µ ijk, V ijk ) Link: ln(µ ijk ) = η ijk The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Linear Algebra Slide 21 of 34

30 A Model for the Data y ijk = reaction time for subject i on item j on replication k. Random component: y ijk U 0i, U 1j Gamma(µ ijk, V ijk ) The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Link: ln(µ ijk ) = η ijk Linear predictor: η ijk = β 0 + U 0i + U 0j + β 1 age } {{ } i + β 2 gender } {{ } i intercept subject specific + β 3 DO j + β 4 M j + β 5 MM j + β 6 O j } {{ } item specific where DO j, M j, MM j and O j are dummy codes for verb bias (note: all equal 0 when verb bias is SO). Linear Algebra Slide 21 of 34

31 A Model for the Data y ijk = reaction time for subject i on item j on replication k. Random component: y ijk U 0i, U 1j Gamma(µ ijk, V ijk ) The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Link: ln(µ ijk ) = η ijk Linear predictor: η ijk = β 0 + U 0i + U 0j + β 1 age } {{ } i + β 2 gender } {{ } i intercept subject specific + β 3 DO j + β 4 M j + β 5 MM j + β 6 O j } {{ } item specific where DO j, M j, MM j and O j are dummy codes for verb bias (note: all equal 0 when verb bias is SO). Generalized Linear Mixed Model: In the scale of the data µ ijk = exp[β 0 + U 0i + U 0j + β 1 age i + β 2 gender i +β 3 DO j + β 4 M j + β 5 MM j + β 6 O j ] Linear Algebra Slide 21 of 34

32 Parameters of the Model The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Since subjects and items are viewed as random samples, we assume a distribution for the unobserved effects U 0i and U 0j : ( U 0i U 0j ) MVN (( 0 0 ), ( τ 2 Ss 0 0 τ 2 It We collapse or integrate out the random effects µ ijk = )) exp [ β 0 + U 0i + U 0j + β 1 age i + β 2 gender i +β 3 DO j + β 4 M j + β 5 MM j + β 6 O j ]f(u 0i )f(u 0j )d(u 0i ), d(u 0j The parameters of the distribution at the βs and the τs. The variance V ijk is complicated... Linear Algebra Slide 22 of 34

33 The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Summary of Models fit to Data Fixed Effect No Random Random Subject Random Item Subject & Item est se est se est se est se intercept β age β female β DO β M β MM β O β Subject τ 2 Ss Item τ 2 It Scale φ # params lnLike 170, , , , AIC 170, , , , BIC 170, , , , Linear Algebra Slide 23 of 34

34 A little Modification The Data: Overall Distribution Distribution of RT without Outliers How about distribution for some Subjects A couple more... and more... and even some more A Model for the Data Parameters of the Model Summary of Models fit to Data A little Modification Since the estimated parameters for M and MM are similar in value (relative to their standard errors), verb bias for these two categories was re-coded as 1 if verb bias was M or MM x M,MM = 0 otherwise Original Revised Normal dist. est se est se est se Subject τss Item τit Scale φ , # params lnLike 164, , , AIC 164, , , BIC 164, , , Linear Algebra Slide 24 of 34

35 Question1 Question: How does the modeling and estimation of random effects differ from fixed effects? Specifically, what is the term whose distribution is modeled as random and how many parameters are estimated in this process? Question1 a b Answer: We think of the Subject as random sample and trial ID s as a random sample. Rather than estimating an effect for each individual and each trial ID, we assume a distribution for them (the Us) and estimate the parameters of that distribution. Since we are assuming the distribution of a U N(0, τ 2 ), we only estimate it s variance; that is 1 parameter (τ 2 ) for each random effect, rather than 39 U 0i s (one for each subject) and 44 U 0j s (one for each trial type). The Us can be estimated AFTER the model is fit to the data. They are estimated using Bayesian methods: BLUPS. Linear Algebra Slide 25 of 34

36 a Question1 a b Question: Do the models calculate intercepts for each random effect? If so, is it the case that each random effect estimates an additional constant that is ADDED to the general intercept of the model, and which varies randomly according to some distribution?... I think the examples on the previous slides may have answered this... Linear Algebra Slide 26 of 34

37 b Question: VARIANCE COVARIANCE STRUCTURES what are they? Question1 a b how to we choose which ones(s?) to use? what consequences does this choice have? Answer: If you form a model using a multilevel perspective, they are implied by the model you specify For Normal data: var(y) = ZTZ } {{ } Level2 + }{{} σ 2 I. Level1 You can also specify a particular form for T (usually this is un-structured) and/or a particular form for the level 1 covariance matrix (e.g., AR(lag) for longitudinal data). The conditional variance (regardless of the distribution of y) is var(y U) = ZTZ. The marginal covariance matrix is much more complicated. Linear Algebra Slide 27 of 34

38 Testing a Random Effect a b c Question: What is the recommended procedure for adding/taking factors out of models when you are model testing? Answer: If it is a fixed effect, SAS gives ones for the effect and ones for individual parameters. If not significant, then I remove it from the model and do a likelihood ratio test (LR is more powerful and isn t as sensitive to multicolinearity). If it is a random effect (variance) and you re using MLE (or laplace), then Fit a model with and without the random effect (a model with variances and covariance for it and one without). Compare the LR statistic to a mixture of chi-square distributions. This is done getting p-value using a χ 2 df where df equal the normal way to computer df and get the p-value from χ 2 df 1. Take the average of these two. Linear Algebra Slide 28 of 34

39 Testing a Random Effect Using the data from the logistic regression example... Testing a Random Effect a b c H o : τ 2 Tr = 0 Using the models with both subject and trial effects: LR = = p-value comparing this to χ 2 1 is tiny and p-value from χ 2 0 = 0. So in this case just take half the p-value from χ 2 1. Linear Algebra Slide 29 of 34

40 a Testing a Random Effect a b c Question: When do you include all the factor that you want to control for in your final models ( psychology/descriptive approach ) vs. including only the predictors that significantly improve the fit of the model to the data ( statistical/predictive approach )? Answer: The fixed effects that you include in the model affect the variances, and the random effects that you include in the model affect the fixed effects. I think the best approach is to find the best model for the data both in terms of fixed effects and random effects. (i.e., if effects that you re controlling for are not significant, take them out). Linear Algebra Slide 30 of 34

41 b Question: What do you conclude when model comparison indicates a factor should be in the model (say according to a deviance statistic or AIC), but the factor is not significant according to a test statistic for that factor (say a z-statistics)? Testing a Random Effect a b c Answer: It depends on Does your theory say it should be significant? Is the effect important? Is the effect large or small? When you take it out of the model or include it, do the other estimated parameters stay basically the same or do they change? How many tests have you done? By deviance, I m assuming you mean likelihood ratio test statistic. LR tests are powerful than z, t, and score tests. In the abstract, I ld say kept it in the model (at least at this point in modeling re-visit it when you have a final model). Linear Algebra Slide 31 of 34

42 c Testing a Random Effect a b c Question: For repeated-measures designs in which a set of items are rotated through the experimental conditions across a series of lists, is it good to include list as a factor in the model (and if so, how do you test with the resulting confounding of subjects, items, etc)? Answer: I m not sure I understand the design... so at this point, I don t have an answer. Linear Algebra Slide 32 of 34

43 Dummy vs Effect Coding: For random intercept models, the only difference is how to interpret the parameters. Choose the one that s easier or more natural (e.g., I used dummy in the examples). For random slope models, the coding may matter depending on whether the variable(s) has a random slope. Miscellaneous continued Linear Algebra Slide 33 of 34

44 Miscellaneous continued Dummy vs Effect Coding: For random intercept models, the only difference is how to interpret the parameters. Choose the one that s easier or more natural (e.g., I used dummy in the examples). For random slope models, the coding may matter depending on whether the variable(s) has a random slope. MLM = Multilevel Logistic Model, and HLM = Hierarchical Linear Model. MLM and HLM are both special cases of GLMMs (Generalized Linear Mixed Models). Linear Algebra Slide 33 of 34

45 Miscellaneous continued Dummy vs Effect Coding: For random intercept models, the only difference is how to interpret the parameters. Choose the one that s easier or more natural (e.g., I used dummy in the examples). For random slope models, the coding may matter depending on whether the variable(s) has a random slope. MLM = Multilevel Logistic Model, and HLM = Hierarchical Linear Model. MLM and HLM are both special cases of GLMMs (Generalized Linear Mixed Models). Ordinal dependent variables: Use model designed for ordinal logistic regression and add random effects (e.g., proportional odds models, etc). Linear Algebra Slide 33 of 34

46 Miscellaneous continued Dummy vs Effect Coding: For random intercept models, the only difference is how to interpret the parameters. Choose the one that s easier or more natural (e.g., I used dummy in the examples). For random slope models, the coding may matter depending on whether the variable(s) has a random slope. MLM = Multilevel Logistic Model, and HLM = Hierarchical Linear Model. MLM and HLM are both special cases of GLMMs (Generalized Linear Mixed Models). Ordinal dependent variables: Use model designed for ordinal logistic regression and add random effects (e.g., proportional odds models, etc). Multicollinearity: The problem is basically the same as for normal linear regression. There are some additional problems with GLMMs and special cases of them (e.g., separation). Linear Algebra Slide 33 of 34

47 Miscellaneous continued Miscellaneous continued I don t use R for these kinds of models (yet) so I don t know how to code a 3-level model. In SAS you just add another RANDOM statement and indicate nesting. How can CI be used to determine the level of a factor that is driving a significant effect? Answer: Check to see whether 0 in the interval or test whether parameter for a level is significantly different from 0. If it is or some look very similar, try recoding, re-fit model, and do likelihood ratio test. Linear Algebra Slide 34 of 34