Centering Predictors and Variance Decomposition

Transcription

1 Centering Predictors and Variance Decomposition Applied Multilevel Models for Cross Sectional Data Lecture 6 ICPSR Summer Workshop University of Colorado Boulder

2 Covered this Section We will expand on this example to cover a few more important concepts in multilevel models The importance of centering of variables Distinguishing within from between cluster effects How total variation is partitioned by random effects Implications for how residuals are correlated Implications for hypothesis testing (type 1 and 2 errors) Implications for modeling dependencies

3 AS SEEN LAST TIME

4 Guiding Example Imagine you are interested in studying the effects of socioeconomic status (SES) on student achievement What do you think the relationship between student achievement and SES happens to be? You are interested in predicting achievement from SES Your guiding research question

5 Your Study Let s imagine you are able to get data from 7 elementary schools around Boulder You sample 50 students from each elementary school You record a measure of their SES (scale with a mean of 50) You record a measure of their achievement (scale with a mean of 100) Both scales magically have absolutely perfect reliability

6 For Now Let s Model the School Intercept Level 1: the student level Level 2: the school level is the overall intercept (predicted value when all X = 0) is the slope for school mean SES (indicates average intercept increase when school mean SES increases by 1. is the fixed slope for SES meaning each school has the same increase (increase in student score when student SES increases by 1) the error associated with school intercepts (called a random intercept) Is assumed to be normally distributed with mean 0 and variance

7 Putting the Model Together We can substitute our level 2 model terms into our level 1 model equation to get an overall regression line: =

8 The Analysis Results = = = = 2.62 (p = ) = 0.99 (p < )

9 By School Regression Lines Student Achievement Student SES

10 Analysis Interpretation = The variance of school random intercepts (how much schools vary from each other) after accounting for school SES = The variance of residuals for student scores (how much students vary after accounting for school mean SES and student SES) The error variance

11 Analysis Interpretation = The overall intercept The predicted score for a student who has zero SES (X is = 0) At a school with a mean SES of zero ( 0) = 2.62 (p = ) The slope for school mean SES The predicted score for a student increases by 2.62 for every one unit increase in the school mean, after controlling for student SES (contextual or incremental effect) Average achievement for a school increases as average SES increases Statistically significant (level 2 degrees of freedom)

12 Analysis Interpretation = 0.99 (p < ) The slope for student SES The predicted score for a student decreases by 0.99 for every one unit increase in the student s SES Within a school, SES is negatively related to achievement Statistically significant (level 1 degrees of freedom) So, what is the nature of the relationship between SES and achievement? Level 1 SES is negatively related to achievement Level 2 SES is positively related to achievement

13 CENTERING

14 A Closer Look at Our Parameters Recall, the data analysis had the following results: Variances: Fixed Effects: Parameter Estimate Parameter Estimate p value < Take the intercept ( 18.69) This value is the predicted achievement for a student with: A zero value for his/her SES ( ) A zero value for the school mean SES for the student ( 0)

15 However About SES and Achievement An intercept of sounds all well and good until you look at whether or not it actually occurs in our data Also, saying SES is zero is also unrealistic The range of achievement scores is 81.1 to doesn t exist The range of SES is 42.4 to 58.7 for students and 45.7 to 54.5 for school means 0 doesn t exist in either

16 Centering Because our intercept is implausible, we may wish to center our data so as to bring the intercept more into line with the data we collected To center the data, subtract a value from each of the predictor/independent variables Centering will alter the meaning of certain parameters The intercept Some slopes (depending on method of centering) Two methods of centering are popular: Grand mean centering/centering by a constant Cluster mean centering

17 Grand Mean Centering Perhaps the easiest way to center the data would be to subtract the grand mean from each observation The grand mean is the mean of each X variable across all observations, regardless of sampling unit Our regression equation then becomes: The intercept now reflects the predicted value of Y for a student who: Has an SES equal to the grand mean Attends a school with a mean SES equal to the grand mean

18 Grand Mean Centering The fixed slope now represents the change in Y for every unit of SES a school mean is above the grand mean The fixed slope now represents the change in Y for every unit of SES a student is above the grand mean

19 Our Results with and without Grand Mean Centering Model: Variances: Parameter Estimate Fixed Effects: Parameter Estimate p value < Model: Variances: Fixed Effects: The only difference is in the intercept Parameter Estimate Parameter Estimate p value < < The intercept now equals the mean of Y

20 Cluster Mean Centering Another popular method for centering is that of cluster mean centering Taking each person s independent variable(s) and subtracting the mean(s) from their cluster/sampling unit Here we subtract the school mean SES from each student s SES One issue with cluster mean centering: what do we do with the level 2 effect? Would be zero if we cluster mean centered it We can leave it alone (what would happen to the intercept?) We can grand mean center it What would you choose?

21 Cluster Mean Centering with our Data Our model (with X cluster mean centered and grand mean centered): is now the predicted value for a student with: SES equal to the school mean Attending a school with mean SES equal to the grand mean is now the increase in Y for each unit of SES the school mean is above the grand mean is now the increase in Y for each unit of SES the student is above the school mean

22 Our Results with and without Cluster Mean Centering Model: Variances: Fixed Effects: Parameter Estimate Parameter Estimate p value < Model: Variances: Fixed Effects: Parameter Estimate Parameter Estimate p value < < There is a difference in the school mean slope The slope changes more on why later in lecture There is a difference in the intercept The intercept now equals the mean of Y

23 Why Does the Slope for School Mean SES Change? The slope for School Mean SES changed from 2.62 in the no centering/gmc models to 1.63 in the cluster mean centered model Remember, slopes in regression are dependent on the other variables in the model If independent variables are correlated, regression weights will change We changed to The issue is with the types of information contained in : It contains both level 1 and level 2 information (each student s SES is related to their school s mean SES The corresponding weight (2.62) represented the additional effect school mean SES when controlling for student SES Cluster centered student SES has only level 1 information

24 Correlation and Centering Correlations of student SES ( ) with: School mean SES ( ): School mean SES after grand mean centering ( ): Correlations of student SES after cluster mean centering ( ) with: School mean SES ( ): School mean SES after grand mean centering ( ): The effect changes because of the correlation between school mean SES and student SES

25 Centering Summary The scale of variables may lead to parameter values that are not plausible Sometimes interpretation changes (grand mean centering) Sometimes inference changes (cluster mean centering) Detailed shortly Centering helps to: Make parameter estimates understandable Help estimation of random effects in some types of models Disentangle types of effects (for cluster mean centering)

26 TYPES OF VARIANCES IN MULTILEVEL MODELS

27 The Goals of MLM: Variance Partitioning The way MLMs control for observations that may be dependent is to incorporate different types of variability into an analysis Variability within clusters Variability between clusters This section will discuss a general multilevel modeling framework for hierarchical data, indicating how different types of effects partition variability in different ways Some of this will be technical but well worth the time

28 Multiple Components/Levels: HLM Recall our running example: attempting to predict student achievement from student SES and school SES Hierarchical analysis made results more informative Within school: student SES is negatively associated with student achievement Between schools: school mean SES is positively associated with student achievement If you will recall from last time, we started by taking a basic regression analysis: And specifying a basic regression model for each possible school:

29 Names of Levels and Analysis Heuristics MLM presents a heuristic for formulating a model that is based on the level of the data/analysis This heuristic is effect for many models, although it breaks down for certain types of models (i.e. crossed models) The by school regression model is called the level 1 model It uses level 1 independent variables to predict the outcome The residual term is one random component (analogous to the residual in regression/anova)

30 Level Two Model The coefficients of the level 1 one model (i.e., and ) were then modeled using a similar modeling approach, which is called the level 2 model Intercept subscript starts with 0 For the intercepts: Slope subscript starts with 1 Random Error Term (called random intercept): For the slopes: Intercept multiplies level 1 covariate Predictors are level 2 covariates Random Error Term (called random slope): Predictors become cross level interactions

31 The Combined Model Although the MLM heuristic does a good job parsing which effects predict which portions of the model, combining the level 2 and level 1 models results in the formation of what is called a general linear mixed model More standard terminology from statistics Level 1 Predictor(s) come from intercepts of slope model(s) Cross level Interactions come from predictors of slope model(s) Fixed Effects: Level 2 Predictor(s) come from predictors of intercept model Random Effects: U

32 Fixed Effects The fixed effects represent model parameters that: Are assumed to be fixed (no prior distribution assumed) Applied to everyone, regardless of sampling unit/cluster Are used to test hypotheses about types of effects Degrees of freedom depend on level of effect Constitute the predicted value for a given As such, they are sometimes called the model for the means

33 Random Effects The random effects represent model parameters that: Are random (assumed to follow a statistical distribution) Normal distribution Zero mean Variance/covariance parameters that are estimated Are the same only if subjects are in the same sampling unit The variance and covariance contribute to the covariance of observations within a cluster ~ 0 0,

34 More on Random Effects The inclusion of random effects impacts the way hypothesis tests about fixed effects are constructed They partition variability into segments that are due to cluster Specification of models with random effects leads makes explicit the assumption that within cluster, there observations are correlated (shown later)

35 BUILDING MODELS: FROM START TO FINISH

36 Building Multilevel Models The process of model building begins with a very basic model and then adds predictors at each type of level We will be fitting a series of models, each attempting to answer a different question Upon the end of the process, we will evaluate the final model we use and make inferences regarding the nature of our variables

37 Question #1: How Much Variability is there in Achievement? To answer our first question, we will fit what is called an empty model: No predictors of achievement No random effects by school We started with this model last lecture Intercept was the mean achievement score Error variance was the variance in achievement score We will use this model as a baseline And build from it by adding random effects and predictors

38 Model #1: Empty Model Model: Model results: Where 0, Fit: Deviance (for comparing model fit): 2,491.5 Means: Variances: 72.29

39 Question #2: Is there variability in achievement unique to schools? The second question can be answered by using an extension of the empty model: a MLM with a random intercept: If the random intercept variance is greater than zero, then the answer is yes Level 1 Model: Where 0, Level 2 Model: Where 0, ; is fixed intercept (means) Combined Model:

40 Model #2 Results (Compared with Model #1) Model #2 Fit: Model #1 Fit: Value Estimate Value Estimate Deviance 2,202.5 Means: Parameter Estimate (SE) Variances: (Intercept) (2.50) Deviance 2,491.5 Means: Parameter Estimate (SE) (Intercept) (0.45) Parameter Estimate Variances: Parameter Estimate (Error) (2.22) (Error) (5.46) (Random Intercept) (23.43)

41 First: Is Model 2 Preferred? To answer our question we need to determine if the random intercept variance is greater than zero : 0; : 0 Would indicate variability due to schools And dependencies between observations within schools Can use a deviance test (Null model deviance full model deviance) is *approximately* chi square distributed Degrees of freedom equal difference in number of parameters between models (here only one new parameter random intercept variance) Note: this test is approximate only very conservative Deviance test: (2, ,202.5) = 289; df=1; p < Indicates there is variability in achievement due to school

42 Second: What are interpretations of parameters of Model 2? Now that we know Model 2 is preferred, we will deviate from our model fitting to demonstrate the interpretations of model parameters from the model Shown for teaching purposes: you wouldn t do this until the end of the model building process First, the fixed intercept ( ) The value of the fixed intercept stayed constant from Model 1 to Model 2 Model for the means is unchanged when model for variances changes Model 1 and Model 2 only differed by the random intercept for school The standard error of the intercept changed Because of the different variance partitioning in Model 2 Overall: intercept still represents predicted value of achievement when all predictors are zero Since no predictors in model everyone s predicted value is mean of achievement

43 Interpretations of Variance Parameters The variance parameters represent the variance of the random intercept and the variance of the level 1 error term These parameters indicate how much variability is present at each level of the analysis They also indicate the degree to which observations nested within a sampling unit/cluster are correlated To demonstrate, I will show what the model expects the dependency between observations within a cluster to be

44 Covariance of Observations Within Cluster Using the algebra of expectations, we seek to determine the dependency (covariance) between two observations, in the same cluster Observations and,,,,,,,,,,,,

45 Covariance of Observations Between Clusters Using the algebra of expectations, we seek to determine the dependency (covariance) between two observations, in different clusters, Observations and,,,,,,,,,,, 0

46 Variance of Observations Within or Between Clusters Using the algebra of expectations, we seek to determine the variance of an observation : 2,

47 Correlation of Observations within Clusters Because we know:, We can determine the correlation between observations within a cluster Also known as the intraclass correlation,,

48 Back to Our Data From our data, we estimated: Parameter Estimate (Error) (2.22) (Random Intercept) (23.43) Meaning, our intraclass correlation was: This means: Student s achievement scores within a school had a correlation of % of the total variability in achievement scores came from between school variability Our linear regression assumption of uncorrelated residuals is violated Should be using the mixed model with a random intercept

49 Back to Model Building Now we ve determined that there is variability at the school level, it is now our job to explain both sources of variability using the independent variables we have collected Student SES School level SES The model building process now attempts to add variables to the baseline model (empty + random intercept) The question comes as to process in which level should we add our variables We will add variables to each level and determine how much variance is accounted for at each level by the new variables

50 Model Building (part 1 using Cluster Mean Centering) Given our choices of levels, we can add variables at: Level 1 only (add cluster mean centered Student SES): New 43.32; Reduced level 1 variance (slight reduction in level 2) Level 2 only (add grand mean centered school mean SES): New 13.28; Reduced level 2 variance (no reduction in level 1) Level 1 and Level 2 simultaneously (add cluster mean centered Student SES and grandmean centered school mean SES): New 13.35; Reduced both level 1 and level 2 variance

51 Which Path to Choose? The path to choose (which level) depends on several factors Types of variables Types of centering Do level 1 variables include *only* level 1 information Research questions of interest There is no true consensus as to which path to use We used cluster mean centered variables at level 1 Level 1 variable without any level 2 information Let s examine what would have happened had we used grand mean centered variables at level 1

52 Model Building (part 2 using GMC) Given our choices of levels, we can add variables at: Level 1 only (add grand mean centered Student SES): New 88.27; Reduced level 1 variance (HUGE increase in level 2 variance) Because level 1 variable included level 2 information Level 2 only (add grand mean centered school mean SES): New 13.28; Reduced level 2 variance (no reduction in level 1) Same as cluster mean centered variables Level 1 and Level 2 simultaneously (add cluster mean centered Student SES and grandmean centered school mean SES): New 13.35; Reduced both level 1 and level 2 variance Same as with cluster mean centered variables

53 For Us We ll Pick Door #3 For our analysis, we ll choose to put level 1 and level 2 variables in simultaneously This is due to understanding our variables it appears that SES has a differential effect at different levels of our analysis Inspection of data So we ll go with #3 and add both student and school SES simultaneously

54 Our New Model (Called Model #3) Adding cluster mean centered student SES and grand mean centered school mean SES yields the following model: Level 1: Where 0, Level 2: Where ~ 0, Combined:

55 Model #3 Results Model Fit Model Fit: Model #2 (old model) Model #3 (new) Value Estimate Value Estimate Deviance 2,202.5 Deviance 2,147.9 Question: Is model 3 preferred to model 2? Answer: Deviance test (two new parameters): Test statistic: 2, ,147.9 = 57.6 Degrees of freedom = 2 (new parameters and ) P value: < Conclusion: Model #3 is preferred

56 Model #3 Results Fixed Effects (Means) Model parameter estimates: ; The overall intercept the value of achievement for a student with SES equal to their school mean SES at a school with mean SES equal to the grand mean SES Is the average value of achievement ; The slope for student SES (minus school mean SES) Represents the change in achievement for each unit a student SES differs from their school mean Given school mean SES is held constant ; The slope for school mean SES (minus grand mean SES) Represents the change in achievement for each unit the school mean SES differs from the grand mean Given student SES is held constant

57 Model #3 Results Variance Parameters Our estimated variance parameters were: Parameter Estimate (Error) (1.93) (Random Intercept) (7.41) Meaning, our intraclass correlation was: This means: Student s achievement scores within a school had a correlation of % of the total variability in achievement scores came from between school variability Our linear regression assumption of uncorrelated residuals is violated Should be using the mixed model with a random intercept

58 More on Variances In comparing Model #3 to Model #2, an important distinction is of how much each variance component is reduced because of addition of the predictors Called a pseudo R 2 Level 2 Variance ( ): Model #2 = 43.25; Model #3 = Reduction: = 29.9 Proportion of Model #2 Variance Explained: 29.9/43.25 =.69 Explanation: School Mean SES explains 69% of the variance in school achievement (random intercept variance)

59 Level 1 Variance Reduction Level 1 Variance ( ): Model #2 = 29.04; Model #3 = Reduction: = 3.69 Proportion of Model #2 Variance Explained: 3.69/29.04 =.13 Explanation: Student SES explains 13% of the variance in student achievement

60 Wrapping Up We discussed the process of fitting multilevel models in the context of our familiar example Why and how to center Effects on parameter interpretations and estimates The model building process How variance is partitioned at each level How variance gets explained at each level The final results of determining which model to choose And how to interpret the parameters

61 Up Next Our analysis and lecture today left a few things up in the air: What about the slopes for SES? Are there school level effects on the slopes (random slopes) Is there a cross level interaction between school mean SES and student SES? Next we will investigate these questions