Partitioning of Variance in Multilevel Models. Dr William J. Browne
|
|
|
- Silas Eaton
- 9 years ago
- Views:
Transcription
1 Partitioning of Variance in Multilevel Models Dr William J. Browne University of Nottingham with thanks to Harvey Goldstein (Institute of Education) & S.V. Subramanian (Harvard School of Public Health) 1
2 Outline 1. The ICC and the VPC 2. Partitioning variances in: a. General normal response multilevel models b. Binary response multilevel models c. General Binomial response multilevel models with overdispersion 3. Example : Indian illiteracy dataset 2
3 Statistical Modelling Often in statistics we have a response variable y which is our main interest. We select a sample of size n from our population of interest and observe values y i, i = 1,..., n. We then wish to infer properties of the variable y in terms of other observed predictor variables X i = (x 1i,..., x ki ), i = 1,..., n. The main use of these predictor variables is to account for differences in the response variable, or to put it another way to explain the variation in y. 3
4 Linear Modelling Consider the standard linear model y i = X i β + e i, e i N(0, σ 2 ) Here we explain variation in y by allowing our predictors X to describe differences in mean behaviour. The amount of variance explained can be calculated via R 2 statistics for the model. Note that X can contain both continuous and categorical predictor variables. Note R 2 is so called as for linear regression R 2 is the square of the correlation between y and x. 4
5 Multilevel Modelling and the VPC Consider a 2-level variance components model y ij = X ij β + u j + e ij, u j N(0, σ 2 u), e ij N(0, σ 2 e) A typical example data structure (from education) is pupils within schools. Here we have the variance partitioned into two components: σ 2 u is the between schools variance and σ 2 e is the residual variation between pupils. We introduce the term Variance Partition Coefficient or VPC to represent the percentage variance explained by the higher level (school). Hence V P C = σ 2 u/(σ 2 e + σ 2 u) 5
6 VPC and ICC It should be noted that an alternative representation of a multilevel model is to consider the response vector following a MV normal, e.g. for 2 pupils in 2 schools. y 11 y 21 y 12 y 22 N X 11 β X 21 β X 12 β X 22 β, σ 2 u + σ2 e σ 2 u 0 0 σ 2 u σ 2 u + σ2 e σ 2 u + σ2 e σ 2 u 0 0 σ 2 u σ 2 u + σ2 e Then the Intra-class correlation or ICC is the correlation between 2 individuals who are in the same higher level unit. ICC = σ 2 u/(σ 2 e + σ 2 u) = V P C Note that this equivalence is not always true for other models. 6
7 VPC for other normal response models Consider a 2-level model y ij = X ij β + Z ij u j + e ij, u j MV N(0, Σ u ), e ij N(0, σ 2 e) Then the formula for V P C depends on Z ij and so is dependent on predictor variables. V P C ij = Z ij Σ u Z T ij /(Z ijσ u Z T ij + σ2 e). To illustrate this we consider a 2-level random slopes model in the next slide: y ij = β 0 + u 0j + (β 1 + u 1j )x ij + e ij, u j MV N(0, Σ u ), e ij N(0, σ 2 e) 7
8 Random slopes model Here variability between lines increases as x increases and hence V P C increases. 8
9 VPC for binary response multilevel models Consider the following multilevel bernouilli model y ij Binomial(1, p ij ) where logit(p ij ) = X ij β + u j, u j N(0, σ 2 u) Now the variance of the bernouilli distribution and hence the level 1 variation is p ij (1 p ij ). This means that even though we have a simple variance for the random effects, the V P C will be a function of the predictor variables. 9
10 Approaches for calculating the VPC The VPC is here more difficult to calculate due to the different scales that the level 1 variance and the higher level variances are measured on due to the link function. We consider the following 3 approaches Model linearisation Simulation A Latent Variable approach 10
11 Method A: Model linearisation Using a first order Taylor series expansion around the mean of the distribution of the u j we have p ij = p ij = exp(x ijβ) 1+exp(X ij β) and first derivative with respect to X ijβ p ij 1+exp(X ij β) and the following expansion y ij = X ij βp ij + u jp ij + e ij pij (1 p ij ) where var(e ij ) = 1. This results in an approximate formula for the VPC as V P C ij = σ 2 u p2 ij /(1+exp(X ijβ)) 2 σ 2 u p2 ij /(1+exp(X ijβ)) 2 +p ij (1 p ij ) 11
12 Method B: Simulation This algorithm simulates from the model to give an estimate of the VPC which improves as the number of simulations increases The steps are 1. Generate a large number m (say 50,000) of values for u j from the distribution N(0, σ 2 u) 2. For a particular fixed value of X ij compute the m corresponding values of p ij, p k, k = 1,..., m. 3. Compute the corresponding Bernouilli variances v k = p k (1 p k ), k = 1,..., m and calculate their mean by σ 2 1 = 1 m m i=1 v k 4. Then V P C ij = σ 2 2/(σ σ 2 1) where σ 2 2 = var(p k ) 12
13 Method C: Latent Variables Here we consider our observed binary response to represent a thresholded continuous variable where we observe 0 below the threshold and 1 above. Now in a logit model we have an underlying (standard) logistic distribution for such a variable. The (standard) logistic distribution has variance π 2 /3 = 3.29 and so we can take this as the level 1 variance and now both the level 1 and 2 variances are on the same scale and so we have the simple formula V P C ij = σ2 u σu 2 +π2 /3. Note this approach is used by Snijders and Bosker (1999) and results in a constant VPC for this model. 13
14 Example: Voting dataset Here we have a two-level dataset (voters within constituencies) with a response as to whether voters will vote conservative. The dataset has four attitudinal predictors designed so that lower values correspond to left-wing attitudes. We consider two models and two VPCs. Firstly a model with just an intercept which has β 0 = 0.256, σ 2 u = and secondly a model with all four predictors and a left-wing individual where X ij β =
15 Example: Voting dataset Scenario Method A Method B Method C Note: Method C gives the same VPC estimate for all p ij but the level 2 variance changes between the two models used in the scenarios. Good agreement between methods A and B particularly in scenario 1. 15
16 VPC in general binomial models with overdispersion Our motivation for considering such models is a dataset on illiteracy in India studied by Subramanian and co-workers. Here we have a structured population with people living in districts which lie within states. We do not have individual data but instead have illiteracy proportions and counts for cells of people who live within a particular district. The cells are formed by individuals with the same values for 3 predictor variables: gender, caste and urban/rural. The size of a cell varies from 1 person to over 4 million! There are 442 districts within 29 states. 16
17 Indian literacy dataset continued A simple logistic model would be y ijk Binomial(n ijk, p ijk ) where logit(p ijk ) = X ijk β + v k + u jk, v k N(0, σ 2 v), u jk N(0, σ 2 u) Here i indexes cell, j indexes district and k indexes state. So we have accounted for variation(over dispersion) due to districts and states but this model doesn t account for overdispersion due to cells. 17
18 Accounting for overdispersion The general binomial distribution is one of many possible distributions for data where the response is a proportion. Datasets can be constructed that have more (overdispersion) or less (underdispersion) variability that a Binomial distribution. There are two approaches that can be used for overdispersion: 1. Multiplicative overdispersion - Note can cope with underdispersion. Adds a scale factor to the model that multiplies the variance but model doesn t have particular distributional form. 2. Additive overdispersion. 18
19 Additive overdispersion If we consider our general model previously we can add an overdispersion term to it as follows: y ijk Binomial(n ijk, p ijk ) where logit(p ijk ) = X ijk β + v k + u jk + e ijk, v k N(0, σ 2 v), u jk N(0, σ 2 u), e ijk N(0, σ 2 e) Here e ijk is a cell level overdispersion effect and σ 2 e is the variance of these effects. One advantage of an additive approach is that it can be fitted as a standard random effects model by the addition of a pseudo level 2 to the model. 19
20 Additive overdispersion continued With a change of notation we can get the following: v (s) l y ijkl Binomial(n ijkl, p ijkl ) where logit(p ijkl ) = X ijkl β + v (s) l + v (d) kl + u jkl, N(0, σ 2 s), v (d) kl N(0, σ 2 d ), u jkl N(0, σ 2 u) Here the second level indexed by j is pseudo in that each level 2 unit contains only 1 level 1 unit. One interpretation of the model is to consider level 2 as the cell level and level 1 as the individual level and in fact expanding the dataset to include each individual response would give an equivalent model. 20
21 Fitting a simple model with districts only For our first model to be fitted to the illiteracy dataset we ignore states and simply fit districts and cells with an intercept term. y ijk Binomial(n ijk, p ijk ) where logit(p ijk ) = β 0 + v (d) k + u jk, v (d) k N(0, σ 2 d ), u jk N(0, σ 2 u) We will consider both quasi-likelihood based methods and MCMC methods (with diffuse priors) for fitting this model. MLwiN offers both Marginal quasi-likelihood (MQL) and penalised quasi-likelihood (PQL) estimation methods but for this model only 1st order MQL will converge. 21
22 Fitting using MCMC MLwiN also offers an MCMC algorithm that consists of a mixture of Metropolis and Gibbs sampling steps. This algorithm had incredibly poor mixing and the chains did not appear to converge. We switched to WinBUGS as we could here reparameterise the model using hierarchical centering (Gelfand, Sahu & Carlin 1995). y ijk Binomial(n ijk, p ijk ) logit(p ijk ) = u jk where u jk = β 0 + v (d) k + u jk u jk N(v k, σ2 u) where vk = β 0 + v (d) k vk N(β 0, σv) 2 22
23 Estimates for Model Parameter 1st MQL (SE) MCMC (95% CI) β (0.025) (-0.058,0.076) σd (0.018) (0.320,0.454) σu (0.016) (1.293,1.408) Note the underestimation of MQL (see e.g. Browne(1998) for details). Here the MCMC intervals do not even contain the MQL estimates for either variance estimate. 23
24 VPC methods for overdispersed random effects models All 3 methods of calculating the VPC can be extended to several levels. A: Model Linearisation Here for our example the VPC associated with districts has the following form: V P C ijk = σ 2 d p2 ijk /(1+exp(β 0)) 2 (σ 2 d +σ2 u )p2 ijk /(1+exp(β 0)) 2 +p ijk (1 p ijk ) C: Latent Variable Approach Here the formula is V P C ijk = σ 2 d /(σ2 d + σ2 u + π 2 /3) 24
25 Method B simulation This method now involves the following steps: 1. From the fitted model simulate a large number (M) of values for the higher-level random effects, from the distribution N(0, σ 2 d ). 2. For each of the generated higher-level random effects, simulate a large number (m) of values for the over-dispersion random effects, from the distribution N(0, σ 2 u). 3. Combine the generated residuals with β 0 and compute the T = M m corresponding values of p ijk (p (r) ijk ), r = 1,.., T using the anti-logit function. For each of these values compute the level 1 binomial variance σ 2 rijk = p(r) ijk (1 p(r) ijk ). 25
26 Method B simulation (continued) 4. For each of the M higher-level random effect draws calculate the mean of the m generated p (r) ijk, p(r) ijk. 5. The coefficient V P C ijk is now calculated as follows V P C ijk = σ 2 3/(σ σ 2 1) where σ 2 3 = var(p (R) ijk ), σ 2 2 = var(p (r) ijk ) and σ 2 1 = r σ2 rijk /T 26
27 VPC estimates for this model The following table contains the district level VPC estimates using both MQL and MCMC estimates for the parameters Method 1st MQL (SE) MCMC (95% CI) Linearisation 4.01% 6.68% Simulation 3.45% (0.016%) 5.40% (0.033%) Latent Variable 4.68% 7.62% Note Simulation estimates based on 10 runs with M = m = 5000 and the number in brackets is MCSE. 27
28 Adding cell-level predictor variables We can extend our model by adding predictors for each cell as follows: y ijk Binomial(n ijk, p ijk ) logit(p ijk ) = β 0 + X T jk β + v(d) k + u jk, v (d) k N(0, σ 2 d ), u jk N(0, σ 2 u) In this example the variables in β are gender, social class (caste, tribe or other) and urban/rural habitation along with some interactions. This will result in 12 (2 3 2) cell types and 12 VPC estimates 28
29 Hierarchical centering again Note that because all our predictors are at level 2 we can perform hierarchical centering as follows: y ijk Binomial(n ijk, p ijk ) logit(p ijk ) = u jk where u jk = β 0 + Xjk T β + v(d) k + u jk u jk N(XT jk β + v k, σ2 u) where vk = β 0 + v (d) k vk N(β 0, σv) 2 This hierarchical centered formulation allows WinBUGS to use conjugate Gibbs sampling steps for all parameters apart from u jk. 29
30 Adding cell-level predictor variables The parameter estimates for the model are given below: Parameter MCMC estimate (95% CI) β 0 Intercept (-0.789, ) β 1 Female (1.393, 1.481) β 2 Caste (0.646, 0.746) β 3 Tribe (0.954, 1.033) β 4 Urban (-1.084, ) β 5 Caste Urban (0.285, 0.414) β 6 Female Urban (-0.402, ) σv 2 District level variance (0.407, 0.539) σu 2 Over-dispersion variance (0.301, 0.328) 30
31 VPC estimates Cell Type Illiteracy VPC Method A VPC Method B Probability (MCSE) Male/Other/Rural % 8.05% (0.03%) Male/Other/Urban % 5.89% (0.03%) Female/Other/Rural % 8.03% (0.04%) Female/Other/Urban % 8.13% (0.03%) Male/Caste/Rural % 8.57% (0.03%) Male/Caste/Urban % 8.06% (0.03%) Female/Caste/Rural % 6.69% (0.04%) Female/Caste/Urban % 8.41% (0.04%) Male/Tribe/Rural % 8.49% (0.03%) Male/Tribe/Urban % 7.99% (0.03%) Female/Tribe/Rural % 5.97% (0.04%) Female/Tribe/Urban % 8.45% (0.04%) Note that method C is independent of the predictors and gives the VPC estimate of 11.47% for all cell types. 31
32 Adding the state level We have thus far ignored the state level. When we add in this level we will get two VPC estimates, one for district and one for state. We will here simply consider a model with no predictor variables as shown below: v (s) l y ijkl Binomial(n ijkl, p ijkl ) logit(p ijkl ) = β 0 + v (s) l + v (d) kl + u jkl, N(0, σ 2 s), v (d) kl N(0, σ 2 d ), u jkl N(0, σ 2 u) Again we can use hierarchical centering to improve the mixing of the MCMC estimation routine. 32
33 Adding the state level The estimates for this model are: Parameter MCMC Estimate (95% CI) β 0 Intercept (-0.654, ) σ 2 s State level variance (0.246, 0.811) σd 2 District level variance (0.022, 0.066) σ 2 u Over-dispersion variance (1.290, 1.402) From our initial inspection we can see that the variability between the 29 states is far greater than the variability between the districts within the states. 33
34 VPC estimates and further models The VPC estimates are given below: Method State VPC District VPC Estimate Estimate A. Linearisation 7.59% 0.70% B. Simulation 6.19% (0.08%) 0.59% (0.001%) C. Latent Variable 8.87% 0.81% Browne et al. (2005) also consider models that include both the state level and predictor variables along with heteroskedasticity at the cell level. 34
35 Interval estimation for the VPC An advantage of using an MCMC algorithm to get parameter estimates for our random effects logistic regression model is that we get a sample of estimates from the posterior distribution of the parameters. We can therefore, for each iteration work out the VPC using whichever method we prefer and hence get a sample of VPC estimates from which we can produce interval estimates. In our first example with no predictors and just district level variation we have 95% credible intervals of (5.64%,7.82%) for method A and (6.45%,8.92%) for method C. Note method B is too computationally burdensome here. 35
36 Further work - 1 Looking at the form of the VPC: The following shows curves representing the VPC plotted against X ij β for various values of σ 2 u. 36
37 Further work - 2 We can try and approximate these curves with Normal, t or power family distributions. Similarly for 3 levels we see below an example with σ 2 u = σ 2 v =
38 Further work - 3 Andrew Gelman and Iain Pardoe have been looking at expanding the R 2 concept to multilevel models. Here they consider there to be an R 2 for each level where a level is a set of random effects. This means in a random slopes model both the intercepts and slopes are levels and have an R 2 associated with them. Here slopes and intercepts are assumed independent. The R 2 values then explain the percentage of variance explained by fixed predictors for each level. It would be interesting to to compare and/or combine their idea with the VPC concept. 38
39 References Goldstein, H., Browne, W.J., and Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding Statistics 1, Browne, W.J., Subramanian, S.V., Jones, K. and Goldstein, H. (2005). Variance partitioning in multilevel logistic models that exhibit over-dispersion. JRSS A to appear Browne, W.J. (1998). Applying MCMC Methods to Multilevel Models. PhD dissertation, Dept. of Maths, University of Bath. Gelfand, A.E., Sahu, S.K., and Carlin, B.P. (1995). Efficient parameterizations for normal linear mixed models. Biometrika 82, Snijders, T. and Bosker, R. (1999). Multilevel Analysis. Sage Gelman, A and Pardoe, I (2004). Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Unpublished Technical Report 39
Applying MCMC Methods to Multi-level Models submitted by William J Browne for the degree of PhD of the University of Bath 1998 COPYRIGHT Attention is drawn tothefactthatcopyright of this thesis rests with
Parallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
Power and sample size in multilevel modeling
Snijders, Tom A.B. Power and Sample Size in Multilevel Linear Models. In: B.S. Everitt and D.C. Howell (eds.), Encyclopedia of Statistics in Behavioral Science. Volume 3, 1570 1573. Chicester (etc.): Wiley,
Multilevel Modelling of medical data
Statistics in Medicine(00). To appear. Multilevel Modelling of medical data By Harvey Goldstein William Browne And Jon Rasbash Institute of Education, University of London 1 Summary This tutorial presents
CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
Bayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
Introduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
Measures of explained variance and pooling in multilevel models
Measures of explained variance and pooling in multilevel models Andrew Gelman, Columbia University Iain Pardoe, University of Oregon Contact: Iain Pardoe, Charles H. Lundquist College of Business, University
AN ILLUSTRATION OF MULTILEVEL MODELS FOR ORDINAL RESPONSE DATA
AN ILLUSTRATION OF MULTILEVEL MODELS FOR ORDINAL RESPONSE DATA Ann A. The Ohio State University, United States of America [email protected] Variables measured on an ordinal scale may be meaningful
Models for Longitudinal and Clustered Data
Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
The Basic Two-Level Regression Model
2 The Basic Two-Level Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,
College Readiness LINKING STUDY
College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)
I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA
Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations
R2MLwiN Using the multilevel modelling software package MLwiN from R
Using the multilevel modelling software package MLwiN from R Richard Parker Zhengzheng Zhang Chris Charlton George Leckie Bill Browne Centre for Multilevel Modelling (CMM) University of Bristol Using the
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Introduction to Data Analysis in Hierarchical Linear Models
Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM
Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group
Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers
Introducing the Multilevel Model for Change
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling - A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
Goodness of fit assessment of item response theory models
Goodness of fit assessment of item response theory models Alberto Maydeu Olivares University of Barcelona Madrid November 1, 014 Outline Introduction Overall goodness of fit testing Two examples Assessing
Illustration (and the use of HLM)
Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will
DURATION ANALYSIS OF FLEET DYNAMICS
DURATION ANALYSIS OF FLEET DYNAMICS Garth Holloway, University of Reading, [email protected] David Tomberlin, NOAA Fisheries, [email protected] ABSTRACT Though long a standard technique
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
The Latent Variable Growth Model In Practice. Individual Development Over Time
The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti
Regression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
Elements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
A Bayesian hierarchical surrogate outcome model for multiple sclerosis
A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
Latent Class Regression Part II
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package
A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package William J Browne, Mousa Golalizadeh Lahi* & Richard MA Parker School of Clinical Veterinary
CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES
Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical
Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
Towards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
Prediction for Multilevel Models
Prediction for Multilevel Models Sophia Rabe-Hesketh Graduate School of Education & Graduate Group in Biostatistics University of California, Berkeley Institute of Education, University of London Joint
CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA
Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or
Logit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Ordinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
LOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 [email protected] In dummy regression variable models, it is assumed implicitly that the dependent variable Y
Validation of Software for Bayesian Models Using Posterior Quantiles
Validation of Software for Bayesian Models Using Posterior Quantiles Samantha R. COOK, Andrew GELMAN, and Donald B. RUBIN This article presents a simulation-based method designed to establish the computational
Sample size and power calculations
CHAPTER 20 Sample size and power calculations 20.1 Choices in the design of data collection Multilevel modeling is typically motivated by features in existing data or the object of study for example, voters
Calculating the Probability of Returning a Loan with Binary Probability Models
Calculating the Probability of Returning a Loan with Binary Probability Models Associate Professor PhD Julian VASILEV (e-mail: [email protected]) Varna University of Economics, Bulgaria ABSTRACT The
Missing Data. Katyn & Elena
Missing Data Katyn & Elena What to do with Missing Data Standard is complete case analysis/listwise dele;on ie. Delete cases with missing data so only complete cases are le> Two other popular op;ons: Mul;ple
Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest
Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t
Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE
PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive
Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling
Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Pre-requisites Modules 1-4 Contents P5.1 Comparing Groups using Multilevel Modelling... 4
Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but
Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I
Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting
Logistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
Point and Interval Estimates
Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number
Interpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
Electronic Theses and Dissertations UC Riverside
Electronic Theses and Dissertations UC Riverside Peer Reviewed Title: Bayesian and Non-parametric Approaches to Missing Data Analysis Author: Yu, Yao Acceptance Date: 01 Series: UC Riverside Electronic
The 3-Level HLM Model
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 2 Basic Characteristics of the 3-level Model Level-1 Model Level-2 Model Level-3 Model
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected]
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing [email protected] IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Notes on the Negative Binomial Distribution
Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
Multilevel Modeling of Complex Survey Data
Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models
Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models Clement A Stone Abstract Interest in estimating item response theory (IRT) models using Bayesian methods has grown tremendously
Lecture 5 Three level variance component models
Lecture 5 Three level variance component models Three levels models In three levels models the clusters themselves are nested in superclusters, forming a hierarchical structure. For example, we might have
Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models
Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model
Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model Bartolucci, A.A 1, Singh, K.P 2 and Bae, S.J 2 1 Dept. of Biostatistics, University of Alabama at Birmingham, Birmingham,
Yew May Martin Maureen Maclachlan Tom Karmel Higher Education Division, Department of Education, Training and Youth Affairs.
How is Australia s Higher Education Performing? An analysis of completion rates of a cohort of Australian Post Graduate Research Students in the 1990s. Yew May Martin Maureen Maclachlan Tom Karmel Higher
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Multiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
