Lecture 5 Three level variance component models

Similar documents

Overview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice

Introduction to Longitudinal Data Analysis

Power and sample size in multilevel modeling

Homework 11. Part 1. Name: Score: / null

The 3-Level HLM Model

Designing Adequately Powered Cluster-Randomized Trials using Optimal Design

Using PROC MIXED in Hierarchical Linear Models: Examples from two- and three-level school-effect analysis, and meta-analysis research

HLM software has been one of the leading statistical packages for hierarchical

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

10. Analysis of Longitudinal Studies Repeat-measures analysis

Longitudinal Data Analysis

Prediction for Multilevel Models

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Introduction to Data Analysis in Hierarchical Linear Models

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Elements of statistics (MATH0487-1)

STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

The Basic Two-Level Regression Model

Simple linear regression

Getting Correct Results from PROC REG

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

CLUSTER SAMPLE SIZE CALCULATOR USER MANUAL

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

An introduction to hierarchical linear modeling

Models for Longitudinal and Clustered Data

Statistical Analysis RSCH 665 Eagle Vision Classroom - Blended Course Syllabus

Chapter 23. Inferences for Regression

Latent Class Regression Part II

Analyzing Data from Nonrandomized Group Studies

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Teaching model: C1 a. General background: 50% b. Theory-into-practice/developmental 50% knowledge-building: c. Guided academic activities:

2009 Mississippi Youth Tobacco Survey. Office of Health Data and Research Office of Tobacco Control Mississippi State Department of Health

Univariate Regression

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

The Comparisons. Grade Levels Comparisons. Focal PSSM K-8. Points PSSM CCSS 9-12 PSSM CCSS. Color Coding Legend. Not Identified in the Grade Band

Homework 8 Solutions

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Illustration (and the use of HLM)

Geostatistics Exploratory Analysis

Εισαγωγή στην πολυεπίπεδη μοντελοποίηση δεδομένων με το HLM. Βασίλης Παυλόπουλος Τμήμα Ψυχολογίας, Πανεπιστήμιο Αθηνών

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Introducing the Multilevel Model for Change

Data analysis process

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Handling attrition and non-response in longitudinal data

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Analysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Statistical Rules of Thumb

Approaches for Analyzing Survey Data: a Discussion

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)

Electronic Thesis and Dissertations UCLA

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Chapter 7 Factor Analysis SPSS

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Simple Linear Regression Inference

Statistical Models in R

Technical Report. Teach for America Teachers Contribution to Student Achievement in Louisiana in Grades 4-9: to

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Assignments Analysis of Longitudinal data: a multilevel approach

Fixed-Effect Versus Random-Effects Models

Statistics 151 Practice Midterm 1 Mike Kowalski

COMMON CORE STATE STANDARDS FOR

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Teaching Multivariate Analysis to Business-Major Students

Week 1. Exploratory Data Analysis

College Readiness LINKING STUDY

ISSN: Journal of Educational and Management Studies. J. Educ. Manage. Stud., 5(3): ; Sep 30, 2015

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

The primary goal of this thesis was to understand how the spatial dependence of

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Systematic Reviews and Meta-analyses

Section 3 Part 1. Relationships between two numerical variables

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

Project Database quality of nursing (Quali-NURS)

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination

Longitudinal Data Analysis. Wiley Series in Probability and Statistics

Portfolio Performance Measures

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Transcription:

Lecture 5 Three level variance component models

Three levels models In three levels models the clusters themselves are nested in superclusters, forming a hierarchical structure. For example, we might have repeated measurement occasions (units) for patients (clusters) who are clustered in hospitals (superclusters).

Which method is best for measuring respiratory flow? Peak respiratory flow (PEFR) is measured by two methods, the standard Wright peak flow and the Mini Wright meter, each on two occasions on 17 subjects.

Table 1.1: Peak respiratory flow rate measured on two occasions using both the Wright and the Mini Wright meter ( Bland and Altma, Lancet 1986) Level 1: occasion (i) Level 2: method (j) Level 3: individual (k)

Model 1: two-level We fitted a two-level model to all 4 measurements ignoring the fact the different methods were used Occasion i, method j, subject k y ijk = β 1 + ζ k (3) + ε ijk ε ijk ~ N(0,σ 2 ) ζ k (3) ~ N(0,τ 2 ) Variance of the measurements within subjects Variance of the measurements across subjects Here we made no distinction between the two methods

Model 2: two-level Occasion i, method j, subject k y ijk = β 1 + β 2 x j + ζ k (3) + ε ijk ε ijk ~ N(0,σ 2 ) ζ k (3) ~ N(0,τ 2 ) Here we might add a binary variable for estimating the methods effect - this variable allows for a systematic difference between the 2 methods

Intraclass correlation coefficient τ 2 τ 2 + σ 2 = 109.2 2 109.2 2 + 23.8 2 = 0.95 Correlation between the 4 repeated measures on the same individual (the method used for the measurement Is ignored) The % of the total variance of the measurements (within + between) that is explained by the variance of the measurem individuals

Why we need three stage? occasion (i), method(j), individual (k) Both two-level variance component models assume that the four measurements using the two methods, were all mutually independent, conditional on the random intercept (that is, they ignore the possibility that the measurements obtained with the same method might be more similar to each other than the measurements obtained with two different methods). In other words the measurements are nested within the method To see if this appears reasonable, we can plot all four measurements against subject id

Fig 7.2: Scatterplot of peak-respiratory flow measured by two methods versus subject id The shift between the measurements taken from the 2 methods varies across subjects measurements on the same subjects are more similar than measurements on different subjects For a given subject, the measurements using the same method tend to resamble each other more than measurements using the other method

Why we need three-level models? As expected, measurements on the same subjects are more similar than measurements on different subjects. This between subject heterogeneity is modeled by the subject-level intercept ς (3) k.

Why we need three-level models The figure suggests that for a given subject, the measurements using the same method tend to be more similar to each other, violating the conditional independence assumption of model (1) The difference between methods is not due to some constant shift of the measurements using one method relative to the other, but due to shifts that vary between subjects, thus violating the assumption in model (2)

Model 3: three-level variance component models y = β + ζ (2) + ζ (3) + ε ijk 1 jk k ijk ε ijk ~ N(0,σ 2 ) ζ (2) ~ N(0,τ 2 ) jk 2 ζ k (3) ~ N(0,τ 3 2 ) account for between-method within-subject heterogeneity Variance of the measurements across the two methods for the same subject Variance of the measurements across subjects

Parameters interpretations y = β + ζ (2) + ζ (3) + ε ijk 1 jk k ijk β 1 Population average of all measurements (across occasions, methods, and subjects) β 1 + ζ k (3) β + ζ (2) + ζ (3) 1 jk k Average of the measurements for subject k (across occasions and methods) Average of the measurements for method j and for subject k (across occasions)

Model 4: three-level variance component models y = β + β x + ζ (2) + ζ (3) + ε ijk 1 2 j jk k ijk ε ijk ~ N(0,σ 2 ) ζ (2) ~ N(0,τ 2 ) jk 2 ζ k (3) ~ N(0,τ 3 2 )

Different types of intraclass correlation ρ(subject) = cor(y ijk, y i' j'k x j,x j' ) = = τ 3 2 τ 2 2 + τ 3 2 + σ 2 correlation between the 4 measurements within the subject (same subject, different method, and different occasion) ρ(method,subject) = cor(y ijk,y i' jk x j ) = = τ 2 2 + τ 3 2 τ 2 2 + τ 3 2 + σ 2 correlation between the measurements obtained with the same method and for the same subject (same subject, same method, different occasions)

Intraclass correlations Note that cor(method,subject)> cor(subject). This makes sense since, as we saw in Figure 7.2, measurements using the same method are more similar than measurements using different methods for the same person.

Three-stage formulation Random effect Fixed effect y ijk = η jk + β 2 x j + ε ijk η = π + ζ (2) jk k jk π = β + ζ (3) k 1 jk Stage 1 Stage 2 Stage 3 y = β + β x + ζ (2) + ζ (3) + ε ijk 1 2 j jk jk ijk

Television school and family smoking cessation project (TVSFP) The TVSFP is a study designed to determine the efficacy of a schoolbased smoking prevention program in conjunction with a television-based prevention program, in terms of preventing smoking onset and increasing smoking cessation (Flay et al 1995)

TVSFP: outcome Outcome: a tobacco and health knowledge scale (THKS) assessing the student s knowledge of tobacco and health Linear model for THKS postintervention, with THKS pre-intervention as a covariate

TVSFP: study design 2x2 factorial design, with four intervention conditions determined by cross-classification of a school-based social resistant curriculum (CC: coded as 0 or 1) with a television-based program (TV, coded as 0 or 1) Randomization to one of the four intervention conditions was at the school level Intervention was delivered at the classroom level 1600 seventh-grades students from 135 classes in 28 schools in Los Angeles

Three-level model for the TVSFP i (student), j (classroom), k (school) postthks Y ijk = β 1 + β 2 prethks + β 3 CC + β 4 TV + β 5 (CC TV ) + +b (3) k + b (2) jk + ε ijk ε ijk ~ N(0,σ 1 2 ) b (2) jk ~ N(0,σ 2 2 ) b k (3) ~ N(0,σ 3 2 ) Within classroom, across students Within school, across classrooms Across schools

Intraclass correlation coefficients Correlation among THKS scores for classmates (or children within the same class and same school) is 0.061 σ 3 2 + σ 2 2 σ 3 2 + σ 2 2 + σ 1 2 = 0.039 + 0.065 0.039 + 0.065 +1.602

Intraclass correlation coefficients Correlation among THKS scores for children for different classrooms within the same school is 0.023 σ 3 2 σ 3 2 + σ 2 2 + σ 1 2 = 0.039 0.039 + 0.065 +1.602

Should we ignore the intraclass correlation? The intraclass correlation coefficients were relatively small at both the school and at the classroom levels. We might be tempted to think that the clustering of the data would not affect the intervention effects Such conclusion would be erroneous Although the intraclass correlations are small, they have substantial impact on the inferences

Linear model for the TVSFP without random effects i (student), j (classroom), k (school) postthks Y ijk = β 1 + β 2 prethks + β 3 CC + β 4 TV + β 5 (CC TV ) + ε ijk ε ijk ~ N(0,σ 1 2 ) This model ignores clustering in the data at a classroom and school levels. This is a standard linear regression model and assumes that the responses are independent

Comparing results Model-based standard errors (assuming no clustering) and misleading small for the randomized intervention effects and lead to substantially different conclusions Bottom line: even a very modest intracluster correlation can have a discernable impact on the inferences