Crossing Borders Merging Multilevel and Structural Equation Modeling

Similar documents
Introduction to Longitudinal Data Analysis

Qualitative vs Quantitative research & Multilevel methods

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Introduction to Data Analysis in Hierarchical Linear Models

Rens van de Schoot a b, Peter Lugtig a & Joop Hox a a Department of Methods and Statistics, Utrecht

Applications of Structural Equation Modeling in Social Sciences Research

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Longitudinal Meta-analysis

lavaan: an R package for structural equation modeling

HLM software has been one of the leading statistical packages for hierarchical

Moderation. Moderation

Moderator and Mediator Analysis

Analyzing Structural Equation Models With Missing Data

Introduction to Path Analysis

Presentation Outline. Structural Equation Modeling (SEM) for Dummies. What Is Structural Equation Modeling?

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Additional sources Compilation of sources:

An Introduction to Path Analysis. nach 3

Overview of Factor Analysis

Handling attrition and non-response in longitudinal data

Illustration (and the use of HLM)

Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger

Marketing Mix Modelling and Big Data P. M Cain

Assignments Analysis of Longitudinal data: a multilevel approach

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

How To Understand Multivariate Models

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

The Basic Two-Level Regression Model

Descriptive Statistics

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

2015 TUHH Online Summer School: Overview of Statistical and Path Modeling Analyses

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Εισαγωγή στην πολυεπίπεδη μοντελοποίηση δεδομένων με το HLM. Βασίλης Παυλόπουλος Τμήμα Ψυχολογίας, Πανεπιστήμιο Αθηνών

Chapter 7: Simple linear regression Learning Objectives

Graduate Certificate in Systems Engineering

Multilevel Modeling Tutorial. Using SAS, Stata, HLM, R, SPSS, and Mplus

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

Introduction to structural equation modeling using the sem command

The Latent Variable Growth Model In Practice. Individual Development Over Time

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Research Methods & Experimental Design

Introduction to Regression and Data Analysis

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Power and sample size in multilevel modeling

Models for Longitudinal and Clustered Data

Module 3: Correlation and Covariance

T-test & factor analysis

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Correlation key concepts:

AN ILLUSTRATION OF MULTILEVEL MODELS FOR ORDINAL RESPONSE DATA

Introducing the Multilevel Model for Change

UNIVERSITY OF NAIROBI

When to Use a Particular Statistical Test

Introduction to Fixed Effects Methods

Binary Logistic Regression

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Module 5: Multiple Regression Analysis

Multilevel Analysis. Techniques and Applications

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

An introduction to hierarchical linear modeling

Regression III: Advanced Methods

Calculating the Probability of Returning a Loan with Binary Probability Models

Statistics Graduate Courses

Supplementary PROCESS Documentation

The primary goal of this thesis was to understand how the spatial dependence of

Association Between Variables

Curriculum - Doctor of Philosophy

Ordinal Regression. Chapter

Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012

The 3-Level HLM Model

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Everything You Wanted to Know about Moderation (but were afraid to ask) Jeremy F. Dawson University of Sheffield

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

A Basic Introduction to Missing Data

Statistical Models in R

Multilevel Models for Longitudinal Data. Fiona Steele

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Fairfield Public Schools

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

SYSTEMS OF REGRESSION EQUATIONS

LOGISTIC REGRESSION ANALYSIS

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Elements of statistics (MATH0487-1)

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Multivariate Analysis of Variance (MANOVA)

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Transcription:

Crossing Borders Merging Multilevel and Structural Equation Modeling Joop Hox Utrecht University, the Netherlands j.hox@uu.nl / www.joophox.net CrossingBorders 1

Structure of Talk What is structural equation modeling? And why is it interesting/useful? What is multilevel modeling? And why is it interesting/useful? Are they indeed merging? And why would that be interesting/useful? 2

What Is Structural Equation Modeling? Structural equation modeling (SEM) is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions Wikipedia, May 2013 SEM can conveniently be viewed as a combination of path analysis and factor analysis 3

Roots of SEM: Path Analysis Path analysis is a combination of a graphical model that specifies relations between variables, and estimation techniques to quantify these relations Invented by geneticist Sewall Wright in 1921 4

Contributions by Wright Distinction between Exogenous and Endogenous variables Distinction between Direct and Indirect effects Path tracing rules for estimating correlations between variables in the model Provided the model is recursive (no reciprocal paths) Path coefficients can be estimated by ordinary least squares regression methods 5

Roots of SEM: Factor Analysis Factor analysis describes the relations between multiple variables by a smaller set of hypothetical factors or Latent Variables Developed first by Charles Spearman in 1904 who suggested that one factor (g) underlies all intelligence test scores And also suggested true score theory ( factor analysis) to deal with measurement error 6

Roots of SEM: Factor Analysis Spearman s one factor model soon appeared to be overly restrictive, and various models for multiple factors were developed Raymond Cattell, Louis Thurstone (1940ies) 7

Roots of SEM: Enter Karl Jöreskog Jöreskog s dissertation was on estimation methods in factor analysis (1963) He went on to develop good computational methods for maximum likelihood estimation in factor analysis (1967) And methods to impose restrictions on parameters (constraints) which led to confirmatory factor analysis (1969) Multigroup factor analysis (1971) LISREL (1973) Combines path & factor analysis 8

The LISREL Model (Path Diagram) 9

The LISREL Model (Equations) y Λ η ε y x Λ ξ δ x η Βη Γξ ζ y, x = observed variables h, x = latent variables (factors) e, d = measurement errors z = prediction errors 10

SEM: Variations LISREL uses complete Jöreskog-Keesing-Wiley model Actually, only h-y part is needed (Amos, Eqs, Mplus) 11

SEM: Extensions Classical SEM: analyzing covariance matrix implies multivariate normal variables Modern SEM can model dichotomous & ordinal variables Modern SEM can deal with incomplete data by basing estimation on partially observed cases Latent variables can be categorical (latent class) 12

Why is SEM Interesting/Useful? Analyzing measurement models Including measurement models Investigating construct validity Estimating indirect paths (mediation) Cross-cultural comparisons Longitudinal data 13

Analyzing Measurement Models Spearman s 1904 article on factor analysis starts with measurement discrepancies must inevitably arise from errors in measuring (p254) His discussion or true correlations comes close to true score theory = classical reliability theory SEM can be used to estimate a variety of measurement models 14

SEM and Classical Reliability Reliability coefficient alpha assumes a constrained Confirmatory Factor Model (CFA) All loadings equal Essential tau equivalence No correlated errors 15

Estimating Coefficient Alpha in SEM Reliability is defined as the squared correlation between the true score and the sum score Actually we need some tricks to run this model (Raykov, 1997) sum r 2 =alpha ksi 1 c 1 1 c c x1 x2 x3 1 1 1 d1 d2 d3 16

SEM and Classical Reliability Reliability coefficient omega assumes a general Confirmatory Factor Model (CFA) Loadings may differ Congeneric test (unidimensional) No correlated errors 17

Estimating Coefficient Omega in SEM Reliability is defined as the correlation between the true score and the sum score Actually we need some tricks to run this model (Raykov, 1997) r 2 =omega 18

Alpha versus Omega CFA model underlying alpha is more restrictive CFA model underlying omega is more complex Using SEM, the difference can be tested Alpha can be calculated easily Omega requires factor analysis which is an advantage! 19

Calculating Coefficient Omega Fit a single factor model, variance factor=1 IF the fit is good, then Sum the loadings Square the sum Divide by the total variance 2 2 2 2 2 x d here is the advantage! Program factor by Lorenzo-Seva & Ferrando 20

Including Measurement Models When constructs are measured with considerable measurement error, the question arises: what is the real correlation? 21

Investigating Construct Validity Self-Concept is a construct that is hypothesized to contain four subdimensions that are correlated and measured by specific items 22

Investigating Construct Validity Several Traits (=constructs) are measured using different Methods The result is a Multi- Trait Multi-Method (MTMM) correlation matrix Modeled using SEM 23

Estimating Indirect Paths (Mediation) Distinction between direct and indirect effects (Sewall Wright) Typical model for mediation is: There is a Direct effect of A on C and an Indirect effect of A on C (via B) B is an intervening variable (mediator) 24

Testing Mediation The mediation effect can be tested directly using the Sobel test Sobel s test is: Built into Mplus and Lavaan Z A B A se B se 2 2 2 2 B A Better SEM methods exist, see MacKinnon (2008) 25

Cross-Cultural Comparisons For cross-cultural comparisons we must prove that our measurement instruments are the same cross-culturally = measurement equivalence Configural equivalence: same structure Metric equivalence: same loadings Scalar equivalence: same loadings, same intercepts Assessed by multigroup SEM New developments: Partial equivalence Approximate equivalence 26

Longitudinal Data SEM provides theoretically interesting models for change over time Cross-Lagged Panel Model Growth Curve Model 27

Multilevel Modeling 28

What Is Multilevel Modeling? Multilevel Modeling is a statistical technique for analyzing data that have a hierarchical data structure (often clusters or groups) Three level data structure Groups at different levels may have different sizes Response (outcome) variable at lowest level 29

What Makes Hierarchical Data Special? Observations in the same group are generally not independent They tend to be more similar than observations from different groups Selection, shared history, contextual group effects The degree of similarity is indicated by the intraclass correlation rho: r Standard statistical tests are not at all robust against violation of the independence assumption 30

Roots of Multilevel Modeling (MLM) 1950-70ies Problems of single level analysis, cross level inference, ecological fallacy Various analysis tricks Aggregation, disaggregation, group dummies Two step analysis (slopes as outcomes) No statistical model 31

Multilevel analysis ± 1970s Proper level, (dis)aggregation, multiple regression tricks, two-step procedure No mention of: random coefficient, hierarchical model, statistical model 32

Multilevel analysis ± 1990s Random coefficients, estimation methods, more than 2 levels, nonlinear models, distributional assumptions, special models 33

Multilevel analysis ± 2010 Extensions to other models, multilevel structural equation modeling, Bayesian approaches More powerful specialized software, multilevel analysis available in standard software packages (SPSS, SAS, Stata) 34

Graphical Picture of Simple Two-level Regression Model school size error School level Pupil level pupil sex grade error Outcome variable on pupil level Explanatory variables at both levels: individual & group Residual error at individual level Plus residual error at school level 35

Two Essential Points of Multilevel Modeling school size error School level Pupil level pupil sex grade error Groups make a difference (1): average outcome differs across groups Groups make a difference (2): the effect of an individual explanatory variable may be different in different groups 36

Multilevel Regression Model Lowest (individual) level: Y ij 0 j 1 j X ij e ij Second (group) level: Z 0 j 00 01 j u0 j 1 j 10 11Z j u1 j Combined Y ij 00 10Xij 01Z j 11Z j Xij u1 j Xij u0 j e ij fixed part random part 37

Multilevel regression analysis: Research questions 1. Questions with respect to variables at the lowest level: intelligence (IQ) as predictor of school achievement (SA) SA IQ u e ij 0 1 ij j ij 38

Multilevel Regression Analysis: Research Questions 2. Questions with respect to the influence of variables at a higher level on the dependent variable at the lowest level: mean intelligence of a class (MIQ) as predictor of school achievement (SA); (controlling for individual IQ) SA SA ij 0 1IQij 2 MIQ j u j e ij IQ 39

Multilevel regression analysis: Research questions 3a. The relation between intelligence and school achievement is not the same in all classes 3b. Questions with respect to the interaction of variables of different levels: mean intelligence of a class as predictor for the slope of intelligence SA SA IQ MIQ IQ MIQ u IQ u e ij 0 1 ij 2 j 3 ij j 1 j ij 0 j ij IQ 40

Nested Data: Individuals within Groups class 1 class 2 st1 st2 st3 st4 st1 st2 st3 st4 GPA 2.4 2.7 31. 2.3 2.9 2.2 2.3 2.6 GPA score based on: Different students in different classes Why multilevel? To control for dependency To answer questions with respect to variables of different levels and their interactions 41

Nested Data: Repeated Measures within Individuals GPA score based on: Different students at different time points Why multilevel? Balanced data (same time points/no dropout) is not a requirement Varying occasions are possible To answer questions with respect to individual characteristics and development over time 42

Special Model: Cross-Classifications Multilevel data may not be fully hierarchical Example: pupils nested in schools and neighborhoods Schools draw pupils from several neighborhoods Neighborhoods may contain several schools There is no perfect nesting either way Example: several raters evaluate pupils performance on a set of different tasks Different pupils different subsets of tasks (no copying) Different raters for different pupils (logistic reasons) 43

Special Model: Cross-Classifications Example: pupils within schools & neighborhoods Pupils nested in a cross-classification of schools and neighborhoods Y S N P u u e 0 1 1 2 1 3 1 j k i jk i jk Setup depends on software 44

Special Model: Meta Analysis Meta analysis pools results from many studies on the same hypothesis Each study provides one effect size estimate (e.g., a correlation), plus an estimate of its standard error Meta analysis methods combine results into one common result estimate between-study variation (heterogeneity) explain between-study variation Multilevel regression models can be used Flexibility, more levels, allows e.g. analysis of multiple outcomes Examples (http://www.self.ox.ac.uk/rdimaterials.htm) 45

Structural Equation and Multilevel Modeling: Are They Merging? Actually, they do not need to merge because they are the same model anyway! Bauer (2003). Estimating multilevel linear models as structural equation models Curran (2003). Have multilevel models been structural equation models all along? Mehta & Neale (2005). People are variables too: multilevel structural equations modeling 46

Why/How are These Models the Same? Image a multilevel data set with 2 groups and 4 people y = DV, x,z = IV at level ½ Restructure the data to a multivariate data set, one row per group Write a complicated SEM model to mimic a multilevel model 47

Multilevel Modeling in SEM Loadings fixed to 1 for intercept Loadings fixed to data values for sex-slope Lowest level variances fixed to be the same: one variance estimated Second level variance u 0 is variance of intercept Second level variance u 1 is variance of sex slope 48

Multilevel Modeling in SEM What happens if we have different numbers of boys and girls in each group? Treat as missing value problem Provide people variables for M and F boys & girls, M&F being maximum nr of boys/girls in data set What if the predictor values are more complex than 0-1? Things get ugly, but it can be done 49

Multilevel Modeling in SEM So, multilevel modeling including random slopes and structural equation modeling have been the same all along Who cares? Why bother? Why is this interesting/useful? 50

Multilevel Modeling = Structural Equation Modeling Interesting implication #1 We have 100+ years of experience with SEM Theoretical results, simulations, model extensions If MLM = SEM then all this applies here We have 25+ years of experience with MLM Theoretical results, simulations, model extensions 51

Multilevel Modeling = Structural Equation Modeling Interesting implication #2 Multilevel SEM makes it possible to have multilevel models with latent factors, measurement models, (multilevel) mediation, and so on To estimate indirect effects To include measurement models To build complex models 52

Why Multilevel SEM? Indirect effects: combine predictors & mediators from different levels (multilevel mediation) Intervention done at the group level Measurement model: specify measurement model for different levels Useful to assess measures of group phenomena measured by lower level observations Family members rate family climate, organizational members rate organizations work climate, efficiency 53

Estimation in Multilevel SEM Recent development: full information maximum likelihood estimation of multilevel data in SEM Allowing non-normal and incomplete data Allowing varying slopes But be numerically very intensive Limited availability so far Mplus (only 3 levels) GLLAMM (Stata plugin) (unlimited # of levels) 54

Multilevel SEM path diagrams Path diagram for two-level model, Muthén style (Hox, 2010, p310) 55

Multilevel CFA Example Six intelligence measures of 400 children from 60 families Hypothesis: at family level one general factor At child level more differentiation 56

Multilevel CFA Example ICC for observed variables high (0.38-0.51) Model fits very well ²=11.9, df=17, p=0.80 Family Size as a predictor at the family level not significant 57

What next? Expected Further integration of multilevel and structural equation modeling More use of Bayesian estimation Complex models easier to use Better diagnostics Needed More information on behavior of estimates and standard errors with small samples and non-normal distributions Better ways to assess effect sizes and explained variances 58

Helpful References SEM: Byrne (several books); Geisser (Mplus) http://davidakenny.net/cm/causalm.htm http://www2.gsu.edu/~mkteer/semfaq.html MLM: Hox; Snijders & Bosker (handbooks) Centre for Multilevel Modelling (Bristol) http://www.bristol.ac.uk/cmm/ Hox (multiple downloads) www.joophox.net Multilevel SEM: not much out there yet 59

Look Out For Utrecht University Summer School Courses on MLM (HLM), SEM (Mplus), Surveys http://www.utrechtsummerschool.nl/ What we do at the department of Methodology and Statistics, Utrecht University Research Master M&S (www.tinyurl.com/mtuumaster) 2014 conference of European Association of Methodology (http://eam2014.fss.uu.nl/) 60