Structural Equations Modeling Part 1: Confirmatory Factor Analysis. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

Similar documents
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Applications of Structural Equation Modeling in Social Sciences Research

Presentation Outline. Structural Equation Modeling (SEM) for Dummies. What Is Structural Equation Modeling?

Overview of Factor Analysis

SPSS and AMOS. Miss Brenda Lee 2:00p.m. 6:00p.m. 24 th July, 2015 The Open University of Hong Kong

Additional sources Compilation of sources:

Structural Equation Modelling (SEM)

Goodness of fit assessment of item response theory models

USING MULTIPLE GROUP STRUCTURAL MODEL FOR TESTING DIFFERENCES IN ABSORPTIVE AND INNOVATIVE CAPABILITIES BETWEEN LARGE AND MEDIUM SIZED FIRMS

Introduction to Path Analysis

Confirmatory factor analysis in MPlus

An Empirical Study on the Effects of Software Characteristics on Corporate Performance

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

Descriptive Statistics

T-test & factor analysis

SEM Analysis of the Impact of Knowledge Management, Total Quality Management and Innovation on Organizational Performance

Social Media Marketing Management 社 會 媒 體 行 銷 管 理 確 認 性 因 素 分 析. (Confirmatory Factor Analysis) 1002SMMM12 TLMXJ1A Tue 12,13,14 (19:20-22:10) D325

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Influencing Factors of the E-commerce Teaching Quality Based Taskdriven Project Empirical Study

An Introduction to Path Analysis. nach 3

The Technology Acceptance Model with Online Learning for the Principals in Elementary Schools and Junior High Schools

Use of structural equation modeling in operations management research: Looking back and forward

Association Between Variables

Chapter 5 Analysis of variance SPSS Analysis of variance

Introduction to Principal Components and FactorAnalysis

Evaluating the Fit of Structural Equation Models: Tests of Significance and Descriptive Goodness-of-Fit Measures

FACTOR ANALYSIS NASC

Knowledge Management and Organizational Learning in Food Manufacturing Industry

A Casual Structure Analysis of Smart phone Addiction: Use Motives of Smart phone Users and Psychological Characteristics

Introduction to Longitudinal Data Analysis

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

The correlation coefficient

JJMIE Jordan Journal of Mechanical and Industrial Engineering

Moderation. Moderation

Simple Regression Theory II 2010 Samuel L. Baker

Rens van de Schoot a b, Peter Lugtig a & Joop Hox a a Department of Methods and Statistics, Utrecht

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance

How To Understand Multivariate Models

MAGNT Research Report (ISSN ) Vol.2 (Special Issue) PP:

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Psychology 405: Psychometric Theory Homework on Factor analysis and structural equation modeling

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

SPSS Guide: Regression Analysis

Analyzing Structural Equation Models With Missing Data

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Common factor analysis

Factors That Improve the Quality of Information Technology and Knowledge Management System for SME(s) in Thailand

Multiple Regression: What Is It?

Introduction to Regression and Data Analysis

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Pearson s Correlation

Research Methods & Experimental Design

Statistical tests for SPSS

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Introduction to Quantitative Methods

lavaan: an R package for structural equation modeling

Chapter 7 Factor Analysis SPSS

Simple Linear Regression Inference

THE RELATIONSHIPS BETWEEN CLIENT AND CONSULTANT OBJECTIVES IN IT PROJECTS

Ordinal Regression. Chapter

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

4. Multiple Regression in Practice

Factor Analysis. Chapter 420. Introduction

Canonical Correlation Analysis

Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Chapter 1 Introduction. 1.1 Introduction

Factors Affecting Demand Management in the Supply Chain (Case Study: Kermanshah Province's manufacturing and distributing companies)

Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012

Efficient Curve Fitting Techniques

DISCRIMINANT FUNCTION ANALYSIS (DA)

Study Guide for the Final Exam

Participation in Performance Measurement Systems and Level of Satisfaction

The Relationships between Perceived Quality, Perceived Value, and Purchase Intentions A Study in Internet Marketing

E-learning: Students perceptions of online learning in hospitality programs. Robert Bosselman Hospitality Management Iowa State University ABSTRACT

IDENTIFICATION OF MEASUREMENT ITEMS OF DESIGN REQUIREMENTS FOR LEAN AND AGILE SUPPLY CHAIN- CONFIRMATORY FACTOR ANALYSIS

A STRUCTURAL EQUATION MODEL ASSESSMENT OF LEAN MANUFACTURING PERFORMANCE

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

E10: Controlled Experiments

THE PSYCHOMETRIC PROPERTIES OF THE AGRICULTURAL HAZARDOUS OCCUPATIONS ORDER CERTIFICATION TRAINING PROGRAM WRITTEN EXAMINATIONS

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Manufacturing Service Quality: An Internal Customer Perspective

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Part 2: Analysis of Relationship Between Two Variables

Attitude, Behavioral Intention and Usage: An Empirical Study of Taiwan Railway s Internet Ticketing System

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Regression Analysis: A Complete Example

Transcription:

Structural Equations Modeling Part 1: Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

Agenda Basic concepts (CFA) Practical guidelines Tutorial on SPSS Amos and CFA 2

Section 1: What is Structural Equations Modeling? 3

What is SEM? Structural equation modeling (SEM) is a collection of statistical techniques that allow a set of relationships between one or more independent variables (IV s), either continuous or discrete, and one or more dependent variables (DV s), either continuous or discrete, to be examined. (~ Series of multiple regression equations) 4

Why use SEM? 1. Estimation of several interrelated relationships 2. Ability to represent unobserved (latent) concepts and correct for measurement error 3. Defines a model to explain an entire set of relationships 5

What is a latent construct? Represents theoretical concepts, which cannot be observed directly Similar to factors discussed in Exploratory Factor Analysis Needs to be measured indirectly using multiple measured variables (a.k.a. indicator or manifest variables) 6

Exogenous vs. Endogenous Constructs Exogenous construct ~ latent, multi-item equivalent of an independent variable Variate (linear combination) of measures is used to represent a construct Multiple measured variables represent the exogenous constructs Endogenous construct ~ latent, multi-item equivalent to a dependent variable Theoretically determined by factors within the model Multiple measured variables represent the endogenous constructs 7

Example: Two latent constructs Exogenous Construct Endogenous Construct X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 4 Loadings represent the relationships from constructs to variables as in factor analysis. Path estimates represent the relationships between constructs as does β in regression analysis. Source: Hair et al. (2010) 8

Visual modeling: Path diagrams SEM models are commonly described in visual form using path diagrams, which present relations between constructs and measured variables Path diagrams generally consist of two parts: Measurement model How are the constructs related to measured variables? Structural model What are the relationships between the constructs? 9

Types of relationships in SEM 1. Relationship between a Construct and a Measured Variable Exogenous X Endogenous Y 10

Types of relationships in SEM 2. Relationship between a Construct and multiple Measured Variables X 1 Exogenous X 2 X 3 11

Types of relationships in SEM 3. Dependence relationship between two constructs (Structural relationship) Exogenous Endogenous 12

Types of relationships in SEM 4. Correlational relationship between constructs Construct 1 Construct 2 13

Measurement and structural model Structural model Measurement model 14

Cause-and-effect relationships Substantial evidence required: 1. Covariation 2. Sequence 3. Non-spurious covariance 4. Theoretical support 15

Causal modeling in SEM? 16

Non-spurious relationships Original relationship: Supervisor 0.50 Job satisfaction Testing for alternate cause: 0.00 Working conditions 0.30 Supervisor 0.50 Job satisfaction 17

Reliability and measurement error A certain degree of measurement error is practically always present Reliability = measure for the degree to which a set of indicators of a latent construct are internally consistent (i.e. the extent to which they measure the same thing) Reliability is generally inversely related to measurement error 18

Improving statistical estimation In the previous multivariate techniques, we have assumed that we can overlook the measurement error in the variables SEM automatically applies a correction for the amount of measurement error and estimates the correct structural coefficient (i.e. the relationships between constructs) Relationship coefficients estimated by SEM tend to be larger than coefficients obtained from multiple regression 19

Strong theoretical basis needed No SEM model should be considered without an underlying theory Theory is needed for specifying the path diagram: Measurement model Structural model 20

Modeling strategies Confirmatory modeling strategy Specify a single model It either works or it doesn t Competing models strategy Multiple alternative specifications Strongest test is to compare models representing different but plausible hypothesized relationships Model development strategy Basic model proposed as a starting point SEM used to get insights for re-specification Model needs to be verified with an independent sample 21

SEM and other multivariate techniques SEM is most appropriate when researcher has multiple constructs, each represented by several measurement variables SEM ~ hybrid of multiple regression, MANOVA and factor analysis Opposite of exploratory techniques; everything is theory driven 22

Example: multiple regression 23

Section 2: 24

Similar to EFA in many respects, but with a completely different philosophy. With CFA, researcher needs to specify both number of factors as well as what variables define the factors. 25

CFA as a tool for evaluating measurement model Specification of the measurement model is a crucial step in SEM (!) Commonly CFA is used as a tool to validate the measurement model before specifying and estimating the structural model: Are the constructs unidimensional and valid? How many indicators should be used for each construct? Are the measures able to portray the construct or explain it? 26

Steps in CFA Define constructs Define measurement model Design the empirical study Estimate and assess validity 27

Step 1: Defining the constructs Operationalization Scales from prior research Development of new scales Pretesting 28

Step 2: Defining the measurement model Are the constructs unidimensional (i.e. no cross-loadings)? Is the measurement model congeneric (i.e. no covariance between or within construct error variances)? Is there a sufficient number of indicators per construct (i.e. ensure identification)? 29

Example: Congeneric model Compensation Teamwork L x1 L x 2 L x 3 L x 4 L x 5 L 6 L x 7 L x 8 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 Each measured variable is related to exactly one construct. Source: Hair et al. (2010) 30

Example: Non-Congeneric model Figure 11.2 A Measurement Model with Hypothesized Cross-Loadings and Correlated Error Variance Ф 21 Compensation Teamwork λ x5,1 λ x3,2 λ x1,1 λ x2,1 λ x3,1 λ x4,1 λ x5,2 λ x6,2 λ x7,2 λ x8,2 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 δ 1 δ 2 δ 3 δ 4 δ 5 δ 6 δ 7 δ 8 θ δ 2,1 θ δ 7,4 Each measured variable is not related to exactly one construct errors are not independent. Source: Hair et al. (2010) 31

Items per construct Good practice dictates a minimum of 3 indicator variables per construct (4 is preferred) Assessment of single-item constructs is problematic (if included, they don t generally stand for latent constructs) Rationale for requirement of 3 indicators: Measurement model with a single constructs and only 2 indicators is under-identified (= there are more parameters than unique covariances) Remember: the number of unique variances and covariances in the observed covariance matrix = degrees of freedom 32

Example: Over-identified construct Source: Hair et al. (2010) Symmetric Covariance Matrix: ξ 1 λ x λ x 1,1 λ x 2,1 λ x 4,1 3,1 10 unique variance-covariance terms X1 X2 X3 X4 --------------------------------------- X1 2.01 X2 1.43 2.01 X3 1.31 1.56 2.24 X4 1.36 1.54 1.57 2.00 Measured Items X 1 =Cheerful X 2 =Stimulated X 3 =Lively X 4 =Bright X 1 X 2 X 3 X 4 θ δ 1,1 θ δ 2,2 θ δ 3,3 θ δ 4,4 δ 1 δ 2 δ 3 δ 4 Eight paths to estimate Loading Estimates λ x 1,1 =0.78 λ x 2,1 =0.89 λ x 3,1 =0.83 λ x 4,1 =0.87 Error Variance Estimates θδ 1,1 =0.39 θδ 2,2 =0.21 θδ 3,3 =0.31 θδ 4,4 =0.24 Model Fit: χ 2 = 14.9 df = 2 p =.001 CFI =.99 33

Formative vs. reflective constructs Reflective measurement theory: Latent constructs cause the measured variables CFA is based on the reflective approach Errors occur due to inability to fully explain variables Formative measurement theory: Measured variables cause the construct Error term is an inability of measured variables to fully explain the construct Formative constructs are not latent Formative constructs are interpreted as indices where each indicator is a cause of the construct Have problems in statistical identification? 34

Formative vs. reflective constructs (cont.) Practical implications: Use of formative constructs require additional variables or constructs to ensure an over-identified model Formative should represent all items for it: dropping items because of low loadings should not be done (internal consistency and reliability are not so important) In reflective approach, indicators which have low correlations with the other indicators of the same construct, should be removed 35

Step 3: Design the empirical study Choice of measurement scales Sampling issues Model specification and identification issues Countering potential estimation problems 36

Setting the scales for constructs All indicator variables for a construct don t have to be of the same scale However, normalization can make interpretation easier Before estimation of the model, you need to ensure that the scale of each construct is defined: Fix one loading and set its value to 1 (i.e. don t estimate loading parameter); or Fix the construct variance and set its value to 1 Check that multiple values are not constrained to 1 for the purpose of defining the scale 37

Identification of the model Degrees of freedom gives the amount of mathematical information available to estimate model parameters In the case of SEM, this is given by the number of unique variances and covariances minus number of parameters Where p = number of variables and k=number of parameters 38

Identification of the model (cont.) Order condition: Net degrees of freedom must be > 0 Under-identified ~ more parameters than unique covariance and variance terms Just identified ~ df = 0 Over-identified ~ df > 0 Rank condition: Each parameter is uniquely defined 39

Recognizing identification problems Incorrect indicator specification Not linking an indicator to any construct Linking an indicator to two or more constructs Not creating and linking an error term for each indicator Setting the scale of a construct Forgetting to set the scale (either loading of an indicator or the construct variance) Insufficient degree of freedom Violation of 3-indicator rule (in particular when sample < 200) More indicators needed or add constraints to free up degrees of freedom 40

Recognizing identification problems (cont.) Very large standard errors Inability to invert the information matrix (no solution found) Wildly unreasonable estimates, including negative error variances Unstable parameter values 41

Sample size issues Multivariate distribution of data Should have 15 observations for each parameter estimated Estimation technique If all assumptions OK, ML works already with sample of 50 In less than ideal conditions, sample should be at least 200 Sample sizes in range of 100-400 are recommended Model complexity (# of constructs, parameters, groups) Amount of missing data Amount of average error variance among the reflective indicators With communalities less than 0.5 (i.e. standardized loadings less than 0.7), large samples required for stable solution 42

Thumb rules on sample size Minimum sample of 100: 5 or less constructs, each with more than 3 indicator variables, and high communalities 0.6 or higher Minimum sample of 150: 7 or less constructs, modest communalities 0.5, and no underidentified constructs (i.e. fewer than 3 indicators) Minimum sample of 300: 7 or fewer constructs, low communalities (below 0.45), and multiple under-identified constructs Minimum sample of 500: Models with large number of constructs, some with lower communalities, and/or having fewer than 3 indicators 43

Step 4: Examination of model validity Are the constructs valid? Is the model fit acceptable? Diagnostics? 44

Construct validity SEM can be used to evaluate the validity of constructs (i.e. to what extent do the measured items reflect the theoretical latent construct?) Aspects of construct validity: Convergent validity: loadings, variance extracted, reliability Discriminant validity Nomological validity Face validity 45

Convergent validity Indicators of a specific construct should converge or share a high proportion of variance in common Statistics for convergent validity Loadings Average variance extracted Reliability 46

Statistics for convergent validity Standardized factor loadings and squared factor loadings High loadings indicate convergence Should be statistically significant AVE = average variance extracted where squared standardized factor loadings indicate the amount of variation in the indicator that can be explained by the factor AVE > 0.5 => adequate convergence 47

Statistics for convergent validity (cont.) Construct reliability Where V(e i ) = error variance in variable i Should be > 0.7 to warrant good reliability High construct reliability indicates internal consistency, i.e. all measures represent the same construct 48

Guidelines for evaluating convergent and discriminant validity Estimated loadings should be 0.5 or higher AVE should be 0.5 or higher to support convergent validity AVE estimates for two factors should be greater than the square of the correlation between two factors to provide evidence of discriminant validity Construct reliability should be 0.7 or higher to suggest convergence and internal consistency 49

Discriminant validity Is the construct unique? Does it differ from other constructs? Do the individual indicator variables represent only one latent construct? Examine correlations between constructs Presence of cross-loadings is an indicator of discriminant validity problems 50

Nomological and face validity Face validity ~ looks like it will work Needs to be established before experiment Ensure understanding of every indicators content and meaning Nomological validity ~ does the construct behave as it should with respect to other constructs Theoretical propositions, e.g. as age increases, memory loss increases Check whether the correlations between constructs make sense! 51

Assessment of model validity Goodness-of-fit: Does the estimated implied covariance matrix match the observed covariance structure? Absolute goodness-of-fit Incremental goodness-of-fit Parsimonious fit measures Construct validity 52

Chi-square test The null hypothesis tests whether the difference between the sample and the estimated covariance matrix is a null or zero matrix Concluding that the null hypothesis holds indicates that the model fits the data 53

Problems with Chi-square test Chi-square statistic is a function of the sample size N and the difference between observed and estimated covariance matrices As N increases, so does the test-statistic even when differences between matrices don t change Chi-square statistic also increases when adding number of observed variables, which makes it more difficult to achieve a fit Need for complementary statistics!! 54

Comparative fit indices Based on idea of comparing nested models on continuum: saturated --- estimated --- independence Bentler-Bonett normed fit index (NFI): compares estimated model to independence model High values (> 0.95) indicate good-fit Bentler s comparative fit index (CFI): High values (> 0.95) indicate good-fit 55

Comparative fit indices (cont.) Tucker-Lewis Index (TLI): Conceptually similar to NFI Takes model complexity into account Not normalized, but generally models with good fit have values close to 1 Relative non-centrality index (RNI) Compares observed fit to that of a null model Higher values represent better model (> 0.9) 56

Parsimony fit indices Improved either by a better fit or a simpler model Conceptually similar to adjusted R 2 Examples: Adjusted Goodness-of-fit (AGFI) Parsimony normed fit index (PNFI) 57

Badness-of-fit indices Root mean square error of approximation (RMSEA) Quite broadly used Attempts to correct for tendency of chi-square to reject models with large sample or large number of observed variables Lower values imply better fit (< 0.08) Root mean square residual (RMR) or standardized RMR Generally standardized residuals exceeding 4.0 should be scrutinized SRMR > 0.1 indicates a problem with fit 58

Source: Hair et al. (2010)

Thank you! 60

Very Useful Materials http://statwiki.kolobkreations.com/ Download the helpful Excel-tool by J. Gaskin! It is useful during the course! There are additionally two plugins, which you can install when using your laptops / home computers (requires admin rights). 61