Introduction to Path Analysis

Similar documents

Applications of Structural Equation Modeling in Social Sciences Research

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Presentation Outline. Structural Equation Modeling (SEM) for Dummies. What Is Structural Equation Modeling?

Structural Equation Modelling (SEM)

Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012

Simple Linear Regression Inference

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

An Introduction to Path Analysis. nach 3

Additional sources Compilation of sources:

The Latent Variable Growth Model In Practice. Individual Development Over Time

Confirmatory factor analysis in MPlus

Evaluating the Fit of Structural Equation Models: Tests of Significance and Descriptive Goodness-of-Fit Measures

Introduction to structural equation modeling using the sem command

lavaan: an R package for structural equation modeling

Latent Class Regression Part II

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

USING MULTIPLE GROUP STRUCTURAL MODEL FOR TESTING DIFFERENCES IN ABSORPTIVE AND INNOVATIVE CAPABILITIES BETWEEN LARGE AND MEDIUM SIZED FIRMS

Elements of statistics (MATH0487-1)

Overview of Factor Analysis

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

SPSS and AMOS. Miss Brenda Lee 2:00p.m. 6:00p.m. 24 th July, 2015 The Open University of Hong Kong

ZHIYONG ZHANG AND LIJUAN WANG

Least Squares Estimation

A Research on Influencing Mechanism of Business Process Outsourcing(BPO) Service Quality

Statistics in Psychosocial Research Lecture 8 Factor Analysis I. Lecturer: Elizabeth Garrett-Mayer

Introduction to Matrix Algebra

Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger

5. Linear Regression

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Econometrics Simple Linear Regression

Moderation. Moderation

SEM Analysis of the Impact of Knowledge Management, Total Quality Management and Innovation on Organizational Performance

Chapter 4: Vector Autoregressive Models

Simple Regression Theory II 2010 Samuel L. Baker

University of Maryland Fraternity & Sorority Life Spring 2015 Academic Report

3. Regression & Exponential Smoothing

Lecture 15. Endogeneity & Instrumental Variable Estimation

Multiple Regression: What Is It?

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Sales forecasting # 2

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Univariate Regression

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Module 5: Multiple Regression Analysis

Fitting Subject-specific Curves to Grouped Longitudinal Data

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

How To Understand Multivariate Models

Analyzing Structural Equation Models With Missing Data

Goodness of fit assessment of item response theory models

Testing for Granger causality between stock prices and economic growth

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Data Mining: Algorithms and Applications Matrix Math Review

Standard errors of marginal effects in the heteroskedastic probit model

SPSS Guide: Regression Analysis

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Manufacturing Service Quality: An Internal Customer Perspective

Study Guide for the Final Exam

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

HYPOTHESIS TESTING: POWER OF THE TEST

Association Between Variables

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Introduction to Fixed Effects Methods

UNIVERSITY OF WAIKATO. Hamilton New Zealand

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

T-test & factor analysis

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Survey Research: Choice of Instrument, Sample. Lynda Burton, ScD Johns Hopkins University

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Chapter 6: Multivariate Cointegration Analysis

Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Sample Size Calculation for Longitudinal Studies

The Technology Acceptance Model with Online Learning for the Principals in Elementary Schools and Junior High Schools

13. Poisson Regression Analysis

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Predict the Popularity of YouTube Videos Using Early View Data

Topic 5: Stochastic Growth and Real Business Cycles

Regression Analysis: A Complete Example

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Multilevel Models for Longitudinal Data. Fiona Steele

Exploratory Factor Analysis Brian Habing - University of South Carolina - October 15, 2003

17. SIMPLE LINEAR REGRESSION II

Introduction to Longitudinal Data Analysis

SYSTEMS OF REGRESSION EQUATIONS

Estimating an ARMA Process

Illustration (and the use of HLM)

VI. Real Business Cycles Models

A Basic Introduction to Missing Data

Transcription:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2007, The Johns Hopkins University and Qian-Li Xue. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS ; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.

Introduction to Path Analysis Statistics for Psychosocial Research II: Structural Models Qian-Li Xue

Outline Key components of path analysis Path Diagrams Decomposing covariances and correlations Direct, Indirect, and Total Effects Parameter estimation Model identification Model fit statistics

The origin of SEM Swell Wright, a geneticist in 920s, attempted to solve simultaneous equations to disentangle genetic influences across generations ( path analysis ) Gained popularity in 960, when Blalock, Duncan, and others introduced them to social science (e.g. status attainment processes) The development of general linear models by Joreskog and others in 970s ( LISREL models, i.e. linear structural relations)

Difference between path analysis and structural equation modeling (SEM) Path analysis is a special case of SEM Path analysis contains only observed variables and each variable only has one indicator Path analysis assumes that all variables are measured without error SEM uses latent variables to account for measurement error Path analysis has a more restrictive set of assumptions than SEM (e.g. no correlation between the error terms) Most of the models that you will see in the literature are SEM rather than path analyses

Path Diagram Path diagrams: pictorial representations of associations (Sewell Wright, 920s) Key characteristics: As developed by Wright, refer to models that are linear in the parameters (but they can be nonlinear in the variables) Exogenous variables: their causes lie outside the model Endogenous variables: determined by variables within the model May or may not include latent variables for now, we will focus on models with only manifest (observed) variables, and will introduce latent variables in the next lecture.

Regression Example Standard equation format for a regression equation: Y = α + γ X + γ 2 X 2 + γ 3 X 3 + γ 4 X 4 + γ 5 X 5 + γ 6 X 6 + ζ

Regression Example x x 2 x 3 Y x 4 x 5 x 6

Regression Example x x 2 ζ x 3 Y x 4 x 5 x 6

Regression Example x x 2 ζ x 3 Y x 4 x 5 x 6

Path Diagram: Common Notation x Yrs of work Age ζ 3 ζ 2 Labor activism y 3 y 2 sentiment Union Sentiment Example (McDonald and Clelland, 984) x 2 ζ y Deference to managers Noncausal associations between exogenous variables indicated by two-headed arrows Causal associations represented by unidirectional arrows extended from each determining variable to each variable dependent on it Residual variables are represented by unidirectional arrows leading from the residual variable to the dependent variable

Path Diagram: Common Notation Gamma (γ): Coefficient of association between an exogenous and endogenous variable Beta (β): Coefficient of association between two endogenous variables Zeta (ζ): Error term for endogenous variable Subscript protocol: first number refers to the destination variable, while second number refers to the origination variable. ζ ζ 2 x γ y β 2 y2 x2 γ 2

Path Diagram: Rules and Assumptions Endogenous variables are never connected by curved arrows Residual arrows point at endogenous variables, but not exogenous variables Causes are unitary (except for residuals) Causal relationships are linear

Path Model: Mediation x ζ x 2 Y ζ 2 ζ 4 x 3 Y 2 Y 4 x 4 ζ 3 x 5 Y 3 x 6

Path Diagram Can you depict a path diagram for this system of equations? y = γ x + γ 2 x 2 + β 2 y 2 + ζ y 2 = γ 2 x + γ 22 x 2 + β 2 y + ζ 2

Path Diagram How about this one? y = γ 2 x 2 + β 2 y 2 + ζ y 2 = γ 2 x + γ 22 x 2 + ζ 2 y 3 = γ 3 x + γ 32 x 2 + β 3 y + β 32 y 2 + ζ 3

Wright s Rules for Calculating Total Association Total association: simple correlation between x and y For a proper path diagram, the correlation between any two variables = sum of the compound paths connecting the two variables Wright s Rule for a compound path: ) no loops 2) no going forward then backward 3) a maximum of one curved arrow per path

Example: compound path (Loehlin, page 9, Fig..6) A B A A B C ζ d ζ c C ζ b B C ζ c ζ d ζ e ζ f D E ζ e D ζ d D E F (a) (b) (c)

Total Association: Calculation f A B C a b D ζ d d ζ e E g e h c Loehlin, page 0, Fig..7 ζ f F What is correlation between A and D? What is correlation between A and E? What is correlation between A and F? What is correlation between B and F? a+fb ad+fbd+hc ade+fbde+hce gce+fade+bde

Direct, Indirect, and Total Effects Path analysis distinguishes three types of effects: Direct effects: association of one variable with another net of the indirect paths specified in the model Indirect effects: association of one variable with another mediated through other variables in the model computed as the product of paths linking variables Total effect: direct effect plus indirect effect(s) Note: the decomposition of effects always is modeldependent!!!

Direct, Indirect, and Total Effects Example: Does association of education with adult depression represent the influence of contemporaneous social stressors? E.g., unemployment, divorce, etc. Every longitudinal study shows that education affects depression, but not vice-versa Use data from the National Child Development Survey, which assessed a birth cohort of about 0,000 individuals for depression at age 23, 33, and 43.

Direct, Indirect, and Total Effects: Example dep23 e.47.25 -.29.50 schyrs2 schyrs43 -.08.24 dep33 e2.34 -.0.46 dep43 e3.37

Direct, Indirect, and Total Effects: Example.47 schyrs2 on dep33 : dep23 e Direct effect = -0.08.25 -.29.50 Indirect effect = -0.29*0.50 =-0.45 schyrs43 schyrs2 -.08.24 dep33 e2.34 Total effect = -0.08+(-0.29*0.50 ) -.0 dep43.46 e3.37 What is the indirect effects of schyrs2 on dep43?

Path Analysis Key assumptions of path analysis: E(ζi)=0: mean value of disturbance term is 0 cov(ζ i, ζ j )=0: no autocorrelation between the disturbance terms var(ζ i X i )= σ 2 cov(ζ i,x i )=0

The Fundamental hypothesis for SEM The covariance matrix of the observed variables is a function of a set of parameters (of the model) (Bollen) If the model is correct and parameters are known: Σ=Σ(θ) where Σ is the population covariance matrix of observed variables; Σ(θ) is the model-based covariance matrix written as a function of θ

Decomposition of Covariances and Correlations λ ξ λ 2 λ λ 3 4 X X 2 X 3 X 4 δ δ 2 δ 3 δ 4 and Cov( ξ, δ ) = 0 cov( X = λ λ 4, ϕ whereϕ X 4, = ) = cov( λ var( ξ ) X X X X or in matrix notation : X E( δ ) = 0, Var( ξ ) = Φ, Var( δ ) ξ + δ, λ 2 3 4 = λ = λ = λ = λ 2 3 4 ξ + δ = Λ ξ + δ x 4 ξ + δ ξ + δ 2 3 ξ + δ 4 ξ + δ ) 4 = Θ δ,

Decomposition of Covariances and Correlations λ ξ λ 2 λ λ 3 4 XX = ( Λ )( ) xξ + δ Λ xξ + δ ( )( ) δ δ 2 δ 3 δ 4 = Λ ξ + δ ξ Λ + δ X X 2 X 3 X 4 X = Λ = Λ Σ = ξ + δ ξξ Λ x x x Cov( X ) E( XX ) = Σ = E( XX ) x + Λ = Λ ξδ + δξ Λ x x x ΦΛ x + Θ x δ + δδ Covariance matrix of ξ Covariance matrix of δ

Path Model: Estimation Model hypothesis: Σ = Σ (θ) But, we don t observe Σ, instead, we have sample covariance matrix of the observed variables: S Estimation of Path Models: Choose θ so that Σ (θ) is close to S

Path Model: Estimation ) Solve system of equations γ β 2 X Y Y 2 ζ ζ 2 a) Using covariance algebra Σ Y = γx + ζ Y2 = β2y + ζ2 Σ(θ) var(y ) γ 2 var(x ) + var(ζ ) cov(y 2,Y ) var(y 2 ) = β 2 (γ 2 var(x ) + var(ζ )) β 22 (γ 2 var(x ) + var(ζ )) + var(ζ 2 ) cov(x,y ) cov(x,y 2 ) var(x ) γ var(x ) β 2 γ var(x ) var(x )

Path Model: Estimation ) Solve system of equations γ β 2 X Y Y 2 ζ ζ 2 Y = γx + ζ Y2 = β2y + ζ2 b) Using Wright s rules based on correlation matrix Σ Σ(θ) cor(y 2,Y ) = β 2 cor(x,y ) cor(x,y 2 ) γ β 2 γ

Path Model: Estimation 2) Write a computer program to estimate every single possible combination of parameters possible, and see which fits best (i.e. minimize a discrepancy function of S-Σ(θ) evaluated at θ) 3) Use an iterative procedure (See Figure 2.2 on page 38 of Loehlin s Latent Variable Models)

Model Identification Identification: A model is identified if is theoretically possible to estimate one and only one set of parameters. Three helpful rules for path diagrams: t-rule necessary, but not sufficient rule the number of unknown parameters to be solved for cannot exceed the number of observed variances and covariances to be fitted analogous to needing an equation for each parameter number of parameters to be estimated: variances of exogenous variables, variances of errors for endogenous variables, direct effects, and double-headed arrows number of variances and covariances, computed as: n(n+)/2, where n is the number of observed exogenous and endogenous variables

Model Identification Just identified: # equations = # unknowns Under-identified: # equations < # unknowns Over-identified: # equations > # unknowns d d A a b B A a b B ζ c ζ d C c D ζ d ζ d c D C e E ζ e

Model Fit Statistics Goodness-of-fit tests based on predicted vs. observed covariances:. χ 2 tests d.f.=(# non-redundant components in S) (# unknown parameter in the model) Null hypothesis: lack of significant difference between Σ(θ) and S Sensitive to sample size Sensitive to the assumption of multivariate normality χ 2 tests for difference between NESTED models 2. Root Mean Square Error of Approximation (RMSEA) A population index, insensitive to sample size No specification of baseline model is needed Test a null hypothesis of poor fit Availability of confidence interval <0.0 good, <0.05 very good (Steiger, 989, p.8) 3. Standardized Root Mean Residual (SRMR) Squared root of the mean of the squared standardized residuals SRMR = 0 indicates perfect fit, <.05 good fit, <.08 adequate fit

Model Fit Statistics Goodness-of-fit tests comparing the given model with an alternative model. Comparative Fit Index (CFI; Bentler 989) compares the existing model fit with a null model which assumes uncorrelated variables in the model (i.e. the "independence model") Interpretation: % of the covariation in the data can be explained by the given model CFI ranges from 0 to, with indicating a very good fit; acceptable fit if CFI>0.9 2. The Tucker-Lewis Index (TLI) or Non-Normed Fit Index (NNFI) Relatively independent of sample size (Marsh et al. 988, 996) NNFI >=.95 indicates a good model fit, <0.9 poor fit More about these later