An introduction to Growth Mixture Modeling using Mplus

Similar documents
CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

An Introduction to Latent Class Growth Analysis and Growth Mixture Modeling

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

The Latent Variable Growth Model In Practice. Individual Development Over Time

CHAPTER 13 EXAMPLES: SPECIAL FEATURES

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Department of Psychology Washington State University. April 7 th, Katie Witkiewitz, PhD

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Psychology 209. Longitudinal Data Analysis and Bayesian Extensions Fall 2012

Introducing the Multilevel Model for Change

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study

Latent Class Regression Part II

Introduction to Longitudinal Data Analysis

Mplus Tutorial August 2012

Converting an SPSS Data File to Mplus. by Paul F. Tremblay September 2013

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

APPLIED MISSING DATA ANALYSIS

Handling attrition and non-response in longitudinal data

STA 4273H: Statistical Machine Learning

A Brief Introduction to SPSS Factor Analysis

Analyzing Structural Equation Models With Missing Data

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide

Lies, damned lies and latent classes: Can factor mixture models allow us to identify the latent structure of common mental disorders?

HLM software has been one of the leading statistical packages for hierarchical

Department of Epidemiology and Public Health Miller School of Medicine University of Miami

Power and sample size in multilevel modeling

DISCRIMINANT FUNCTION ANALYSIS (DA)

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Introduction to Data Analysis in Hierarchical Linear Models

A REVIEW OF CURRENT SOFTWARE FOR HANDLING MISSING DATA

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Overview of Factor Analysis

Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; and Dr. J.A. Dobelman

Using Excel for Statistical Analysis

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

lavaan: an R package for structural equation modeling

Imputing Missing Data using SAS

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

Statistical Machine Learning

Longitudinal Meta-analysis

Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Statistics Graduate Courses

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Use of deviance statistics for comparing models

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Comparison of Estimation Methods for Complex Survey Data Analysis

Additional sources Compilation of sources:

Specification of Rasch-based Measures in Structural Equation Modelling (SEM) Thomas Salzberger

Simple Predictive Analytics Curtis Seare

Logistic Regression (1/24/13)

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Module 3: Correlation and Covariance

Binary Logistic Regression

Factor Analysis. Factor Analysis

SUGI 29 Statistics and Data Analysis

BayesX - Software for Bayesian Inference in Structured Additive Regression

Multivariate Analysis. Overview

Gerry Hobbs, Department of Statistics, West Virginia University

Regression Modeling Strategies

Generalized Linear Models

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

SAS Certificate Applied Statistics and SAS Programming

Assignments Analysis of Longitudinal data: a multilevel approach

Supplementary PROCESS Documentation

Linear Classification. Volker Tresp Summer 2015

11. Time series and dynamic linear models

Everything You Wanted to Know about Moderation (but were afraid to ask) Jeremy F. Dawson University of Sheffield

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Credit Risk Analysis Using Logistic Regression Modeling

SAS Syntax and Output for Data Manipulation:

Multilevel Models for Longitudinal Data. Fiona Steele

II. DISTRIBUTIONS distribution normal distribution. standard scores

Gamma Distribution Fitting

Multivariate Statistical Inference and Applications

Moderation. Moderation

Introduction to Principal Components and FactorAnalysis

Chapter 7 Factor Analysis SPSS

Didacticiel - Études de cas

Problem of Missing Data

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

11. Analysis of Case-control Studies Logistic Regression

Rens van de Schoot a b, Peter Lugtig a & Joop Hox a a Department of Methods and Statistics, Utrecht

Latent Class Growth Modelling: A Tutorial

Master programme in Statistics

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

2015 TUHH Online Summer School: Overview of Statistical and Path Modeling Analyses

Statistics in Retail Finance. Chapter 6: Behavioural models

Interpretation of Somers D under four simple models

Homework 8 Solutions

Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations

Transcription:

An introduction to Growth Mixture Modeling using Mplus Jacques Juhel University Rennes 2, CRPCC, EA 1285 -Nanterre 1

General latent variable framework - Implemented in Mplus 6 program Muthén and Muthén (1998-2010) - Latent Growth Curve modeling / SEM is linked to Random Coefficient Growth Modeling / Multilevel modeling - Latent Growth Curve modeling (single population) is a case of Growth Mixture Modeling. 2

Mplus files Three primary files - The data file; - The program file; - The analysis output file. 3

Mplus has very limited data management capabilities Data files - Data files in ASCII format, in an external file. - No more that 500 variables in a data set. - No variable names in the data set. Names are described in the program file. - The estimator chosen for an analysis determines the type of data required for the analysis. - Individual data can be in fixed or free format (default.) - Summary data (matrix data) must be in free format. - Free format requires a comma, space, or tab delimit. 4

Mplus command language Different commands divided into a series of sections - Data and variable commands are required for every analysis. - All commands must begin on a new line and be followed by a colon. - Semicolons separate command options. - Lines in the program file cannot exceed 80 columns. - User comments preceded by!. - The keywords IS, ARE and = are interchangeable. 5

Mplus command language Different commands divided into a series of sections (chap. 15) Title : Gives identifying title to the analysis. Data : Identifies the location and name of the data set to be analyzed. Variable : Names and describes the variables in the data set to be analyzed. Define : Provides the ability to transform existing variables and to create new variables. 6

Mplus command language Different commands divided into a series of sections (chap. 16) Analysis : Describes the type of analysis to be performed. 7

Mplus command language Different commands divided into a series of sections (chap. 17) Model : Describes the relationships in the model. 8

Mplus command language Different commands divided into a series of sections (chap. 18) Output : Specifies options to customize the output. Savedata : Saves the analysis data and/or model results in ASCII files. Montecarlo : Defines the specifications of a Monte Carlo analysis. 9

Mplus command language DATA command - File name and location of the data file - Format data file format - Type (individual, cov, corr, means, stdeviations) - Noobservations (summary data) - Ngroups number of groups 10

Mplus command language VARIABLE command - Names names of variables in the data set - Useobservations selects observations - Usevariables variables to be analyzed - Missing indicates missing values for each variable (any numeric value or period, asterisk, or blank) - Categorical names categorical dependent variables - Classes specifies the number of latent classes in a model and assigns names to the categorical latent variable 11

Mplus command language ANALYSIS command Describes the type of analysis, the statistical estimator, the matrix to be analyzed, and the specifics of computational algorithms. - Type = basic; Type =mixture; etc. - Estimator = ML, MLR, WLSMV, etc. (frequentist and bayesian estimation; cf. pp. 532-534) 12

Mplus command language MODEL command Describes the specific model to be estimated. BY measured by ksi1 BY y1 y2 y3 y4;!the factor loadings on the right side of the BY statement are freely estimated, except for the first variable (lambda_1_1=1). Residual variances are estimated. Residual covariances are fixed to zero. 13

Mplus command language MODEL command Describes the specific model to be estimated. ON regressed on y1 ON x1; ksi2 ON ksi1;!regression from ksi2 on ksi1 c#1 ON x;!logistic regression c#1 c#2 ON x;!multinomial logistic regression of the categorical latent variable c (k=3) on the covariate x 14

Mplus command language MODEL command Describes the specific model to be estimated. WITH correlated with x1 WITH x2 x3 x4 x5 Covariances can be specified among independent variables, among residuals of dependent variables. 15

Mplus command language MODEL command Fixing and Freeing Parameters and Assigning Start Values: f1 BY y1@1 y2 y3*0.5 y4; Constraining parameter values to be equal: f1 ON x1-x5 (1) 16

Mplus command language OUTPUT command Standardized : standardized parameter estimates and their standard errors (cf. p. 641). Residual : Modindices : Cinterval : frequentist and bayesian Tech1: to request the arrays containing parameter specifications Tech4: estimated means, covariances, and correlations for the latent variables in the model. Tech11: with type=mixture to request the LMR-LRT. Tech12: with type=mixture to request residuals Tech14: with type=mixture to request a parametric bootstrapped LRT. 17

Mplus command language SAVEDATA command Factor scores, posterior probabilities, and most likely class membership for each response pattern, outliers, etc. SAVE = FSCORES; SAVE = CPROBABILITIES; 18

Mplus command language PLOT command 19

SEM growth model Individual development over time y = η + η + ε ti 0i 1i i η = α + γ w + ζ (1) 0 i 0 0 i 0 i ( 2 a ) η = α + γ w + ζ 1i 1 1 i 1 i ( 2 b ) 20

SEM growth model SEM growth models - Time-scores are parameters - Time-varying covariates have fixed effect coefficients Multilevel and mixed linear growth models - Time-scores are data - Time-varying covariates have random effect coefficients p = # of repeated measurements on the ECLS-K math proficiency test q = # of growth factors k = # of time-varying predictors S = # of time-invariant predictors 21

SEM growth model Measurement portion y = + x + w + η η κ ε ti 0i 1i t t ti ti x t are parameters, slopes for time-varying covariates vary over time-points. Structural portion η = α + γ w + ζ 0i 0 0 i 0i η = α + γ w + ζ ( 2 a ) 1i 1 1 i 1 i ( 2 b ) 22

SEM growth model Data arranged as wide vs long - Wide: multivariate (single level) approach y = i + s time + ε ti i i ti ti - Long: two-level approach 23

SEM growth model Two options for handling the relationship between the outcome and time. 1) SEM : allows time scores to be parameters in the model so that the growth function can be estimated. 2) ML : allows time to be a variable that reflects individually-varying times of observations. This variable has a random slope. Random effects in the form of random slopes are also used to represent individual variation in the influence of time-varying covariates on outcomes. 24

Steps in growth modeling -Descriptive statistics. -Shape of the growth curve. -Fit model using fixed time scores without covariates. -Add eventually- correlated residuals. -Add covariates: -Time-invariant covariates -Time-varying covariates 25

Example 1 : LGM for continuous data with time-varying covariates. Manuel : p. 114 Ex.: Lgm_cont_tv.inp SEM growth model 26

SEM growth model Example 2 : LGM for continuous data with individually timevarying covariates Manuel : p. 116 Ex.: Lgm_cont_itv.inp 27

LGM for categorical data Threshold SEM growth model Mplus thinks of binary variable (resp. ordinal) as being a dichotomised (resp. polychotomised) continuous latent variable. The point at which a continuous N(0,1) variable must be cut to create a binary variable is called a threshold. - A binary variable with 50% cases corresponds to a threshold of zero. - A binary variable with 2.5% cases corresponds to a threshold of 1.96. 28

SEM growth model LGM with categorical outcomes Measurement invariance of the outcome over time is represented by the equality of thresholds across time points: [y1$1 y2$1 y3$1 y4$1] (1); Differences in the variances of the outcome over time are representing by allowing scale factors for continuous latent response variables of observed categorical dependent variables to vary over time: {y1@1.0 y2* y3* y4*}. 29

Example 3 : LGM for categorical data VARIABLE: Names are Y1 Y2 Y3 Y4; Usevar = Y1-Y4; Categorical are Y1-Y4; SEM growth model MODEL: i s Y1@0 Y2@1 Y3@2 y4@3; [Y1$1 Y2$1 Y3$1 Y4$1] (1); [Y1$2 Y2$2 Y3$2 Y4$2] (2); {y1@1.0 y2* y3* y4*}; Manuel : pp. 107 Ex.: Lgm_cat_delta.inp (see also Lgm_cat_ML.inp) 30

SEM growth model LGM with categorical outcomes 1) DELTA parameterization (default) : scale factors for continuous latent response variables of observed categorical outcome variables are allowed to be parameters in the model, but residual variances for continuous latent response variables are not. 2) THETA parameterization (when hypotheses involving residual variances are of interest) : residual variances for continuous latent response variables of observed categorical outcome variables are allowed to be parameters in the model, but scale factors for continuous latent response variables are not. 31

SEM growth model Example 4 : LGM for categorical data VARIABLE: Names are Y1 Y2 Y3 Y4; Usevar = Y1-Y4; Categorical are Y1-Y4; ANALYSIS: MODEL: parameterization=theta; i s Y1@0 Y2@1 Y3@2 y4@3; y1@0;!just for illustration Manuel : pp. 108 Ex.: Lgm_cat_theta.inp 32

Latent Class Growth Analysis LCGA -Estimate trajectory shapes -Estimate trajectory class probabilities -Relate class probabilities to covariates -Classify individual into classes (posterior probabilities) 33

Latent Class Growth Analysis Example 5 : LCGA for continuous data LCGA is carried out when type=mixture is selected without algorithm=integration. Ex.: LCGA_M1.inp 34

Thorough job Comparing models with different numbers of classes 1) BIC, SABIC 2) LMR-LRT (tech11) ; Bootstrapped LRT (tech14): heavy! 3) Examine global entropy : ideally approaching 1. 4) Examine the assignment probabilities for each individual pattern. 35

How to obtain Bootstrap LRT? Bootstrap Likelihood Ratio Test The BLRT empirically estimates the difference distribution providing a p-value for the observed difference which can be used in tests of model fit. 1) Fit k-class model. 2) Select an optseed which replicates the solution with the lowest likelihood for k-classes. 3) Re-fit k-class model using optseed, specifying tech14 in the output section and selecting an appropriate number of runs for bootstrapping k-1starts = 100 20; lrtstarts 0 0 250 50; lrtbootstrap 100; Ensure that optimal k-1 class model has been replicated. 36

Latent Class Growth Analysis Example 6 : LCGA for categorical outcome LCGA is carried out when type=mixture is selected without algorithm=integration. Ex.: LCGA_cat_no_int.inp 37

Growth Mixture Analysis GMM Estimate trajectory shapes Estimate trajectory class probabilities Estimate variation within class Relate class probabilities to covariates Relate within-class variation to covariates Classify individual into classes (posterior probabilities) 38

Growth Mixture Analysis Example 7 : GMM for categorical outcome LCGA_M2.inp (not a good name!) 39

References 1) Muthén, B. (2004). Latent variable analysis: GMM and related techniques for longitudinal data. In D. Kaplan (Ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage. 2) Mplus short courses (free at Statmodel.com) -topic 3 : Introductory and intermediate growth models -topic 5 : Categorical latent variable modeling using Mplus: longitudinal data. 40