An introduction to Growth Mixture Modeling using Mplus

An introduction to Growth Mixture Modeling using Mplus Jacques Juhel University Rennes 2, CRPCC, EA 1285 -Nanterre 1

General latent variable framework - Implemented in Mplus 6 program Muthén and Muthén (1998-2010) - Latent Growth Curve modeling / SEM is linked to Random Coefficient Growth Modeling / Multilevel modeling - Latent Growth Curve modeling (single population) is a case of Growth Mixture Modeling. 2

Mplus files Three primary files - The data file; - The program file; - The analysis output file. 3

Mplus has very limited data management capabilities Data files - Data files in ASCII format, in an external file. - No more that 500 variables in a data set. - No variable names in the data set. Names are described in the program file. - The estimator chosen for an analysis determines the type of data required for the analysis. - Individual data can be in fixed or free format (default.) - Summary data (matrix data) must be in free format. - Free format requires a comma, space, or tab delimit. 4

Mplus command language Different commands divided into a series of sections - Data and variable commands are required for every analysis. - All commands must begin on a new line and be followed by a colon. - Semicolons separate command options. - Lines in the program file cannot exceed 80 columns. - User comments preceded by!. - The keywords IS, ARE and = are interchangeable. 5

Mplus command language Different commands divided into a series of sections (chap. 15) Title : Gives identifying title to the analysis. Data : Identifies the location and name of the data set to be analyzed. Variable : Names and describes the variables in the data set to be analyzed. Define : Provides the ability to transform existing variables and to create new variables. 6

Mplus command language Different commands divided into a series of sections (chap. 16) Analysis : Describes the type of analysis to be performed. 7

Mplus command language Different commands divided into a series of sections (chap. 17) Model : Describes the relationships in the model. 8

Mplus command language Different commands divided into a series of sections (chap. 18) Output : Specifies options to customize the output. Savedata : Saves the analysis data and/or model results in ASCII files. Montecarlo : Defines the specifications of a Monte Carlo analysis. 9

Mplus command language DATA command - File name and location of the data file - Format data file format - Type (individual, cov, corr, means, stdeviations) - Noobservations (summary data) - Ngroups number of groups 10

Mplus command language VARIABLE command - Names names of variables in the data set - Useobservations selects observations - Usevariables variables to be analyzed - Missing indicates missing values for each variable (any numeric value or period, asterisk, or blank) - Categorical names categorical dependent variables - Classes specifies the number of latent classes in a model and assigns names to the categorical latent variable 11

Mplus command language ANALYSIS command Describes the type of analysis, the statistical estimator, the matrix to be analyzed, and the specifics of computational algorithms. - Type = basic; Type =mixture; etc. - Estimator = ML, MLR, WLSMV, etc. (frequentist and bayesian estimation; cf. pp. 532-534) 12

Mplus command language MODEL command Describes the specific model to be estimated. BY measured by ksi1 BY y1 y2 y3 y4;!the factor loadings on the right side of the BY statement are freely estimated, except for the first variable (lambda_1_1=1). Residual variances are estimated. Residual covariances are fixed to zero. 13

Mplus command language MODEL command Describes the specific model to be estimated. ON regressed on y1 ON x1; ksi2 ON ksi1;!regression from ksi2 on ksi1 c#1 ON x;!logistic regression c#1 c#2 ON x;!multinomial logistic regression of the categorical latent variable c (k=3) on the covariate x 14

Mplus command language MODEL command Describes the specific model to be estimated. WITH correlated with x1 WITH x2 x3 x4 x5 Covariances can be specified among independent variables, among residuals of dependent variables. 15

Mplus command language MODEL command Fixing and Freeing Parameters and Assigning Start Values: f1 BY y1@1 y2 y3*0.5 y4; Constraining parameter values to be equal: f1 ON x1-x5 (1) 16

Mplus command language OUTPUT command Standardized : standardized parameter estimates and their standard errors (cf. p. 641). Residual : Modindices : Cinterval : frequentist and bayesian Tech1: to request the arrays containing parameter specifications Tech4: estimated means, covariances, and correlations for the latent variables in the model. Tech11: with type=mixture to request the LMR-LRT. Tech12: with type=mixture to request residuals Tech14: with type=mixture to request a parametric bootstrapped LRT. 17

Mplus command language SAVEDATA command Factor scores, posterior probabilities, and most likely class membership for each response pattern, outliers, etc. SAVE = FSCORES; SAVE = CPROBABILITIES; 18

Mplus command language PLOT command 19

SEM growth model Individual development over time y = η + η + ε ti 0i 1i i η = α + γ w + ζ (1) 0 i 0 0 i 0 i ( 2 a ) η = α + γ w + ζ 1i 1 1 i 1 i ( 2 b ) 20

SEM growth model SEM growth models - Time-scores are parameters - Time-varying covariates have fixed effect coefficients Multilevel and mixed linear growth models - Time-scores are data - Time-varying covariates have random effect coefficients p = # of repeated measurements on the ECLS-K math proficiency test q = # of growth factors k = # of time-varying predictors S = # of time-invariant predictors 21

SEM growth model Measurement portion y = + x + w + η η κ ε ti 0i 1i t t ti ti x t are parameters, slopes for time-varying covariates vary over time-points. Structural portion η = α + γ w + ζ 0i 0 0 i 0i η = α + γ w + ζ ( 2 a ) 1i 1 1 i 1 i ( 2 b ) 22

SEM growth model Data arranged as wide vs long - Wide: multivariate (single level) approach y = i + s time + ε ti i i ti ti - Long: two-level approach 23

SEM growth model Two options for handling the relationship between the outcome and time. 1) SEM : allows time scores to be parameters in the model so that the growth function can be estimated. 2) ML : allows time to be a variable that reflects individually-varying times of observations. This variable has a random slope. Random effects in the form of random slopes are also used to represent individual variation in the influence of time-varying covariates on outcomes. 24

Steps in growth modeling -Descriptive statistics. -Shape of the growth curve. -Fit model using fixed time scores without covariates. -Add eventually- correlated residuals. -Add covariates: -Time-invariant covariates -Time-varying covariates 25

Example 1 : LGM for continuous data with time-varying covariates. Manuel : p. 114 Ex.: Lgm_cont_tv.inp SEM growth model 26

SEM growth model Example 2 : LGM for continuous data with individually timevarying covariates Manuel : p. 116 Ex.: Lgm_cont_itv.inp 27

LGM for categorical data Threshold SEM growth model Mplus thinks of binary variable (resp. ordinal) as being a dichotomised (resp. polychotomised) continuous latent variable. The point at which a continuous N(0,1) variable must be cut to create a binary variable is called a threshold. - A binary variable with 50% cases corresponds to a threshold of zero. - A binary variable with 2.5% cases corresponds to a threshold of 1.96. 28

SEM growth model LGM with categorical outcomes Measurement invariance of the outcome over time is represented by the equality of thresholds across time points: [y1$1 y2$1 y3$1 y4$1] (1); Differences in the variances of the outcome over time are representing by allowing scale factors for continuous latent response variables of observed categorical dependent variables to vary over time: {y1@1.0 y2* y3* y4*}. 29

Example 3 : LGM for categorical data VARIABLE: Names are Y1 Y2 Y3 Y4; Usevar = Y1-Y4; Categorical are Y1-Y4; SEM growth model MODEL: i s Y1@0 Y2@1 Y3@2 y4@3; [Y1$1 Y2$1 Y3$1 Y4$1] (1); [Y1$2 Y2$2 Y3$2 Y4$2] (2); {y1@1.0 y2* y3* y4*}; Manuel : pp. 107 Ex.: Lgm_cat_delta.inp (see also Lgm_cat_ML.inp) 30

SEM growth model LGM with categorical outcomes 1) DELTA parameterization (default) : scale factors for continuous latent response variables of observed categorical outcome variables are allowed to be parameters in the model, but residual variances for continuous latent response variables are not. 2) THETA parameterization (when hypotheses involving residual variances are of interest) : residual variances for continuous latent response variables of observed categorical outcome variables are allowed to be parameters in the model, but scale factors for continuous latent response variables are not. 31

SEM growth model Example 4 : LGM for categorical data VARIABLE: Names are Y1 Y2 Y3 Y4; Usevar = Y1-Y4; Categorical are Y1-Y4; ANALYSIS: MODEL: parameterization=theta; i s Y1@0 Y2@1 Y3@2 y4@3; y1@0;!just for illustration Manuel : pp. 108 Ex.: Lgm_cat_theta.inp 32

Latent Class Growth Analysis LCGA -Estimate trajectory shapes -Estimate trajectory class probabilities -Relate class probabilities to covariates -Classify individual into classes (posterior probabilities) 33

Latent Class Growth Analysis Example 5 : LCGA for continuous data LCGA is carried out when type=mixture is selected without algorithm=integration. Ex.: LCGA_M1.inp 34

Thorough job Comparing models with different numbers of classes 1) BIC, SABIC 2) LMR-LRT (tech11) ; Bootstrapped LRT (tech14): heavy! 3) Examine global entropy : ideally approaching 1. 4) Examine the assignment probabilities for each individual pattern. 35

How to obtain Bootstrap LRT? Bootstrap Likelihood Ratio Test The BLRT empirically estimates the difference distribution providing a p-value for the observed difference which can be used in tests of model fit. 1) Fit k-class model. 2) Select an optseed which replicates the solution with the lowest likelihood for k-classes. 3) Re-fit k-class model using optseed, specifying tech14 in the output section and selecting an appropriate number of runs for bootstrapping k-1starts = 100 20; lrtstarts 0 0 250 50; lrtbootstrap 100; Ensure that optimal k-1 class model has been replicated. 36

Latent Class Growth Analysis Example 6 : LCGA for categorical outcome LCGA is carried out when type=mixture is selected without algorithm=integration. Ex.: LCGA_cat_no_int.inp 37

Growth Mixture Analysis GMM Estimate trajectory shapes Estimate trajectory class probabilities Estimate variation within class Relate class probabilities to covariates Relate within-class variation to covariates Classify individual into classes (posterior probabilities) 38

Growth Mixture Analysis Example 7 : GMM for categorical outcome LCGA_M2.inp (not a good name!) 39

References 1) Muthén, B. (2004). Latent variable analysis: GMM and related techniques for longitudinal data. In D. Kaplan (Ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage. 2) Mplus short courses (free at Statmodel.com) -topic 3 : Introductory and intermediate growth models -topic 5 : Categorical latent variable modeling using Mplus: longitudinal data. 40