SHORT COURSE ON MPLUS Getting Started with Mplus HANDOUT

Transcription

1 SHORT COURSE ON MPLUS Getting Started with Mplus HANDOUT Instructor: Cathy Zimmer , INTRODUCTION a) Who am I? Who are you? b) Overview of Course i) Mplus capabilities ii) The Mplus interface and Mplus files Data files, program files, analysis output, diagram files iii) Programming in Mplus Basic commands and options iv) Example c) This course is designed to provide a basic introduction to Mplus and will get you started working in the program. The Odum Institute has documentation and consultation resources for specific problems or help with intermediate and advanced usage of Mplus. **The Mplus website is let s see what is there! (Use CNTL-F to search.) Mplus CAPABILITIES a) Mplus estimates a variety of models for continuous and categorical observed variables as well as continuous and categorical latent (unobserved) variables b) Types of Analysis i) Linear regression (continuous outcomes) ii) Probit regression (binary and ordered outcomes) iii) Logistic regression (binary, ordered, and unordered outcomes) iv) Loglinear modeling/poisson or negative binomial regression (count) v) Path analysis vi) Exploratory and confirmatory factor analysis vii) Structural equation modeling viii) Mixture modeling ix) Latent class analysis x) Growth modeling xi) Multiple group analysis xii) Multilevel or hierarchical modeling xiii) Discrete and continuous time survival analysis xiv) Monte Carlo studies (data simulation) c) Features i) Missing Data ii) Sampling/frequency weighting and complex sample analysis (stratification, clustering) iii) Multiple imputation and analyzing multiple data sets from multiple imputation iv) Modeling with means, intercepts, and thresholds v) Random intercepts and slopes vi) Latent variable interactions and non-linear factor analysis AND MORE! 1

2 THE Mplus INTERFACE AND Mplus FILES a) To open Mplus: i) Double click on the Mplus icon on your desktop in the Quantitative Statistics folder. ii) Or go to start All Programs Mplus Mplus Editor b) The Program Window where you can write your Mplus programs and then save. c) Opening an existing program file (.inp extension) text file d) Save data files in same directory as program files (.dat or.csv extension) Text/ASCII file The data must reside in an external file Data must be numeric except for missing value flags; there are NO variable names in the data set -- names are described in the program file Mplus accepts no more that 500 variables in a data set e) Data file can contain individual level data or summary data in the form of a covariance or correlation matrix with or without means and/or standard deviations. Individual data can be in fixed or free format. (Free format is the default.) Summary data (matrix data) must be in free format Free format requires a comma, space, or tab delimiter. f) Run Mplus program files by clicking on the blue RUN icon on the toolbar. Mplus automatically saves an output file that contains analysis results. Mplus applies the same name as the program file (.out extension) A summary of the analysis specified this shows how Mplus has interpreted the input from your program A summary of the analysis this shows which estimator and data file were used in the analysis along with iteration and convergence information A summary of analysis results this provides tests of model fit, parameter estimates, standard errors, z-statistics, standardized parameter estimates, etc. Other information can be included in the output file if you specify this in your program Output files also contain error messages and other notes. It is very important to read through the entire output since these messages are not always easy to see g) Mplus has very limited data management capabilities. Most users clean, code, and format data in another package like SAS, Stata, or SPSS. ** Detour to learn stata2mplus command! PROGRAMMING IN Mplus: a) There are 10 programming commands altogether b) Not all commands are required to run an analysis. However, the data and variable commands are required for every analysis c) The commands may come in any order d) All commands must begin on a new line and be followed by a colon e) Semicolons separate command options (There can be more than one option per line.) f) Lines in the program file cannot exceed 80 columns g) User comments can be included and are preceded by an exclamation point h) The keywords IS, ARE and = are interchangeable (Except with the define command.) i) A hyphen (-) can be used to indicate a list of variables or numbers 2

3 Commands Title Data Variable Define Analysis Model Output Savedata Plot Montecarlo What they do: Gives identifying title to the analysis. Identifies the location and name of the data set to be analyzed. Names and describes the variables in the data set to be analyzed. Provides the ability to transform existing variables and to create new variables. Describes the type of analysis to be performed. Describes the model to be estimated. Specifies options to customize the output. Saves the analysis data and/or model results in ASCII files. Requests graphical displays of data and analysis results. Defines the specifications of a Monte Carlo analysis. The TITLE command: Allows you to specify a title for the analysis in your program. This title is printed on the output file. The title command is optional. TITLE: Clinton Thermometer Regression The DATA command: This command identifies the location of the data set to be analyzed, describes the format and type of data in the data set, and specifies the number of observations for summary form data sets. The data command is required and the file option is required. DATA: FILE IS clinton data.dat; FORMAT IS free; TYPE IS individual; 3

4 The example above specifies a data file called clinton data.dat (which is in the I:\ Mplus zimmer directory). It specifies that the data is in free format and is individual level data (not summary data). Data command options include: File name and location of the data file Format data file format Type type of data file Noobservations number of observations Ngroups number of groups Variances check for zero variances The file option is required. The format and type options are optional. The default format is free and the default type is individual (so we did not really need them in the above example.) Also, you do not need a directory specification if the data is in the same directory as the program file. With fixed data files, a FORTRAN-like format statement must be included on the format option. Data types available under the type option: Individual data matrix with observations on rows and variables on columns Covariance lower triangular covariance matrix Correlation lower triangular correlation matrix Fullcov full covariance matrix Fullcorr full correlation matrix Means - means Stdeviations standard deviations Montecarlo list of data sets Imputation list of data sets Means and standard deviations are combined with correlation matrix data. When using summary data files, the noobservations option is required. This option tells Mplus how many observations are in the analysis. The ngroups option is used when doing a multiple group analysis. The variances option is used to check that the analysis variables do not have zero variance. The VARIABLE command: This command names and describes the variables in the data set to be analyzed. This command is required and the names option is required. The options under the variable command allow you to subset the data set on the observations used or the variables used in an analysis, to specify missing values, to indicate categorical dependent variables, and to identify variables with a special function (e.g., a weight variable.) 4

5 VARIABLE: NAMES ARE clinton dem ind follow liberal moderate badoff attend age married educ faminc male white; USEVARIABLES ARE clinton age married educ faminc male white; MISSING ARE.; In the NAMES option, variable names must be listed in the order that they appear in the data set! The example above names 12 variables, but subsets 7 of the 12 variables for the analysis. Missing values are identified as. for all variables. Variable command options include: Names names of variables in the data set Useobservations selects observations Usevariables variables to be analyzed Missing indicates missing values for each variable (any numeric value or period, asterisk, or blank) Categorical names of categorical dependent variables Nominal names of unordered categorical dependent variables Count names of count dependent variables Censored names of censored dependent variables Variables with Special Functions Nominal names of unordered categorical variables Count names of count variables Grouping name of grouping variable Idvariable name of an ID variable Centering variables to be centered and method of centering Complex sample options Cluster names variable containing cluster information Strata names variable containing stratification information Weight names variable containing the case or sampling weight information The DEFINE command: Allows you to transform existing variables and to create new variables. The define command is optional. DEFINE: lginc = log(faminc); newvar = follow + attend; 5

6 The example above creates a new variable called lginc by taking the base e log of the variable faminc. It also creates newvar which is the sum of the follow and attend variables. There are many other ways to mathematically transform variables and to recode variables under the define command. Defined variables must be listed last in the USEVARIABLES option. We will not be using the define command in our regression example today. The ANALYSIS command: This command is used to describe the type of analysis, the statistical estimator, the matrix to be analyzed, and the specifics of computation. The analysis command does not need to be used if you want to use the program defaults. ANALYSIS: TYPE IS general; ESTIMATOR = ML; The example above tells Mplus that we want to do a general analysis using the maximum likelihood estimator analyzing the covariance matrix. (All of the options in the example above are the defaults and not necessary in the program.) The two important options under the Analysis command are type and estimator. The other options are used to specify computational procedures. Analysis types available under the type option: Mixture mixture modeling Twolevel multilevel modeling EFA exploratory factor analysis Logistic logistic regression General all other analyses (the default). This includes models with relationships among observed variables, among continuous latent variables, and among observed and continuous latent variables. Some common sub-options for a general type analysis include: Basic - sample and descriptive statistics Missing allows analysis of missing data Meanstructure allows estimation of means, thresholds, and intercepts Complex allows for estimation of data that are clustered The estimator option specifies the estimator to be used in the analysis. The default estimator differs depending on the type of analysis and the measurement of the dependent variable. 6

7 Available estimators: ML maximum likelihood MLM maximum likelihood, robust standard errors, & mean adjusted chi-square test statistic MLMV - maximum likelihood, robust standard errors & mean and variance adjusted chi-square test statistic MLR - maximum likelihood with robust standard errors MLF maximum likelihood w/ first order derivative standard errors WLS weighted least squares WLSM - weighted least squares, robust standard errors, & mean adjusted chi-square test statistic WLSMV - weighted least squares, robust standard errors, & mean and variance adjusted chisquare test statistic GLS generalized least squares ULS unweighted least squares There are many computational options under the estimator command including numerical integration, bootstrapping, random starts, etc. The matrix option allows you to specify whether a covariance or correlation matrix is to be analyzed. Some types of analyses require a certain type of matrix. The MODEL command: The model command describes the specific model to be estimated. The components of the model include (1) the measurement model for indicators of continuous latent variables, (2) the measurement model for indicators of categorical latent variables, (3) the structural model relating latent variables. MODEL: clinton ON age married educ faminc male white The model command has three key words: BY ( measured by ) Used to describe regression relationships in the measurement model. Defines the continuous latent variables in the model. ON ( Y regressed on X, as in linear regression) Used to describe the regression relationships in the structural model involving the continuous and categorical latent variables (and observed variables). WITH ( correlated with ) Used to describe correlation (covariance) relationships in the measurement and structural models. 7

8 An example of specifying a more complex model: y1a y2a y3a y4a x1 Fa x2 y1c x3 Fc y2c y3c x4 x5 Fb y4c y1b y2b y3b The measurement models (and the BY keyword): Fa BY y1a y2a y3a y4a Fb BY y1b y2b y3b Fc BY y1c y2c y3c y4c Measurement model defaults: 1. The factor loadings on the right side of the BY statement are freely estimated, except for the first variable (here y1a, y1b, and y1c) which has a factor loading of The start value for factor loadings is Residual variances are estimated. Residual covariances are fixed to zero. 8

9 The structural model (and the ON keyword): Fa Fb ON x1 x2 x3 x4 x5 Fc ON Fa Fb Structural model defaults: 1. The regression coefficients are freely estimated. 2. The start value for regression coefficients is Residual variances of latent variables are estimated. Residuals of the dependent latent variables are correlated if they do not influence any other variables in the model. Covariances (and the WITH keyword): x1 WITH x2 x3 x4 x5 x2 WITH x3 x4 x5 x3 WITH x4 x5 x4 WITH x5 Covariances can be specified: among independent (observed or latent) variables among residuals of dependent (observed or latent) variables The model command is also used to: Provide information about means, variances and covariances of observed and latent variables To specify the scales of unobserved variables To fix and free parameters To constrain parameters to be equal To assign start values Variances/Residual Variances/Means/Intercepts/Thresholds: Variances are estimated for independent variables and residual variances are estimated for dependent variables. (Not the case for categorical latent variables.) 9

10 Fixing and Freeing Parameters and Assigning Start Values: Fa BY y1a* y2a y3a*0.5 Here the factor loading of y1a is freed with the default start value of one (no longer fixed to 1), the y3a loading has a start value of 0.5, and the y4a loading is fixed to one. The asterisk (*) will free a parameter and assign a starting value for the estimation of that parameter. You can assign start values to variances, means, thresholds, and scales. symbol will fix any parameter at the given value. Constraining parameter values to be equal: Fa ON x1 x2 x3 x4 x5 (1) Fb ON x1 x2 x3 x4 x5 (1) Here the regression parameters are constrained to be equal for the two latent variables Fa and Fb. Place the same number in parentheses following the parameters that are to be held equal. This convention can be used for all parameters. The OUTPUT command: Allows you to specify extra output not included by default. The output command is optional. OUTPUT: sampstat cinterval standardized; The example above outputs sample statistics and confidence intervals in addition to the normal output information. Options for the output command are listed below. Some are inappropriate for certain types of analysis and cannot be used. Sampstat sample statistics Modindices modification indices, expected parameter change indices Standardized three types of standardized coefficients and R-square values Residual model estimated means, variances, and covariances and the differences between these and observed sample statistics Cinterval 95% and 99% confidence intervals for all of the parameter estimates Patterns summary of missing data patterns Fscoefficient factor score coefficients and factor score posterior covariance matrix Tech1 through Tech9 various technical outputs See the User s Guide for even more options! 10

11 The SAVEDATA command: Allows you to save data or output from an analysis in ASCII files for later use. You can output the following: individual level analysis data, sample correlation and covariance matrix, analysis results, tech3, and tech4. The savedata command is optional. SAVEDATA: FILE IS analysis.dat; SAMPLE IS sample.dat; RESULTS ARE results.dat; The example above outputs three ASCII files. The first file contains the individual level analysis data, the second file contains the sample covariance matrix, and the third file contains analysis results. The DIAGRAMMER: You can draw your model and an Mplus program is generated from it OR it produces a diagram of your model that you can edit and save AFTER running the model. To get the diagram, click on the Diagram drop-down menu, and then click on View diagram. The diagram is automatically saved, but you can save it under another name if you choose. Let me show you how to use it. 11

12 EXAMPLE FOR PRACTICE: Clinton Thermometer Regression TITLE: Clinton Thermometer Regression DATA: FILE IS clinton data.dat; FORMAT IS free; TYPE IS individual; VARIABLE: NAMES ARE clinton dem ind follow liberal moderate badoff attend age married educ faminc male white; USEVARIABLES ARE clinton age married educ faminc male white; MISSING ARE.;!Comment out the define command because the operations don t make sense, just examples!define:! lginc = log(faminc);! newvar = follow + attend; ANALYSIS: TYPE IS general; ESTIMATOR = ML; MODEL: clinton ON age married educ faminc male white; OUTPUT: sampstat cinterval standardized; SAVEDATA: FILE IS analysis.dat; SAMPLE IS sample.dat; RESULTS ARE results.dat; 12

13 Data description for example data and program files: Data file is clinton data.dat Full regression program is clinton regress.inp Data come from the 1998 National Election Study. There are 1281 observations. Variables 1. clinton Clinton feeling thermometer (0 to 100, not favorable to favorable) 2. democrat party identification (1=Democrat, 0=not Democrat) 3. independent party identification (1=Independent, 0 = not Independent) 4. follow follow government and public affairs (1= hardly at all to 4= most of the time ) 5. liberal liberal/conservative self placement (1=liberal, 0=not liberal) 6. moderate liberal/conservative self placement (1=moderate, 0=not moderate) 7. badoff how much better or worse off than last year (1= much better to 5= much worse ) 8. attend how often attend religious service (1= every week to 5= never ) 9. age age in years 10. married marital status (1=married, 0 not married) 11. educ school completed (1= 8 th grade or less to 7= advanced degree ) 12. faminc family income (1= less than $2,999 to 24= $105,000 or more ) 13. male sex (1=male, 0=female) 14. white race (1=white, 0=other race/ethnicity) Fall