Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents

Similar documents

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

The Latent Variable Growth Model In Practice. Individual Development Over Time

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

13. Poisson Regression Analysis

Latent Class Regression Part II

Ordinal Regression. Chapter

11. Analysis of Case-control Studies Logistic Regression

Multivariate Logistic Regression

Categorical Data Analysis

Logistic regression modeling the probability of success

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Multinomial and Ordinal Logistic Regression

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Multiple Choice Models II

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Multinomial Logistic Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

VI. Introduction to Logistic Regression

Generalized Linear Models

Calculating the Probability of Returning a Loan with Binary Probability Models

SAS Software to Fit the Generalized Linear Model

Examples of Using R for Modeling Ordinal Data

Basic Statistical and Modeling Procedures Using SAS

Multiple logistic regression analysis of cigarette use among high school students

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

HLM software has been one of the leading statistical packages for hierarchical

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Additional sources Compilation of sources:

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Simple Linear Regression Inference

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Calculating Effect-Sizes

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Nominal and ordinal logistic regression

Logistic Regression.

Interpretation of Somers D under four simple models

Yew May Martin Maureen Maclachlan Tom Karmel Higher Education Division, Department of Education, Training and Youth Affairs.

Handling missing data in Stata a whirlwind tour

Longitudinal Data Analysis. Wiley Series in Probability and Statistics

Aileen Murphy, Department of Economics, UCC, Ireland. WORKING PAPER SERIES 07-10

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

Master programme in Statistics

Sun Li Centre for Academic Computing

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Logistic Regression (1/24/13)

Mplus Short Courses Topic 7. Multilevel Modeling With Latent Variables Using Mplus: Cross-Sectional Analysis

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Department of Epidemiology and Public Health Miller School of Medicine University of Miami

Module 4 - Multiple Logistic Regression

When to Use a Particular Statistical Test

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Module 14: Missing Data Stata Practical

CREDIT SCORING MODEL APPLICATIONS:

Rethinking the Cultural Context of Schooling Decisions in Disadvantaged Neighborhoods: From Deviant Subculture to Cultural Heterogeneity

Comparison of Estimation Methods for Complex Survey Data Analysis

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Introduction to Data Analysis in Hierarchical Linear Models

Discussion Section 4 ECON 139/ Summer Term II

Introduction to Longitudinal Data Analysis

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA

2015 TUHH Online Summer School: Overview of Statistical and Path Modeling Analyses

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Virtual Parental Involvement: The Role of the Internet in Parent-School Communications

BayesX - Software for Bayesian Inference in Structured Additive Regression

Handling attrition and non-response in longitudinal data

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Modeling Lifetime Value in the Insurance Industry

Missing data and net survival analysis Bernard Rachet

Free Trial - BIRT Analytics - IAAs

Relating the ACT Indicator Understanding Complex Texts to College Course Grades

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Multivariate Models of Student Success

240ST014 - Data Analysis of Transport and Logistics

Regression 3: Logistic Regression

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables

Section 5 Part 2. Probability Distributions for Discrete Random Variables

Statistical Rules of Thumb

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

From the help desk: hurdle models

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Transcription:

Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén Bengt Muthén Copyright 28 Muthén & Muthén www.statmodel.com Table Of Contents General Latent Variable Modeling Framework Analysis With Categorical Observed And Latent Variables Categorical Observed Variables Logit And Probit Regression British Coal Miner Eample Logistic Regression And Adjusted Odds Ratios Latent Response Variable Formulation Versus Probability Curve Formulation Ordered Polytomous Regression Alcohol Consumption Eample Unordered Polytomous Regression Censored Regression Count Regression Poisson Regression Negative Binomial Regression Path Analysis With Categorical Outcomes Occupational Destination Eample 7 3 8 25 39 46 49 55 58 65 67 68 7 73 8 2

Table Of Contents (Continued) Categorical Observed And Continuous Latent Variables Item Response Theory Eploratory Factor Analysis Practical Issues CFA With Covariates Antisocial Behavior Eample Multiple Group Analysis With Categorical Outcomes Technical Issues For Weighted Least Squares Estimation References 86 89 3 29 42 47 67 72 79 3 Inefficient dissemination of statistical methods: Many good methods contributions from biostatistics, psychometrics, etc are underutilized in practice Fragmented presentation of methods: Technical descriptions in many different journals Many different pieces of limited software Mplus: Integration of methods in one framework Easy to use: Simple, non-technical language, graphics Powerful: General modeling capabilities Mplus versions V: November 998 V3: March 24 V5: November 27 Mplus Background V2: February 2 V4: February 26 Mplus team: Linda & Bengt Muthén, Thuy Nguyen, Tihomir Asparouhov, Michelle Conn, Jean Maninger 4 2

Statistical Analysis With Latent Variables A General Modeling Framework Statistical Concepts Captured By Latent Variables Continuous Latent Variables Measurement errors Factors Random effects Frailties, liabilities Variance components Missing data Categorical Latent Variables Latent classes Clusters Finite mitures Missing data 5 Statistical Analysis With Latent Variables A General Modeling Framework (Continued) Models That Use Latent Variables Continuous Latent Variables Factor analysis models Structural equation models Growth curve models Multilevel models Categorical Latent Variables Latent class models Miture models Discrete-time survival models Missing data models Mplus integrates the statistical concepts captured by latent variables into a general modeling framework that includes not only all of the models listed above but also combinations and etensions of these models. 6 3

General Latent Variable Modeling Framework Observed variables background variables (no model structure) y continuous and censored outcome variables u categorical (dichotomous, ordinal, nominal) and count outcome variables Latent variables f continuous variables c interactions among f s categorical variables multiple c s 7 Several programs in one Eploratory factor analysis Structural equation modeling Item response theory analysis Latent class analysis Latent transition analysis Survival analysis Growth modeling Multilevel analysis Comple survey data analysis Monte Carlo simulation Mplus Fully integrated in the general latent variable framework 8 4

Overview Of Mplus Courses Topic. March 8, 28, Johns Hopkins University: Introductory - advanced factor analysis and structural equation modeling with continuous outcomes Topic 2. March 9, 28, Johns Hopkins University: Introductory - advanced regression analysis, IRT, factor analysis and structural equation modeling with categorical, censored, and count outcomes Topic 3. August 2, 28, Johns Hopkins University: Introductory and intermediate growth modeling Topic 4. August 2, 28, Johns Hopkins University: Advanced growth modeling, survival analysis, and missing data analysis 9 Overview Of Mplus Courses (Continued) Topic 5. November, 28, University of Michigan, Ann Arbor: Categorical latent variable modeling with crosssectional data Topic 6. November, 28, University of Michigan, Ann Arbor: Categorical latent variable modeling with longitudinal data Topic 7. March 7, 29, Johns Hopkins University: Multilevel modeling of cross-sectional data Topic 8. March 8, 29, Johns Hopkins University: Multilevel modeling of longitudinal data 5

Analysis With Categorical Observed And Latent Variables Categorical Variable Modeling Categorical observed variables Categorical observed variables, continuous latent variables Categorical observed variables, categorical latent variables 2 6

Categorical Observed Variables 3 Two Eamples Alcohol Dependence And Gender In The NLSY Female Male n 4573 463 976 Not Dep 437 394 822 Dep 256 699 955 Prop.56.52 Odds (Prop/(-Prop)).59.79 Odds Ratio =.79/.59 = 3.9 Eample wording: Males are three times more likely than females to be alcohol dependent. Colds And Vitamin C n No Cold Cold Prop Odds Placebo 4 9 3.22.284 Vitamin C 39 22 7.22.39 4 7

Categorical Outcomes: Probability Concepts Probabilities: Joint: P (u, ) Marginal: P (u) Conditional: P (u ) Joint Female Alcohol Eample Conditional Not Dep.47 Dep.3 Male.43.8 Marginal.9. Distributions: Bernoulli: u = /; E(u) = π Binomial: sum or prop. (u = ), E(prop.) = π, V(prop.) = π( π)/n, π = prop Multinomial (#parameters = #cells ) Independent multinomial (product multinomial) Poisson.6.5 5 Categorical Outcomes: Probability Concepts (Continued) u = u = Cross-product ratio (odds ratio): = π π π = / π π = π π π / ( ππ) = π / π P(u =, = ) / P(u =, = ) / P(u =, = ) / P(u =, = ) Tests: Log odds ratio (appro. normal) Test of proportions (appro. normal) Pearson χ 2 = Σ(O E) 2 / E (e.g. independence) Likelihood Ratio χ 2 = 2 Σ Olog(O / E ) 6 8

Further Readings On Categorical Variable Analysis Agresti, A. (22). Categorical data analysis. Second edition. New York: John Wiley & Sons. Agresti, A. (996). An introduction to categorical data analysis. New York: Wiley. Hosmer, D. W. & Lemeshow, S. (2). Applied logistic regression. Second edition. New York: John Wiley & Sons. Long, S. (997). Regression models for categorical and limited dependent variables. Thousand Oaks: Sage. 7 Logit And Probit Regression Dichotomous outcome Adjusted log odds Ordered, polytomous outcome Unordered, polytomous outcome Multivariate categorical outcomes 8 9

Logs Logarithmic Function Logistic Distribution Function e log P(u = ) Logit Logit [P(u = )] Logistic Density Density u * 9 Binary Outcome: Logistic Regression The logistic function P(u = ) = F ( + )=. + e ( + ) Logistic distribution function Logistic density F ( + ) F ( + ) + + Logistic score Logistic density: δ F / δ z = F( F) = f (z;, π 2 /3) 2

Binary Outcome: Probit Regression Probit regression considers P (u = ) = Φ ( + ), (6) where Φ is the standard normal distribution function. Using the inverse normal function Φ -, gives a linear probit equation Φ - [P(u = )] = +. (6) Normal distribution function Normal density Φ ( + ) Φ ( + ) + + z score 2 Interpreting Logit And Probit Coefficients Sign and significance Odds and odds ratios Probabilities 22

2 23 Logistic Regression And Log Odds Odds (u = ) = P(u = )/ P(u = ) = P(u = ) / ( P(u = )). The logistic function gives a log odds linear in, + + = + + ) ( / log ) ( ) ( e e [ ] e ) ( log + = = + + + = + + + ) ( ) ( ) ( * log e e e logit = log [odds (u = )] = log [P(u = ) / ( P(u = ))] ) ( ) ( - e u P + + = = 24 Logistic Regression And Log Odds (Continued) logit = log odds = + When changes one unit, the logit (log odds) changes units When changes one unit, the odds changes units e

British Coal Miner Data Have you eperienced breathlessness? Proportion yes.44.42.4.38.36.34.32.3.28.26.24.22.2.8.6.4.2..8.6.4.2 2 4 6 8 246 8 2 2224 26283 32 34 3638 442 44 46 48 5 52 54 56 58 6 62 64 66 68 7 Age 25 Plot Of Sample Logits Logit -.2 -.4 -.6 -.8 -. -.2 -.4 -.6 -.8-2. -2.2-2.4-2.6-2.8-3. -3.2-3.4-3.6-3.8-4. -4.2-4.4-4.6-4.8-5. 2 4 6 8 246 8 2 2224 26283 32 34 3638 442 44 46 48 5 52 54 56 58 6 62 64 66 68 7 Age Sample logit = log [proportion / ( proportion)] 26 3

British Coal Miner Data (Continued) Age () N N Yes Proportion Yes OLS Estimated Probability Logit Estimated Probability Probit Estimated Probability 22 27 32 37 42 47 52 57 62,952,79 2,3 2,783 2.274 2,393 2,9,75.36 8,282 6 32 73 69 223 357 52 558 478 2,427.8.8.35.6.98.49.249.39.42.3 -.53 -.4.45.94.43.92.24.29.339.3.22.36.59.95.48.225.327.448.9.8.34.6..56.23.322.425 SOURCE: Ashford & Sowden (97), Muthén (993) 2 Logit model: χ LRT (7) = 7.3 (p >.) Probit model: χ 2 LRT (7) = 5.9 27 Coal Miner Data 22 22 27 27 32 32 37 37 42 42 47 47 52 52 57 57 62 62 u w 936 6 759 32 24 73 264 69 25 223 236 357 569 52 92 558 658 478 28 4

Mplus Input For Categorical Outcomes Specifying dependent variables as categorical use the CATEGORICAL option CATEGORICAL ARE u u2 u3; Thresholds used instead of intercepts only different in sign Referring to thresholds in the model use $ number added to a variable name the number of thresholds is equal to the number of categories minus u$ refers to threshold of u u$2 refers to threshold 2 of u 29 Mplus Input For Categorical Outcomes (Continued) u2$ refers to threshold of u2 u2$2 refers to threshold 2 of u2 u2$3 refers to threshold 3 of u2 u3$ refers to threshold of u3 Referring to scale factors use { } to refer to scale factors {u@ u2 u3}; 3 5

Input For Logistic Regression Of Coal Miner Data TITLE: DATA: VARIABLE: DEFINE: ANALYSIS: MODEL: OUTPUT: Logistic regression of coal miner data FILE = coalminer.dat; NAMES = u w; CATEGORICAL = u; FREQWEIGHT = w; = /; ESTIMATOR = ML; u ON ; TECH SAMPSTAT STANDARDIZED; 3 Input For Probit Regression Of Coal Miner Data TITLE: DATA: VARIABLE: DEFINE: MODEL: OUTPUT: Probit regression of coal miner data FILE = coalminer.dat; NAMES = u w; CATEGORICAL = u; FREQWEIGHT = w; = /; u ON ; TECH SAMPSTAT STANDARDIZED; 32 6

Output Ecerpts Logistic Regression Of Coal Miner Data Model Results Estimates S.E. Est./S.E. Std StdYX U ON X.25.25 4.758.25.556 Thresholds U$ 6.564.24 52.873 Odds: e.25 = 2.79 As increases unit ( years), the odds of breathlessness increases 2.79 33 Estimated Logistic Regression Probabilities For Coal Miner Data P ( u = ) = + e L where L = 6.564 +.25 For = 6.2 (age 62) L = 6.564 +.25 6.2 =.29 P( u = age 62) = + e,.29 =.448 34 7

Output Ecerpts Probit Regression Of Coal Miner Data Model Results Estimates S.E. Est./S.E. Std StdYX U ON X.548.3 43.75.548.545 Thresholds U$ 3.58.62 57.866 3.58 3.58 R-Square Observed Variable U Residual Variance. R-Square.297 35 Estimated Probit Regression Probabilities For Coal Miner Data P (u = = 62) = Φ ( + ) = Φ (τ ) = Φ ( τ + ). Φ ( 3.58 +.548 * 6.2) = Φ (.834).427 Note: logit probit * c where c = π 2 / 3 =.8 36 8

Categorical Outcomes: Logit And Probit Regression With One Binary And One Continuous X P(u =, 2 ) = F[ + + 2 2 ], (22) P(u =, 2 ) = - P[u =, 2 ], where F[z] is either the standard normal (Φ[z]) or logistic (/[ + e -z ]) distribution function. Eample: Lung cancer and smoking among coal miners u lung cancer (u = ) or not (u = ) smoker ( = ), non-smoker ( = ) 2 years spent in coal mine 37 Categorical Outcomes: Logit And Probit Regression With One Binary And One Continuous X P(u =, 2 ) = F [ + + 2 2 ], (22) P( u =, 2 ) = Probit / Logit = = =.5 2 2 38 9

Logistic Regression And Adjusted Odds Ratios Binary u variable regression on a binary variable and a continuous 2 variable: P (u =, 2 ) = - (, (62) + + + e 2 2 ) which implies log odds = logit [P (u =, 2 )] = + + 2 2. (63) This gives log odds{ = } = logit [P (u = =, 2 )] = + 2 2, (64) and log odds{ = } = logit [P (u = =, 2 )] = + + 2 2. (65) 39 Logistic Regression And Adjusted Odds Ratios (Continued) The log odds ratio for u and adjusted for 2 is odds log OR = log [ ] = log odds log odds = (66) odds so that OR = ep ( ), constant for all values of 2. If an interaction term for and 2 is introduced, the constancy of the OR no longer holds. Eample wording: The odds of lung cancer adjusted for years is OR times higher for smokers than for nonsmokers The odds ratio adjusted for years is OR 4 2

Analysis Of NLSY Data: Odds Ratios For Alcohol Dependence And Gender Adjusting for Age First Started Drinking (n=976) Observed Frequencies, Proportions, and Odds Ratios Frequency Proportion Dependent Age st Female Male Female Male OR 2 or < 3 4 5 6 7 8 or > 85 5 98 33 8 725 2329 223 8 38 534 99 777 59.7.33.86.6.79.7.3.233.256.253.85.52.7.89 3.98 2.24 3.6.9 2.9 2.72 3.6 4 Analysis Of NLSY Data: Odds Ratios For Alcohol Dependence And Gender (Continued) Estimated Probabilities and Odds Ratios Age st 2 or < 3 4 5 6 7 8 or > Logit Female.4.7.96.78.64.52.42 Male OR Female Male OR.34.26.22.85.54.27.5 2.66 2.66 2.66 2.66 2.66 2.66 2.66.52.25.2.82.65.5.4 Probit.298.257.22.86.55.28.4 2.37 2.42 2.48 2.55 2.63 2.72 2.82 2 Logit model: χ p (2) = 54.2 Probit model: χ 2 p (2) = 46.8 42 2

Analysis Of NLSY Data: Odds Ratios For Alcohol Dependence And Gender (Continued) Dependence on Gender and Age First Started Drinking Unstd. Coeff. Logit Regression s.e. t Std. Unstd. Coeff. Probit Regression s.e. t Std. Unstd. Coeff Rescaled To Logit Intercept.84.32 2.6 -.42.8-2.4 Male.98.8 2.7.5.5.4 3..48.9 Age st -.22.2 -.6 -.9 -.2. -. -.9 -.22 R 2.2.8 OR = e.98 = 2.66 logit probit * c where c = π 2 / 3 =.8 43 NELS 88 Table 2.2 Odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 99, by basic demographics Variable Below basic mathematics Below basic reading Dropped out Se Female vs. male.8*.73**.92 Race ethnicity Asian vs. white Hispanic vs. white Black vs. white Native American vs. white.82 2.9** 2.23** 2.43**.42** 2.29** 2.64** 3.5**.59 2.** 2.23** 2.5** Socioeconomic status Low vs. middle High vs. middle.9**.46**.9**.4** 3.95**.39* SOURCE: U.S. Department of Education, National Center for Education Statistics, National Education Longitudinal Study of 988 (NELS:88), Base Year and First Follow-Up surveys. 44 22

NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 99, by basic demographics Variable Below basic mathematics Below basic reading Dropped out Se Female vs. male.77**.7**.86 Race ethnicity Asian vs. white Hispanic vs. white Black vs. white Native American vs. white.84.6**.77** 2.2**.46**.74** 2.9** 2.87**.6.2.45.64 Socioeconomic status Low vs. middle High vs. middle.68**.49**.66**.44** 3.74**.4* 45 Latent Response Variable Formulation Versus Probability Curve Formulation Probability curve formulation in the binary u case: P (u = ) = F ( + ), (67) where F is the standard normal or logistic distribution function. Latent response variable formulation defines a threshold τ on a continuous u * variable so that u = is observed when u * eceeds τ while otherwise u = is observed, where δ ~ N (, V (δ)). u * = γ + δ, u = (68) u = τ u* 46 23