Latent Class Analysis: Evaluating Model Fit

Similar documents
STATISTICA Formula Guide: Logistic Regression. Table of Contents

Penalized regression: Introduction

Multivariate Normal Distribution

Goodness of fit assessment of item response theory models

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Latent Class Regression Part II

Basics of Statistical Machine Learning

Time Series Analysis

Association Between Variables

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

171:290 Model Selection Lecture II: The Akaike Information Criterion

Two Correlated Proportions (McNemar Test)

Chapter 6: Multivariate Cointegration Analysis

Descriptive Statistics

5. Multiple regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Covariance and Correlation

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

The Dummy s Guide to Data Analysis Using SPSS

Statistical Machine Learning

2. Linear regression with multiple regressors

Module 3: Correlation and Covariance

Odds ratio, Odds ratio test for independence, chi-squared statistic.

CS229 Lecture notes. Andrew Ng

Ordinal Regression. Chapter

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Linear Classification. Volker Tresp Summer 2015

Likelihood: Frequentist vs Bayesian Reasoning

1 Prior Probability and Posterior Probability

Introduction to Matrix Algebra

Lecture 3: Linear methods for classification

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Crosstabulation & Chi Square

CHAPTER 2 Estimating Probabilities

Activity 1: Using base ten blocks to model operations on decimals

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

SPSS Guide: Regression Analysis

Week 4: Standard Error and Confidence Intervals

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Simple Regression Theory II 2010 Samuel L. Baker

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Multivariate Analysis of Variance (MANOVA): I. Theory

Statistical tests for SPSS

DERIVATIVES AS MATRICES; CHAIN RULE

SAS Software to Fit the Generalized Linear Model

Data Mining Techniques Chapter 6: Decision Trees

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Data Mining: Algorithms and Applications Matrix Math Review

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

MATHEMATICAL METHODS OF STATISTICS

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chi-square test Fisher s Exact test

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Factor Analysis. Chapter 420. Introduction

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

Statistics Graduate Courses

Fairfield Public Schools

Collaborative Filtering. Radek Pelánek

The Theory and Practice of Using a Sine Bar, version 2

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Introduction to Quantitative Methods

1 Teaching notes on GMM 1.

Nonlinear Regression Functions. SW Ch 8 1/54/

Multivariate Logistic Regression

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Topic 8. Chi Square Tests

11. Analysis of Case-control Studies Logistic Regression

1 Review of Newton Polynomials

The Gravity Model: Derivation and Calibration

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Imputing Missing Data using SAS

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Extreme Value Modeling for Detection and Attribution of Climate Extremes

CALCULATIONS & STATISTICS

Social Media Mining. Data Mining Essentials

1 Review of Least Squares Solutions to Overdetermined Systems

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Statistical Models in R

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Journal of Statistical Software

L4: Bayesian Decision Theory

Continued Fractions and the Euclidean Algorithm

Statistics in Retail Finance. Chapter 6: Behavioural models

Reject Inference in Credit Scoring. Jie-Men Mok

Comparison of Estimation Methods for Complex Survey Data Analysis

Simple Linear Regression Inference

Big Data, Machine Learning, Causal Models

Review Jeopardy. Blue vs. Orange. Review Jeopardy

STA 4273H: Statistical Machine Learning

Pennsylvania System of School Assessment

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Transcription:

Latent Class Analysis: Evaluating Model Fit Lecture 13 March 30, 2006 Clustering and Classification Lecture #13-3/30/2006 Slide 1 of 22

Today s Lecture Evaluating the fit of an LCA. Today s Lecture Model-based measures of fit: The Log-likelihood of a model. Chi-square tests. Model comparison measures: Akaike Information Criterion (AIC). Bayesian Information Criterion (BIC, A.K.A. Schwarz s Criterion). comparisons: Predicted item means. Predicted item covariance/correlations. (a measure of classification certainty). Lecture #13-3/30/2006 Slide 2 of 22

Upcoming Schedule Today s Lecture Date Topic 3/30 Other Topics in Latent Class Analysis 4/4 Latent Profile Analysis 4/6,4/11 No Class 4/13 Model Estimation: EM Algorithm 4/18 Model Estimation: MCMC algorithm 4/20, 4/25 General Finite Mixture Models 4/27 Growth Mixture Models 5/2, 5/4 Models for Cognitive Diagnosis 5/9, 5/11 More Cognitive Diagnosis (or spill over time) Note: Projects due 5/16. Lecture #13-3/30/2006 Slide 3 of 22

Model Chi-squared Test Chi-squared Test Example Chi-square Problems Recall the standard latent class model: Using some notation of Bartholomew and Knott, a latent class model for the response vector of p variables (i = 1,...,p) with K classes (j = 1,...,K): f(x i ) = K p η j j=1 i=1 π x i ij (1 π ij) 1 x i Model based measures of fit revolve around the model function listed above. With just the function above, we can compute the probability of any given response pattern. Lecture #13-3/30/2006 Slide 4 of 22

Model Chi-squared Test Model Chi-squared Test Chi-squared Test Example Chi-square Problems The χ 2 test compares the sets of response patterns that were observed with the set of response patterns expected under the model. To form the χ 2 test, one must first compute the probability of each response pattern using the latent class model equation displayed on the last slide. The hypothesis tested is that the observed frequency is equal to the expected frequency. If the test has a low p-value, the model is said to not fit. To demonstrate the model χ 2 test, lets consider the results of the latent class model fit to the data from our running example (from Macready and Dayton, 1977). Lecture #13-3/30/2006 Slide 5 of 22

Chi-squared Test Example Model Chi-squared Test Chi-squared Test Example Chi-square Problems Class Probabilities Class Probability 1 0.586523874 2 0.413476126 Item Parameters class: 1 item prob ase(prob) 1 0.75345 0.05125 2 0.78029 0.05108 3 0.43161 0.05625 4 0.70757 0.05445 class: 2 item prob ase(prob) 1 0.20861 0.06047 2 0.06834 0.04848 3 0.01793 0.02949 4 0.05228 0.04376 Lecture #13-3/30/2006 Slide 6 of 22

Chi-squared Test Example Model Chi-squared Test Chi-squared Test Example Chi-square Problems To begin, compute the probability of observing the pattern [1111]... Then, to find the expected frequency, multiply that probability by the number of observations in the sample. Repeat that process for all cells... The compute χ 2 p = r response pattern. (O r E r ) 2, where r represents each E r The degrees of freedom are equal to the number of response patterns minus model parameters minus one. Then find the p-value, and decide if the model fits. Lecture #13-3/30/2006 Slide 7 of 22

Likelihood Ratio Chi-squared The likelihood ratio Chi-square is a variant of the Pearson Chi-squared test, but still uses the observed and expected frequencies for each cell. Model Chi-squared Test Chi-squared Test Example Chi-square Problems The formula for this test is: G = 2 r O r ln ( Or E r ) The degrees of freedom are still the same as the Pearson Chi-squared test, however. Lecture #13-3/30/2006 Slide 8 of 22

Chi-square Problems Model Chi-squared Test Chi-squared Test Example Chi-square Problems The Chi-square test is reasonable for situations where the sample size is large, and the number of variables is small. If there are too many cells where the observed frequency is small (or zero), the test is not valid. Note that the total number of response patterns in an LCA is 2 I, where I is the total number of variables. For our example, we had four variables, so there were 16 possible response patterns. If we had 20 variables, there would be a total of 1,048,576. Think about the number of observations you would have to have if you were to observe at least one person with each response pattern. Now think about if the items were highly associated (you would need even more people). Lecture #13-3/30/2006 Slide 9 of 22

Log Likelihood Information Criteria So, if model-based Chi-squared tests are valid only for a limited set of analyses, what else can be done? One thing is to look at comparative measures of model fit. Such measures will allow the user to compare the fit of one solution (say two classes) to the fit of another (say three classes). Note that such measures are only valid as a means of relative model fit - what do these measures become if the model fits perfectly? Lecture #13-3/30/2006 Slide 10 of 22

Log Likelihood Log Likelihood Information Criteria Prior to discussing anything, let s look at the log-likelihood function, taken across all the observations in our data set. The log likelihood serves as the basis for the AIC and BIC, and is what is maximized by the estimation algorithm. The likelihood function is the model formulation across the joint distribution of the data (all observations): L(x i ) = N k=1 K j=1 η j p i=1 π x ki ij (1 π ij ) 1 x ki Lecture #13-3/30/2006 Slide 11 of 22

Log Likelihood Log Likelihood Information Criteria The log likelihood function is the log of the model formulation across the joint distribution of the data (all observations): LogL(x i ) = log LogL(x i ) = N k=1 N log k=1 K j=1 K j=1 η j η j p i=1 p i=1 π x ki ij π x ki ij (1 π ij ) 1 x ki (1 π ij ) 1 x ki Here, the log function taken is typically base e - the natural log. The log likelihood is a function of the observed responses for each person and the model parameters. Lecture #13-3/30/2006 Slide 12 of 22

Information Criteria Log Likelihood Information Criteria The Akaike Information Criterion (AIC) is a measure of the goodness of fit of a model that considers the number of model parameters (q). AIC = 2q 2 log L Schwarz s Information Criterion (also called the Bayesian Information Criterion or the Schwarz-Bayesian Information Criterion) is a measure of the goodness of fit of a model that considers the number of parameters (q) and the number of observations (N): BIC = q log(n) 2 log L Other information criteria exist - most are based on the concept of entropy. For another example, look up Shannon. Lecture #13-3/30/2006 Slide 13 of 22

Information Criteria Log Likelihood Information Criteria When considering which model fits the data best, the model with the lowest AIC or BIC should be considered. As you have seen (from our empirical article), not all solutions are preferred. Again, although AIC and BIC are based on good statistical theory, neither is a gold standard for assessing which model should be chosen. Furthermore, neither will tell you, overall, if your model estimates bear any decent resemblance to your data. You could be choosing between two (equally) poor models - other measures are needed. Lecture #13-3/30/2006 Slide 14 of 22

of Model Fit Marginal Bivariate The model-based Chi-squared provided a measure of model fit, while narrow in the times it could be applied, that tried to map what the model said the data looked like to what the data actually looked like. The same concept lies behind the ideas of distributional measures of model fit - use the parameters of the model to predict what the data should look like. In this case, measures that are easy to attain are measures that look at: Each variable marginally - the mean (or proportion). The bivariate distribution of each pair of variables - contingency tables (for categorical variables), correlation matrices, or covariance matrices. Lecture #13-3/30/2006 Slide 15 of 22

Marginal Marginal Bivariate For each item, the model-predicted mean of the item (proportion of people responding with a value of one) is given by: ˆ X i = Ê(X j) = M x j =0 ˆP(X i = x i )x i = J j=1 ˆη j ˆπ ij Across all items, you can then form an aggregate measure of model fit by comparing the observed mean of the item to that found under the model, such as the root mean squared error: RMSE = I i=1 ( ˆ Xi X i ) 2 I Often, there is not much difference between observed and predicted mean (depending on the model, the fit will always be perfect). Lecture #13-3/30/2006 Slide 16 of 22

Bivariate Marginal Bivariate For each pair of items (say a and b, the model-predicted probability of both being one is given in the same way: ˆP(X a = 1, X b = 1) = J j=1 ˆη j ˆπ aj ˆπ bj Given the marginal means, you can now form a 2 x 2 table of the probability of finding a given pair of responses to variable a and b: a b 0 1 0 1 ˆP(X b = 1) 1 ˆP(X a = 1, X b = 1) ˆP(X b = 1) 1 ˆP(X a = 1) ˆP(Xa = 1) 1 Lecture #13-3/30/2006 Slide 17 of 22

Bivariate Marginal Bivariate Given the model-predicted contingency table (on the last slide) for every pair of items, you can then form a measure of association for the items. Depending on your preference, you could use: Pearson correlation. Tetrachoric correlation. Cohen s kappa. After that, you could then summarize the discrepancy between what your model predicts and what you have observed in the data. Such as the RMSE, MAD, or BIAS. Lecture #13-3/30/2006 Slide 18 of 22

The entropy of a model is defined to be a measure of classification uncertainty. In my mind, I am not certain of it s value as a model selection criterion. To define the entropy of a model, we must first look at the posterior probability of class membership, let s call this ˆα ic (notation borrowed from Dias and Vermunt, date unknown - online document). Here, ˆα ic is the estimated probability that observation i is a member of class c The entropy of a model is defined as: EN(α) = N i=1 J α ij log α ij j=1 Lecture #13-3/30/2006 Slide 19 of 22

Relative The entropy equation on the last slide is bounded from [0, ), with higher values indicated a larger amount of uncertainty in classification. Mplus reports the relative entropy of a model, which is a rescaled version of entropy: E = 1 EN(α) N log J The relative entropy is defined on [0, 1], with values near one indicating high certainty in classification and values near zero indicating low certainty. Lecture #13-3/30/2006 Slide 20 of 22

Final Thought Final Thought Next Class Evaluating the fit of a mixture model (or specifically, an LCA) can be very complicated. The way of doing it, preferably, would consider each of the above measures. Relative fit of a model is ok - but what if you are choosing between two really bad models? Lecture #13-3/30/2006 Slide 21 of 22

Next Time Latent Profile Analysis. Who has the responsibility for the empirical article? Final Thought Next Class Lecture #13-3/30/2006 Slide 22 of 22