Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Similar documents

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

CHAPTER 14 MORE ABOUT REGRESSION

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Regression Models for a Binary Response Using EXCEL and JMP

1 De nitions and Censoring

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Statistical Methods to Develop Rating Models

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

STATISTICAL DATA ANALYSIS IN EXCEL

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Binomial Link Functions. Lori Murray, Phil Munz

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Survival analysis methods in Insurance Applications in car insurance contracts

An Alternative Way to Measure Private Equity Performance

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

How To Calculate The Accountng Perod Of Nequalty

1. Measuring association using correlation and regression

Quantization Effects in Digital Filters

What is Candidate Sampling

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Calculation of Sampling Weights

Prediction of Disability Frequencies in Life Insurance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Marginal Returns to Education For Teachers

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Latent Class Regression Part II

1.2 DISTRIBUTIONS FOR CATEGORICAL DATA

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

Can Auto Liability Insurance Purchases Signal Risk Attitude?

International University of Japan Public Management & Policy Analysis Program

Credit Limit Optimization (CLO) for Credit Cards

SIMPLE LINEAR CORRELATION

A 'Virtual Population' Approach To Small Area Estimation

Variance estimation for the instrumental variables approach to measurement error in generalized linear models

General Iteration Algorithm for Classification Ratemaking

Meta-Analysis of Hazard Ratios

Finite Math Chapter 10: Study Guide and Solution to Problems

Extending Probabilistic Dynamic Epistemic Logic

Recurrence. 1 Definitions and main statements

Transition Matrix Models of Consumer Credit Ratings

Logistic Regression. Steve Kroon

Economic Interpretation of Regression. Theory and Applications

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Realistic Image Synthesis

Lecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression.

Fixed income risk attribution

Lecture 3: Force of Interest, Real Interest Rate, Annuity

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

The Application of Fractional Brownian Motion in Option Pricing

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Evaluate A Dia Fund Suffcency

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Simple Interest Loans (Section 5.1) :

Measures of Fit for Logistic Regression

Single and multiple stage classifiers implementing logistic discrimination

Forecasting the Direction and Strength of Stock Market Movement

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

Calculating the high frequency transmission line parameters of power cables

An Empirical Study of Search Engine Advertising Effectiveness

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Forecasting and Stress Testing Credit Card Default using Dynamic Models

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

DEFINING %COMPLETE IN MICROSOFT PROJECT

Statistical algorithms in Review Manager 5

Joe Pimbley, unpublished, Yield Curve Calculations

Gender differences in revealed risk taking: evidence from mutual fund investors

Analysis of Premium Liabilities for Australian Lines of Business

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Level Annuities with Payments Less Frequent than Each Interest Period

An Interest-Oriented Network Evolution Mechanism for Online Communities

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

7.5. Present Value of an Annuity. Investigate

Estimation of Dispersion Parameters in GLMs with and without Random Effects

The Current Employment Statistics (CES) survey,

Time Value of Money. Types of Interest. Compounding and Discounting Single Sums. Page 1. Ch. 6 - The Time Value of Money. The Time Value of Money

Modeling Loss Given Default in SAS/STAT

Estimating Age-specific Prevalence of Testosterone Deficiency in Men Using Normal Mixture Models

EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR

Transcription:

Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006

Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model from last term: Items measure dagnoses rather than underlyng scores Patterns of responses are thought to contan nformaton above and beyond aggregaton of responses The goal s clusterng ndvduals rather than response varables We add structural pece to model where covarates predct class membershp

Structural Equaton-type Depcton Measurement Pece x 1 y 1 x 2 η y 2 y 3 x 3 y 4 y 5 Structural Pece

When to use LCR Multple dscrete outcome varables bnary examples yes/no questons present/absent symptoms all measurng same latent construct We want to construct as outcome varable Responses to questons/tems measure underlyng states (.e. classes) wth error NOT approprate for counts or other way of groupng response patterns responses measure underlyng score wth error Note: Latent Varable s DISCRETE

Example: Depresson Is depresson contnuous or categorcal? Latent trat (IRT) assumes t s contnuous. Latent class model assumes t s dscrete Densty 0.0 0.2 0.4 0.6 0.8 1.0 % class 1 80 class 2 15 class 3 5 0 1 2 3 4 5 Depresson

Recall LC model M: number of latent classes K: number of symptoms p km : probablty of reportng symptom k gven latent class m π m : proporton of ndvduals n class m η : the true latent class of ndvdual, 1,,N m 1,,M; k 1,,K y 1, y 2,,y k : symptom presence/absence for ndvdual.

ECA wave 3 data (1993) N1126 n Baltmore Symptoms: weght/appette change sleep problems slow/ncreased movement loss of nterest/pleasure fatgue gult concentraton problems thoughts of death dysphora Covarates of nterest gender age martal status educaton ncome How are the above assocated wth depresson?

Assumptons Condtonal Independence: gven an ndvdual s depresson class, hs symptoms are ndependent P(y k, y j η ) P(y k η ) P(y j η ) Non-dfferental Measurement: gven an ndvdual s depresson class, covarates are not assocated wth symptoms P(y k x, η ) P(y k η )

Why LCR may be better than another analytc method LCR versus usng counts (e.g. number of symptoms) Pros: dstngushes meanngful patterns from trvally dfferent ones whch may be hard to dscern emprcally acknowledges measurement error precson and estmates of regresson coeffcents reflect measurement error Cons: may overdstngush prevalent patterns and mask dfferences n rare ones volaton of assumptons make nferences nvald

Why LCR may be better than another analytc method (contnued) Versus factor-type methods Pros: less severe assumptons (statstcally) easer to check assumptons Cons: lose statstcal power f construct s actually dmensonal (.e. contnuous) dentfablty harder to acheve (need bg sample) Practcally Pro: Allows for dsease/dsorder classfcaton whch s useful n a treatment vs. no treatment settng

Structural Equaton-type Depcton Measurement Pece x 1 x 2 x 3 y 1 β p What are y 2 the parameters that the arrows η y represent? 3 In other words, what are β and y 4 p n the LCR model? y 5 Structural Pece

Parameter Interpretaton Measurement Pece (p s) p km : probablty that an ndvdual from class m reports symptom k. η p kη y k Same as standard latent class model from last term

Parameter Interpretaton How do we relate η s and β s? In classc SEM, we have lnear model. What about when η s categorcal? What f η s bnary? x 1 x 2 x 3 β η

Parameter Interpretaton How do we relate η to x s? Consder smplest case: 2 classes or equvalently, β 1 and β 2 are log odds ratos x x P P 2 2 1 1 0 2) ( 1 2) ( log β β β η η + + x x P P 2 2 1 1 0 1 2 1) ( 2) ( log log β β β η η π π + +

p same as last term KxM p s π j P(η j) β Model Results Condtonal on x s No longer proporton of ndvduals n class Now, only can nterpret to mean probablty of class membershp gven covarates for ndvdual To get sze of class j, can sum of π j for all (M-1)*(H+1) β s where H number of covarates M-1: one class s reference class so all of ts β coeffcents are techncally zero H+1: for each class, there s one β for each covarate plus another for the ntercept.

Solvng for π j P(η j) π log π 2 1 log P( η P( η 2 x 1 x 1 1, x, x 2 2 ) ) β 0 + β x 1 1 + β x 2 2 π π P( η 2 x, x ) 2 1 2 P( η 1 x, x ) e 1+ e 1+ β + β x + β x 0 1 1 2 2 β + β x + β x 0 1 1 2 2 1 1 2 β + β x + β x e 1 0 1 1 2 2

Parameter Interpretaton Example: e β1 2 and x 1 1 f female, 0 f male Women have twce the odds of beng n class 2 versus class 1 than men, holdng all else constant e β 1 P( η 2 x1 1, x2 c) P( η 1 x 1, x c) 1 2 P( η 2 x1 0, x2 c) P( η 1 x 1, x c) 1 2

More than two classes? Need more than one equaton Need to choose a reference class x x x x P x x P 2 22 1 12 02 2 1 2 1 ), 1 ( ), 2 ( log β β β η η + + x x x x P x x P 2 23 1 13 03 2 1 2 1 ), 1 ( ), 3 ( log β β β η η + + e e e e e β β β β β β 12 13 13 12 13 12 OR for class 2 versus class 1 for females versus males OR for class 3 versus class 1 for females versus males OR for class 3 versus class 2 for females versus males /

Solvng for π j P(η ) π log π 2 1 log P( η 2) P( η 1) β 02 + β 12 x 1 + β 22 x 2 π 2 P( η 2) e 1+ e + β02 + β12x1+ β22x2 e 3 r 1 e β + β x + β β + β x + β x β + β x + β x β + β x + β 02 12 1 22 2 02 12 1 22 2 03 13 1 23 2 x 0r 1r 1 2r 2 e x Where we assume that β01 β11 β21 0

Depresson Example: LCR coeffcents (log ORs) n 3 class model Class Class Class 3 vs 1 2 vs 1 3 vs 2 Log(age) -1.2* -1.5* 0.23 Female 0.85* 0.76* 0.09 Sngle 0.44 0.38-0.05 Sep/wd/dv 0.86* 0.83* -0.01 HS dploma -0.01-0.56* 0.51 * ndcates sgnfcant at the 0.10 level Note: class 1 s non-depressed, class 2 s mld, class 3 s severe

Depresson Example: ODDS RATIOS n 3 class model Class Class Class 3 vs 1 2 vs 1 3 vs 2 Log(age) 0.3* 0.22* 1.26 Female 2.34* 2.13* 1.09 Sngle 1.55 1.46 0.95 Sep/wd/dv 2.36* 2.29* 0.99 HS dploma 0.99 0.57* 1.67 * ndcates sgnfcant at the 0.10 level Note: class 1 s non-depressed, class 2 s mld, class 3 s severe

Step 1: Model Buldng Get the measurement part rght! Ft standard latent class model frst. Use methods we dscussed last term to choose approprate model Step 2: add covarates one at a tme It s useful to perform smple regressons to see how each covarate s assocated wth latent varable before adjustng for others. Many of same ssues n lnear and logstc regresson (e.g. multcollnearty)

Estmaton Same caveats as last term Maxmum lkelhood: Iteratve fttng procedure. Packages Mplus Splus, R SAS Bayesan approach Computatonally ntensve WnBugs Splus, R SAS

Propertes of Estmates (β, p) If N s large, coeffcents are approxmately normal confdence ntervals and Z-tests are approprate. Nested models can be compared by usng chsquare test. But, recall problems of ch-square test when sample sze s large! And problems when the sample sze s small! Also can use AIC, BIC, etc. to compare nested AND non-nested models (e.g. s age as contnuous better than 3 age categores).

Specfcs Statstcally Standard LCM Lkelhood Latent Class Regresson Lkelhood ) (1 1 1 5 5, 4 4, 3 3, 2 2, 1 1 ) (1 ) ( ) ( k y k km M m K k y km m y Y y Y y Y y Y y Y y Y p p P P π ) (1 1 1 5 5, 4 4, 3 3, 2 2, 1 1 ) (1 ) ( ) ( ) ( k y k km M m K k y km m x y Y y Y y Y y Y y Y y Y p p x P P π M m x x m m m e e x 1 ) ( β β π where

Example: 3 class model coeffcent estmate se 95% confdence nterval b02-3.11 0.21-3.52-2.71 b01-1.80 0.15-2.08-1.52 b2age -1.21 0.74-2.65 0.27 b3age -1.44 0.53-2.48-0.38 b2sex 0.86 0.38 0.15 1.64 b3sex 0.77 0.25 0.32 1.34 p[1,1] 0.83 0.06 0.69 0.93 p[1,2] 0.40 0.05 0.31 0.50 p[1,3] 0.02 0.01 0.01 0.03 p[2,1] 0.84 0.061 0.72 0.94 p[2,2] 0.41 0.05 0.31 0.52 p[2,3] 0.02 0.01 0.01 0.04 etc..

Some Addtonal Concepts (1) η s a NOMINAL varable (2) Data Setup: Centerng covarates can help. Due to need to ntalze algorthm n ML. Due to prors on β s n Bayesan settng Wll be meanngful n model checkng, too. Need to choose startng values for model estmaton for regresson coeffcents n some ML packages. Ths s easer f they are centered. Not an ssue for Mplus: only need startng values for measurement part.

Choosng Values for Intalzaton A: Measurement model 1. Use results from standard latent class model B: Structural pece 1. choose all β s equal to 0 (wll work f there s a LOT of data and no ID problems) 2. a. Make a surrogate latent class (e.g. choose cutoffs based on number of symptoms) b. Perform mlogt on surrogate wth covarates c. Use log ORs as startng values

Choosng Values for Intalzaton 3. Use ML pseudo-class approach a. Usng pseudo-classes from standard LC model, treat class assgnment as fxed b. Regress class membershp on covarates (polytomous logstc regresson) c. Model buldng strategy -- gves ntal dea of whch covarates are assocated. d. Also, can use ths as a model checkng strategy post hoc 4. Use MCMC class assgnment approach: same as 3, but wth classes assgned usng MCMC model

Important Identfablty Issue Must run model more than once usng dfferent startng values to check dentfablty!

Model Checkng Very mportant step n LCR LCR can gve msleadng fndngs f measurement model assumptons are volated Two types of model checks: (1) model ft do y patterns behave as model would predct? (2) volaton of assumptons do y s relate to x s as expected?

ECA wave 3 data (1993) N1126 n Baltmore Symptoms: weght/appette change sleep problems slow/ncreased movement loss of nterest/pleasure fatgue gult concentraton problems thoughts of death dysphora Covarates of nterest gender age martal status educaton ncome How are the above assocated wth depresson?

Models Model A: log(age), gender, race Model B: log(age), gender, race, dploma

Do y patterns behave as model predcts? Compare observed pattern frequences to expected pattern frequences PFC plot How does addton of regresson change nterpretaton? Evaluatng ft of measurement pece Wll be same as n standard LC model unless..

o 2 class x 3 class 4 class

LCA x LCR-A o LCR-B

Does pattern frequency behave as predcted by covarates? Idea: focus on one tem at a tme Recall: M K yk (1 y 1 y1,..., YK yk x ) ( x ) pkm (1 pkm) m 1 k 1 P( Y π If nterested n tem r, gnore ( margnalze over ) other tems: k ) P( Y y x ) π ( x r r M m 1 ) p yk rm (1 p rm ) (1 y k )

Comparng Ftted to Observed

Categorcal Covarates Easer than contnuous (computatonally) Example Calculate: Predcted males wth gult Observed males wth gult Predcted females wth gult Observed females wth gult

Assume LC regresson model wth only gender Gender 0 f male, 1 f female Item of nterest f gult. Want fnd how many class 2 men we would expect to report gult based on the model P( gult and male and class 2) P( gult and class 2 male) P( male) P( gult male and class 2) P( class 2 male) P( male) P( gult class 2) P( class 2 male) P( male) p km β0 e 1+ e β 0 Pmale ( ) β0 e Expected ( gult and male and class 2) N pkm β Pmale ( ) 0 1+ e Calculate ths for each of the classes and sum up: Wll tell us the expected number of males reportng gult.

Falure n Ft Check Assumptons non-dfferental measurement condtonal ndependence Non-dfferental Measurement: P(y k x, η ) P(y k η ) In words, wthn a class, there s no assocaton between y s and x s. Check ths usng logstc regresson approach

Checkng Non-dfferental Measurement Assumpton For bnary covarates and for each class m and tem k consder P( yk 1 x 1, η m) / P( yk 0 x 1, η m) ORkmx P( y 1 x 0, η m) / P( y 0 x 0, η m) k If assumpton holds, ths OR wll be approxmately equal to 1. Why may ths get trcky? We don t KNOW class assgnments. Need a strategy for assgnng ndvduals to classes. k

Checkng NDM: Maxmum Lkelhood Approach (a) assgn ndvduals to pseudo-classes based on posteror probablty of class membershp recall posteror probablty based on observed pattern e.g. ndvdual wth 0.20, 0.05, 0.75 better chance of beng n class 3 not necessarly n class 3 (b) calculate OR s wthn classes. (c) repeat (a) and (b) at least a few tmes (d) compare OR s to 1.

Checkng NDM: Maxmum Lkelhood Approach What about contnuous covarates? Use same general dea, but estmate the logor wthn classes by logstc regresson Example: age

Checkng NDM: MCMC (Bayesan) approach At each teraton n Gbbs sampler, ndvduals are automatcally assgned to classes no need to manually assgn. At each teraton, smply calculate the OR s of nterest. Then, margnalze or average over all teratons. Results s posteror dstrbuton of OR

Checkng Condtonal Independence Assumpton In words, wthn a class, there s no assocaton between y k and y j, j k. Same approach Only dfference: OR jkm P( y P( y j j 1, y 1, y k k 1 η 0 η m) / m) / P( y P( y j j 0, 0, y y k k 1 η 0 η m) m) Stll use pseudo-class assgnment (ML) or class assgnment at each teraton (MCMC)

Identfablty (brefly) General Idea: dfferent parameters can lead to the same model ft 2 step rule: If (a) polytomous logstc regresson s ID ed (b) standard LCM s ID ed Then model s ID ed t-rule: need more data cells than parameters complcaton: contnuous covarates, but they usually don t make unid ed.

Utlty of Model Checkng May modfy nterpretaton to ncorporate lack of ft/volaton of assumpton May help elucdate a transformaton that that would be more approprate (e.g. log(age) versus age) May lead to beleve that LCR s not approprate.