Latent Profile Analysis

Similar documents
Multivariate Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Sections 2.11 and 5.8

Dimensionality Reduction: Principal Components Analysis

Data Mining: Algorithms and Applications Matrix Math Review

Statistical Machine Learning

Regression III: Advanced Methods

Lecture 3: Linear methods for classification

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Multivariate normal distribution and testing for means (see MKB Ch 3)

CS229 Lecture notes. Andrew Ng

Gaussian Conjugate Prior Cheat Sheet

Multivariate Statistical Inference and Applications

Basics of Statistical Machine Learning

[1] Diagonal factorization

Factor Analysis. Chapter 420. Introduction

How To Understand Multivariate Models

by the matrix A results in a vector which is a reflection of the given

Inner product. Definition of inner product

MATHEMATICAL METHODS OF STATISTICS

Introduction to Matrix Algebra

Lecture 8: Signal Detection and Noise Assumption

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Introduction to General and Generalized Linear Models

DATA ANALYSIS II. Matrix Algorithms

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Goodness of fit assessment of item response theory models

Least Squares Estimation

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Elasticity Theory Basics

Orthogonal Diagonalization of Symmetric Matrices

Solving Systems of Linear Equations

Vector and Matrix Norms

Linear Classification. Volker Tresp Summer 2015

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Stat 704 Data Analysis I Probability Review

Inner Product Spaces

Exploratory Data Analysis with MATLAB

Component Ordering in Independent Component Analysis Based on Data Power

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Direct Methods for Solving Linear Systems. Matrix Factorization

On Mardia s Tests of Multinormality

LS.6 Solution Matrices

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Penalized regression: Introduction

Manifold Learning Examples PCA, LLE and ISOMAP

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Notes on Determinant

Multivariate Analysis of Variance (MANOVA): I. Theory

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

Linear Models for Classification

CURVE FITTING LEAST SQUARES APPROXIMATION

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

1 Prior Probability and Posterior Probability

Math 4310 Handout - Quotient Vector Spaces

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Understanding and Applying Kalman Filtering

Linear Algebra and TI 89

STATISTICA Formula Guide: Logistic Regression. Table of Contents

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Least-Squares Intersection of Lines

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Correlation in Random Variables

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Fixed Point Theorems

10.2 Series and Convergence

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

October 3rd, Linear Algebra & Properties of the Covariance Matrix

The Fourth International DERIVE-TI92/89 Conference Liverpool, U.K., July Derive 5: The Easiest... Just Got Better!

Similarity and Diagonalization. Similar Matrices

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

What is Linear Programming?

Module 3: Correlation and Covariance

Introduction to Principal Components and FactorAnalysis

Machine Learning and Pattern Recognition Logistic Regression

Some probability and statistics

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Section Inner Products and Norms

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Nonlinear Iterative Partial Least Squares Method

Simple Linear Regression Inference

On Correlating Performance Metrics

Principal components analysis

Classification Techniques for Remote Sensing

Gaussian Processes in Machine Learning

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Econometrics Simple Linear Regression

How To Identify Noisy Variables In A Cluster

Transcription:

Lecture 14 April 4, 2006 Clustering and Classification Lecture #14-4/4/2006 Slide 1 of 30

Today s Lecture (LPA). Today s Lecture LPA as a specific case of a Finite Mixture Model. How to do LPA. Lecture #14-4/4/2006 Slide 2 of 30

LPA Introduction LPA Input LPA Process LPA Estimation Assumptions Latent profile models are commonly attributed to Lazarsfeld and Henry (1968). Like K-means and hierarchical clustering techniques, the final number of latent classes is not usually predetermined prior to analysis with latent class models. The number of classes is determined through comparison of posterior fit statistics. The characteristics of each class is also determined following the analysis. Lecture #14-4/4/2006 Slide 3 of 30

Variable Types Used in LPA As it was originally conceived, LPA is an analysis that uses: LPA Input LPA Process LPA Estimation Assumptions A set of continuous (metrical) variables - values allowed to range anywhere on the real number line. Examples include: The number of classes (an integer ranging from two through...) must be specified prior to analysis. Lecture #14-4/4/2006 Slide 4 of 30

LPA Process For a specified number of classes, LPA attempts to: LPA Input LPA Process LPA Estimation Assumptions For each class, estimate the statistical likelihood of each variable. Estimate the probability that each observation falls into each class. For each observation, the sum of these probabilities across classes equals one. This is different from K-means where an observation is a member of a class with certainty. Across all observations, estimate the probability that any observation falls into a class. Lecture #14-4/4/2006 Slide 5 of 30

LPA Estimation LPA Input LPA Process LPA Estimation Assumptions Estimation in LPA is more complicated than in previous methods discussed in this course. In agglomerative hierarchical clustering, a search process was used with new distance matrices being created for each step. K-means used more of a brute-force approach - trying multiple starting points. Both methods relied on distance metrics to find clustering solutions. LPA estimation uses distributional assumptions to find classes. The distributional assumptions provide the measure of "distance" in LPA. Lecture #14-4/4/2006 Slide 6 of 30

LPA Distributional Assumptions LPA Input LPA Process LPA Estimation Assumptions Because LPA works with continuous variables, the distributional assumptions of LPA must use a continuous distribution. Within each latent class, the variables are assumed to: Be independent. (Marginally) be distributed normal (or Gaussian): For a single variable, the normal distribution function is: f(x i ) = ( 1 (xi µ x ) 2 ) exp 2πσ 2 x σx 2 Lecture #14-4/4/2006 Slide 7 of 30

Joint Distribution Contours Properties Because, conditional on class, we have normally distributed variables in LPA, we could also phrase the likelihood as coming from a multivariate normal distribution (): The next set of slides describes the. What you must keep in mind is that our variables are set to be independent, conditional on class, so the within class covariance matrix will be diagonal. Lecture #14-4/4/2006 Slide 8 of 30

Multivariate Normal Distribution Contours Properties The generalization of the well-known normal distribution to multiple variables is called the multivariate normal distribution (). Many multivariate techniques rely on this distribution in some manner. Although real data may never come from a true, the provides a robust approximation, and has many nice mathematical properties. Furthermore, because of the central limit theorem, many multivariate statistics converge to the distribution as the sample size increases. Lecture #14-4/4/2006 Slide 9 of 30

Univariate Normal Distribution The univariate normal distribution function is: f(x i ) = ( 1 (xi µ x ) 2 ) exp 2πσ 2 x σx 2 Contours Properties The mean is µ x. The variance is σ 2 x. The standard deviation is σ x. Standard notation for normal distributions is N(µ x, σ 2 x), which will be extended for the distribution. Lecture #14-4/4/2006 Slide 10 of 30

Univariate Normal Distribution N(0, 1) Contours Properties f(x) 0.0 0.1 0.2 0.3 0.4 Univariate Normal Distribution 6 4 2 0 2 4 6 x Lecture #14-4/4/2006 Slide 11 of 30

Univariate Normal Distribution N(0, 2) Contours Properties f(x) 0.0 0.1 0.2 0.3 0.4 Univariate Normal Distribution 6 4 2 0 2 4 6 x Lecture #14-4/4/2006 Slide 12 of 30

Univariate Normal Distribution N(3, 1) Contours Properties f(x) 0.0 0.1 0.2 0.3 0.4 Univariate Normal Distribution 6 4 2 0 2 4 6 x Lecture #14-4/4/2006 Slide 13 of 30

UVN - Notes Contours Properties Recall that the area under the curve for the univariate normal distribution is a function of the variance/standard deviation. In particular: P(µ σ X µ + σ) = 0.683 P(µ 2σ X µ + 2σ) = 0.954 Also note the term in the exponent: ( ) 2 (x µ) = (x µ)(σ 2 ) 1 (x µ) σ This is the square of the distance from x to µ in standard deviation units, and will be generalized for the. Lecture #14-4/4/2006 Slide 14 of 30

The multivariate normal distribution function is: f(x) = 1 (2π) p/2 Σ 1/2e (x µ)σ 1 (x µ)/2 Contours Properties The mean vector is µ. The covariance matrix is Σ. Standard notation for multivariate normal distributions is N p (µ,σ). Visualizing the is difficult for more than two dimensions, so I will demonstrate some plots with two variables - the bivariate normal distribution. Lecture #14-4/4/2006 Slide 15 of 30

Bivariate Normal Plot #1 µ = [ 0 0 ],Σ = [ 1 0 0 1 ] Contours Properties 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 4 2 4 0 2 4 4 2 0 2 Lecture #14-4/4/2006 Slide 16 of 30

Bivariate Normal Plot #1a µ = [ 0 0 ],Σ = [ 1 0 0 1 ] Contours Properties 4 3 2 1 0 1 2 3 4 4 3 2 1 0 1 2 3 4 Lecture #14-4/4/2006 Slide 17 of 30

Bivariate Normal Plot #2 µ = [ 0 0 ],Σ = [ 1 0.5 0.5 1 ] Contours Properties 0.2 0.15 0.1 0.05 0 4 2 4 0 2 4 4 2 0 2 Lecture #14-4/4/2006 Slide 18 of 30

Bivariate Normal Plot #2 µ = [ 0 0 ],Σ = [ 1 0.5 0.5 1 ] Contours Properties 4 3 2 1 0 1 2 3 4 4 3 2 1 0 1 2 3 4 Lecture #14-4/4/2006 Slide 19 of 30

Contours Contours Properties The lines of the contour plots denote places of equal probability mass for the distribution. These contours can be constructed from the eigenvalues and eigenvectors of the covariance matrix. The direction of the ellipse axes are in the direction of the eigenvalues. The length of the ellipse axes are proportional to the constant times the eigenvector. Specifically: (x µ)σ 1 (x µ) = c 2 has ellipsoids centered at µ, and has axes ±c λ i e i. Lecture #14-4/4/2006 Slide 20 of 30

Contours, Continued Contours Properties Contours are useful because they provide confidence regions for data points from the distribution. The multivariate analog of a confidence interval is given by an ellipsoid, where c is from the Chi-Squared distribution with p degrees of freedom. Specifically: (x µ)σ 1 (x µ) = χ 2 p(α) provides the confidence region containing 1 α of the probability mass of the distribution. Lecture #14-4/4/2006 Slide 21 of 30

Contour Example Contours Properties Imagine we had a bivariate normal distribution with: [ ] [ ] 0 1 0.5 µ =,Σ = 0 0.5 1 The covariance matrix has eigenvalues and eigenvectors: [ ] [ ] 1.5 0.707 0.707 λ =, E = 0.5 0.707 0.707 We want to find a contour where 95% of the probability will fall, corresponding to χ 2 2(0.05) = 5.99 Lecture #14-4/4/2006 Slide 22 of 30

Contour Example This contour will be centered at µ. Axis 1: Contours Properties µ ± [ 5.99 1.5 Axis 2: 0.707 0.707 ] = [ 2.12 2.12 ], [ 2.12 2.12 ] µ ± 5.99 0.5 [ 0.707 0.707 ] = [ 1.22 1.22 ], [ 1.22 1.22 ] Lecture #14-4/4/2006 Slide 23 of 30

Properties The distribution has some convenient properties. Contours Properties If X has a multivariate normal distribution, then: 1. Linear combinations of X are normally distributed. 2. All subsets of the components of X have a distribution. 3. Zero covariance implies that the corresponding components are independently distributed. 4. The conditional distributions of the components are. Lecture #14-4/4/2006 Slide 24 of 30

Finite Mixture Models Finite Mixture Models LCA as a FMM Recall from last time that we stated that a finite mixture model expresses the distribution of X as a function of the sum of weighted distribution likelihoods: f(x) = G η g f(x g) g=1 We are now ready to construct the LPA model likelihood. Here, we say that the conditional distribution of X given g is a sequence of independent normally distributed variables. Lecture #14-4/4/2006 Slide 25 of 30

Latent Class as a FMM Finite Mixture Models LCA as a FMM Using some notation of Bartholomew and Knott, a latent profile model for the response vector of p variables (i = 1,...,p) with K classes (j = 1,...,K): f(x i ) = K j=1 η j p 1 exp 2πσij 2 i=1 ( (x i µ ij ) 2 η j is the probability that any individual is a member of class j (must sum to one). x i is the observed response to variable i. σ 2 ij ) µ ij is the mean for variable i for an individual from class j. σ 2 ij is the variance for variable i for an individual from class j. Lecture #14-4/4/2006 Slide 26 of 30

LPA Example To illustrate the process of LPA, consider an example using Fisher s Iris data. The Mplus code is found on the next few slides. We will use the Plot command to look at our results. Class Interpretation Lecture #14-4/4/2006 Slide 27 of 30

title: of Fisher s Iri data: file=iris.dat; variable: names=x1-x4; classes=c(3); analysis: type=mixture; model: %OVERALL% %C#1% x1-x4; %C#2% x1-x4; %C#3% x1-x4; OUTPUT: TECH1 TECH5 TECH8; PLOT: TYPE=PLOT3; SERIES IS x1(1) x2(2) x3(3) x4(4); 27-1

SAVEDATA: FILE IS myfile.dat; SAVE = CPROBABILITIES; 27-2

Interpreting Classes Class Interpretation After the analysis is finished, we need to examine the item probabilities to gain information about the characteristics of the classes. An easy way to do this is to look at a chart of the item response means by class. Lecture #14-4/4/2006 Slide 28 of 30

Final Thought Final Thought Next Class LPA is a wonderful technique to use to find classes with very specific types of data. We have only scratched the surface of LPA techniques. We will discuss estimation and other models in the weeks to come. Lecture #14-4/4/2006 Slide 29 of 30

Next Time No class next two meetings (4-6 and 4-11). Our next class: More LPA examples/facets. A bit about estimation of such models. Final Thought Next Class Lecture #14-4/4/2006 Slide 30 of 30