# Introduction to latent variable models

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Introduction to latent variable models lecture 1 Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia, IT

2 Outline [2/24] Latent variables and their use Some example datasets A general formulation of latent variable models The Expectation-Maximization algorithm for maximum likelihood estimation Finite mixture model (with example of application) Latent class and latent regression models (with examples of application)

3 Latent variables and their use [3/24] Latent variable and their use A latent variable is a variable which is not directly observable and is assumed to affect the response variables (manifest variables) Latent variables are typically included in an econometric/statistical model (latent variable model) with different aims: representing the effect of unobservable covariates/factors and then accounting for the unobserved heterogeneity between subjects (latent variables are used to represent the effect of these unobservable factors) accounting for measurement errors (the latent variables represent the true outcomes and the manifest variables represent their disturbed versions)

4 Latent variables and their use [4/24] summarizing different measurements of the same (directly) unobservable characteristics (e.g., quality-of-life), so that sample units may be easily ordered/classified on the basis of these traits (represented by the latent variables) Latent variable models have now a wide range of applications, especially in the presence of repeated observations, longitudinal/panel data, and multilevel data These models are typically classified according to: nature of the response variables (discrete or continuous) nature of the latent variables (discrete or continuous) inclusion or not of individual covariates

5 Latent variables and their use [5/24] Most well-known latent variable models Factor analysis model: fundamental tool in multivariate statistic to summarize several (continuous) measurements through a small number of (continuous) latent traits; no covariates are included Item Response Theory models: models for items (categorical responses) measuring a common latent trait assumed to be continuous (or less often discrete) and typically representing an ability or a psychological attitude; the most important IRT model was proposed by Rasch (1961); typically no covariates are included Generalized linear mixed models (random-effects models): extension of the class of Generalized linear models (GLM) for continuous or categorical responses which account for unobserved heterogeneity, beyond the effect of observable covariates

6 Latent variables and their use [6/24] Finite mixture model: model, used even for a single response variable, in which subjects are assumed to come from subpopulations having different distributions of the response variables; typically covariates are ruled out Latent class model: model for categorical response variables based on a discrete latent variable, the levels of which correspond to latent classes in the population; typically covariates are ruled out Finite mixture regression model (Latent regression model): version of the finite mixture (or latent class model) which includes observable covariates affecting the conditional distribution of the response variables and/or the distribution of the latent variables

7 Latent variables and their use [7/24] Models for longitudinal/panel data based on a state-space formulation: models in which the response variables (categorical or continuous) are assumed to depend on a latent process made of continuous latent variables Latent Markov models: models for longitudinal data in which the response variables are assumed to depend on an unobservable Markov chain, as in hidden Markov models for time series; covariates may be included in different ways Latent Growth/Curve models: models based on a random effects formulation which are used the study of the evolution of a phenomenon across of time on the basis of longitudinal data; covariates are typically ruled out

8 Latent variables and their use [8/24] Some example datasets Dataset 1: it consists of 500 observations simulated from a model with 2 components By a finite mixture model we can estimate separate parameters for these components and classify sample units (model-based clustering)

9 Latent variables and their use [9/24] Dataset 2: it is collected on 216 subjects who responded to T = 4 items concerning similar social aspects (Goodman, 1974, Biometrika) Data may be represented by a 2 4 -dimensional vector of frequencies for all the response configurations freq(0000) 42 freq(0001) n =. = 23. freq(1111) 20 By a latent class model we can classify subjects in homogeneous clusters on the basis of the tendency measured by the items

10 Latent variables and their use [10/24] Dataset 3: about 1,093 elderly people, admitted in 2003 to 11 nursing homes in Umbria (IT), who responded to 9 items about their health status: Item % 1 [CC1] Does the patient show problems in recalling what recently happened (5 minutes)? [CC2] Does the patient show problems in making decisions regarding tasks of daily life? [CC3] Does the patient have problems in being understood? [ADL1] Does the patient need support in moving to/from lying position, turning side to side and positioning body while in bed? [ADL2] Does the patient need support in moving to/from bed, chair, wheelchair and standing position? [ADL3] Does the patient need support for eating? [ADL4] Does the patient need support for using the toilet room? [SC1] Does the patient show presence of pressure ulcers? [SC2] Does the patient show presence of other ulcers? 23.1

11 Latent variables and their use [11/24] Binary responses to items are coded so that 1 is a sign of bad health conditions The available covariates are: gender (0 = male, 1 = female) 11 dummies for the nursing homes age By a latent class regression model we can understand how the covariates affect the probability of belonging to the different latent classes (corresponding to different levels of the health status)

12 General formulation of latent variable models [12/24] A general formulation of latent variable models The contexts of application dealt with are those of: observation of different response variables at the same occasion (e.g. item responses) repeated observations of the same response variable at consecutive occasions (longitudinal/panel data); this is related to the multilevel case in which subjects are collected in clusters Basic notation: n: number of sample units (or clusters in the multilevel case) T : number of response variables (or observations of the same response variable) for each subject y it : response variable of type t (or at occasion t) for subject i x it : corresponding column vector of covariates

13 General formulation of latent variable models [13/24] A latent variable model formulates the conditional distribution of the response vector y i = (y i1,..., y it ), given the covariates (if there are) in X i = (x i1,..., x it ) and a vector u i = (u i1,..., u il ) of latent variables The model components of main interest concern: conditional distribution of the response variables given X i and u i (measurement model): p(y i u i, X i ) distribution of the latent variables given the covariates (latent model): p(u i X i ) With T > 1, a crucial assumption is typically that of (local independence): the response variables in y i are conditionally independent given X i and u i

14 General formulation of latent variable models [14/24] The marginal distribution of the response variables (manifest distribution) is obtained as p(y i X i ) = p(y i u i, X i )p(u i X i )du i This distribution may be explicitly computed with discrete latent variables, when the integral becomes a sum With continuous latent variables the integral may be difficult to compute and quadrature or Monte Carlo methods are required The conditional distribution of the latent variables given the responses (posterior distribution) is p(u i X i, y i ) = p(y i u i, X i )p(u i X i ) p(y i X i )

15 General formulation of latent variable models [15/24] Case of discrete latent variables (finite mixture model, latent class model) Each vector u i has a discrete distribution with k support point ξ 1,..., ξ k and corresponding probabilities π 1 (X i ),..., π k (X i ) (possibly depending on the covariates) The manifest distribution is then p(y i X i ) = c π c p(y i u i = ξ c, X i ) without covariates p(y i X i ) = c π c (X i )p(y i u i = ξ c, X i ) with covariates Model parameters are typically the support points ξ c, the mass probabilities π c and parameters common to all the distributions

16 General formulation of latent variable models [16/24] Example: Finite mixture of Normal distributions with common variance There is only one latent variable (l = 1) having k support points and no covariates are included Each support point ξ c corresponds to a mean µ c and there is a common variance-covariance matrix Σ The manifest distribution of y i is: p(y i ) = c π cφ(y i ; µ c, Σ) φ(y; µ, Σ): density function of the multivariate Normal distribution with mean µ and variance-covariance matrix Σ Exercise: write down the density of the model in the univariate case with k = 2 and represent it for different parameter values

17 General formulation of latent variable models [17/24] Case of continuous latent variables (Generalized linear mixed models) With only one latent variable (l = 1), the integral involved in the manifest distribution is approximated by a sum (quadrature method): p(y i X i ) c π c p(y i u i = ξ c, X i ) In this case the nodes ξ c and the corresponding weights π c are a priori fixed; a few nodes are usually enough for an adequate approximation With more latent variables (l > 1), the quadrature method may be difficult to implement and unprecise; a Monte Carlo method is preferable in which the integral is approximated by a mean over a sample drawn from the distribution of u i

18 General formulation of latent variable models [18/24] Example: Logistic model with random effect There is only one latent variable u i (l = 1), having Normal distribution with mean µ and variance σ 2 The distribution of the response variables given the covariates is p(y it u i, X i ) = p(y it u i, x it ) = exp[y it(u i + x it β)] 1 + exp(u i + x it β) and local independence is assumed The manifest distribution of the response variables is p(y i X i ) = [ p(y it u i, x it )]φ(u i ; µ, σ 2 )du i t

19 General formulation of latent variable models [19/24] In order to compute the manifest distribution it is convenient to reformulate the model as p(y it u i, x it ) = exp(u iσ + x it β) 1 + exp(u i σ + x it β), where u i N(0, 1) and µ has been absorbed into the intercept in β The manifest distribution is computed as p(y i X i ) = c π c p(y it u i = ξ c, x it ) ξ 1,..., ξ k : grid of points between, say, -5 and 5 π 1,..., π k : mass probabilities computed as π c = φ(ξ c; 0, 1) d φ(ξ d; 0, 1) Exercise: implement a function to compute the manifest distribution with T = 1 and one covariate; try different values of µ and σ 2 t

20 Expectation-Maximization paradigm [20/24] The Expectation-Maximization (EM) paradigm for maximum likelihood estimation This is a general approach for maximum likelihood estimation in the presence of missing data (Dempster et al., 1977, JRSS-B) In our context, missing data correspond to the latent variables, then: incomplete (observable) data: covariates and response variables (X, Y ) complete (unobservable) data: incomplete data + latent variables (U, X, Y ) The corresponding log-likelihood functions are: l(θ) = i log p(y i X i ), l (θ) = i log[p(y i u i, X i )p(u i X i )]

21 Expectation-Maximization paradigm [21/24] The EM algorithm maximizes l(θ) by alternating two steps until convergence (h=iteration number): E-step: compute the expect value of l (θ) given the current parameter value θ (h 1) and the observed data, obtaining Q(θ θ (h 1) ) = E[l (θ) X, Y, θ (h 1) ] M-step: maximize Q(θ θ (h 1) ) with respect to θ obtaining θ (h) Convergence is checked on the basis of the difference l(θ (h) ) l(θ (h 1) ) or θ (h) θ (h 1) The algorithm is usually easy to implement with respect to Newton-Raphson algorithms, but it is usually much slower

22 Expectation-Maximization paradigm [22/24] Case of discrete latent variables It is convenient to introduce the dummy variables z ic, i = 1,..., n, c = 1,..., k, with z ic = { 1 if u i = ξ c 0 otherwise The compute log-likelihood may then be expressed as l (θ) = i z ic log[π c (X i )p(y i u i = ξ c, X i )] c The corresponding conditional expected value is then computed as Q(θ θ (h 1) ) = i ẑ ic : posterior expected value of u i = ξ c ẑ ic log[π c (X i )p(y i u i = ξ c, X i )] c

23 Expectation-Maximization paradigm [23/24] The posterior expected value ẑ ic is computed as ẑ ic = p(z ic = 1 X, Y, ˆθ (h 1) ) = π c(x i )p(y i u i = ξ c, X i ) d π d(x i )p(y i u i = ξ d, X i ) The EM algorithm is much simpler to implement with respect to the general case; its steps become: E-step: compute the expected values ẑ ic for every i and c M-step: maximize Q(θ θ (h 1) ) with respect to θ, obtaining θ (h) A similar algorithm may be adopted, as an alternative to a Newton-Raphson algorithm, for a model with continuous latent variables when the manifest distribution is computed by quadrature Exercise: show how to implement the algorithm for the finite mixture of Normal distributions with common variance (try simulated data)

24 Latent class and latent regression model [24/24] Latent class and latent regression model These are models for categorical response variables (typically binary) based on a single discrete latent variable For each level ξ c of the latent variable there is a specific conditional distribution of y it In the latent regression version the mass probabilities (conditional distribution of each y it ) are allowed to depend on individual covariates (e.g. multinomial logit parameterization) Exercise: write down the manifest distribution of the latent class model for binary response variables and binary latent variable Exercise: implement the EM algorithm for the latent class model (try on the Goodman (1974) dataset)

25 Latent class and latent regression model [25/24] Latent regression model Two possible choices to include individual covariates: 1. on the measurement model so that we have random intercepts (via a logit or probit parametrization): λ itc = p(y it = 1 u i = ξ c, X i ), λ itc log = ξ c + x 1 λ itβ, i = 1,..., n, t = 1,..., T, c = 1,..., k itc 2. on the model for the distribution of the latent variables (via a multinomial logit parameterization): π ic = p(u i = ξ c X i ), log π ic π i1 = x itβ c, c = 2,..., k Alternative parameterizations are possible with ordinal response variables or ordered latent classes

26 Latent class and latent regression model [26/24] The models based on the two extensions have a different interpretation: 1. the latent variables are used to account for the unobserved heterogeneity and then the model may be seen as discrete version of the logistic model with one random effect 2. the main interest is on a latent variable which is measured through the observable response variables (e.g. health status) and on how this latent variable depends on the covariates Only the M-step of the EM algorithm must be modified by exploiting standard algorithms for the maximization of: 1. the weighed likelihood of a logit model 2. the likelihood of a multinomial logit model

27 Latent class and latent regression model [27/24] Exercise: write down the manifest distribution of the latent regression model for binary response variables and binary latent variable Exercise: show how to implement (and implement) the EM algorithm for a latent class model for binary response variables (try with the elderly people dataset)

### Item selection by latent class-based methods: an application to nursing homes evaluation

Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University

### Generalized Linear Model Theory

Appendix B Generalized Linear Model Theory We describe the generalized linear model as formulated by Nelder and Wedderburn (1972), and discuss estimation of the parameters and tests of hypotheses. B.1

### Clustering - example. Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering

Clustering - example Graph Mining and Graph Kernels Given some data x i X Find a partitioning of the data into k disjunctive clusters Example: k-means clustering x!!!!8!!! 8 x 1 1 Clustering - example

### A hidden Markov model for criminal behaviour classification

RSS2004 p.1/19 A hidden Markov model for criminal behaviour classification Francesco Bartolucci, Institute of economic sciences, Urbino University, Italy. Fulvia Pennoni, Department of Statistics, University

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model

Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model

### Classifying Galaxies using a data-driven approach

Classifying Galaxies using a data-driven approach Supervisor : Prof. David van Dyk Department of Mathematics Imperial College London London, April 2015 Outline The Classification Problem 1 The Classification

### Mixture Models. Jia Li. Department of Statistics The Pennsylvania State University. Mixture Models

Mixture Models Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based

### 15 Ordinal longitudinal data analysis

15 Ordinal longitudinal data analysis Jeroen K. Vermunt and Jacques A. Hagenaars Tilburg University Introduction Growth data and longitudinal data in general are often of an ordinal nature. For example,

### Random-Effects Models for Multivariate Repeated Responses

Random-Effects Models for Multivariate Repeated Responses Geert Verbeke & Steffen Fieuws geert.verbeke@med.kuleuven.be Biostatistical Centre, K.U.Leuven, Belgium QMSS, Diepenbeek, September 2006 p. 1/24

### Problem of Missing Data

VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

### Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Regime Switching Models: An Example for a Stock Market Index

Regime Switching Models: An Example for a Stock Market Index Erik Kole Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam April 2010 In this document, I discuss in detail

### Lecture 20: Logit Models for Multinomial Responses

Lecture 20: Logit Models for Multinomial Responses Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

### CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### Lecture 7 Logistic Regression with Random Intercept

Lecture 7 Logistic Regression with Random Intercept Logistic Regression Odds: expected number of successes for each failure P(y log i x i ) = β 1 + β 2 x i 1 P(y i x i ) log{ Od(y i =1 x i =a +1) } log{

### Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 1-3. You must complete all four problems to

### Structural Equation Models: Mixture Models

Structural Equation Models: Mixture Models Jeroen K. Vermunt Department of Methodology and Statistics Tilburg University Jay Magidson Statistical Innovations Inc. 1 Introduction This article discusses

### Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 10, 2014 1 Principle of maximum likelihood Consider a family of probability distributions

Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

### Fortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid

Fortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid Institut für Statistik LMU München Sommersemester 2013 Outline

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### 1 Notes on Markov Chains

Notes on Markov Chains A Markov chain is a finite-state process that can be characterized completely by two objects the finite set of states Z = {z,, z n } and the transition matrix Π = [π ij i,j=,n that

### Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

### Microeconometrics Blundell Lecture 1 Overview and Binary Response Models

Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell http://www.ucl.ac.uk/~uctp39a/ University College London February-March 2016 Blundell (University College London)

### ML estimation: very general method, applicable to. wide range of problems. Allows, e.g., estimation of

Maximum Likelihood (ML) Estimation ML estimation: very general method, applicable to wide range of problems. Allows, e.g., estimation of models nonlinear in parameters. ML estimator requires a distributional

### Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

### A SAS-Macro for Linear Mixed Models with Finite Normal Mixtures as Random-Effects Distribution, version 1.1

Katholieke Universiteit Leuven Biostatistisch Centrum A SAS-Macro for Linear Mixed Models with Finite Normal Mixtures as Random-Effects Distribution, version 1.1 Arnošt Komárek Geert Verbeke Geert Molenberghs

### Latent Class (Finite Mixture) Segments How to find them and what to do with them

Latent Class (Finite Mixture) Segments How to find them and what to do with them Jay Magidson Statistical Innovations Inc. Belmont, MA USA www.statisticalinnovations.com Sensometrics 2010, Rotterdam Overview

### Hidden Markov Models Fundamentals

Hidden Markov Models Fundamentals Daniel Ramage CS229 Section Notes December, 2007 Abstract How can we apply machine learning to data that is represented as a sequence of observations over time? For instance,

### Handling attrition and non-response in longitudinal data

Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

### Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution

Solving Challenging Non-linear Regression Problems by Manipulating a Gaussian Distribution Sheffield Gaussian Process Summer School, 214 Carl Edward Rasmussen Department of Engineering, University of Cambridge

### Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu)

Paper Author (s) Chenfeng Xiong (corresponding), University of Maryland, College Park (cxiong@umd.edu) Lei Zhang, University of Maryland, College Park (lei@umd.edu) Paper Title & Number Dynamic Travel

### Lecture #2 Overview. Basic IRT Concepts, Models, and Assumptions. Lecture #2 ICPSR Item Response Theory Workshop

Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

### A tutorial on Bayesian model selection. and on the BMSL Laplace approximation

A tutorial on Bayesian model selection and on the BMSL Laplace approximation Jean-Luc (schwartz@icp.inpg.fr) Institut de la Communication Parlée, CNRS UMR 5009, INPG-Université Stendhal INPG, 46 Av. Félix

### IRTPRO is an entirely new application for item calibration and test scoring using IRT.

IRTPRO 3 FEATURES... 1 ORGANIZATION OF THE USERS GUIDE... 2 MONTE CARLO-MARKOV CHAIN (MCMC) ESTIMATION... 4 MCMC GRAPHICS... 4 Autocorrelations... 4 Trace Plots... 5 Running Means... 6 Posterior Densities...

### Fisher Information and Cramér-Rao Bound. 1 Fisher Information. Math 541: Statistical Theory II. Instructor: Songfeng Zheng

Math 54: Statistical Theory II Fisher Information Cramér-Rao Bound Instructor: Songfeng Zheng In the parameter estimation problems, we obtain information about the parameter from a sample of data coming

### Lecture 1: Finite Markov Chains. Branching process.

Lecture 1: Finite Markov Chains. Branching process. October 9, 2007 Antonina Mitrofanova A Stochastic process is a counterpart of the deterministic process. Even if the initial condition is known, there

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### CS340 Machine learning Naïve Bayes classifiers

CS340 Machine learning Naïve Bayes classifiers Document classification Let Y {1,,C} be the class label and x {0,1} d eg Y {spam, urgent, normal}, x i = I(word i is present in message) Bag of words model

### Lecture 9: Continuous

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 9: Continuous Latent Variable Models 1 Example: continuous underlying variables What are the intrinsic latent dimensions in these two datasets?

### Logistic and Poisson Regression: Modeling Binary and Count Data. Statistics Workshop Mark Seiss, Dept. of Statistics

Logistic and Poisson Regression: Modeling Binary and Count Data Statistics Workshop Mark Seiss, Dept. of Statistics March 3, 2009 Presentation Outline 1. Introduction to Generalized Linear Models 2. Binary

### Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

### Package EstCRM. July 13, 2015

Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

### Review of Likelihood Theory

Appendix A Review of Likelihood Theory This is a brief summary of some of the key results we need from likelihood theory. A.1 Maximum Likelihood Estimation Let Y 1,..., Y n be n independent random variables

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### ECONOMETRIC MODELS. The concept of Data Generating Process (DGP) and its relationships with the analysis of specification.

ECONOMETRIC MODELS The concept of Data Generating Process (DGP) and its relationships with the analysis of specification. Luca Fanelli University of Bologna luca.fanelli@unibo.it The concept of Data Generating

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Analysis of Microdata

Rainer Winkelmann Stefan Boes Analysis of Microdata With 38 Figures and 41 Tables 4y Springer Contents 1 Introduction 1 1.1 What Are Microdata? 1 1.2 Types of Microdata 4 1.2.1 Qualitative Data 4 1.2.2

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Lecture 27: Introduction to Correlated Binary Data

Lecture 27: Introduction to Correlated Binary Data Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

### Statistical Analysis with Missing Data

Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### The Exponential Family

The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural

### Calibration of Dynamic Traffic Assignment Models

Calibration of Dynamic Traffic Assignment Models m.hazelton@massey.ac.nz Presented at DADDY 09, Salerno Italy, 2-4 December 2009 Outline 1 Statistical Inference as Part of the Modelling Process Model Calibration

### CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

### Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

### CS540 Machine learning Lecture 14 Mixtures, EM, Non-parametric models

CS540 Machine learning Lecture 14 Mixtures, EM, Non-parametric models Outline Mixture models EM for mixture models K means clustering Conditional mixtures Kernel density estimation Kernel regression GMM

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

### Lecture 22: Introduction to Log-linear Models

Lecture 22: Introduction to Log-linear Models Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

### Lecture 6: The Bayesian Approach

Lecture 6: The Bayesian Approach What Did We Do Up to Now? We are given a model Log-linear model, Markov network, Bayesian network, etc. This model induces a distribution P(X) Learning: estimate a set

### K-Means Clustering. Clustering and Classification Lecture 8

K-Means Clustering Clustering and Lecture 8 Today s Class K-means clustering: What it is How it works What it assumes Pitfalls of the method (locally optimal results) 2 From Last Time If you recall the

### An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Note on the EM Algorithm in Linear Regression Model

International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

### Binary Response Models

Short Guides to Microeconometrics Fall 2016 Kurt Schmidheiny Unversität Basel Binary Response Models 2 2 The Econometric Model: Probit and Logit 1 Introduction Binary Response Models Many dependent variables

### Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Example: (Spector, 987) describes a study of two drugs on human heart rate There are 24 subjects enrolled in the study which are assigned at random to one of

### Joint model with latent classes for time-to-event and longitudinal data

Joint model with latent classes for time-to-event and longitudinal data Hélène Jacqmin-Gadda, Cécile Proust-Lima INSRM, U897, Bordeaux, France Inserm workshop, St Raphael Joint models for time-to-event

### CS229 Lecture notes. Andrew Ng

CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

### CHAPTER 6 EXAMPLES: GROWTH MODELING AND SURVIVAL ANALYSIS

Examples: Growth Modeling And Survival Analysis CHAPTER 6 EXAMPLES: GROWTH MODELING AND SURVIVAL ANALYSIS Growth models examine the development of individuals on one or more outcome variables over time.

### Multilevel Modeling of Complex Survey Data

Multilevel Modeling of Complex Survey Data Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 University of California, Los Angeles 2 Abstract We describe a multivariate, multilevel, pseudo maximum

### Bayesian Statistics in One Hour. Patrick Lam

Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical

### 1 Logit & Probit Models for Binary Response

ECON 370: Limited Dependent Variable 1 Limited Dependent Variable Econometric Methods, ECON 370 We had previously discussed the possibility of running regressions even when the dependent variable is dichotomous

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

### ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

### Bayesian Techniques for Parameter Estimation. He has Van Gogh s ear for music, Billy Wilder

Bayesian Techniques for Parameter Estimation He has Van Gogh s ear for music, Billy Wilder Statistical Inference Goal: The goal in statistical inference is to make conclusions about a phenomenon based

### Curriculum Vitae of Francesco Bartolucci

Curriculum Vitae of Francesco Bartolucci Department of Economics, Finance and Statistics University of Perugia Via A. Pascoli, 20 06123 Perugia (IT) email: bart@stat.unipg.it http://www.stat.unipg.it/bartolucci

### Partitioning of Variance in Multilevel Models. Dr William J. Browne

Partitioning of Variance in Multilevel Models Dr William J. Browne University of Nottingham with thanks to Harvey Goldstein (Institute of Education) & S.V. Subramanian (Harvard School of Public Health)

### Bayesian Learning. CSL465/603 - Fall 2016 Narayanan C Krishnan

Bayesian Learning CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### Event history analysis

Event history analysis Jeroen K. Vermunt Department of Methodology and Statistics Tilburg University 1 Introduction The aim of event history analysis is to explain why certain individuals are at a higher

### Lecture 12: Generalized Linear Models for Binary Data

Lecture 12: Generalized Linear Models for Binary Data Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of

### A Tutorial on Logistic Regression

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

### Stat260: Bayesian Modeling and Inference Lecture Date: February 1, Lecture 3

Stat26: Bayesian Modeling and Inference Lecture Date: February 1, 21 Lecture 3 Lecturer: Michael I. Jordan Scribe: Joshua G. Schraiber 1 Decision theory Recall that decision theory provides a quantification

### 6. If there is no improvement of the categories after several steps, then choose new seeds using another criterion (e.g. the objects near the edge of

Clustering Clustering is an unsupervised learning method: there is no target value (class label) to be predicted, the goal is finding common patterns or grouping similar examples. Differences between models/algorithms

### A model-based approach to goodness-of-fit evaluation in item response theory

A MODEL-BASED APPROACH TO GOF EVALUATION IN IRT 1 A model-based approach to goodness-of-fit evaluation in item response theory DL Oberski JK Vermunt Dept of Methodology and Statistics, Tilburg University,

### Mixtures of Robust Probabilistic Principal Component Analyzers

Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Model based clustering of longitudinal data: application to modeling disease course and gene expression trajectories

1 2 3 Model based clustering of longitudinal data: application to modeling disease course and gene expression trajectories 4 5 A. Ciampi 1, H. Campbell, A. Dyacheno, B. Rich, J. McCuser, M. G. Cole 6 7

### Estimation and comparison of multiple change-point models

Journal of Econometrics 86 (1998) 221 241 Estimation and comparison of multiple change-point models Siddhartha Chib* John M. Olin School of Business, Washington University, 1 Brookings Drive, Campus Box