Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Similar documents
Parallelization Strategies for Multicore Data Analysis

BayesX - Software for Bayesian Inference in Structured Additive Regression

Markov Chain Monte Carlo Simulation Made Simple

SAS Software to Fit the Generalized Linear Model

Bayesian Statistics in One Hour. Patrick Lam

PS 271B: Quantitative Methods II. Lecture Notes

Nominal and ordinal logistic regression

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Interpretation of Somers D under four simple models

Regression with a Binary Dependent Variable

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Regression III: Advanced Methods

Basics of Statistical Machine Learning

Multiple Choice Models II

Poisson Models for Count Data

Some Essential Statistics The Lure of Statistics

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Logistic Regression (1/24/13)

Multinomial and Ordinal Logistic Regression

LOGIT AND PROBIT ANALYSIS

Simple linear regression

STA 4273H: Statistical Machine Learning

Logistic Regression (a type of Generalized Linear Model)

Statistics in Retail Finance. Chapter 6: Behavioural models

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Linear Classification. Volker Tresp Summer 2015

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Statistics Graduate Courses

SUMAN DUVVURU STAT 567 PROJECT REPORT

Generalized Linear Models

Handling attrition and non-response in longitudinal data

Imputing Missing Data using SAS

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

More details on the inputs, functionality, and output can be found below.

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

A Latent Variable Approach to Validate Credit Rating Systems using R

Examining a Fitted Logistic Model

11. Analysis of Case-control Studies Logistic Regression

Statistics in Retail Finance. Chapter 2: Statistical models of default

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Part 2: One-parameter models

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Christfried Webers. Canberra February June 2015

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Bayes and Naïve Bayes. cs534-machine Learning

1 Prior Probability and Posterior Probability

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

Linear Threshold Units

Multilevel Modelling of medical data

Penalized regression: Introduction

Penalized Logistic Regression and Classification of Microarray Data

Ordinal Regression. Chapter

A Basic Introduction to Missing Data

Estimation of σ 2, the variance of ɛ

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

MODELLING AND ANALYSIS OF

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Centre for Central Banking Studies

Factors affecting online sales

Probit Analysis By: Kim Vincent

Hierarchical Insurance Claims Modeling

Dirichlet Processes A gentle tutorial

Multivariate Normal Distribution

A Bayesian hierarchical surrogate outcome model for multiple sclerosis


A revisit of the hierarchical insurance claims modeling

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

13. Poisson Regression Analysis

LOGISTIC REGRESSION ANALYSIS

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

11. Time series and dynamic linear models

DURATION ANALYSIS OF FLEET DYNAMICS

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

Elements of statistics (MATH0487-1)

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Reliability estimators for the components of series and parallel systems: The Weibull model

Analysis of Bayesian Dynamic Linear Models

Bayesian Analysis for the Social Sciences

Regression III: Advanced Methods

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Logit Models for Binary Data

Bayesian Approaches to Handling Missing Data

MTH 140 Statistics Videos

Gamma Distribution Fitting

Note on the EM Algorithm in Linear Regression Model

Chenfeng Xiong (corresponding), University of Maryland, College Park

Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Introduction to General and Generalized Linear Models

Simple Regression Theory II 2010 Samuel L. Baker

Transcription:

Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary response Y = 1 is recorded if and only if U > 0: θ = Pr(Y = 1 x) = 1 F (0; x). Since U is not directly observed there is no loss of generality in taking the critical (i.e., cutoff point) to be 0. In addition, we can take the standard deviation of U (or some other measure of dispersion) to be 1, without loss of generality. 1

Probit Models For example, if U N(x β, 1) it follows that θ i = Pr(Y = 1 x i ) = Φ(x iβ), where Φ( ) is the cumulative normal distribution function Φ(t) = (2π) 1/2 t exp( 1 2 z2 ) dz. The relation is linearized by the inverse normal transformation Φ 1 (θ) = x iβ = p j=1 x ij β j. 2

We have regarded the cutoff value of U as fixed and the mean of U to be changing with x. Alternatively, one could assume that the distribution of U is fixed and allow the critical value to vary with x (e.g., dose) In toxicology studies where dose is the explanatory variable it makes sense to let V denote the minimum level of dose needed to produce a response (i.e., tolerance) 3

Under the second formulation, y i = 1 if x i β > v i It follows that Pr(Y = 1 x i ) = Pr(V x iβ). Note that the shape of the dose-response curve is determined by the distribution function of V If V N(0, 1), then Pr(Y = 1 x i ) = Φ(x iβ), and it follows that the U and V formulations are equivalent The U formulation is more common 4

Latent Utilities & Choice Models Suppose that Fred is choosing between 2 brands of a product (say, Ben & Jerry s or Haigen Daz) Fred has a utility for Ben & Jerry s (denoted by Z i1 ) and a utility for Haigen Daz (denotes by Z i2 ) Letting the difference in utilities be represented by the normal linear model, we have U i = Z i1 Z i2 = x iβ + ɛ i, where ɛ i N(0, 1). If Fred has a higher utility for Ben & Jerry s, then Z i1 > Z i2, U i > 0, and Fred will choose Ben & Jerry s (Y i = 1) 5

This latent utility formulation is again equivalent to a probit model for the binary response. The generalization to a multinomial response is straightforward: Introduce k latent utilities instead of 2 Individual s response (i.e., choice) corresponds to the category with maximum utility Referred to as discrete choice model 6

Comments: Certain types of models are preferred in certain applications - probit common in bioassay & social sciences. Ability to calculate posterior densities for any functional makes parameter interpretation less important for Bayesian approach. The model will ideally be motivated by prior information and fit to the current data. Practically, computational convenience also plays a role, particularly in complex settings. 7

Logistic Regression The normal form is only one possibility for the distribution of U. Another is the logistic distribution with location x iβ and unit scale. The logistic distribution has cumulative distribution function so that F (u) = exp(u x iβ) 1 + exp(u x iβ), F (0; x i ) = 1/{1 + exp(x iβ)}, 8

It follows that Pr(Y = 1 x i ) = Pr(U > 0 x i ) = 1 F (0; x i ) = 1/{1+exp( x iβ)}. To linearize this relation, we take the logit transformation of both sides, log{θ i /(1 θ i )} = x iβ. 9

Some Generalizations of the Logistic Model Logistic regression assumes a restricted dose-response shape - possible to relax restriction Aranda-Ordaz (1981) proposed two families of linearizing transformations, which can easily be inverted and which span a range of forms. 10

The first, which is restricted to symmetric cases (i.e., invariant to interchanging success & failure) is 2 θ ν (1 θ) ν ν θ ν + (1 θ). ν In the limit as ν 0, this is logistic and for ν = 1 this is linear The second family has log[{(1 θ) ν 1}/ν], which reduces to the extreme value model when ν = 0 and the logistic when ν = 1. 11

When there is doubt about the transformation, a formal approach is to use one or the other of the above transformations and to fit the resulting model for a range of possible values for ν A profile likelihood can be obtained for ν by plotting the maximized likelihood against ν (Frequentist) 12

Potentially, one could choose a standard form, such as the logistic, if the corresponding value of ν falls with the 95% profile likelihood confidence region. Alternatively, we could choose a prior density for ν and implement a Bayesian approach. 13

DDE & Preterm Birth Example Scientific Interest: Association between DDE exposure & preterm birth adjusting for possible confounding variables. Data from US Collaborative Perinatal Project (CPP) - n = 2380 children out of which 361 were born preterm. Analysis: Bayesian analysis using a probit model 14

Model: y i = 1 if preterm birth and y i = 0 if full-term birth Pr(y i = 1 x i, β) = Φ(x iβ), x i = (1, dde i, x i3,..., x i7 ) x i3,..., x i7 = represent possible confounders β 1 = intercept β 2 = dde slope 15

Prior, Likelihood & Posterior Prior: π(β) = N(0, Σ β ) Likelihood: π(y β, X) = n i=1 Φ(x iβ) y i {1 Φ(x iβ)} 1 y i Posterior: π(β y, X) π(β)π(y β, X). No closed form available for normalizing constant 16

Maximum Likelihood Results Parameter MLE SE Z stat p-value β 1-1.08068 0.04355-24.816 < 2e 16 β 2 0.17536 0.02909 6.028 1.67e-09 β 3-0.12817 0.03528-3.633 0.000280 β 4 0.11097 0.03366 3.297 0.000978 β 5-0.01705 0.03405-0.501 0.616659 β 6-0.08216 0.03576-2.298 0.021571 β 7 0.05462 0.06473 0.844 0.398721 β 2 = dde slope (highly significant increasing trend) 17

Augmented Data Posterior Augment observed data {y i, x i } with latent z i. Probit model can be expressed in hierarchical form as follows: y i = 1(z i > 0) z i N(x iβ, 1) 18

Augmented data posterior is π(z, β y, X) { n i=1 π(y i z i, x i, β) }{ n i=1 Conditional distribution of y i given z i, x i, β: π(z i x i, β) } π(β), π(y i z i, x i, β) = 1(z i > 0)y i + 1(z i 0)(1 y i ) For Gibbs sampling, need full conditional posterior distributions for each unknown given all other unknowns in the model. 19

Conditional posterior of z i y i = 1, x i, β: π(z i y i = 1, x i, β) 1(z i > 0) 2π exp { 1/2(z i x iβ) 2 } = 1(z i > 0) N(z i ; x iβ, 1) 1 F (0; x iβ, 1) = N(z i ; x iβ, 1) truncated below by 0. Conditional posterior of z i y i = 0, x i, β: π(z i y i = 0, x i, β) = N(z i ; x iβ, 1) truncated above by 0. Conditional posterior of β z, y, X: π(β z, y, X) = N( β, V β ), V β = (Σ 1 β + X X) 1, β = V β (Σ 1 β β 0 + X z). 20

To implement Gibbs sampling, we simply iteratively sample from these full conditional posterior distributions a large number of times An initial burn-in is discarded to allow convergence to a stationary distribution Inferences can be based on posterior summaries calculated using the draws from the joint posterior distribution. 21

DDE Application We chose a ridge-regression type conjugate prior: π(β) = N(β; β 0, Σ β ), β = 0, Σ β = 4 I 7 7 An alternative default-type conjugate prior is obtained by letting Σ β = g (X X) 1, where g is a small positive constant. This prior, which is related to Zellner s g-prior, has some nice properties. 22

Another possibility would be the uniform improper prior: π(β) 1 Ideally, one would choose an informative prior to formally incorporate outside information into the analysis. For this example, the sample size is large, so there tends to be minimal differences in the posterior obtained using various diffuse to moderately informative priors with support on R. 23

Gibbs Sampling Results We chose β = 0 as starting values. The MLEs may provide a better choice, but results should not depend on the starting values. The algorithm was run for 1,000 iterations, with the first 100 iterations discarded as a burn-in. For standard GLMs, convergence tends to occur rapidly and 100 iterations appeared to be sufficient in this case based on diagnostic plots. For more complex models, many more iterations may be needed. 24

intercept (beta_1) 1.2 1.0 0.8 0.6 0 200 400 600 800 1000 iteration slope (beta_2) 0.05 0.15 0.25 0 200 400 600 800 1000 iteration 25

Inferences on a Regression Coefficient Bayesian inferences are often based on examination of marginal posterior densities. These densities can be estimated using kernel smoothing applied to draws collected after burn-in. One can also summarize the posterior using the mean, median, and a 95% credible interval. 26

Posterior Summaries Parameter Mean Median SD 95% credible interval β 1-1.08-1.08 0.04 (-1.16, -1.01) β 2 0.17 0.17 0.03 (0.12, 0.23) β 3-0.13-0.13 0.04 (-0.2, -0.05) β 4 0.11 0.11 0.03 (0.05, 0.18) β 5-0.02-0.02 0.03 (-0.08, 0.05) β 6-0.08-0.08 0.04 (-0.15, -0.02) β 7 0.05 0.06 0.06 (-0.07, 0.18) 27

Posterior Density 0 2 4 6 8 10 12 0.05 0.10 0.15 0.20 0.25 0.30 DDE slope (beta_1) 28

Inferences on Functional Often, it is not the regression parameter which is of primary interest. One may want to estimate functionals, such as the mean at different values of a predictor. By applying the function to every iteration of the MCMC algorithm after burn-in, one can obtain samples from the marginal posterior density of the unknown of interest. 29

Pr(Preterm birth) 0.0 0.05 0.10 0.15 0.20 0.25 2 1 0 1 2 Serum DDE (mg/l) 30