Bayesian Hidden Markov Models for Alcoholism Treatment Trial Data May 12, 2008
Co-Authors Dylan Small, Statistics Department, UPenn Kevin Lynch, Treatment Research Center, Upenn Steve Maisto, Psychology Department, Syracuse University Dave Oslin, Treatment Research Center, UPenn
The Problem N subjects measured on T days: daily drink counts. Want to estimate the average treatment effect on outcome. Day Subject 1 2 3 4... 166 167 168 1 1 1 2 2... 2 1 1 2 1 1 1... 3 1 1 3 3 3 3 1... 1 1 3........... 238 1 3... 2 3 3 239 1 1 1 1... 1 1 1 240 1 1 1 2... 1 2 2
Sample Time Series Subject 61 Subject 108 3 3 Drinks 2 Drinks 2 1 1 0 50 100 150 Day 0 50 100 150 Day Subject 142 Subject 183 3 3 Drinks 2 Drinks 2 1 1 0 50 100 150 Day 0 50 100 150 Day
The Goal of Treatment The main goal: Reduce Alcohol Consumption 1. Does the treatment reduce the frequency of all drinking events - or only certain types of drinking events? Is moderate drinking an acceptable outcome? How does the treatment affect different complex drinking patterns and behaviors? 2. Does the treatment reduce the frequency and/or duration of relapses? What is a relapse? Everybody agrees on the notion of a relapse, but there is no concensus for an operational definition of relapse.
What is the Outcome? It s complicated. The subjects are recovering alcoholics, whose drinking behaviors are complex processes that evolve and change through time. Simple models lack the structure to adequately describe these processes (Wang, et. al., 2002).
Simple Models Time until first drink/relapse (Ignores all behavior after first drink) Percentage of days drinking (Ignores amount of alcohol that is consumed) Multiple failure time models (Requires definition of a relapse) Drinks per Day 3 Y it 2 1 0 5 10 15 20 25 30 day
HMM Motivation A well-known theory of relapse, the cognitive-behavioral model of relapse (McKay, et. al. 2006, Marlatt and Gordon, 1985), suggests that the cause of a relapse is two-fold: 1. First, the subject must be in a mental and/or physical condition in which he or she is vulnerable to drinking. That is, if presented with an opportunity to drink, the subject would not be able to mount a coping response. 2. Second, the subject must actually encounter such a high-risk drinking situation.
HMM structure Y it is the observation for subject i at time t. Y i1 Y i2 Y i, t 1 Y it Y i, t+1 Y i, T 1 Y it H i1 H i2 H i, t 1 H it H i, t+1 H i, T 1 H it H it is the hidden state for subject i at time t.
A Simple HMM with no covariates The complete-data likelihood for an HMM factors into three parts: p(y, H θ) = N p(h i1 θ) (1) i=1 N i=1 t=2 N i=1 t=1 T p(h it H (i,t 1),θ) (2) T p(y it H it,θ), (3) where Y and H denote observations and hidden states, and parts (1), (2), and (3) refer to the initial state distribution, the hidden state transitions, and the observations, respectively.
Simple HMM Fit: S=5 Fit multinomial distributions for hidden state transitions and observations conditional on hidden states. Data is pooled across individuals: ˆπ = (.79,.11,.01,.07,.01).99.00.00.00.01.01.98.01.00.00 ˆQ =.01.00.95.00.04.01.00.00.98.01.05.00.02.02.91 ˆP =.99.01.00.71.26.03.08.86.06.65.06.29.03.01.96 where ˆQ and ˆP denote the hidden state transition matrix, and the observation distributions, respectively.
Interpretation of hidden states for S=5 1. Large probabilities on the diagonal of ˆQ hidden states are persistent. 2. Observation Distributions are clinically interpretable: Y it = 1 Y it = 2 Y it = 3 0 1 A.99.01.00 IM.71.26.03 ˆP = SM.08.86.06 IH B @.65.06.29 C A SH.03.01.96 Abstinence Intermittent Moderate Drinking Steady Moderate Drinking Intermittent Heavy Drinking Steady Heavy Drinking Fitting additional latent states (S = 6, 7) yielded no additional interpretable drinking behaviors.
Choosing the number of Hidden States 10-fold CV to make out-of-sample predictions; measure deviance N T D = 2 log ˆP(Y it = y it ). i=1 t=11 HMM Markov MTD 0.80 0.75 Deviance 0.70 0.65 3 4 5 6 7 Number of Hidden States 1 2 3 Order 4 5 6 7 8 9 10 Number of Lags
Question 1: Is Moderate Drinking OK? Question: If the hidden states are persistent, can a subject drink moderately, and not resort to heavy drinking soon after? Define states 4 and 5 as Relapse States. Probability of Avoiding Relapse as a Function of Time 1.0 0.8 Initial State = 1 (A) Initial State = 2 (IM) Initial State = 3 (SM) Probability 0.6 0.4 0.2 0.0 0 25 50 75 100 125 150 175 Day
Question 2: What is a Relapse? Currently, there is no universally agreed upon operational definition of relapse. Furthermore, different definitions can have an impact on the estimates of treatments (Maisto, et. al, 2003). Any drink of alcohol A day of heavy drinking Four consecutive drinking days (any amount of alcohol) Any drink of alcohol that follows at least 4 days of abstinence The HMM offers a new data-based definition: Any time point at which a subject has a high probability of being in hidden state 4 or 5 ( Intermittent Heavy Drinking or Steady Heavy Drinking ). Estimate the most likely hidden state sequence for each subject using the Viterbi algorithm.
Most Likely Sequence 1 Subject 34 3 5(SH) 4(IH) Y it 2 3(SM) Latent State 2(IM) 1 1(A) 0 50 100 150 day
Most Likely Sequence (2) Subject 126 3 5(SH) 4(IH) Y it 2 3(SM) Latent State 2(IM) 1 1(A) 0 50 100 150 day
A More Complex HMM Incorporate Covariates, possibly time-varying Random Effects Missing data, assuming MAR
The Model For hidden state transition probabilities, use a multinomial logit model, where P(H it = s H (i,t 1) = r, X it, β) = exp(xq it β rs ) k exp(xq it β rk ). β rs0 N(µ rs, σ rs ). For observation probabilities, use an ordinal probit model, where P(y sj = 1) P(y sj = 2) P(y sj = 3) x P 4 2 0 sj β s 2 4 γ s1 γ s2
The Hidden State Transition Matrix The hidden state transition matrix parameters are organized as follows (for S = 3 hidden states): 1 2 3 1 {0} (0,0,...,) {β 120i } (β 121,β 122,...) {β 130i } (β 131,β 132,...) 2 {0} (0,0,...,) {β 220i } (β 221,β 222,...) {β 230i } (β 231,β 232,...) 3 {0} (0,0,...,) {β 320i } (β 321,β 322,...) {β 330i } (β 331,β 332,...) where braces {β} denote a set of random effects, and the rest are fixed effects.
The Data The outcome (N = 240 subjects and T = 168 days) is distributed as follows: Y % 1 68 2 7 3 8 Missing 17 Total 100
Covariates This clinical trial, conducted at UPenn s Treatment Research Center, had 6 arms: treatment/control for Naltrexone, and two therapies vs. control. In the hidden state transition matrix, we include: 1. Treatment (Naltrexone) 2. Therapy 1 3. Therapy 2 4. Female 5. Time In the observation model, we include: 1. Weekend indicator 2. Past Drinking Behavior
The Gibbs Sampler 1. Initialize the parameters θ = (β, η, γ, π, µ, σ). 2. H Y obs, θ from its full conditional distribution by evaluating the likelihood using the forward recursion, and then using a stochastic backward recursion for all subjects i = 1, 2,..., N (Scott, 2002). 3. β H from its posterior using Scott s DAFE algorithm (2007), which involves augmented variables and a Metropolis-Hastings step, or using a random-walk Metropolis step. 4. µ β, σ from their full conditional distributions, assuming flat or weakly informative priors (Gelman, forthcoming). 5. σ β, µ from their full conditional distributions. 6. Y mis H, η, γ assuming it is missing at random (MAR) using the current batch of parameters. 7. γ H, Y obs, Y mis, η using Cowles (1996) random-walk Metropolis-Hastings step. 8. η H, Y obs, Y mis, γ in the standard data augmentation way (Albert and Chib 1993). 9. π H from its full conditional Dirichlet distribution. 10. Repeat steps 2-9 for g = 2,..., G.
Characterizing the Fit: S = 3 ˆπ = (.94,.04,.02) ˆQ =.98.01.01.69.28.03.37.02.61 ˆP =.99.01.00.24.73.03.03.00.97
The Treatment Effect (Treat = Red, Control = Black) Q(1,1) Q(1,2) Q(1,3) Density Density Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Q(2,1) Q(2,2) Q(2,3) Density Density Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Q(3,1) Q(3,2) Q(3,3) Density Density Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Missing Data Subject 183 3 Drinks 2 1 60 70 80 90 100 Day
Hidden States Posterior Distribution Subject 183 1.0 Drinks 0.5 0.0 60 70 80 90 100 Day
Missing Data Posterior Distribution Subject 183 3 Drinks 2 1 60 70 80 90 100 Day
Missing Data Subject 61 3 Drinks 2 1 0 50 100 150 Day
Hidden States Posterior Distribution Subject 61 1.0 Drinks 0.5 0.0 0 50 100 150 Day
Missing Data Posterior Distribution Subject 61 3 Drinks 2 1 0 50 100 150 Day
An HMM is a model with a rich structure that can capture complex drinking behaviors as they evolve through time. It corresponds to a well-known theoretical model for relapse, the cognitive-behavioral model of relapse. We can (1) assess the danger of moderate drinking, and (2) define relapse in a data-based way. We can measure treatment effects. We can fit the model to subjects with incomplete data, and we can incorporate random effects.
Thanks! www.stat.columbia.edu/ shirley