Hidden Markov Models. Terminology and Basic Algorithms

Transcription

1 Hidden Markov Models Terminology and Basic Algorithms

2 Motivation We make predictions based on models of observed data (machine learning). A simple model is that observations are assumed to be independent and identically distributed (iid)... but this assumption is not always the best, fx (1) measurements of weather patterns, (2) daily values of stocks, (3) acoustic features in successive time frames used for speech recognition, (4) the composition of texts, (5) the composition of DNA, or...

3 Markov Models If the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e. then the chain of observations is a 1st-order Markov chain, and the joint-probability of a sequence of N observations is If the distributions p(xn xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain...

4 The model, i.e. p(xn xn-1): A sequence of observations: Markov Models If the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e. then the chain of observations is a 1st-order Markov chain, and the joint-probability of a sequence of N observations is If the distributions p(xn xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain...

5 The model, i.e. p(xn xn-1): A sequence of observations: Markov Models If the n'th observation in a chain of observations is influenced only by the n-1'th observation, i.e. Extension A higher Markov chain then the chain of observations is aorder 1st-order Markov chain, and the joint-probability of a sequence of N observations is If the distributions p(xn xn-1) are the same for all n, then the chain of observations is an homogeneous 1st-order Markov chain...

6 Hidden Markov Models What if the n'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable? Latent values H H L L H Observations If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

7 Hidden Markov Models What if the n'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable? Latent values Markov Model H Hidden Markov Model H L L H Observations If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

8 Computational problems Hidden Markov Models Determine the likelihood of the sequence of observations What if the n'th observation in a chain of observations is influenced Predict the next observation in hidden) the by a corresponding latent (i.e. variable? sequence of observations Latent values Markov Model Find the most likely underlying H H L L H explanation of the sequence of observation Hidden Markov Model Observations If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

9 Hidden Markov Models What if the n'th observation in a chain of observations is influenced The predictive distribution by a corresponding latent (i.e. hidden) variable? p(xn+1 x1,...,xn) Latent values Markov Model H Lon alll for observation xn+1 can be shownhto depend previous observations, i.e. the sequence of observations is not a Markov chain of any order... Hidden Markov Model Observations H If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

10 Hidden Markov Models What if the n'th observation in a chain of observations is influenced Thevariable? joint distribution by a corresponding latent Observations Markov Model H Hidden Markov Model H L L H Latent values If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

11 Hidden Markov Models Emission probabilities What if the n'th observation in a chain of observations is influenced Transition probabilities Thevariable? joint distribution by a corresponding latent Observations Markov Model H Hidden Markov Model H L L H Latent values If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)

12 Transition probabilities Notation: In Bishop, the latent variables zn are discrete variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3... Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector π... The probability of going from state j to state k is: The probability of state k being the initial state is:

13 Transition probabilities Notation: In Bishop, the latent variables zn are discrete variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3... Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector π... The probability of going from state j to state k is: The probability of state k being the initial state is:

14 Transition Probabilities The transition probabilities: Notation: The latent variables zn are discrete multinomial variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3... Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector π... The probability of going from state j to state k is: The probability of state k being the initial state is:

15 State transition diagram Transition Probabilities Notation: The latent variables zn are discrete multinomial variables, e.g. if zn = (0,0,1) then the model in step n is in state k=3... Transition probabilities: If the latent variables are discrete with K states, the conditional distribution p(zn zn-1) is a K x K table A, and the marginal distribution p(z1) describing the initial state is a K vector π... The probability of going from state j to state k is: The probability of state k being the initial state is:

16 Emission probabilities Emission probabilities: The conditional distributions of the observed variables p(xn zn) from a specific state If the observed values xn are discrete (e.g. D symbols), the emission probabilities Ф is a KxD table of probabilities which for each of the K states specifies the probability of emitting each observable...

17 Emission probabilities Emission probabilities: The conditional distributions of the observed variables p(xn zn) from a specific state If the observed values xn are discrete (e.g. D symbols), the emission probabilities Ф is a KxD table of probabilities which for each of the K states specifies the probability of emitting each observable...

18 Emission probabilities Emission probabilities: The conditional distributions of the observed variables p(xn zn) from a specific state znk = 1 iff the n'th latent variable in the If the observed values xn are discrete (e.g. D symbols), the emission sequence is in state k, otherwise it is probabilities Ф is a KxD table of probabilities which for each of the K 0, i.e. the product just picks the states specifies the probability of emitting each observable... emission probabilities corresponding to state k...

19 HMM joint probability distribution Observables: Latent states: Model parameters: If A and Ф are the same for all n then the HMM is homogeneous

20 HMM joint probability distribution Observables: Latent states: Model parameters: If A and Ф are the same for all n then the HMM is homogeneous

21 HMMs as a generative model A HMM generates a sequence of observables by moving from latent state to latent state according to the transition probabilities and emitting an observable (from a discrete set of observables, i.e. a finite alphabet) from each latent state visited according to the emission probabilities of the state... Model M: A run follows a sequence of states: H H L L H And emits a sequence of symbols:

22 HMMs as a generative model A HMM generates a sequence of observables by moving from latent state to latent state according to the transition probabilities and emitting an observable (from a discrete set of observables, i.e. a finite alphabet) from each latent state visited according to the emission probabilities of the state... Model M: A run follows a sequence of states: H H L L H And emits a sequence of symbols: End A special End-state can be added to generate finite output

23 Using HMMs Determine the likelihood of the sequence of observations Predict the next observation in the sequence of observations Find the most likely underlying explanation of the sequence of observation

25 Using HMMs Determine the likelihood of the sequence of observations Predict the next observation in the sequence of observations Find the most likely underlying explanation of the sequence of observation The sum has KN terms, but it can be computed in O(K2N) time...

26 The forward-backward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn

27 The forward-backward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Using α(zn) and β(zn) we get the likelihood of the observations as:

28 The forward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn

29 The α-recursion

30 The α-recursion

31 The forward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn Recursion:

32 The forward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn Recursion: Basis:

33 The forward algorithm α(zn) is the joint probability of observing x1,...,xn and being in state zn Recursion: Basis: Takes time O(K2N) and space O(KN) using memorization

34 The backward algorithm β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn

35 The β-recursion

36 The β-recursion

37 The backward algorithm β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Recursion:

38 The backward algorithm β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Recursion: Basis:

39 The backward algorithm β(zn) is the conditional probability of future observation xn+1,...,xn assuming being in state zn Recursion: Basis: Takes time O(K2N) and space O(KN) using memorization

41 Predicting the next observation

44 Using HMMs Determine the likelihood of the sequence of observations Predict the next observation in the sequence of observations Find the most likely underlying explanation of the sequence of observation The Viterbi algorithm: Finds the most probable sequence of states generating the observations...

45 The Viterbi algorithm ω(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn Intuition: Find the longest path from column 1 to column n, where length is its total probability, i.e. the probability of the transitions and emissions along the path... h

46 The Viterbi algorithm ω(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn Recursion:

47 The Viterbi algorithm ω(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn Recursion: Basis: Takes time O(K2N) and space O(KN) using memorization

48 The Viterbi algorithm ω(zn) is the probability of the most likely sequence of states z1,...,zn generating the observations x1,...,xn Recursion: Basis: The path itself can be retrieve in time O(KN) by backtracking Takes time O(K2N) and space O(KN) using memorization

49 Summary Introduced hidden Markov models (HMMs) The forward-backward algorithms for determining the likelihood of a sequence of observations, and predicting the next observation in a sequence of observations. The Viterbi-algorithm for finding the most likely underlying explanation (sequence of latent states) of a sequence of observation Next: How to implement the basic algorihtms (forward, backward, and Viterbi) in a numerically sound manner.