Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4 by Dr Philip Jackson Discrete & continuous HMMs - Gaussian & other distributions Gaussian mixtures - Revised B-W formulae Implementing B-W re-estimation - Forward-backward algorithm - Accumulation & update Summary http://www.ee.surrey.ac.uk/personal/p.jackson/tutorial/
Discrete HMM, λ (a) (b) (c) Initial-state probabilities, π = {π i } = {P (x 1 = i)} for 1 i N; State-transition probabilities, A = {a ij } = {P (x t = j x t 1 = i)} for 1 i, j N; Discrete output probabilities, B = {b i (k)} = {P (o t = k x t = i)} for 1 i N and 1 k K. a 11 a 22 a 33 a 44 π 1 a 12 a 23 a 34 a 45 1 2 3 4 b 1(o 1) b 1(o 2) b 2(o 3) b 3(o 5) b 3(o 4) b 4(o 6) o 1 o 2 o 3 o 4 o 5 o 6 producing discrete observations with a state sequence X = { 1, 1, 2, 3, 3, 4 }. 2
Discrete output pdfs Discretised observations 3
Continuous HMM, λ (a) (b) (c) Initial-state probabilities, π = {π i } = {P (x 1 = i)} for 1 i N; State-transition probabilities, A = {a ij } = {P (x t = j x t 1 = i)} for 1 i, j N; Continuous output probabilities, B = {b i (o t )} = {P (o t x t = i)} for 1 i N, where the output probability for each state, b i (o t ) = f (o t ; κ i ), (1) is a function of the observations f(o t ) that depends some model parameters κ i. 4
Gaussian output pdfs Univariate Gaussian (scalar observations) b i (o t ) = 1 [ exp (o t µ i ) 2 ]. 2πΣi 2Σ i Multivariate Gaussian (vector observations) b i (o t ) = 1 exp (2π) K Σ i [ 1 2 (o t µ i )Σ 1 i (o t µ i ) ], where K here refers to the dimensionality of the observation space. 5
Other distributions probability 0.1 0.05 Actual Uniform Expon. Poisson Normal Gamma 10 1 10 2 0 0 5 10 15 20 duration (frame) 0 5 10 15 20 duration (frame) Various functions on (left) linear and (right) logarithmic scales. Exponential Ricean Poisson Rayleigh Gamma Beta Log-normal student-t Gaussian mixture Cauchy 6
Gaussian mixture pdfs Univariate Gaussian mixture b i (o t ) = = M m=1 M m=1 c im N (o t ; µ im, Σ im ) [ c im exp (o t µ im ) 2 ], (2) 2πΣim 2Σ im where M is the number of mixture components in the Gaussian mixture (aka. M-mix), and M m=1 c im = 1. Multivariate Gaussian mixture b i (o t ) = M m=1 c im N (o t ; µ im, Σ im ). (3) 7
Revised Baum-Welch formulae Likelihood of mixture occupation γ t (j, m) = α t (j) c jm N ( ) o t ; µ jm, Σ jm βt (j) P (O λ) (4) where and, as before, α t (j) = N i=1 α t (i)a ij γ t (j) = α t(j) β t (j) P (O λ) = M m=1 γ t (j, m). (5) 8
Re-estimation of mixture parameters For the means ˆµ jm = Tt=1 γ t (j, m)o t Tt=1 γ t (j, m), (6) for the variances Tt=1 γ t (j, m)(o t µ jm )(o t µ jm ) ˆΣ jm = Tt=1, (7) γ t (j, m) and for the mixture weights ĉ jm = Tt=1 γ t (j, m) Tt=1 γ t (j). (8) 9
Implementating B-W re-estimation Overview Forward: compute likelihoods α t (i) Backward: compute likelihoods β t (i) Parallel: compute γ t (i) (occupation) compute ξ t (i, j) (transition) accumulate π i, a ij and ā i accumulate µ i, Σ i and b i Repeat for all files in the training set, r {1, 2,..., R}. 10
Forward procedure (Problem 1) 1. Initially, α 1 (i) = π i b i (o 1 ), for 1 i N; 2. For t = 2, 3,..., T, α t (j) = [ Ni=1 α t 1 (i) a ij ] bj (o t ), for 1 j N; 3. Finally, P (O λ) = N i=1 α T (i). Backward procedure (Problem 1) 1. Initially, β T (i) = 1, for 1 i N; 2. For t = T 1, T 2,..., 1, β t (i) = N j=1 a ij b j (o t+1 ) β t+1 (j), for 1 i N; 3. Finally, P (O λ) = N i=1 π i b i (o 1 ) β 1 (i). 11
Parallel procedure Occupation likelihoods: Transition likelihoods: Transition accumulators: γ t (j) = α t(j) β t (j) P (O λ) ξ t (i, j) = α t 1(i) a ij b j (o t ) β t (j) P (O λ) π i = π i + γ 1 (i) a ij = ȧ ij + T t=2 ξ t (i, j) ā i = ā i + T t=2 γ t (i) where ẋ denotes to previous value of accumulator x. 12
Parallel (continued) Output accumulators (eqs. 11 & 12, from tut. 3): µ i = µ i + T t=1 γ t (i)o t Σ i = Σ i + T t=1 γ t (i)(o t µ i )(o t µ i ) b i = b i + T t=1 γ t (i) Repeat For all files in the training set: 1. recompute the forward and backward likelihoods; 2. recompute the occupation and transition likelihoods; 3. increment the accumulators. 13
Update (Problem 3) Finally, we update the models: ˆπ i = 1 R R π i r=1 â ij = Rr=1 a ij Rr=1 ā i ˆb i ˆµ i = ˆΣ i = Rr=1 µ i Rr=1 b i Rr=1 Σ i Rr=1 b i 14
Today s summary Discrete & continuous HMMs Discrete output pdfs, b i (k) Continuous output pdfs, b i (o t ) Gaussian & other distributions Gaussian mixtures, m c im N (µ im, Σ im ) Revised B-W formulae Implementing B-W re-estimation Forward-backward algorithm Accumulation & update 15
Next time A case study and some illustrative examples More implementation details Extensions of the HMM framework Homework For your data: look at the distributions for each of the states in your model and choose a number of mixture components; re-train your model, checking for increased likelihood with new model. As before, use Viterbi to test whether the updated parameters change the state alignment. 16