Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4. by Dr Philip Jackson

Similar documents
Hardware Implementation of Probabilistic State Machine for Word Recognition

How To Solve A Save Problem With Time T

Advanced Signal Processing and Digital Noise Reduction

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

1 Maximum likelihood estimation

Linear Threshold Units

Gaussian Conjugate Prior Cheat Sheet

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Logistic Regression (1/24/13)

Solutions to Exam in Speech Signal Processing EN2300

Speech Recognition on Cell Broadband Engine UCRL-PRES

Web-based Supplementary Materials for. Modeling of Hormone Secretion-Generating. Mechanisms With Splines: A Pseudo-Likelihood.

Review of Random Variables

HMM : Viterbi algorithm - a toy example

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

Derivative Approximation by Finite Differences

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION

Statistics Graduate Courses

Statistical Machine Learning

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

Coding and decoding with convolutional codes. The Viterbi Algor

Java Modules for Time Series Analysis

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

PREDICTION OF FINANCIAL TIME SERIES WITH HIDDEN MARKOV MODELS

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

extreme Datamining mit Oracle R Enterprise

Data a systematic approach

Monte Carlo Simulation

Probability Calculator

Section 13.5 Equations of Lines and Planes

A Multi-Model Filter for Mobile Terminal Location Tracking

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

Nonnested model comparison of GLM and GAM count regression models for life insurance data

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Linear Classification. Volker Tresp Summer 2015

Hidden Markov Models 1

Problem set 2, Part 2: Generalized Roy Model 2 Factor, no normality

Probabilistic Methods for Time-Series Analysis

Bayesian networks - Time-series models - Apache Spark & Scala

Discrete Hidden Markov Model Training Based on Variable Length Particle Swarm Optimization Algorithm

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Software for Distributions in R

1 The Brownian bridge construction

How To Test Granger Causality Between Time Series

Master s Theory Exam Spring 2006

Predict the Popularity of YouTube Videos Using Early View Data

Hidden Markov Models with Applications to DNA Sequence Analysis. Christopher Nemeth, STOR-i

Tutorial on Markov Chain Monte Carlo

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Bayesian Statistics: Indian Buffet Process

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

Intrusion Detection via Machine Learning for SCADA System Protection

Package EstCRM. July 13, 2015

Sections 2.11 and 5.8

STATISTICAL ANALYSIS TOWARDS THE IDENTIFICATION OF ACCURATE PROBABILITY DISTRIBUTION MODELS FOR THE COMPRESSIVE STRENGTH OF CONCRETE

STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400&3400 STAT2400&3400 STAT2400&3400 STAT2400&3400 STAT3400 STAT3400

Maximum Likelihood Estimation

Math 461 Fall 2006 Test 2 Solutions

Methods for big data in medical genomics

Poisson Models for Count Data

Machine Learning.

Signal Detection C H A P T E R SIGNAL DETECTION AS HYPOTHESIS TESTING

Ericsson T18s Voice Dialing Simulator

Vilnius University. Faculty of Mathematics and Informatics. Gintautas Bareikis

Probability and Random Variables. Generation of random variables (r.v.)

Probabilistic concepts of risk classification in insurance

STA 4273H: Statistical Machine Learning

Natural Conjugate Gradient in Variational Inference

Credit Risk Models: An Overview

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

Conditional Tail Expectations for Multivariate Phase Type Distributions

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Introduction to Algorithmic Trading Strategies Lecture 2

Introduction to Probability

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Learning Visual Behavior for Gesture Analysis

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

SAS Software to Fit the Generalized Linear Model

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Log Analysis-Based Intrusion Detection via Unsupervised Learning. Pingchuan Ma

Lecture Notes 1. Brief Review of Basic Probability

A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking

Slides for Risk Management VaR and Expected Shortfall

Slides for Risk Management

4 Sums of Random Variables

Monte Carlo Methods and Models in Finance and Insurance

Random Variate Generation (Part 3)

Lecture 4: Partitioned Matrices and Determinants

Chapter 6: Point Estimation. Fall Probability & Statistics

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Grey Brownian motion and local times

Exploratory Data Analysis Using Radial Basis Function Latent Variable Models

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

Transcription:

Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4 by Dr Philip Jackson Discrete & continuous HMMs - Gaussian & other distributions Gaussian mixtures - Revised B-W formulae Implementing B-W re-estimation - Forward-backward algorithm - Accumulation & update Summary http://www.ee.surrey.ac.uk/personal/p.jackson/tutorial/

Discrete HMM, λ (a) (b) (c) Initial-state probabilities, π = {π i } = {P (x 1 = i)} for 1 i N; State-transition probabilities, A = {a ij } = {P (x t = j x t 1 = i)} for 1 i, j N; Discrete output probabilities, B = {b i (k)} = {P (o t = k x t = i)} for 1 i N and 1 k K. a 11 a 22 a 33 a 44 π 1 a 12 a 23 a 34 a 45 1 2 3 4 b 1(o 1) b 1(o 2) b 2(o 3) b 3(o 5) b 3(o 4) b 4(o 6) o 1 o 2 o 3 o 4 o 5 o 6 producing discrete observations with a state sequence X = { 1, 1, 2, 3, 3, 4 }. 2

Discrete output pdfs Discretised observations 3

Continuous HMM, λ (a) (b) (c) Initial-state probabilities, π = {π i } = {P (x 1 = i)} for 1 i N; State-transition probabilities, A = {a ij } = {P (x t = j x t 1 = i)} for 1 i, j N; Continuous output probabilities, B = {b i (o t )} = {P (o t x t = i)} for 1 i N, where the output probability for each state, b i (o t ) = f (o t ; κ i ), (1) is a function of the observations f(o t ) that depends some model parameters κ i. 4

Gaussian output pdfs Univariate Gaussian (scalar observations) b i (o t ) = 1 [ exp (o t µ i ) 2 ]. 2πΣi 2Σ i Multivariate Gaussian (vector observations) b i (o t ) = 1 exp (2π) K Σ i [ 1 2 (o t µ i )Σ 1 i (o t µ i ) ], where K here refers to the dimensionality of the observation space. 5

Other distributions probability 0.1 0.05 Actual Uniform Expon. Poisson Normal Gamma 10 1 10 2 0 0 5 10 15 20 duration (frame) 0 5 10 15 20 duration (frame) Various functions on (left) linear and (right) logarithmic scales. Exponential Ricean Poisson Rayleigh Gamma Beta Log-normal student-t Gaussian mixture Cauchy 6

Gaussian mixture pdfs Univariate Gaussian mixture b i (o t ) = = M m=1 M m=1 c im N (o t ; µ im, Σ im ) [ c im exp (o t µ im ) 2 ], (2) 2πΣim 2Σ im where M is the number of mixture components in the Gaussian mixture (aka. M-mix), and M m=1 c im = 1. Multivariate Gaussian mixture b i (o t ) = M m=1 c im N (o t ; µ im, Σ im ). (3) 7

Revised Baum-Welch formulae Likelihood of mixture occupation γ t (j, m) = α t (j) c jm N ( ) o t ; µ jm, Σ jm βt (j) P (O λ) (4) where and, as before, α t (j) = N i=1 α t (i)a ij γ t (j) = α t(j) β t (j) P (O λ) = M m=1 γ t (j, m). (5) 8

Re-estimation of mixture parameters For the means ˆµ jm = Tt=1 γ t (j, m)o t Tt=1 γ t (j, m), (6) for the variances Tt=1 γ t (j, m)(o t µ jm )(o t µ jm ) ˆΣ jm = Tt=1, (7) γ t (j, m) and for the mixture weights ĉ jm = Tt=1 γ t (j, m) Tt=1 γ t (j). (8) 9

Implementating B-W re-estimation Overview Forward: compute likelihoods α t (i) Backward: compute likelihoods β t (i) Parallel: compute γ t (i) (occupation) compute ξ t (i, j) (transition) accumulate π i, a ij and ā i accumulate µ i, Σ i and b i Repeat for all files in the training set, r {1, 2,..., R}. 10

Forward procedure (Problem 1) 1. Initially, α 1 (i) = π i b i (o 1 ), for 1 i N; 2. For t = 2, 3,..., T, α t (j) = [ Ni=1 α t 1 (i) a ij ] bj (o t ), for 1 j N; 3. Finally, P (O λ) = N i=1 α T (i). Backward procedure (Problem 1) 1. Initially, β T (i) = 1, for 1 i N; 2. For t = T 1, T 2,..., 1, β t (i) = N j=1 a ij b j (o t+1 ) β t+1 (j), for 1 i N; 3. Finally, P (O λ) = N i=1 π i b i (o 1 ) β 1 (i). 11

Parallel procedure Occupation likelihoods: Transition likelihoods: Transition accumulators: γ t (j) = α t(j) β t (j) P (O λ) ξ t (i, j) = α t 1(i) a ij b j (o t ) β t (j) P (O λ) π i = π i + γ 1 (i) a ij = ȧ ij + T t=2 ξ t (i, j) ā i = ā i + T t=2 γ t (i) where ẋ denotes to previous value of accumulator x. 12

Parallel (continued) Output accumulators (eqs. 11 & 12, from tut. 3): µ i = µ i + T t=1 γ t (i)o t Σ i = Σ i + T t=1 γ t (i)(o t µ i )(o t µ i ) b i = b i + T t=1 γ t (i) Repeat For all files in the training set: 1. recompute the forward and backward likelihoods; 2. recompute the occupation and transition likelihoods; 3. increment the accumulators. 13

Update (Problem 3) Finally, we update the models: ˆπ i = 1 R R π i r=1 â ij = Rr=1 a ij Rr=1 ā i ˆb i ˆµ i = ˆΣ i = Rr=1 µ i Rr=1 b i Rr=1 Σ i Rr=1 b i 14

Today s summary Discrete & continuous HMMs Discrete output pdfs, b i (k) Continuous output pdfs, b i (o t ) Gaussian & other distributions Gaussian mixtures, m c im N (µ im, Σ im ) Revised B-W formulae Implementing B-W re-estimation Forward-backward algorithm Accumulation & update 15

Next time A case study and some illustrative examples More implementation details Extensions of the HMM framework Homework For your data: look at the distributions for each of the states in your model and choose a number of mixture components; re-train your model, checking for increased likelihood with new model. As before, use Viterbi to test whether the updated parameters change the state alignment. 16