Hardware Implementation of Probabilistic State Machine for Word Recognition



Similar documents
Ericsson T18s Voice Dialing Simulator

School Class Monitoring System Based on Audio Signal Processing

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Advanced Signal Processing and Digital Noise Reduction

Myanmar Continuous Speech Recognition System Based on DTW and HMM

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

Artificial Neural Network for Speech Recognition

Emotion Detection from Speech

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Establishing the Uniqueness of the Human Voice for Security Applications

L9: Cepstral analysis

Introduction to Algorithmic Trading Strategies Lecture 2

Comp Fundamentals of Artificial Intelligence Lecture notes, Speech recognition

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

Speech Analysis for Automatic Speech Recognition

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Solutions to Exam in Speech Signal Processing EN2300

Cloud User Voice Authentication enabled with Single Sign-On framework using OpenID

COMPARATIVE STUDY OF RECOGNITION TOOLS AS BACK-ENDS FOR BANGLA PHONEME RECOGNITION

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Application of Hidden Markov Model in Credit Card Fraud Detection

B. Raghavendhar Reddy #1, E. Mahender *2

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

HMM : Viterbi algorithm - a toy example

Cell Phone based Activity Detection using Markov Logic Network

Marathi Interactive Voice Response System (IVRS) using MFCC and DTW

Multi-factor Authentication in Banking Sector

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

Hidden Markov Models 1

From Concept to Production in Secure Voice Communications

KEYWORD SPOTTING USING HIDDEN MARKOV MODELS. by Şevket Duran B.S. in E.E., Boğaziçi University, 1997

Security in Voice Authentication

CONATION: English Command Input/Output System for Computers

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

HMM Based Enhanced Security System for ATM Payment [AICTE RPS sanctioned grant project, Affiliated to University of Pune, SKNCOE, Pune]

THE goal of Speaker Diarization is to segment audio

Weighting and Normalisation of Synchronous HMMs for Audio-Visual Speech Recognition

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Developing an Isolated Word Recognition System in MATLAB

MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming

Gender Identification using MFCC for Telephone Applications A Comparative Study

Java Modules for Time Series Analysis

Speech Signal Processing: An Overview

Coding and decoding with convolutional codes. The Viterbi Algor

Computation of Forward and Inverse MDCT Using Clenshaw s Recurrence Formula

Design and Analysis of Parallel AES Encryption and Decryption Algorithm for Multi Processor Arrays

Probability and Random Variables. Generation of random variables (r.v.)

METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS.

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Efficient on-line Signature Verification System

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Reinforcement Learning

New Pulse Width Modulation Technique for Three Phase Induction Motor Drive Umesha K L, Sri Harsha J, Capt. L. Sanjeev Kumar

IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

FAST MIR IN A SPARSE TRANSFORM DOMAIN

Hidden Markov Models

Finding soon-to-fail disks in a haystack

Course: Model, Learning, and Inference: Lecture 5

THE problem of tracking multiple moving targets arises

Probabilistic Credit Card Fraud Detection System in Online Transactions

Speech recognition for human computer interaction

Accurate and robust image superresolution by neural processing of local image representations

Bayesian networks - Time-series models - Apache Spark & Scala

Implementation of the Discrete Hidden Markov Model in Max / MSP Environment

Image Compression through DCT and Huffman Coding Technique

Automated Model Based Testing for an Web Applications

A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking

Spam Detection and the Types of

A Content based Spam Filtering Using Optical Back Propagation Technique

Statistical Machine Learning from Data

How To Fix A 3 Bit Error In Data From A Data Point To A Bit Code (Data Point) With A Power Source (Data Source) And A Power Cell (Power Source)

Accelerometer Based Real-Time Gesture Recognition

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

T Automatic Speech Recognition: From Theory to Practice

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

A Hidden Markov Model Approach to Available Bandwidth Estimation and Monitoring

Schedule Risk Analysis Simulator using Beta Distribution

Machine Learning with MATLAB David Willingham Application Engineer

Finding Liveness Errors with ACO

A New Digital Encryption Scheme: Binary Matrix Rotations Encryption Algorithm

Turkish Radiology Dictation System

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

Factor Graphs and the Sum-Product Algorithm

Transcription:

IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2 Dr. K.A Narayanankutty 1,2 Amrita Vishwavidyapeetham,Coimbatore, India Abstract Probabilistic Finite State Machines (PFSM) are used in feature Extraction, training and testing which are the most important steps in any speech recognition system. An important PFSM is the Hidden Markov Model which is dealt in this paper. This paper proposes a hardware architecture for the forward-backward algorithm as well as the Viterbi Algorithm used in speech recognition based on Hidden Markov models. The feature extraction and the training process is done using Hidden Markov Model (HTK) tool kit. The testing of the utterance of a new word is done using the hardware proposed in the paper. Finally the results obtained from the HTK tool kit and the hardware is compared. Keywords HMM (Hidden Markov Model), HTK (Hidden Markov Model Toolkit) I. Introduction Markov models can model random processes that can be in discrete states as a function of time. Suppose a system has different states, it can change from one state to another and the probability of transition depends only on the state of the system at time t[1]. Let s[t]denotes the state of the system at time t, and initial state is selected according to a probability. =P( S[1]= i); i=1,2,,s. The probability of transition depends only on the current state. This property is called the Markov property. P(s(t+1)) = j s[t]=i Here the observation sequences are discrete in nature, but in the case of speech recognition the observations are continuous vectors. [1-2]. An Hidden Markov Model (HMM ) can be represented by three probability measures namely 1. -The initial probability matrix; 1 i N ; N is the number of states. P(q 1 = S i ] ; 1 i N 2. aij -State transition probability distribution a ij =P(q t+1 = S j q t =S i ); 1 i, j N. 3. B The observation probability distribution B=b j (O) =P(O t at t q t = S j ) ; 1 j N;1 T O; So an HMM model can be represented as λ=(a,b,п). Problem: 3 This is about how the parameters are adjusted to maximize the probability of observation sequence called the Learning problem. B. Solutions for the Basic Problems 1. Evaluation Problem The evaluation problem can be solved by the forward backward procedure. (i). Forward Procedure Consider the forward variable αt(i) defined as α t (i) =P(O 1, O 2,, O t, Q t = S i λ) i.e the probability of the partial observation sequence and a particular state at time t, given the model λ [1-2]. Various steps involved are: 1. Initialization α t (i) =П i b i (O 1 ) 1 i N 2. Induction 3. Termination P (O λ) = A hardware architecture for the above algorithm is proposed in fig. 1. Here a discrete hidden Markov model is considered. The number of states is taken as 2. A. The Basic problems for HMMs The three problems are: Problem: 1 Given an observation sequence O = O 1,O 2,,O T and a model λ=(a,b,п) how can we compute the probability of observation sequence for a given model. This problem is called the Evaluation problem. Problem: 2 Given an observation sequence and the model how do we choose a corresponding state sequence q=q 1, q 2,,q T which best explains the observations. This is called the Decoding problem. 38 International Journal of Electronics & Communication Technology www.iject.org

ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 Fig. 1: Forward Procedure www.iject.org International Journal of Electronics & Communication Technology 39

IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Fig. 2: Backward Procedure 40 International Journal of Electronics & Communication Technology www.iject.org

ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) (ii). Backward Procedure Consider a backward variable βt(i) β t (i) =P(O t+1,o t+2,,o t q t =S i,λ) This variable again gives the probability of the partial observation sequence from t+1 to the end,given state S i at time t and the model λ[1-2]. Various steps are: (a). Initialization IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 maximized using an iterative procedure known as the Baum-Welch estimation method. Here an initial estimate of the parameters is taken, then by the iterative procedure a better estimate is obtained such that the new model maximizes the probability of observation sequence. II. Speech Recognition The basic blocks in a speech recognition system involves the following: (b). Induction Speech Signal SPEECH ANALYSIS FEATURE EXTRACTION TRAINING FEATURE VECTORS This is similar to the forward procedure. Here the observation sequences are accounted from time t+1 to time t considering all the possible states. Both the forward and the backward procedure gives the same probability. The architecture for the backward procedure is shown in fig. 2. In this paper the forward and the backward hardware architectures are implemented for a system with two states and two sets of observation symbols.this has been implemented and found out that both forward as well as the backward probabilities are the same. 2. Decoding Problem The solution for the decoding problem can be obtained by the Viterbi Algorithm. Here a single best state sequence, for the given observation sequence can be obtained. The procedure is as follows: (i). Initialization (ii). Recursion (iii). Termination REFERENCE MODELS Fig. 3: Basic Processes in Speech Recognition TESTING RECOGNITION RESULTS Speech waveform is taken as the input and the feature vectors are extracted by the feature extraction process. Feature extraction process involves the following [3]: Framing Speech signal is divided into number of frames and the number of frames depends on the length of the speech signal. Windowing After the framing is done a Hamming window is applied to each frame. This windowing effect reduces the signal discontinuity at the ends of each block. Hamming window can be defined as W(k) = 0.54-0.46Cos(2пk)/K-1. Mel-Cepstrum Method The method used to extract information from each frame is the Mel-Cepstrum Method. The Mel-Cepstrum consists of two methods Mel-scaling and Cepstrum calculation. Mel Spectrum Coefficients With Filterbank The human perception of the frequency content in a speech signal is not linear so there is a need for a mapping scale. The commonly used scale is the Mel scale. This scale is warping a measured frequency of a pitch to a corresponding pitch measured on the Mel scale. The warping from frequency in Hz to frequency in Mel scale is described by the equation (iv). Path Backtracking The Viterbi algorithm is similar to the forward-backward procedure except for the maximization step. 3. Learning Problem The third problem is to determine a method to adjust the model parameters (A,B, ) to maximize the probability of observation sequence for a given model. A model is chosen such that the probability of an observation sequence for a given model is locally The practical warping is done using a triangular Mel scale filterbank which handles the warping from Frequency in Hz to frequency in mel scale. Mel-Cepstrum Coefficients, DCT Discrete Cosine Transform To derive the Mel Cepstrum coefficients the inverse discrete cosine transform will be calculated. Finally from each frame 39 dimensional coefficients are obtained. www.iject.org International Journal of Electronics & Communication Technology 41

IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) A. HMM in isolated word recognition An HMM model is represented by three probability measures namely λ = (A,B, п). In case of speech recognition theory initial probability is considered as п= [1 0 0 N]; N is the number of states. Initially a transition probability matrix is assumed, after the training process a reestimated transition matrix is obtained. The continuous observation probability distribution matrix represents the probability that the state has produced the observed mel frequency cepstrum coefficients. The Gaussian distribution is the commonly used one. The continuous observation probability density function matrix, B can be represented by the mean, μ and variance, Σ.[5]. In this paper a 32-mixture Gaussian distribution is used. Here each state is modeled by a 32 mixture Gaussian distribution [1-3]. The continuous probability distribution can be represented by 1. Initialization for t=1, 1 j N 2. Recursion ;for 2 t T ; 1 j N 3. Termination ; for t=t Here N is the number of states, T is the number of frames for the feature vectors. A Hardware architecture for the Viterbi algorithm is proposed and is shown in the fig. 4.The initial probabilities, observation probabilities and the transition probabilities are stored in 3 different memories and then used for further computation. For the utterances considered here the number of frames taken is 298,number of states taken is 10. M is the number of mixture weights, coefficient. is the weight The observation probabilities can be computed by the following equations: Initially a global mean and variance is calculated for all the states. Then by the training process a better estimate of the mean and variance is obtained which is then used for the subsequent iterative processes [1-2]. A better estimate of mean and variance are obtained by using the following equations: Fig. 4: Viterbi Algorithm B. Implementation Details and Results 1. Speech Database The training and testing of a speech recognition system needs collection of speech utterances. Here the language used is English and the utterances were obtained from the TIMIT database. It consists of about 3000 training files which are speaker independent. The sampling frequency used is 16kHz. After the training process a re-estimated transition matrix is also obtained. So finally the probability measures corresponding to a word model is obtained. In word recognition a model is obtained for each word, the recognizer computes and compares the output for each word model The probability of an observation for a given word model is obtained by the viterbi algorithm [3]. Here log viterbi algorithm is used as it avoids underflow. The log viterbi algorithm is as follows: 42 International Journal of Electronics & Communication Technology 2. Front End Processing Front end features consist of 39 dimensional MFCC coefficients which are extracted using the HTK. 3. Training and Testing Here each word is modeled by a 10 state left to right models with a 32-mixture Gaussian output probability density function for each state. Baum-Welche training is used to estimate the HMM parameters.the trained model was tested for various utterances using HTK. The system was found to have an accuracy of 92%. After training the log probability of each utterance is obtained www.iject.org

ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) by using Viterbi algorithm in HTK tool kit. These obtained probabilities are compared with those obtained by the hardware architecture of Viterbi algorithm. These probabilities are again compared with software results. The hardware architecture for the forward as well as the backward procedure is also proposed. The result obtained from the hardware is compared with the software. The accuracy is found to be 100%. In case of isolated word recognition the feature extraction process as well as the training process is done in HTK toolkit. A hardware architecture for the Viterbi algorithm is also proposed. The hardware result is compared with the results obtained from the HTK tool kit. The architecture has been synthesized in Altera Quartus-II. Altera Cyclone II 2C20 FPGA-EP2C20F484C7 device in the Altera DE-1 board is used to test the hardware. IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 HMM-Based Speech Recognition and VLSI Implementation in Complete System, IEEE Transactions on circuits and systems, Vol. 53, No. 1, January 2006 Table:1 Comparison of HTK and Hardware for finding the word probability UTTERANCES HTK (Log Probability) HARDWARE (Log Probability) 1-63.9009-63.9958 2-62.7388-63.9987 3-63.3772-63.9962 4-74.5460-78.0071 5-64.9717-63.9959 III. Conclusion This work has used Mel-Frequency Cepstrum Coefficient method for feature extraction process, training has been done using the Baum-Welche algorithm, and testing has been done using the Viterbi algorithm. The feature extraction as well as the training has been done using the HTK tool kit. Then the hardware architecture for the Viterbi algorithm is implemented in hardware and is compared against the HTK results. IV. Acknowledgement I express my sincere gratitude to my guide Prof Dr. K.A.Narayanankutty, Professor, Amrita School of Engineering, for his valuable help, timely guidance and supervision throughout my project work. I greatly acknowledge Mr. Govind D Menon, Assistant Professor, CEN Department, Amrita School of Engineering for his constant encouragement and suggestions. References [1] L.R.Rabiner, B.H.Juang, An Introduction to Hidden Markov Models, IEEE ASSP Magazine January 1986. [2] L.R.Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, Vol 77, No. 2, February 1989. [3] E- Hocine Bourouba, Mouldi Bedda, Rafik Djemili, Isolated Words Recognition System Based on Hybrid Approach DTW/GHMM, Informatica 30, 2006, pp. 373 384 [4] S J Melnikoff, S F Quigley, M J Russell, Implementing a Simple Continuous Speech Recognition System on an FPGA, Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 02) 2002 IEEE. [5] Shingo Yoshizawa, Naoya Wada, Noboru Hayasaka, Yoshikazu Miyanaga, Scalable Architecture for Word www.iject.org International Journal of Electronics & Communication Technology 43