Design and Implementation of Speech Recognition System Based on Field Programmable Gate Array

Similar documents
HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

The Virtual Machine Resource Allocation based on Service Features in Cloud Computing Environment

12/7/2011. Procedures to be Covered. Time Series Analysis Using Statgraphics Centurion. Time Series Analysis. Example #1 U.S.

Using Cellular Automata for Improving KNN Based Spam Filtering

Lecture 40 Induction. Review Inductors Self-induction RL circuits Energy stored in a Magnetic Field

A Hybrid AANN-KPCA Approach to Sensor Data Validation

Linear Extension Cube Attack on Stream Ciphers Abstract: Keywords: 1. Introduction

PARTICLE FILTER BASED VEHICLE TRACKING APPROACH WITH IMPROVED RESAMPLING STAGE

How To Calculate Backup From A Backup From An Oal To A Daa

Analyzing Energy Use with Decomposition Methods

Revision: June 12, E Main Suite D Pullman, WA (509) Voice and Fax

Methodology of the CBOE S&P 500 PutWrite Index (PUT SM ) (with supplemental information regarding the CBOE S&P 500 PutWrite T-W Index (PWT SM ))

Capacity Planning. Operations Planning

Genetic Algorithm with Range Selection Mechanism for Dynamic Multiservice Load Balancing in Cloud-Based Multimedia System

The Prediction Algorithm Based on Fuzzy Logic Using Time Series Data Mining Method

A Hybrid Method for Forecasting Stock Market Trend Using Soft-Thresholding De-noise Model and SVM

Pedro M. Castro Iiro Harjunkoski Ignacio E. Grossmann. Lisbon, Portugal Ladenburg, Germany Pittsburgh, USA

PerfCenter: A Methodology and Tool for Performance Analysis of Application Hosting Centers

APPLICATION OF CHAOS THEORY TO ANALYSIS OF COMPUTER NETWORK TRAFFIC Liudvikas Kaklauskas, Leonidas Sakalauskas

Spline. Computer Graphics. B-splines. B-Splines (for basis splines) Generating a curve. Basis Functions. Lecture 14 Curves and Surfaces II

MODEL-BASED APPROACH TO CHARACTERIZATION OF DIFFUSION PROCESSES VIA DISTRIBUTED CONTROL OF ACTUATED SENSOR NETWORKS

MORE ON TVM, "SIX FUNCTIONS OF A DOLLAR", FINANCIAL MECHANICS. Copyright 2004, S. Malpezzi

Anomaly Detection in Network Traffic Using Selected Methods of Time Series Analysis

A 3D Model Retrieval System Using The Derivative Elevation And 3D-ART

An Anti-spam Filter Combination Framework for Text-and-Image s through Incremental Learning

Analysis of intelligent road network, paradigm shift and new applications

A Hybrid Wind-Solar Energy System: A New Rectifier Stage Topology

An Architecture to Support Distributed Data Mining Services in E-Commerce Environments

Cooperative Distributed Scheduling for Storage Devices in Microgrids using Dynamic KKT Multipliers and Consensus Networks

The Rules of the Settlement Guarantee Fund. 1. These Rules, hereinafter referred to as "the Rules", define the procedures for the formation

Selected Financial Formulae. Basic Time Value Formulae PV A FV A. FV Ad

An Ensemble Data Mining and FLANN Combining Short-term Load Forecasting System for Abnormal Days

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS. Exponential Smoothing for Inventory Control: Means and Variances of Lead-Time Demand

HAND: Highly Available Dynamic Deployment Infrastructure for Globus Toolkit 4

A. Jagadeesan 1, T.Thillaikkarasi 2, Dr.K.Duraiswamy 3

A Background Layer Model for Object Tracking through Occlusion

TECNICHE DI DIAGNOSI AUTOMATICA DEI GUASTI. Silvio Simani References

Load Balancing in Internet Using Adaptive Packet Scheduling and Bursty Traffic Splitting

CONTROLLER PERFORMANCE MONITORING AND DIAGNOSIS. INDUSTRIAL PERSPECTIVE

INTERNATIONAL JOURNAL OF STRATEGIC MANAGEMENT

Currency Exchange Rate Forecasting from News Headlines

Modèles financiers en temps continu

CLoud computing has recently emerged as a new

MULTI-WORKDAY ERGONOMIC WORKFORCE SCHEDULING WITH DAYS OFF

Module 4. Single-phase AC circuits. Version 2 EE IIT, Kharagpur

Boosting for Learning Multiple Classes with Imbalanced Class Distribution

Recognition and Classification of Human Behavior in Intelligent Surveillance Systems using Hidden Markov Model

Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds.

SPC-based Inventory Control Policy to Improve Supply Chain Dynamics

Optimization of Nurse Scheduling Problem with a Two-Stage Mathematical Programming Model

Kalman filtering as a performance monitoring technique for a propensity scorecard

Sensor Nework proposeations

Temporal and Spatial Distributed Event Correlation for Network Security

Pocket3D Designing a 3D Scanner by means of a PDA 3D DIGITIZATION

Network Effects on Standard Software Markets: A Simulation Model to examine Pricing Strategies

Pulse-Width Modulation Inverters

Comparative Study of Multicast Authentication Schemes with Application to Wide-Area Measurement System

Scientific Ontology Construction Based on Interval Valued Fuzzy Theory under Web 2.0

Mobile Broadband Rollout Business Case: Risk Analyses of the Forecast Uncertainties

Full-wave rectification, bulk capacitor calculations Chris Basso January 2009

Nonlinearity or Structural Break? - Data Mining in Evolving Financial Data Sets from a Bayesian Model Combination Perspective

An Introductory Study on Time Series Modeling and Forecasting

RC (Resistor-Capacitor) Circuits. AP Physics C

Chapter 7. Response of First-Order RL and RC Circuits

ANALYSIS OF SOURCE LOCATION ALGORITHMS Part I: Overview and non-iterative methods

Time Series. A thesis. Submitted to the. Edith Cowan University. Perth, Western Australia. David Sheung Chi Fung. In Fulfillment of the Requirements

Steps for D.C Analysis of MOSFET Circuits

Ground rules. Guide to the calculation methods of the FTSE Actuaries UK Gilts Index Series v1.9

A Common Neural Network Model for Unsupervised Exploratory Data Analysis and Independent Component Analysis

Applying the Theta Model to Short-Term Forecasts in Monthly Time Series

Auxiliary Module for Unbalanced Three Phase Loads with a Neutral Connection

Optimization Design of the Multi-stage Inventory Management for Supply Chain

CALCULATION OF OMX TALLINN

Index Mathematics Methodology

Gestures for pointing devices in screen-based environments. Florian Weil,

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

Linear methods for regression and classification with functional data

Hill Cipher Modifications: A Detailed Review

Mobile and Ubiquitous Compu3ng. Mul3plexing for wireless. George Roussos.

Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

THE USE IN BANKS OF VALUE AT RISK METHOD IN MARKET RISK MANAGEMENT. Ioan TRENCA *

Levy-Grant-Schemes in Vocational Education

Case Study on Web Service Composition Based on Multi-Agent System

A robust optimisation approach to project scheduling and resource allocation. Elodie Adida* and Pradnya Joshi

INTRODUCTION TO MARKETING PERSONALIZATION. How to increase your sales with personalized triggered s

HEURISTIC ALGORITHM FOR SINGLE RESOURCE CONSTRAINED PROJECT SCHEDULING PROBLEM BASED ON THE DYNAMIC PROGRAMMING

Baumer FWL120 NeuroCheck Edition Art. No: OD106434

Oblique incidence: Interface between dielectric media

Partial Fingerprint Matching

RESOLUTION OF THE LINEAR FRACTIONAL GOAL PROGRAMMING PROBLEM

MODELLING DISTURBANCES IN SYSTEM TRACK RAIL VEHICLE

Fourier Transforms and the -Adic Span of Periodic Binary Sequences

Task is a schedulable entity, i.e., a thread

.A UNIVERSITY'S EDUCATIONAL PROGRAM IN COMPUTER SCIENCE BY GEORGE E. FORSYTHE TECHNICAL REPORT NO. CS39 MAY 18, 1966

arxiv: v1 [cs.sy] 22 Jul 2014

Analogue and Digital Signal Processing. First Term Third Year CS Engineering By Dr Mukhtiar Ali Unar

A New Approach to Linear Filtering and Prediction Problems 1

cooking trajectory boiling water B (t) microwave time t (mins)

Measuring macroeconomic volatility Applications to export revenue data,

Transcription:

Vol. 3, No. 8 Modern Appled Scence Desgn and Implemenaon of Speech Recognon Sysem Based on Feld Programmable Gae Array Haao Zhou Informaon and Communcaon Deparmen Tann Polyechnc Unversy Tann 30060, Chna E-mal: zhouann@26.com Xaoun Han Informaon and Communcaon Deparmen Tann Polyechnc Unversy Tann 30060, Chna The research s fnanced by Appled Program of Basc Research of Tann (08JCYBJC4700) Absrac In hs paper, a Hdden Markov Model (HMM) speech recognon sysem whch s based on Feld Programmable Gae Array (FPGA) s desgned. I nroduces he prncple of speech recognon algorhm and deduces he hardware frameworks accordngly. In erms of HMM recognon module, he convenonal Verb algorhm has been mproved and recognon speed has been ncreased. The core par of he hardware s EP2S60F020C3 FPGA chp. The expermenal resul of hs sysem shows ha he speech recognon accuracy reaches 94% when en numbers are beng recognzed, and he average recognon me s 0.669s. Keywords: Feld Programmable Gae Array, Hdden Markov Model, Speech Recognon, Verb Algorhm. Inroducon As a new convenen means of human-machne neracon, speech recognon s wdely appled o many porable embed speech producs. The ulmae am of speech recognon s o make machne undersand naural language. I s of grea sgnfcance no only n praccal applcaon bu scenfc research. The research on speech recognon echnology manly concenraes on wo aspecs. One s he sofware runnng on compuer, he oher s embedded sysems. The advanages of Embedded sysems are hgh-performance, convenence, cheap and hey have huge poenal for developmen. FPGA has advanages of shor developmen cycle, low-cos desgn and low-rsk. In recen years, FPGA has become he key componens n hgh-performance dgal sgnal processng sysems n dgal communcaon, nework, vdeo and mage felds. In hs paper, he desgn was mplemened on an EP2S60F020C3 FPGA, sng on srax II developmen board. 2. Speech Recognon Bascs Fg. shows he speech recognon algorhm flow. A ypcal speech recognon sysem sars wh he Mel Frequency Cepsrum Coeffcen (MFCC) feaure analyss sage, whch s composed of he followng ems: ) Pre-emphass. 2) Dvde he speech sgnal no frames. 3) Apply he hammng wndow. 4) Compue he MFCC feaure. The second sage s vecor quanzaon sage. In hs sage, codebook s used o quanze he MFCC feaure and ge MFCC feaure vecor. The codebook s generaed on compue va LBG arhmec, and s downloaded o ROM. The las sage s recognon, whch s performed by usng a se of sascal models.e. hdden Markov models (HMM). In hs sage, he probably 06

Modern Appled Scence Augus, 2009 of MFCC feaure vecor has been generaed by each model and he resul s he model whch generaed he larges probably. 2. MFCC Feaure analyss Fgure 2 shows he process of creang MFCC feaures. The frs sep s o be aken he Dscree Fourer Transform (DFT) of each frame. Ceran amoun of 0s are added o he end of Tme-doman sgnal s(n) of each frame, n order o form he sequence of N-lengh. And hen he DFT of each frame s aken o ge he lnear specrum X (k). In he second sep, lnear specrum X (k) s mulpled by he Mel frequency fler banks and convered o Mel specrum. Mel frequency fler banks are several band pass flers H m (k), and each band pass fler s defned as follows: 0 ( k < f ( m )) k f ( m ) ( f ( m ) k f ( m)) f ( m) f ( m ) H m ( k) = (0 m < M ) f ( m + ) k ( f ( m) k f ( m _)) f ( m + ) f ( m) 0 ( k > f ( m + )) Where 0 m < M, M s he number of he band pass flers, and f (m) s he cenral frequency. The hrd sep s o be aken he logarhm of Mel specrum o ge logarhmc specrum S (m). Thus, he ransfer funcon from lnear specrum X (k) o logarhmc specrum S(m) s N 2 S( m) = ln X ( k) H m ( k) (0 m < M ) (2) k = 0 In he las sep, logarhmc specrum S(m) s ransformed no cepsrum frequency by Dscree cosne Transform (DCT)n order o yeld MFCC feaure. 2.2 Vecor Quanzaon In hs paper, due o he dscree hdden markov model s used, s necessary o ransform connuous MFCC feaure whch has been yelded no dscree MFCC feaure. K Vecor quanzaon s o map one K dmensonal vecor X X ~ R o anoher K dmensonal quanze vecor ~ K Y Y = { Y, Y, L Y Y R }, n where X s npu vecor, Y s quanze vecor or codeword, X ~ s source space, N 2 Y ~ s oupu space, N s he sze of codebook, and N N K R s K dmensonal Eucldean space. The process of quanzng vecor X s o search a codeword whch s he neares one from he vecor X n codebooky ~ N. In hs paper, square dsoron measure s appled o calculae dsoron, whch s defned as 2 = X (3) d( X, Y ) Y 2.3 HMM Recognon The role of HMM Recognon s o fnd ou he maxmum probably of he HMM whch has generaed he feaure vecor, accordng o he gven feaure vecor. In hs paper, verb algorhm s used o solve he problem, and an mproved algorhm s proposed based on he orgnal algorhm. The gven HMM parameers λ = { π, A, B} ( π = { π }, A = { a}, B = { b k }), and he observaon sequenceo = O, O2, L, O, n where N s he number of HMM saes, T ( ) s he hghes probably along wh a sngle pah, a me, whch accouns for he frs observaons and ends n sae, ϕ ( ) s he HMM sae a me. The dealed algorhm s defned as follow: ) Inalzaon ( ) = π b ( O ) ϕ ( ) = 0 ( N) 2) Recurson ) [max{ ( ) a }] b ( O ) ( N) ( = + ϕ ) = arg max{ ( ) a } ( N) (5) ( 3) Termnaon () (4) 07

Vol. 3, No. 8 Modern Appled Scence P = max[ ( )] ( N) 08 T q = argmax[ ( )] (6) 4) Pah backrackng T q = ϕ + ( q+ ) ( T ) (7) 5) Algorhm mprovng In pracce, π, A and B are decmal fracons beween 0 and. I s no conducve for FPGA o mplemen decmal fracon operaon, because decmal fracon mulplcaon may cause he problem of gross underflow when T s larger han a hreshold. So s mporan o ake he logarhm of π, A and B before operaon. When π, A and B are ransformed o logarhmc probably π, A and B, floang pon numbers mulply operaon s ransformed o neger addon operaon. In addon, consderng akng ou he sgn b before operaon, (4) and (5) should be changed o ( ) = π b ( O ) ( N) (8) ϕ ( ) = 0 ) [mn{ ( ) + a }] b ( O ) ( N) ( = + + ϕ ( ) arg mn{ ( ) + a } ( N) (9) = Thus (8) and (9) are mproved algorhm expresson. 3. Desgn of Speech recognon hardware 3. Desgn of MFCC module hardware As shown n Fg.3, MFCC module s conssed of DFT module, Mel fler banks, endpon deecon module, logarhm operaon module, DCT module, oupu conrol module and conrol module. Speech sgnal s sampled a a sample rae of 8 k. Each speech frame s composed of 256 24-b sample pons. Daa wll be sen o he Mel fler bank under he conrol of he conrol elemen, and he resul afer calculaon wll be old o conrol elemen. The oupu of he Mel fler bank wll be expored o logarhm compuaon un and DCT module o calculae he MFCC parameer. Meanwhle, he pon deecon wll be execued: conrol elemen deermne wheher pu ou he MFCC parameer accordng o he oupu of speech endpon module. Daa ge hrough he module n a ppelne mode, whch enhance he sysem processng speed. 3.2 Desgn of Vecor quanzaon module hardware Vecor quanzaon module hardware s desgned as Fg.4. The order number s sored n couner. The ndex of codebook s sored n couner2. The ndex of he neares codebook s sored n regser2. The value of he dsance beween ROM (codebook) and RAM (MFCC) s sored n address module. The work flow s shown as follows: ) Under he conrol of he conroller, couner sars counng. The MFCC of each frame and codebook are read, subraced, and send o accumulaor. 2) To compare he value of regser2 and he oupu of accumulaor: f he oupu of accumulaor s larger han he value of regser2, he conroller sops he compue of curren codebook and ends o nex codebook. Couner and accumulaor are cleared. The value of couner2 plus. 3) If he oupu of accumulaor s less han he value of regser2 when he value of couner s 2. The oupu of accumulaor s sored n couner 2. The curren value of couner 2 s sored n regser. The ndex of he neares codebook and he ndex of codebook are renewed. Couner and accumulaor are cleared. The value of couner2 plus. 4) To repea above process, unl he value of couner 2 s 256. Then he vecor quanzaon of a speech frame s accomplshed. 3.3 Desgn of HMM recognon module hardware A 4 sae lef-o-rgh HMM whou skppng s adoped n hs paper. The desgn of HMM recognon module hardware s shown n Fg.5. FSM s he conroller of sae machne. The observaon sequence s sored n RAM O. The value of nal probably s sored n RAM P. Sae ranson probably A s sored n RAM A. Oupu probably B s sored n RAM B. The address of RAM A and REM B are generaed from GENaddrA and GENaddrB respecvely. CurrenMn s used for preservng he smalles probably of he recognon model unl he curren model, CounerIndex s used for savng he model label of he smalles

Modern Appled Scence Augus, 2009 probably, The key pon of he Verb algorhm s seekng ( ) va ype (9), as a resul, he PE un has been desgned for calculang ( ) n hs paper. As shown n Fgure 6, PE un s conssed of hree adders and wo daa selecors. Frs, o calculae he value of ( ) + a and, ( ) + a. Second, o choose he smalles value hrough daa selecor,, and add a value of b ( O ) on o ge ( ). In he nal sae, f =, only needs o compare he value of ( ) + a and, π, and o ake he smaller value as he smalles value. 4. Implemenaon and Resuls I was acheved he enre voce ranng and he recognon process by usng Srax II EP2S60 DSP developmen board as he hardware plaform of Voce processng module. Fg.7 s he RTL vew. Acqure he voce sgnal hrough he mcrophones and PC-n ape recorder. The sample rae was 025KHz, and he sample precson was 6bs. Gan 50 samples for each mandarn dg from o 0 as he expermen subecs. The expermenal resuls were shown n able. The average recognon accuracy of speaker-ndependen mandarn dgs reaches 94% and he average recognon me s 0.669s n hs sysem, whch acheves he recognon rae and real-me requremens. 5. Conclusons In hs paper, a FPGA-based Hdden Markov Model speech recognon sysem was desgned. I complees he acquson of voce by mcrophone and PC-n ape recorder and he generaon of code book and ranng daa. In he sysem, calculae he MFCC feaure vecor was calculaed, quanzed and recognzed by Verb algorhms. In he HMM recognon, he radonal Verb algorhm was mproved o enhance he recognon speed, whch was able o mee he needs for real-me voce recognon sysems and he requremens of he recognon accuracy. References Alera Corporaon. (2006). Nos II Processor Reference Handbook, -. Alera Corporaon. (2006). Nos II Sofware Developer s Handbook, 4-. Bok-Gue Park, Koon-shk Cho, & Jun-Dong Cho. (2002). Low power VLSI archecure of verb scorer for HMM-based solaed word recognon. Inernaonal Symposum on Qualy Elecronc Desgn, 235-39. Elmsery, F, A, Khall, A, H, Salama, A, E, & Hammed, H, E. (2003). A FPGA-Based HMM for a dscree Arabc speech recognon sysem. Proceedngs of he 5h Inernaonal Conference on 9-0 Dec, 322-325 Lawrence, R, Rabner. (989). A Tuoral on Hdden Markov Models and Seleced Applcaons n Speech Recognon. Proceedngs of he IEEE, VOL.77, NO 2, February. Lawrence, Rabner, & Bng-Hwang, Juang. (999). Fundamenals of speech recognon. Beng: Prence-Hall Inernaonal, Inc. Melnkoff, S, J, Qugley, S, F, & Russell, M, J. (2002). Implemenng a smple connuous speech recognon sysem on an FPGA. Feld-Programmable Cusom Compung Machnes, Proceedngs.0h annual IEEE Symposum, 275-276 Nedevsch, S, Para, R, K, & Brewer, E, A. (2005) Hardware speech recognon for user nerfaces n low cos, low power devces.desgn, Auomaon Conference. Proceedngs. 42nd3-7 June, 684-689. Yoshzawa, S, Mynamaga, Y, & Wada, N. (2002). A low-power VLSI desgn of an HMM based speech recognon sysem. Crcu sand Sysems. Mdwes Symposum on Volume 2, II-489-II-49292. Table. Expermen resul Number 2 3 4 5 6 7 8 9 0 Correc rae (%) 96 94 94 92 96 92 94 92 96 94 Tme ( μ s ) 0.67 0.69 0.65 0.70 0.66 0.65 0.68 0.65 0.64 0.70 09

Vol. 3, No. 8 Modern Appled Scence Fgure. Speech recognon algorhm flow s (n) X (k) s (m) c(n) Fgure 2. MFCC Feaure analyss algorhm flow Fgure 3. MFCC feaure analyss hardware srucure Conroller Couner Couner2 Regser GEN addr ROM (codebook) RAM (MFCC) SUB MUL Regser2 ACCU MIN? Fgure 4. Vecor quanzaon hardware srucure 0

Modern Appled Scence Augus, 2009 FSM Frame Couner Nodel Couner Sae Couner Couner ndex RAM O GEN addrb GEN addra Curren Mn RAM P RAM A RAM B Buffer PE PE2 PE3 PE4 MIN? Fgure 5. HMM recognon hardware srucure ( ) ( ) a, π b ( O ) MUX MUX ( ) a, Fgure 6. processng elemen Fgure 7. RTL vew