Speech Recognition on Cell Broadband Engine UCRL-PRES

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Speech Recognition on Cell Broadband Engine UCRL-PRES-223890"

Transcription

1 Speech Recognition on Cell Broadband Engine UCRL-PRES Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda (IBM) Daniel May (Mississippi State) GAIA Graphics Architectures for Intelligence Applications

2 Multi-Channel Speech Recognition Motivation Concurrent processing for thousands of speech channels! Each CPU can process only a few (10-30) channels in real-time. Real-time speech processing of high volume voice traffic Can streaming architectures help achieve this goal? Approach Started with Mississippi State open source ISIP toolkit, ported feature extraction onto GPU, and then Cell, implemented isolated digit decoder for Cell, then connected digit decoder for Cell, and now recently connected phones.

3 Outline Speech system Feature extraction Speech decoding Cell implementation Isolated digits (Viterbi decoder) Connected digits (Level Building Algorithm) Phones (lexicon / grammar) Results Future work

4 Speech System Overview

5 Speech System Components Feature extraction Encode audio channel into a sequence of feature vectors Characterize spectral and temporal aspects of speech Mel Frequency Cepstral Coefficients (MFCC) features model human auditory response Viterbi decoding Hidden Markov Model (HMM) based pattern-matching Model digits directly using HMMs Dynamic programming Baum-Welch training HMM parameter estimation Require training corpus

6 Feature Extraction Twelve signal processing stages: Window Extraction Zero Mean Energy Computation Preemphasis Filter Hamming Window Spectrum Computation (FFT) Mel Frequency Computation Cepstrum Computation Discrete Cosine Transform Lifter (Cepstral Filter) Cepstrum Energy Norm First and Second Derivatives MFCC feature vector frame: Represents 25 ms of audio w/ 15 ms overlap (10 ms / frame) 250 floats per 25 ms of audio (Pad to 256 for FFT/DCT) Reduce to 39 coefficients Real-time processing requires 100 frames per channel per second Normalize Cepstrum variance for speaker-independence

7 Viterbi Decoding Hidden Markov Models Finite state machine that characterizes observation (MFCC) sequences Transitional probabilities assigned to each directed edge Each state generates an observation drawn from its PDF Stochastically model the process of human speech synthesis Viterbi algorithm How are HMMs used to recognize digits from an observation sequence? Sum up probability of all possible paths through the HMM that could have generated the observation sequence Too expensive, instead approximate this with the maximum likelihood path Use dynamic programming to cache intermediate path probabilities

8 Cell Implementation

9 Decoding Isolated Digits Easy problem Digit word models Exactly one digit per channel Generalizes to more complex speech recognition First experiment Very fast prototyping Estimate best performance on Cell Components Feature extraction Viterbi decoding

10 Feature Extraction Straightforward parallel implementation Process 8 signals concurrently (packed into 2 arrays of quad-words) FFT is the most computationally intense Take advantage of IBM s extremely optimized Cell FFT library Data format conversion required to interface with FFT library Hybrid implementation Process audio channel in single pass Data reduction computed on SPEs First and second derivatives computed on PPE Enforce a streaming model Pipeline data communication, and avoid branches

11 Viterbi Decoding Computational bottleneck! Data-independent processing opportunity to leverage parallelism Profile and identify kernels Simplifying assumptions: First order HMM Strict left right model, only allow self & forward transitions PDF approximated by linear combination of four Gaussian kernels (Gaussian Mixture Model GMM) For each HMM state: Calculate observational probability (evaluate GMM) Select maximum likelihood (between self and forward transition)

12 Viterbi Decoding Data representation Store HMM states in quadword vectors SIMD parallelism Transition probabilities across model boundaries are zeroed out to prevent data contamination Store only two columns; perform computation in place Isolated digit recognition Seed with initial probability Decode all HMMs in one pass Extract final probabilities and select maximum

13 Computation Kernels Observation probability Evaluate feature vector frame at each state: ( s ) = ( x s ) obs P Maximum likelihood Find likelihood of best path through each HMM: i ( s ) ( s ) Must shift column vector up by one element to align data access i L + = + k-1 ( si) slf( si) ( s ) + ( s ) k L i obs i max L k-1 i 1 fwd i 1

14 SPE Programs spe_mfcc_frontend Convert audio sequence to MFCC frames spe_decode_obs Convert MFCC frames to observation probability spe_decode_max Maximum likelihood computation

15 Decoding Connected Digits Multiple utterances in a frame sequence: How many digits are there? Where does one digit end and another begin? What if the speaker talks too fast? (co-articulation) (e.g. two-oh, three-eight-two, nine-nine, six-seven-nine ) One naïve approach: Construct large HMM for every possible digit string (e.g. zero-zero-zero, zero-zero-one ) Too many combinatoric possibilities! Duplicate computation!

16 Decoding Connected Digits Level Building Algorithm Explore all possible combinatoric arrangements of HMMs but cache intermediate computations Basic idea: First digit starts at frame 0, but may terminate anywhere Decode second digit at each possible ending frame for the first digit Now the second digit can end anywhere Decoding results overlap, so select the best probability at each frame Each frame also needs to maintain its starting frame pointer and label of HMM with best probability Organize computation into levels Decode at most one digit per level

17 Level Building Algorithm Multiple passes over MFCC frames Observational probabilities computed in a single pass Maximum-likelihood path computed over several passes Bound computation using minimum and maximum digit length expectations Opportunity for dynamic pruning Traceback step decodes digits in reverse PPE implementation Drives spe_decode_max, maintains bookkeeping, and performs traceback

18 Decoding Phones Linguistic phone Atomic unit of speech Support arbitrary vocabulary size More efficient to decode than word models Triphone models provide left / right context Implementation Not so different from word models Fewer states per phone model fewer HMM state pruning opportunities Apply lexicon / language grammar to decode valid words from phones (more later)

19 Preliminary Results

20 Experiment Setup Vocabulary Recognize digit phones 3 state phone HMMs 20 classes, 57 total states Gaussian Mixture Model 4 Gaussians per mixture One mixture per state Corpus TIDIGITS Connected phone recognition Level Building Algorithm Phones between 5 to 20 frames Training HMMs trained off-line using MS ISIP s Baum-Welch Platforms 3.2 GHz Pentium (CPU) 4.0 GHz Cell simulator (SIM) 2.0 GHz Cell beta hardware (CELL)

21 System Performance Feature Extraction Connected Phones Total (FE + CP) Real Time Channels (100 frames / sec) CPU (3.2 GHz) SIM (4.0 GHz) CELL (2.0 GHz) Timing methodology: Ignored initialization and network I/O gettimeofday() wrapper around critical code Timed for single SPE, extrapolated to all 8 Gained at least two orders of magnitude of performance on Cell, but does not include lexicon / grammar / pruning optimizations!

22 Sample Run speech]#./decode.ppe Allocating levels... Loading Gaussian means... Loading Gaussian inverse variances... Loading Gaussian weight factors... Loading Gaussian scale factors... Loading transitional probabilities... Loading MFCC file frames. Computing observational probabilities... Decoding maximum likelihood... nlevels = 48 min_levels = 9 prob = Decoded: [sil] f ay v n ay n ah ay n s eh v ih n s w ih k s [sil] Obs time: seconds. Max time: seconds. Tot time: seconds. Cleaning up... speech]# _

23 Next Steps

24 Sample Decode Speech decode using only acoustic phone models: [sil] f ay v n ay n ah ay n s eh v ih n s w ih k s [sil] ([sil] five nine nine seven six [sil]) What is swihks? Apply lexicon constraints to decode actual words from phones Implement lexicon at level boundaries in LBA on PPE

25 Lexicon Phone transcriptions FOUR f ow r FIVE f ay v SIX s ih k s Lexical graph ow r f ay v Implementation Must maintain state for each decode Managed by Level Building Algorithm on PPE Opportunity to also apply phone bi-gram / tri-gram Difficult to leverage SIMD parallelism s ih k s

26 Larger Corpora TIDIGITS (Early Prototyping) 326 speakers (111 men, 114 women, 50 boys, and 51 girls) Each pronounce 77 digit sequences, 11 words, no grammar TIMIT (Develop Phonetic System) 630 speakers (representing 8 major dialects of US) Each speak 10 sentences ~6000 words, limited grammar Wall Street Journal (WSJ) Read from WSJ news text Also includes spontaneous dictation Switchboard (Real Speech Problem) 2430 spontaneous conversations from over 500 speakers across telephone Noisy data, speech vs. nonspeech (stuttering, incomplete words, etc) Unknown vocabulary and/or language grammar

27 Future Work Larger vocabulary More challenging corpora (WSJ, Switchboard) for performance / accuracy tradeoff study Lexicon / language grammar (bigram / trigram) PPE optimizations Model / word level hierarchical pruning SPE system load balancing SPE optimizations Chain feature extraction kernels (pending DMA issues) Tied Gaussian mixtures HMM-level pruning

28 Questions? Contact: Yang Liu Holger Jones

ISSN: [N. Dhanvijay* et al., 5 (12): December, 2016] Impact Factor: 4.116

ISSN: [N. Dhanvijay* et al., 5 (12): December, 2016] Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY HINDI SPEECH RECOGNITION SYSTEM USING MFCC AND HTK TOOLKIT Nikita Dhanvijay *, Prof. P. R. Badadapure Department of Electronics

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS. Keith Vertanen

BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS. Keith Vertanen BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS Keith Vertanen University of Cambridge, Cavendish Laboratory Madingley Road, Cambridge, CB3 0HE, UK kv7@cam.ac.uk

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Improvement of an Automatic Speech Recognition Toolkit

Improvement of an Automatic Speech Recognition Toolkit Improvement of an Automatic Speech Recognition Toolkit Christopher Edmonds, Shi Hu, David Mandle December 14, 2012 Abstract The Kaldi toolkit provides a library of modules designed to expedite the creation

More information

Speech Recognition Basics

Speech Recognition Basics Sp eec h A nalysis and Interp retatio n L ab Speech Recognition Basics Signal Processing Signal Processing Pattern Matching One Template Dictionary Signal Acquisition Speech is captured by a microphone.

More information

SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR

SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR Masatsune Tamura y, Takashi Masuko y, Keiichi Tokuda yy, and Takao Kobayashi y ytokyo Institute of Technology, Yokohama, 6-8 JAPAN yynagoya

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training

Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training INTERSPEECH-2004 1 Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training G.L. Sarada, N. Hemalatha, T. Nagarajan, Hema A. Murthy Department of Computer Science and Engg.,

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

An Approach to Extract Feature using MFCC

An Approach to Extract Feature using MFCC IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 08 (August. 2014), V1 PP 21-25 www.iosrjen.org An Approach to Extract Feature using MFCC Parwinder Pal Singh,

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Hidden Markov Models. and. Sequential Data

Hidden Markov Models. and. Sequential Data Hidden Markov Models and Sequential Data Sequential Data Often arise through measurement of time series Snowfall measurements on successive days in Buffalo Rainfall measurements in Chirrapunji Daily values

More information

Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance

Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance Iosif Mporas, Todor Ganchev, Elias Kotinas, and Nikos Fakotakis Department of Electrical

More information

Signal Processing for Speech Recognition

Signal Processing for Speech Recognition Signal Processing for Speech Recognition Once a signal has been sampled, we have huge amounts of data, often 20,000 16 bit numbers a second! We need to find ways to concisely capture the properties of

More information

CHAPTER 5 SPEAKER IDENTIFICATION USING SPEAKER- SPECIFIC-TEXT

CHAPTER 5 SPEAKER IDENTIFICATION USING SPEAKER- SPECIFIC-TEXT 52 CHAPTER 5 SPEAKER IDENTIFICATION USING SPEAKER- SPECIFIC-TEXT 5.1 MOTIVATION FOR USING SPEAKER-SPECIFIC-TEXT Better classification accuracy can be achieved if the training technique is able to capture

More information

Lecture 10: Sequential Data Models

Lecture 10: Sequential Data Models CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

More information

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM

RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM RADIOLOGICAL REPORTING BY SPEECH RECOGNITION: THE A.Re.S. SYSTEM B. Angelini, G. Antoniol, F. Brugnara, M. Cettolo, M. Federico, R. Fiutem and G. Lazzari IRST-Istituto per la Ricerca Scientifica e Tecnologica

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION

EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION By Ramasubramanian Sundaram A Thesis Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Closed-Set Speaker Identification Based on a Single Word Utterance: An Evaluation of Alternative Approaches

Closed-Set Speaker Identification Based on a Single Word Utterance: An Evaluation of Alternative Approaches Closed-Set Speaker Identification Based on a Single Word Utterance: An Evaluation of Alternative Approaches G.R.Dhinesh, G.R.Jagadeesh, T.Srikanthan Nanyang Technological University, Centre for High Performance

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

INVESTIGATION OF SEGMENTATION IN I-VECTOR BASED SPEAKER DIARIZATION OF TELEPHONE SPEECH

INVESTIGATION OF SEGMENTATION IN I-VECTOR BASED SPEAKER DIARIZATION OF TELEPHONE SPEECH INVESTIGATION OF SEGMENTATION IN I-VECTOR BASED SPEAKER DIARIZATION OF TELEPHONE SPEECH ZČU, FAV, KKY Zbyněk Zajíc, Marie Kunešová, Vlasta Radová Introduction 2 Speaker Diarization = Who spoke when? No

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Speech Recognition Software and Vidispine

Speech Recognition Software and Vidispine Speech Recognition Software and Vidispine Tobias Nilsson April 2, 2013 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Frank Drewes Examiner: Fredrik Georgsson Umeå University Department

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

More information

Machine Learning I Week 14: Sequence Learning Introduction

Machine Learning I Week 14: Sequence Learning Introduction Machine Learning I Week 14: Sequence Learning Introduction Alex Graves Technische Universität München 29. January 2009 Literature Pattern Recognition and Machine Learning Chapter 13: Sequential Data Christopher

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Phil Blunsom pcbl@cs.mu.oz.au August 9, 00 Abstract The Hidden Markov Model (HMM) is a popular statistical tool for modelling a wide range of time series data. In the context of natural

More information

Architectures and Platforms

Architectures and Platforms Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

HMM Speech Recognition. Words: Pronunciations and Language Models. Pronunciation dictionary. Out-of-vocabulary (OOV) rate.

HMM Speech Recognition. Words: Pronunciations and Language Models. Pronunciation dictionary. Out-of-vocabulary (OOV) rate. HMM Speech Recognition ords: Pronunciations and Language Models Recorded Speech Decoded Text (Transcription) Steve Renals Acoustic Features Acoustic Model Automatic Speech Recognition ASR Lecture 9 January-March

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

A Supervised Approach To Musical Chord Recognition

A Supervised Approach To Musical Chord Recognition Pranav Rajpurkar Brad Girardeau Takatoki Migimatsu Stanford University, Stanford, CA 94305 USA pranavsr@stanford.edu bgirarde@stanford.edu takatoki@stanford.edu Abstract In this paper, we present a prototype

More information

Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot

Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot Moisa Claudia**, Silaghi Helga Maria**, Rohde L. Ulrich * ***, Silaghi Paul****, Silaghi Andrei****

More information

Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model

Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model

More information

EVALUATION OF SPEAKER RECOGNITION FEATURE-SETS USING THE SVM CLASSIFIER. Daniel J. Mashao and John Greene

EVALUATION OF SPEAKER RECOGNITION FEATURE-SETS USING THE SVM CLASSIFIER. Daniel J. Mashao and John Greene EVALUATION OF SPEAKER RECOGNITION FEATURE-SETS USING THE SVM CLASSIFIER Daniel J. Mashao and John Greene STAR Research Group, Department of Electrical Engineering, UCT, Rondebosch, 77, South Africa, www.star.za.net

More information

Speech Signal Processing introduction

Speech Signal Processing introduction Speech Signal Processing introduction Jan Černocký, Valentina Hubeika {cernocky,ihubeika}@fit.vutbr.cz DCGM FIT BUT Brno FIT BUT Brno Speech Signal Processing introduction. Valentina Hubeika, DCGM FIT

More information

AUTOMATIC SPEECH PROCESSING USING HTK FOR TELUGU LANGUAGE

AUTOMATIC SPEECH PROCESSING USING HTK FOR TELUGU LANGUAGE AUTOMATIC SPEECH PROCESSING USING HTK FOR TELUGU LANGUAGE SaiSujith Reddy Mankala 1, Sainath Reddy Bojja 2, V. Subba Ramaiah 3 & R. Rajeswara Rao 4 1,2 Department of CSE, MGIT, JNTU, Hyderabad, India 3

More information

Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models

Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models Denis Tananaev, Galina Shagrova, Victor Kozhevnikov North-Caucasus Federal University,

More information

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The

More information

The BBC s Virtual Voice-over tool ALTO: Technology for Video Translation

The BBC s Virtual Voice-over tool ALTO: Technology for Video Translation The BBC s Virtual Voice-over tool ALTO: Technology for Video Translation Susanne Weber Language Technology Producer, BBC News Labs In this presentation. - Overview over the ALTO Pilot project - Machine

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Hidden Markov Models for biological systems

Hidden Markov Models for biological systems Hidden Markov Models for biological systems 1 1 1 1 0 2 2 2 2 N N KN N b o 1 o 2 o 3 o T SS 2005 Heermann - Universität Heidelberg Seite 1 We would like to identify stretches of sequences that are actually

More information

Phonetic Speaker Recognition with Support Vector Machines

Phonetic Speaker Recognition with Support Vector Machines Phonetic Speaker Recognition with Support Vector Machines W. M. Campbell, J. P. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek MIT Lincoln Laboratory Lexington, MA 02420 wcampbell,jpc,dar,daj,tleek@ll.mit.edu

More information

PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED

PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED A. Matthew Zimmer, Bingjun Dai, Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk,

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

GMP implementation on CUDA - A Backward Compatible Design With Performance Tuning

GMP implementation on CUDA - A Backward Compatible Design With Performance Tuning 1 GMP implementation on CUDA - A Backward Compatible Design With Performance Tuning Hao Jun Liu, Chu Tong Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto haojun.liu@utoronto.ca,

More information

Overview. Speech Signal Analysis. Speech signal analysis for ASR. Speech production model. Vocal Organs & Vocal Tract. Speech Signal Analysis for ASR

Overview. Speech Signal Analysis. Speech signal analysis for ASR. Speech production model. Vocal Organs & Vocal Tract. Speech Signal Analysis for ASR Overview Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 7/2 January 23 Speech Signal Analysis for ASR Reading: Features for ASR Spectral analysis

More information

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,

More information

On the use of asymmetric windows for robust speech recognition

On the use of asymmetric windows for robust speech recognition Circuits, Systems and Signal Processing manuscript No. (will be inserted by the editor) On the use of asymmetric windows for robust speech recognition Juan A. Morales-Cordovilla Victoria Sánchez Antonio

More information

ABSTRACT 1. INTRODUCTION 2. A MODEL OF THE ENVIRONMENT

ABSTRACT 1. INTRODUCTION 2. A MODEL OF THE ENVIRONMENT A VECTOR TAYLOR SERIES APPROACH FOR ENVIRONMENT-INDEPENDENT SPEECH RECOGNITION Pedro J. Moreno, Bhiksha Raj and Richard M. Stern Department of Electrical and Computer Engineering & School of Computer Science

More information

Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio

Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio Case Study Intel Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio Challenge: Deliver high performance code for time-critical tasks in LTE wireless communication applications.

More information

1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland

1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland ACOUSTIC MODELS OF CV SYLLABLES FOR SPEECH RECOGNITION S. Fernández 1, S. Feijóo 2, and N. Barros 2 1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland e-mail:

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS

TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS Tobias Bocklet and Andreas Maier and Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Martenstr.3, 91058 Erlangen,

More information

NSC Natural Speech Communication Ltd. Key-Word Spotting- The Base Technology for Speech Analytics. By Guy Alon NSC - Natural Speech Communication Ltd.

NSC Natural Speech Communication Ltd. Key-Word Spotting- The Base Technology for Speech Analytics. By Guy Alon NSC - Natural Speech Communication Ltd. NSC Natural Speech Communication Ltd. J u l y, 2 0 0 5 Key-Word Spotting- The Base Technology for Speech Analytics By Guy Alon NSC - Natural Speech Communication Ltd. 33 Lazarov St. Rishon Lezion Israel

More information

A Computer Vision System on a Chip: a case study from the automotive domain

A Computer Vision System on a Chip: a case study from the automotive domain A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel

More information

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K cgupta@clemson.edu, sumodm@clemson.edu Abstract In this project we propose a method to improve

More information

Recurrent (Neural) Networks: Part 1. Tomas Mikolov, Facebook AI Research Deep Learning NYU, 2015

Recurrent (Neural) Networks: Part 1. Tomas Mikolov, Facebook AI Research Deep Learning NYU, 2015 Recurrent (Neural) Networks: Part 1 Tomas Mikolov, Facebook AI Research Deep Learning course @ NYU, 2015 Structure of the talk Introduction Architecture of simple recurrent nets Training: Backpropagation

More information

Evaluation of Acoustic Model and Unsupervised Speaker Adaptation for Child Speech Recognition in Real Environment

Evaluation of Acoustic Model and Unsupervised Speaker Adaptation for Child Speech Recognition in Real Environment Vol. 47 No. 7 July 2006, 71.1% 23.9% 59,966 200 2.2% 1.7% 0.5% Evaluation of Acoustic Model and Unsupervised Speaker Adaptation for Child Speech Recognition in Real Environment Mitsuru Samejima, Randy

More information

(LDA). I. INTRODUCTION

(LDA). I. INTRODUCTION A Study on Speech Recognition Mridula Shanbhogue 1, Shreya Kulkarni 2, R Suprith 3, Tejas K I 4, Nagarathna N 5 Dept. Of Computer Science & Engineering, B.M.S. College of Engineering, Bangalore, India

More information

A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION

A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION ZAKI B. NOSSAIR AND STEPHEN A. ZAHORIAN Department of Electrical and Computer Engineering Old Dominion University, Norfolk,

More information

Signal Modeling for High-Performance Robust Isolated Word Recognition

Signal Modeling for High-Performance Robust Isolated Word Recognition 1 Signal Modeling for High-Performance Robust Isolated Word Recognition Montri Karnjanadecha and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA

More information

C E D A T 8 5. Innovating services and technologies for speech content management

C E D A T 8 5. Innovating services and technologies for speech content management C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle

More information

Analysis of GPU Parallel Computing based on Matlab

Analysis of GPU Parallel Computing based on Matlab Analysis of GPU Parallel Computing based on Matlab Mingzhe Wang, Bo Wang, Qiu He, Xiuxiu Liu, Kunshuai Zhu (School of Computer and Control Engineering, University of Chinese Academy of Sciences, Huairou,

More information

Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press

Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press -! Generative statistical models try to learn how to generate data from latent variables (for example class labels).

More information

COMPARISON OF ENERGY-BASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING

COMPARISON OF ENERGY-BASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING DEPARTMENT OF COMPARISON OF ENERGY-BASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING by A. Ganapathiraju, L. Webster, J. Trimble, K. Bush, P. Kornman Department of Electrical and Computer Engineering

More information

An Automatic Real-time Synchronization of Live Speech with Its Transcription Approach

An Automatic Real-time Synchronization of Live Speech with Its Transcription Approach Article An Automatic Real-time Synchronization of Live Speech with Its Transcription Approach Nat Lertwongkhanakool a, Natthawut Kertkeidkachorn b, Proadpran Punyabukkana c, *, and Atiwong Suchato d Department

More information

Online Diarization of Telephone Conversations

Online Diarization of Telephone Conversations Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of

More information

Background. Linear Convolutional Encoders (2) Output as Function of Input (1) Linear Convolutional Encoders (1)

Background. Linear Convolutional Encoders (2) Output as Function of Input (1) Linear Convolutional Encoders (1) S-723410 Convolutional Codes (1) 1 S-723410 Convolutional Codes (1) 3 Background Convolutional codes do not segment the data stream, but convert the entire data stream into another stream (codeword) Some

More information

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31 Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the

More information

have more skill and perform more complex

have more skill and perform more complex Speech Recognition Smartphone UI Speech Recognition Technology and Applications for Improving Terminal Functionality and Service Usability User interfaces that utilize voice input on compact devices such

More information

ECEN 5682 Theory and Practice of Error Control Codes

ECEN 5682 Theory and Practice of Error Control Codes ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.

More information

A bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale

A bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale What is the University of Florida EDGE Program? EDGE enables engineering professional, military members, and students worldwide to participate in courses, certificates, and degree programs from the UF

More information

Data driven design of filter bank for speech recognition

Data driven design of filter bank for speech recognition Data driven design of filter bank for speech recognition Lukáš Burget 12 and Hynek Heřmanský 23 1 Oregon Graduate Institute, Anthropic Signal Processing Group, 2 NW Walker Rd., Beaverton, Oregon 976-8921,

More information

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

Advanced Microprocessors RISC & DSP

Advanced Microprocessors RISC & DSP Advanced Microprocessors RISC & DSP RISC & DSP :: Slide 1 of 23 RISC Processors RISC stands for Reduced Instruction Set Computer Compared to CISC Simpler Faster RISC & DSP :: Slide 2 of 23 Why RISC? Complex

More information

Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System

Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System Francis Kubala Ming-Whel Feng, John Makhoul, Richard Schwartz BBN Systems and Technologies Corporation 10 Moulton St.,

More information

Multi Grid for Multi Core

Multi Grid for Multi Core Multi Grid for Multi Core Harald Köstler, Daniel Ritter, Markus Stürmer and U. Rüde (LSS Erlangen, ruede@cs.fau.de) in collaboration with many more Lehrstuhl für Informatik 10 (Systemsimulation) Universität

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

Phoneme-based Recognition of Finnish Words with Dynamic Dictionaries

Phoneme-based Recognition of Finnish Words with Dynamic Dictionaries Phoneme-based Recognition of Finnish Words with Dynamic Dictionaries F. Seydou *, T. Seppänen *, J. Peltola, P. Väyrynen * * University of Oulu, Information Processing Laboratory, Oulu, Finland VTT Electronics,

More information

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102

More information

Design and Implementation of a SoPC System for Speech Recognition

Design and Implementation of a SoPC System for Speech Recognition Design and Implementation of a SoPC System for Speech Recognition Tran Van Hoang 1, Nguyen Ly Thien Truong 1, Hoang Trang 1, Xuan-Tu Tran 2 1 University of Technology, Vietnam National University, HoChiMinh

More information

Automatic analysis of heart sounds using speech recognition techniques

Automatic analysis of heart sounds using speech recognition techniques Automatic analysis of heart sounds using speech recognition techniques Dan Ellis & Michael Mandel Laboratory for Recognition and Organization of Speech and Audio (LabROSA) Department of Electrical Engineering,

More information

SAYA FREE SPEECH RECOGNITION MINI PROJECT AVIAD OTMAZGIN AMIR BARON

SAYA FREE SPEECH RECOGNITION MINI PROJECT AVIAD OTMAZGIN AMIR BARON SAYA FREE SPEECH RECOGNITION MINI PROJECT AVIAD OTMAZGIN AMIR BARON INTRODUCTION Our mini project handles with the speech recognition part on saya. Currently, saya can recognize only a small vocabulary

More information

AUTOMATIC IDENTIFICATION OF BIRD CALLS USING SPECTRAL ENSEMBLE AVERAGE VOICE PRINTS. Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar

AUTOMATIC IDENTIFICATION OF BIRD CALLS USING SPECTRAL ENSEMBLE AVERAGE VOICE PRINTS. Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar AUTOMATIC IDENTIFICATION OF BIRD CALLS USING SPECTRAL ENSEMBLE AVERAGE VOICE PRINTS Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar Indian Institute of Technology Madras Chennai, 60006,

More information

Dragon NaturallySpeaking and citrix. A White Paper from Nuance Communications March 2009

Dragon NaturallySpeaking and citrix. A White Paper from Nuance Communications March 2009 Dragon NaturallySpeaking and citrix A White Paper from Nuance Communications March 2009 Introduction As the number of deployed enterprise applications increases, organizations are seeking solutions that

More information

Artificial Intelligence for Speech Recognition

Artificial Intelligence for Speech Recognition Artificial Intelligence for Speech Recognition Kapil Kumar 1, Neeraj Dhoundiyal 2 and Ashish Kasyap 3 1,2,3 Department of Computer Science Engineering, IIMT College of Engineering, GreaterNoida, UttarPradesh,

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information