# Speech Recognition Basics

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Sp eec h A nalysis and Interp retatio n L ab Speech Recognition Basics Signal Processing Signal Processing Pattern Matching One Template Dictionary Signal Acquisition Speech is captured by a microphone. The analog signal is converted to a digital signal by sampling it 6000 times a second. Each value is quantized to 6 bits, a number between -768 and 767. A/D,,67,-6,...

2 " # " *. / *. / / / %. " " ( + / + / " + " / %=( C. %(! ". %(. %A( " %E (. " %" *(. %?( &. %>(. %B( *. %F( *. %=( ". %==(. %=( E. %=( \$ *. %=A( ". / " # C " " # " * % ( * Spectral Analysis Spectral Analysis Take a window of 0-0 ms (0 to 80 samples). Q FFT (Fast Fourier Transform) used to compute energy in several frequency bands, typically 0 to 0. Q Cepstrum computed, or so coefficients. A frame, a snapshot of the input s spectrum, is represented by a vector of cepstrum coefficients. Q To describe the spectral change with time, one frame is computed every 0 ms or so, i.e. 00 frames per second. Q A word that lasts second is described by 00 numbers. Q Q

3 Probability

4 Markov Models 0 Weather predictor example of a Markov model Weather predictor calculation State : State : State : rain c lo u d su n State-transitio n p ro b ab ilities, A = { } a ij = () Given today is sunny (i.e., x = ), w h at is th e p rob ab ility of sun-sun-rain-c loud-c loud-sun w ith m odel M? P (X M) = P (X = {,,,,, } M) = P (x = ) P (x = x = ) P (x = x = ) P (x = x = ) P (x = x = ) P (x 6 = x = ) = π a a a a a = (0.8 )(0.)(0.)(0.6)(0.) = w h ere th e initial state p rob ab ility for state i is π i = P (x = i). () 6 State duration probability Summary of Markov models As a consequence of the first-order Markov model, the p rob ab ility of occup y ing a state for a g iven duration, τ, is ex p onential: probability p(x M, x = i) = (a ii ) τ ( a ii ). ( ) a = duration Transition probabilities: A = { } a ij = and π = {π i } = [ ]. P robability of a g iv en state seq u enc e X: P (X M ) = π x a x x a x x a x x... a xt x t. () = π x T t= 7 8

5 Hidden Markov Models Hidden Markov Models Urns and balls example (Ferguson) b b b P rob ab ility of state i p rodu c ing an ob servation o t is: b i (o t ) = P (o t x t = i), ( 6 ) w h ic h c an b e discrete or co n tin u o u s in o. 9 Elements of a discrete HMM, λ Hidden Markov model example. u m b e r o f sta te s, x {,..., };. u m b e r o f e v e n ts K, k {,..., K};. In itia l-sta te p ro b a b ilitie s, π = {π i } = {P (x = i)} fo r i ; a a a a a a a π a b (o ) b (o ) b (o ) b (o ) b (o ) b (o 6). S ta te -tra n sitio n p ro b a b ilitie s, A = {a ij } = {P (x t = j x t = i)} fo r i, j ;. D isc re te o u tp u t p ro b a b ilitie s, B = {b i (k)} = {P (o t = k x t = i)} fo r i a n d k K. o o o o o o 6 with state sequence X = {,,,,, }, P (O X, λ) = b (o ) b (o ) b (o ) b (o ) b (o ) b (o 6 ) P (X λ) = π a a a a a (7 ) P (O, X λ) = π b (o b (o b (o )... (8 ) 0 Isolated digit recognition 0 templates: one template M i per digit. Compare input x with all templates. Select the most similar template j: j min{ f ( x, M )} where f is the comparison function. i Hidden Markov Models (HMM) HMM defined by: umber of states Transition probabilities a ij Output probabilities b i (x): ) i K ik k 0 ( x,, ) c ( x,, ) ik p / ik exp{ a p m0 a a a a ( xm m) m a a a } a a a a

6 Probability of a path P 7 a States a 8 Input Frames a a a a a a a a a a a a 6 7 ) 6 ) 8 ) Probability of a model Is the probability of the best path. Problem: We need to evaluate all possible paths, and there grow exponential with number of states and input frames. Solution: The Viterbi algorithm has complexity that grows linearly with number of states and input frames. Large vocabulary Problem: Huge number of models: 60,000 models Models are hard to train Cannot easily add new words. Solution: Use phoneme models Context Dependent Models To improve accuracy, phone models are context-dependent. Example: THIS = DH IH S DH(SIL,IH) IH(DH,S) S(IH,SIL) 0 =,000 possible models are clustered to about 8,000 generalized triphones Continuous speech Example: THIS IS A TEST SIL DH(SIL,IH) IH(DH,S) S(IH,SIL) SIL IH(SIL,Z) Z(IH,SIL) SIL A(SIL,SIL) SIL T(SIL,EH) EH(T,S) S(EH,T) T(S,SIL) SIL SIL DH(SIL,IH) IH(DH,S) S(IH,IH) IH(S,Z) Z(IH,A) A(Z,T) T(A,EH) EH(T,S) S(EH,T) T(S,SIL) SIL SIL DH IH S SIL IH Z SIL A SIL T EH S T SIL Baum-Welch Algorithm t ( i) P( x, x Lxt, st i ) t ( j) t ( i) aij b j ( xt ) i P( X ) ( i) i Exact solution. T s s s a j a j a j s j t t+

7 Speeding Search: Pruning Training Viterbi algorithm: replace sum with max. Speed-Accuracy Tradeoff: Computations can be reduced by eliminating paths that are not promising, at the expense of having a chance of eliminating the best path. Typically, computations can be reduced by more than a factor of 0, without affecting the error rate significantly. 8 HMM Training A lot of speech is needed to train the models. Done with an iterative algorithm: Take initial model (i.e. uniform probabilities). Segment Database (Run Viterbi algorithm). Update models. xij ( xij ˆ j ) i0 i0 ˆ j ˆ j If Converged stop, otherwise go to. HMM Training Rule of thumb: Each state has to appear in the training database at least 0 times. Speaker-Independent systems have lots of data (models well trained), but contain high variability. Speaker-Dependent systems do not have as much data (models not so well trained) but they are more consistent. Language Modeling Bayes Rule in ASR P( A W ) P( W ) P( W A) P( A) Wˆ max P( W A) max P( A W ) P( W ) where A is the Acoustics and W the sequence of words. P(W) is the language model.

8 Statistical Language Model P( W ) P( WW W LW P( W W W ) LP( W W ) P( W ) P( W P W ) P( W ) P( W ) LP( W ( LW W ) ) Unigram W ) P ( W ) P ( W W ) P ( W W ) LP ( W W ) Bigram P( W ) P( W W ) P( W WW ) LP( W W W ) Trigram Trigrams P(THIS IS A TEST) = P(THIS) P(IS THIS) P(A THIS IS) P(TEST IS A) These statistical language models predict the probability of the current word given the past history. While simplistic, they contain a lot of information P(IS THIS) >> P(IS HAVE) -Gram Performance Perplexity Error Rate Multiplier x.0 x.0 x. x.0 Trigrams Bigrams Unigrams Uniform Perplexity measures the average branching of a text when presented to a language LP / n model. PP Pˆ( W, W, LWn ) An empirical observation E k P where E is the error rate and P the perplexity. A language model with low perplexity is good for ASR. Tradeoff between perplexity and coverage. Context Free Grammar Applications Advantages: Low error rate (because of low perplexity). Compact. Easy for application developers. Disadvantages: Poor coverage (out of language sentences). 8

9 ASR problem Space ASR Accuracy in the Lab Error Rate Multiplier x0.0 x.0 x.0 More Constraint Unconstrained Task oisy Environment Large Vocabulary Speaker Variance Less Constraint Command &Control (00 words + CFG) % word error rate (speaker-independent) Discrete Dictation (60K words) % word error rate (speaker-dependent) Continuous Dictation (60K words) 7% word error rate (speaker-dependent) Telephone Digits 0.% word error rate (speaker-independent) C & C Types of Errors Pronunciation errors make some words unrecognizable. User says something outside the vocabulary/grammar. User speaks while machine is talking. Background noise fires recognizer (false alarm). Rejection needs improvement. Spontaneous speech (uhms, ahms) ASR in Real Applications ew word addition Rejection Barge-in. Robustness to noise. ASR is just a part of an application, User Interface is critical. Interface LP/ASR SAIL Lab: Loose coupling: LP selects correct input from a list of top candidate sentences: Easy to implement, not optimal Tight coupling. LP provides the language probabilities for the search: Optimal but hard to implement. Can a LP system reduce perplexity?

10

11 Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab ote: Lectures put together from ow n m a teria l a s w ell a s m a teria l from tutoria ls of A lex A cero & P hilips J a ck son 66

### Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

### Hidden Markov Models. and. Sequential Data

Hidden Markov Models and Sequential Data Sequential Data Often arise through measurement of time series Snowfall measurements on successive days in Buffalo Rainfall measurements in Chirrapunji Daily values

### Lecture 10: Sequential Data Models

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

### Lecture 12: An Overview of Speech Recognition

Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

### Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP

### Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System

Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System Francis Kubala Ming-Whel Feng, John Makhoul, Richard Schwartz BBN Systems and Technologies Corporation 10 Moulton St.,

### Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The

### Hidden Markov Models Fundamentals

Hidden Markov Models Fundamentals Daniel Ramage CS229 Section Notes December, 2007 Abstract How can we apply machine learning to data that is represented as a sequence of observations over time? For instance,

### Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

### ISSN: [N. Dhanvijay* et al., 5 (12): December, 2016] Impact Factor: 4.116

IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY HINDI SPEECH RECOGNITION SYSTEM USING MFCC AND HTK TOOLKIT Nikita Dhanvijay *, Prof. P. R. Badadapure Department of Electronics

### An Approach to Extract Feature using MFCC

IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 08 (August. 2014), V1 PP 21-25 www.iosrjen.org An Approach to Extract Feature using MFCC Parwinder Pal Singh,

### Hardware Implementation of Probabilistic State Machine for Word Recognition

IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

### Speech Recognition Software and Vidispine

Speech Recognition Software and Vidispine Tobias Nilsson April 2, 2013 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Frank Drewes Examiner: Fredrik Georgsson Umeå University Department

### Signal Processing for Speech Recognition

Signal Processing for Speech Recognition Once a signal has been sampled, we have huge amounts of data, often 20,000 16 bit numbers a second! We need to find ways to concisely capture the properties of

### AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

### Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press

Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press -! Generative statistical models try to learn how to generate data from latent variables (for example class labels).

### Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

### SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR

SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR Masatsune Tamura y, Takashi Masuko y, Keiichi Tokuda yy, and Takao Kobayashi y ytokyo Institute of Technology, Yokohama, 6-8 JAPAN yynagoya

### Language Modeling. Chapter 1. 1.1 Introduction

Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

### Hidden Markov Models

Hidden Markov Models Phil Blunsom pcbl@cs.mu.oz.au August 9, 00 Abstract The Hidden Markov Model (HMM) is a popular statistical tool for modelling a wide range of time series data. In the context of natural

### Machine Learning I Week 14: Sequence Learning Introduction

Machine Learning I Week 14: Sequence Learning Introduction Alex Graves Technische Universität München 29. January 2009 Literature Pattern Recognition and Machine Learning Chapter 13: Sequential Data Christopher

### Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4. by Dr Philip Jackson

Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4 by Dr Philip Jackson Discrete & continuous HMMs - Gaussian & other distributions Gaussian mixtures -

### Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

### HMM Speech Recognition. Words: Pronunciations and Language Models. Pronunciation dictionary. Out-of-vocabulary (OOV) rate.

HMM Speech Recognition ords: Pronunciations and Language Models Recorded Speech Decoded Text (Transcription) Steve Renals Acoustic Features Acoustic Model Automatic Speech Recognition ASR Lecture 9 January-March

### BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS. Keith Vertanen

BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS Keith Vertanen University of Cambridge, Cavendish Laboratory Madingley Road, Cambridge, CB3 0HE, UK kv7@cam.ac.uk

### Statistical Natural Language Processing

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language Processing Some subproblems are partially solved Spelling correction, grammar checking Information retrieval with

### Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model

Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model

### Speech Signal Processing: An Overview

Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

### Digital image processing

Digital image processing The two-dimensional discrete Fourier transform and applications: image filtering in the frequency domain Introduction Frequency domain filtering modifies the brightness values

### Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

### Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

### Phoneme-based Recognition of Finnish Words with Dynamic Dictionaries

Phoneme-based Recognition of Finnish Words with Dynamic Dictionaries F. Seydou *, T. Seppänen *, J. Peltola, P. Väyrynen * * University of Oulu, Information Processing Laboratory, Oulu, Finland VTT Electronics,

### Speech recognition technology for mobile phones

Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such

### Artificial Intelligence for Speech Recognition

Artificial Intelligence for Speech Recognition Kapil Kumar 1, Neeraj Dhoundiyal 2 and Ashish Kasyap 3 1,2,3 Department of Computer Science Engineering, IIMT College of Engineering, GreaterNoida, UttarPradesh,

### Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training

INTERSPEECH-2004 1 Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training G.L. Sarada, N. Hemalatha, T. Nagarajan, Hema A. Murthy Department of Computer Science and Engg.,

### Hidden Markov Models for biological systems

Hidden Markov Models for biological systems 1 1 1 1 0 2 2 2 2 N N KN N b o 1 o 2 o 3 o T SS 2005 Heermann - Universität Heidelberg Seite 1 We would like to identify stretches of sequences that are actually

### Speech Signal Processing introduction

Speech Signal Processing introduction Jan Černocký, Valentina Hubeika {cernocky,ihubeika}@fit.vutbr.cz DCGM FIT BUT Brno FIT BUT Brno Speech Signal Processing introduction. Valentina Hubeika, DCGM FIT

### Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

### An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

### Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

### Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

### Speech recognition for human computer interaction

Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

### L6: Short-time Fourier analysis and synthesis

L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude

### SPEECH RECOGNITION USING ARTIFICIAL NEURAL NETWORKS

SPEECH RECOGNITION USING ARTIFICIAL NEURAL NETWORKS Tiago P. Nascimento, Diego Stefano PDEEC, FEUP and INESC-Porto Rua Dr. Roberto Frias, Porto, 4200-465 Portugal Faculdade de Tecnologia e Ciências Avenida

### Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline

### An introduction to Hidden Markov Models

An introduction to Hidden Markov Models Christian Kohlschein Abstract Hidden Markov Models (HMM) are commonly defined as stochastic finite state machines. Formally a HMM can be described as a 5-tuple Ω

### Information Retrieval Statistics of Text

Information Retrieval Statistics of Text James Allan University of Massachusetts, Amherst (NTU ST770-A) Fall 2002 All slides copyright Bruce Croft and/or James Allan Outline Zipf distribution Vocabulary

About Speaker Recognition Techology Gerik Alexander von Graevenitz Bergdata Biometrics GmbH, Bonn, Germany gerik @ graevenitz.de Overview about Biometrics Biometrics refers to the automatic identification

### TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS

TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS Tobias Bocklet and Andreas Maier and Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Martenstr.3, 91058 Erlangen,

### Chapter 3 Discrete-Time Fourier Series. by the French mathematician Jean Baptiste Joseph Fourier in the early 1800 s. The

Chapter 3 Discrete-Time Fourier Series 3.1 Introduction The Fourier series and Fourier transforms are mathematical correlations between the time and frequency domains. They are the result of the heat-transfer

### Phonetic Speaker Recognition with Support Vector Machines

Phonetic Speaker Recognition with Support Vector Machines W. M. Campbell, J. P. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek MIT Lincoln Laboratory Lexington, MA 02420 wcampbell,jpc,dar,daj,tleek@ll.mit.edu

### GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL

GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL Md. Sadek Ali 1, Md. Shariful Islam 1 and Md. Alamgir Hossain 1 1 Dept. of Information & Communication Engineering Islamic University, Kushtia 7003, Bangladesh.

### Overview. Speech Signal Analysis. Speech signal analysis for ASR. Speech production model. Vocal Organs & Vocal Tract. Speech Signal Analysis for ASR

Overview Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 7/2 January 23 Speech Signal Analysis for ASR Reading: Features for ASR Spectral analysis

### Recent Progress in Robust Vocabulary-Independent Speech Recognition Hsiao-Wuen Hon and Kai.Fu Lee

Recent Progress in Robust Vocabulary-Independent Speech Recognition Hsiao-Wuen Hon and Kai.Fu Lee School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Abstract This paper

### Reinforcement Learning

Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu

### (LDA). I. INTRODUCTION

A Study on Speech Recognition Mridula Shanbhogue 1, Shreya Kulkarni 2, R Suprith 3, Tejas K I 4, Nagarathna N 5 Dept. Of Computer Science & Engineering, B.M.S. College of Engineering, Bangalore, India

### Recurrent (Neural) Networks: Part 1. Tomas Mikolov, Facebook AI Research Deep Learning NYU, 2015

Recurrent (Neural) Networks: Part 1 Tomas Mikolov, Facebook AI Research Deep Learning course @ NYU, 2015 Structure of the talk Introduction Architecture of simple recurrent nets Training: Backpropagation

### Speech Transcription

TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion

### Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

### L9: Cepstral analysis

L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### Signal Modeling for High-Performance Robust Isolated Word Recognition

1 Signal Modeling for High-Performance Robust Isolated Word Recognition Montri Karnjanadecha and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA

### Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot

Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot Moisa Claudia**, Silaghi Helga Maria**, Rohde L. Ulrich * ***, Silaghi Paul****, Silaghi Andrei****

### at Spoken Term Detection Task in NTCIR-10 SpokenDoc-2

YLAB@RU at Spoken Term Detection Task in NTCIR-10 SpokenDoc-2 ABSTRACT Iori Sakamoto is019083@ed Masanori Morise morise@media The development of spoken term detection (STD) techniques, which detect a given

### School Class Monitoring System Based on Audio Signal Processing

C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

### Logic, Probability and Learning

Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

### Phrases. Topics for Today. Phrases. POS Tagging. ! Text transformation. ! Text processing issues

Topics for Today! Text transformation Word occurrence statistics Tokenizing Stopping and stemming Phrases Document structure Link analysis Information extraction Internationalization Phrases! Many queries

### BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

### 1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland

ACOUSTIC MODELS OF CV SYLLABLES FOR SPEECH RECOGNITION S. Fernández 1, S. Feijóo 2, and N. Barros 2 1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland e-mail:

### Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

### CONATION: English Command Input/Output System for Computers

CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India

### Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models

Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models Denis Tananaev, Galina Shagrova, Victor Kozhevnikov North-Caucasus Federal University,

### Hidden Markov Models 1

Hidden Markov Models 1 Hidden Markov Models A Tutorial for the Course Computational Intelligence http://www.igi.tugraz.at/lehre/ci Barbara Resch (modified by Erhard and Car line Rank) Signal Processing

### VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications

VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,

### Hidden Markov Models. Hidden Markov Models

Markov Chains Simplest possible information graph structure: just a linear chain X Y Z. Obeys Markov property: joint probability factors into conditional probabilities that only depend on the previous

### Automatic New Word Acquisition: Spelling from Acoustics

Automatic New Word Acquisition: Spelling from Acoustics Fil Alleva and Kai-Fu Lee School of Computer Science Carnegie Mellon University Pittsburgh, PA Abstract The problem of extending the lexicon of words

### Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models Terminology and Basic Algorithms Motivation We make predictions based on models of observed data (machine learning). A simple model is that observations are assumed to be independent

### Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

### princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks

princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks Lecturer: Sanjeev Arora Scribe:Elena Nabieva 1 Basics A Markov chain is a discrete-time stochastic process

### Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic

### Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance

Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance Iosif Mporas, Todor Ganchev, Elias Kotinas, and Nikos Fakotakis Department of Electrical

### Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX

### A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION

A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION ZAKI B. NOSSAIR AND STEPHEN A. ZAHORIAN Department of Electrical and Computer Engineering Old Dominion University, Norfolk,

### Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing)

Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing) Christos P. Yiakoumettis Department of Informatics University of Sussex, UK (Email: c.yiakoumettis@sussex.ac.uk)

### A VOICE INTERFACE FOR THE VISUALLY IMPAIRED

SETIT 2005 3 rd International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 27-31, 2005 TUNISIA A VOICE INTERFACE FOR THE VISUALLY IMPAIRED Anand Arokia Raj,

### Reinforcement Learning

Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de

### PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED

PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED A. Matthew Zimmer, Bingjun Dai, Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk,

### Context Free Grammars

Context Free Grammars So far we have looked at models of language that capture only local phenomena, namely what we can analyze when looking at only a small window of words in a sentence. To move towards

### Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition

1 Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition Émilie CAILLAULT LASL, ULCO/University of Littoral (Calais) Christian VIARD-GAUDIN IRCCyN, University of

### Developing an Isolated Word Recognition System in MATLAB

MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

### Markov Chains, part I

Markov Chains, part I December 8, 2010 1 Introduction A Markov Chain is a sequence of random variables X 0, X 1,, where each X i S, such that P(X i+1 = s i+1 X i = s i, X i 1 = s i 1,, X 0 = s 0 ) = P(X

### Lecture 3: Quantization Effects

Lecture 3: Quantization Effects Reading: 6.7-6.8. We have so far discussed the design of discrete-time filters, not digital filters. To understand the characteristics of digital filters, we need first

### A Supervised Approach To Musical Chord Recognition

Pranav Rajpurkar Brad Girardeau Takatoki Migimatsu Stanford University, Stanford, CA 94305 USA pranavsr@stanford.edu bgirarde@stanford.edu takatoki@stanford.edu Abstract In this paper, we present a prototype

### A Study on Speaker-Adaptive Speech Recognition

A Study on Speaker-Adaptive Speech Recognition X.D. Huang School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 ABSTRACT Speaker-independent system is desirable in many applications

### Autoregressive HMMs for speech synthesis. Interspeech 2009

William Byrne Interspeech 2009 Outline Introduction Highlights Background 1 Introduction Highlights Background 2 3 Experimental set-up Results 4 Highlights Background Highlights for speech synthesis: synthesis

### Why 64-bit processing?

Why 64-bit processing? The ARTA64 is an experimental version of ARTA that uses a 64-bit floating point data format for Fast Fourier Transform processing (FFT). Normal version of ARTA uses 32-bit floating

### COMPARISON OF ENERGY-BASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING

DEPARTMENT OF COMPARISON OF ENERGY-BASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING by A. Ganapathiraju, L. Webster, J. Trimble, K. Bush, P. Kornman Department of Electrical and Computer Engineering