Speech Recognition Basics

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Speech Recognition Basics"

Transcription

1 Sp eec h A nalysis and Interp retatio n L ab Speech Recognition Basics Signal Processing Signal Processing Pattern Matching One Template Dictionary Signal Acquisition Speech is captured by a microphone. The analog signal is converted to a digital signal by sampling it 6000 times a second. Each value is quantized to 6 bits, a number between -768 and 767. A/D,,67,-6,...

2 " # " *. / *. / / / %. " " ( + / + / " + " / %=( C. %(! ". %(. %A( " %E (. " %" *(. %?( &. %>(. %B( *. %F( *. %=( ". %==(. %=( E. %=( $ *. %=A( ". / " # C " " # " * % ( * Spectral Analysis Spectral Analysis Take a window of 0-0 ms (0 to 80 samples). Q FFT (Fast Fourier Transform) used to compute energy in several frequency bands, typically 0 to 0. Q Cepstrum computed, or so coefficients. A frame, a snapshot of the input s spectrum, is represented by a vector of cepstrum coefficients. Q To describe the spectral change with time, one frame is computed every 0 ms or so, i.e. 00 frames per second. Q A word that lasts second is described by 00 numbers. Q Q

3 Probability

4 Markov Models 0 Weather predictor example of a Markov model Weather predictor calculation State : State : State : rain c lo u d su n State-transitio n p ro b ab ilities, A = { } a ij = () Given today is sunny (i.e., x = ), w h at is th e p rob ab ility of sun-sun-rain-c loud-c loud-sun w ith m odel M? P (X M) = P (X = {,,,,, } M) = P (x = ) P (x = x = ) P (x = x = ) P (x = x = ) P (x = x = ) P (x 6 = x = ) = π a a a a a = (0.8 )(0.)(0.)(0.6)(0.) = w h ere th e initial state p rob ab ility for state i is π i = P (x = i). () 6 State duration probability Summary of Markov models As a consequence of the first-order Markov model, the p rob ab ility of occup y ing a state for a g iven duration, τ, is ex p onential: probability p(x M, x = i) = (a ii ) τ ( a ii ). ( ) a = duration Transition probabilities: A = { } a ij = and π = {π i } = [ ]. P robability of a g iv en state seq u enc e X: P (X M ) = π x a x x a x x a x x... a xt x t. () = π x T t= 7 8

5 Hidden Markov Models Hidden Markov Models Urns and balls example (Ferguson) b b b P rob ab ility of state i p rodu c ing an ob servation o t is: b i (o t ) = P (o t x t = i), ( 6 ) w h ic h c an b e discrete or co n tin u o u s in o. 9 Elements of a discrete HMM, λ Hidden Markov model example. u m b e r o f sta te s, x {,..., };. u m b e r o f e v e n ts K, k {,..., K};. In itia l-sta te p ro b a b ilitie s, π = {π i } = {P (x = i)} fo r i ; a a a a a a a π a b (o ) b (o ) b (o ) b (o ) b (o ) b (o 6). S ta te -tra n sitio n p ro b a b ilitie s, A = {a ij } = {P (x t = j x t = i)} fo r i, j ;. D isc re te o u tp u t p ro b a b ilitie s, B = {b i (k)} = {P (o t = k x t = i)} fo r i a n d k K. o o o o o o 6 with state sequence X = {,,,,, }, P (O X, λ) = b (o ) b (o ) b (o ) b (o ) b (o ) b (o 6 ) P (X λ) = π a a a a a (7 ) P (O, X λ) = π b (o b (o b (o )... (8 ) 0 Isolated digit recognition 0 templates: one template M i per digit. Compare input x with all templates. Select the most similar template j: j min{ f ( x, M )} where f is the comparison function. i Hidden Markov Models (HMM) HMM defined by: umber of states Transition probabilities a ij Output probabilities b i (x): ) i K ik k 0 ( x,, ) c ( x,, ) ik p / ik exp{ a p m0 a a a a ( xm m) m a a a } a a a a

6 Probability of a path P 7 a States a 8 Input Frames a a a a a a a a a a a a 6 7 ) 6 ) 8 ) Probability of a model Is the probability of the best path. Problem: We need to evaluate all possible paths, and there grow exponential with number of states and input frames. Solution: The Viterbi algorithm has complexity that grows linearly with number of states and input frames. Large vocabulary Problem: Huge number of models: 60,000 models Models are hard to train Cannot easily add new words. Solution: Use phoneme models Context Dependent Models To improve accuracy, phone models are context-dependent. Example: THIS = DH IH S DH(SIL,IH) IH(DH,S) S(IH,SIL) 0 =,000 possible models are clustered to about 8,000 generalized triphones Continuous speech Example: THIS IS A TEST SIL DH(SIL,IH) IH(DH,S) S(IH,SIL) SIL IH(SIL,Z) Z(IH,SIL) SIL A(SIL,SIL) SIL T(SIL,EH) EH(T,S) S(EH,T) T(S,SIL) SIL SIL DH(SIL,IH) IH(DH,S) S(IH,IH) IH(S,Z) Z(IH,A) A(Z,T) T(A,EH) EH(T,S) S(EH,T) T(S,SIL) SIL SIL DH IH S SIL IH Z SIL A SIL T EH S T SIL Baum-Welch Algorithm t ( i) P( x, x Lxt, st i ) t ( j) t ( i) aij b j ( xt ) i P( X ) ( i) i Exact solution. T s s s a j a j a j s j t t+

7 Speeding Search: Pruning Training Viterbi algorithm: replace sum with max. Speed-Accuracy Tradeoff: Computations can be reduced by eliminating paths that are not promising, at the expense of having a chance of eliminating the best path. Typically, computations can be reduced by more than a factor of 0, without affecting the error rate significantly. 8 HMM Training A lot of speech is needed to train the models. Done with an iterative algorithm: Take initial model (i.e. uniform probabilities). Segment Database (Run Viterbi algorithm). Update models. xij ( xij ˆ j ) i0 i0 ˆ j ˆ j If Converged stop, otherwise go to. HMM Training Rule of thumb: Each state has to appear in the training database at least 0 times. Speaker-Independent systems have lots of data (models well trained), but contain high variability. Speaker-Dependent systems do not have as much data (models not so well trained) but they are more consistent. Language Modeling Bayes Rule in ASR P( A W ) P( W ) P( W A) P( A) Wˆ max P( W A) max P( A W ) P( W ) where A is the Acoustics and W the sequence of words. P(W) is the language model.

8 Statistical Language Model P( W ) P( WW W LW P( W W W ) LP( W W ) P( W ) P( W P W ) P( W ) P( W ) LP( W ( LW W ) ) Unigram W ) P ( W ) P ( W W ) P ( W W ) LP ( W W ) Bigram P( W ) P( W W ) P( W WW ) LP( W W W ) Trigram Trigrams P(THIS IS A TEST) = P(THIS) P(IS THIS) P(A THIS IS) P(TEST IS A) These statistical language models predict the probability of the current word given the past history. While simplistic, they contain a lot of information P(IS THIS) >> P(IS HAVE) -Gram Performance Perplexity Error Rate Multiplier x.0 x.0 x. x.0 Trigrams Bigrams Unigrams Uniform Perplexity measures the average branching of a text when presented to a language LP / n model. PP Pˆ( W, W, LWn ) An empirical observation E k P where E is the error rate and P the perplexity. A language model with low perplexity is good for ASR. Tradeoff between perplexity and coverage. Context Free Grammar Applications Advantages: Low error rate (because of low perplexity). Compact. Easy for application developers. Disadvantages: Poor coverage (out of language sentences). 8

9 ASR problem Space ASR Accuracy in the Lab Error Rate Multiplier x0.0 x.0 x.0 More Constraint Unconstrained Task oisy Environment Large Vocabulary Speaker Variance Less Constraint Command &Control (00 words + CFG) % word error rate (speaker-independent) Discrete Dictation (60K words) % word error rate (speaker-dependent) Continuous Dictation (60K words) 7% word error rate (speaker-dependent) Telephone Digits 0.% word error rate (speaker-independent) C & C Types of Errors Pronunciation errors make some words unrecognizable. User says something outside the vocabulary/grammar. User speaks while machine is talking. Background noise fires recognizer (false alarm). Rejection needs improvement. Spontaneous speech (uhms, ahms) ASR in Real Applications ew word addition Rejection Barge-in. Robustness to noise. ASR is just a part of an application, User Interface is critical. Interface LP/ASR SAIL Lab: Loose coupling: LP selects correct input from a list of top candidate sentences: Easy to implement, not optimal Tight coupling. LP provides the language probabilities for the search: Optimal but hard to implement. Can a LP system reduce perplexity?

10

11 Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab ote: Lectures put together from ow n m a teria l a s w ell a s m a teria l from tutoria ls of A lex A cero & P hilips J a ck son 66

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Hidden Markov Models. and. Sequential Data

Hidden Markov Models. and. Sequential Data Hidden Markov Models and Sequential Data Sequential Data Often arise through measurement of time series Snowfall measurements on successive days in Buffalo Rainfall measurements in Chirrapunji Daily values

More information

Lecture 10: Sequential Data Models

Lecture 10: Sequential Data Models CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Time-series: stock

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP

More information

Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System

Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System Francis Kubala Ming-Whel Feng, John Makhoul, Richard Schwartz BBN Systems and Technologies Corporation 10 Moulton St.,

More information

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The

More information

Hidden Markov Models Fundamentals

Hidden Markov Models Fundamentals Hidden Markov Models Fundamentals Daniel Ramage CS229 Section Notes December, 2007 Abstract How can we apply machine learning to data that is represented as a sequence of observations over time? For instance,

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

ISSN: [N. Dhanvijay* et al., 5 (12): December, 2016] Impact Factor: 4.116

ISSN: [N. Dhanvijay* et al., 5 (12): December, 2016] Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY HINDI SPEECH RECOGNITION SYSTEM USING MFCC AND HTK TOOLKIT Nikita Dhanvijay *, Prof. P. R. Badadapure Department of Electronics

More information

An Approach to Extract Feature using MFCC

An Approach to Extract Feature using MFCC IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 08 (August. 2014), V1 PP 21-25 www.iosrjen.org An Approach to Extract Feature using MFCC Parwinder Pal Singh,

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Speech Recognition Software and Vidispine

Speech Recognition Software and Vidispine Speech Recognition Software and Vidispine Tobias Nilsson April 2, 2013 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Frank Drewes Examiner: Fredrik Georgsson Umeå University Department

More information

Signal Processing for Speech Recognition

Signal Processing for Speech Recognition Signal Processing for Speech Recognition Once a signal has been sampled, we have huge amounts of data, often 20,000 16 bit numbers a second! We need to find ways to concisely capture the properties of

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press

Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press -! Generative statistical models try to learn how to generate data from latent variables (for example class labels).

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR

SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS SYSTEM USING MLLR Masatsune Tamura y, Takashi Masuko y, Keiichi Tokuda yy, and Takao Kobayashi y ytokyo Institute of Technology, Yokohama, 6-8 JAPAN yynagoya

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Phil Blunsom pcbl@cs.mu.oz.au August 9, 00 Abstract The Hidden Markov Model (HMM) is a popular statistical tool for modelling a wide range of time series data. In the context of natural

More information

Machine Learning I Week 14: Sequence Learning Introduction

Machine Learning I Week 14: Sequence Learning Introduction Machine Learning I Week 14: Sequence Learning Introduction Alex Graves Technische Universität München 29. January 2009 Literature Pattern Recognition and Machine Learning Chapter 13: Sequential Data Christopher

More information

Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4. by Dr Philip Jackson

Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4. by Dr Philip Jackson Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4 by Dr Philip Jackson Discrete & continuous HMMs - Gaussian & other distributions Gaussian mixtures -

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

HMM Speech Recognition. Words: Pronunciations and Language Models. Pronunciation dictionary. Out-of-vocabulary (OOV) rate.

HMM Speech Recognition. Words: Pronunciations and Language Models. Pronunciation dictionary. Out-of-vocabulary (OOV) rate. HMM Speech Recognition ords: Pronunciations and Language Models Recorded Speech Decoded Text (Transcription) Steve Renals Acoustic Features Acoustic Model Automatic Speech Recognition ASR Lecture 9 January-March

More information

BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS. Keith Vertanen

BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS. Keith Vertanen BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS Keith Vertanen University of Cambridge, Cavendish Laboratory Madingley Road, Cambridge, CB3 0HE, UK kv7@cam.ac.uk

More information

Statistical Natural Language Processing

Statistical Natural Language Processing Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language Processing Some subproblems are partially solved Spelling correction, grammar checking Information retrieval with

More information

Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model

Hidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Digital image processing

Digital image processing Digital image processing The two-dimensional discrete Fourier transform and applications: image filtering in the frequency domain Introduction Frequency domain filtering modifies the brightness values

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

More information

Phoneme-based Recognition of Finnish Words with Dynamic Dictionaries

Phoneme-based Recognition of Finnish Words with Dynamic Dictionaries Phoneme-based Recognition of Finnish Words with Dynamic Dictionaries F. Seydou *, T. Seppänen *, J. Peltola, P. Väyrynen * * University of Oulu, Information Processing Laboratory, Oulu, Finland VTT Electronics,

More information

Speech recognition technology for mobile phones

Speech recognition technology for mobile phones Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such

More information

Artificial Intelligence for Speech Recognition

Artificial Intelligence for Speech Recognition Artificial Intelligence for Speech Recognition Kapil Kumar 1, Neeraj Dhoundiyal 2 and Ashish Kasyap 3 1,2,3 Department of Computer Science Engineering, IIMT College of Engineering, GreaterNoida, UttarPradesh,

More information

Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training

Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training INTERSPEECH-2004 1 Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training G.L. Sarada, N. Hemalatha, T. Nagarajan, Hema A. Murthy Department of Computer Science and Engg.,

More information

Hidden Markov Models for biological systems

Hidden Markov Models for biological systems Hidden Markov Models for biological systems 1 1 1 1 0 2 2 2 2 N N KN N b o 1 o 2 o 3 o T SS 2005 Heermann - Universität Heidelberg Seite 1 We would like to identify stretches of sequences that are actually

More information

Speech Signal Processing introduction

Speech Signal Processing introduction Speech Signal Processing introduction Jan Černocký, Valentina Hubeika {cernocky,ihubeika}@fit.vutbr.cz DCGM FIT BUT Brno FIT BUT Brno Speech Signal Processing introduction. Valentina Hubeika, DCGM FIT

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms  Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

More information

Speech recognition for human computer interaction

Speech recognition for human computer interaction Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

More information

L6: Short-time Fourier analysis and synthesis

L6: Short-time Fourier analysis and synthesis L6: Short-time Fourier analysis and synthesis Overview Analysis: Fourier-transform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlap-add (OLA) method STFT magnitude

More information

SPEECH RECOGNITION USING ARTIFICIAL NEURAL NETWORKS

SPEECH RECOGNITION USING ARTIFICIAL NEURAL NETWORKS SPEECH RECOGNITION USING ARTIFICIAL NEURAL NETWORKS Tiago P. Nascimento, Diego Stefano PDEEC, FEUP and INESC-Porto Rua Dr. Roberto Frias, Porto, 4200-465 Portugal Faculdade de Tecnologia e Ciências Avenida

More information

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke 1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline

More information

An introduction to Hidden Markov Models

An introduction to Hidden Markov Models An introduction to Hidden Markov Models Christian Kohlschein Abstract Hidden Markov Models (HMM) are commonly defined as stochastic finite state machines. Formally a HMM can be described as a 5-tuple Ω

More information

Information Retrieval Statistics of Text

Information Retrieval Statistics of Text Information Retrieval Statistics of Text James Allan University of Massachusetts, Amherst (NTU ST770-A) Fall 2002 All slides copyright Bruce Croft and/or James Allan Outline Zipf distribution Vocabulary

More information

About Speaker Recognition Techology

About Speaker Recognition Techology About Speaker Recognition Techology Gerik Alexander von Graevenitz Bergdata Biometrics GmbH, Bonn, Germany gerik @ graevenitz.de Overview about Biometrics Biometrics refers to the automatic identification

More information

TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS

TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS Tobias Bocklet and Andreas Maier and Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Martenstr.3, 91058 Erlangen,

More information

Chapter 3 Discrete-Time Fourier Series. by the French mathematician Jean Baptiste Joseph Fourier in the early 1800 s. The

Chapter 3 Discrete-Time Fourier Series. by the French mathematician Jean Baptiste Joseph Fourier in the early 1800 s. The Chapter 3 Discrete-Time Fourier Series 3.1 Introduction The Fourier series and Fourier transforms are mathematical correlations between the time and frequency domains. They are the result of the heat-transfer

More information

Phonetic Speaker Recognition with Support Vector Machines

Phonetic Speaker Recognition with Support Vector Machines Phonetic Speaker Recognition with Support Vector Machines W. M. Campbell, J. P. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek MIT Lincoln Laboratory Lexington, MA 02420 wcampbell,jpc,dar,daj,tleek@ll.mit.edu

More information

GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL

GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL Md. Sadek Ali 1, Md. Shariful Islam 1 and Md. Alamgir Hossain 1 1 Dept. of Information & Communication Engineering Islamic University, Kushtia 7003, Bangladesh.

More information

Overview. Speech Signal Analysis. Speech signal analysis for ASR. Speech production model. Vocal Organs & Vocal Tract. Speech Signal Analysis for ASR

Overview. Speech Signal Analysis. Speech signal analysis for ASR. Speech production model. Vocal Organs & Vocal Tract. Speech Signal Analysis for ASR Overview Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 7/2 January 23 Speech Signal Analysis for ASR Reading: Features for ASR Spectral analysis

More information

Recent Progress in Robust Vocabulary-Independent Speech Recognition Hsiao-Wuen Hon and Kai.Fu Lee

Recent Progress in Robust Vocabulary-Independent Speech Recognition Hsiao-Wuen Hon and Kai.Fu Lee Recent Progress in Robust Vocabulary-Independent Speech Recognition Hsiao-Wuen Hon and Kai.Fu Lee School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Abstract This paper

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg martin.lauer@kit.edu

More information

(LDA). I. INTRODUCTION

(LDA). I. INTRODUCTION A Study on Speech Recognition Mridula Shanbhogue 1, Shreya Kulkarni 2, R Suprith 3, Tejas K I 4, Nagarathna N 5 Dept. Of Computer Science & Engineering, B.M.S. College of Engineering, Bangalore, India

More information

Recurrent (Neural) Networks: Part 1. Tomas Mikolov, Facebook AI Research Deep Learning NYU, 2015

Recurrent (Neural) Networks: Part 1. Tomas Mikolov, Facebook AI Research Deep Learning NYU, 2015 Recurrent (Neural) Networks: Part 1 Tomas Mikolov, Facebook AI Research Deep Learning course @ NYU, 2015 Structure of the talk Introduction Architecture of simple recurrent nets Training: Backpropagation

More information

Speech Transcription

Speech Transcription TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Signal Modeling for High-Performance Robust Isolated Word Recognition

Signal Modeling for High-Performance Robust Isolated Word Recognition 1 Signal Modeling for High-Performance Robust Isolated Word Recognition Montri Karnjanadecha and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA

More information

Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot

Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot Moisa Claudia**, Silaghi Helga Maria**, Rohde L. Ulrich * ***, Silaghi Paul****, Silaghi Andrei****

More information

at Spoken Term Detection Task in NTCIR-10 SpokenDoc-2

at Spoken Term Detection Task in NTCIR-10 SpokenDoc-2 YLAB@RU at Spoken Term Detection Task in NTCIR-10 SpokenDoc-2 ABSTRACT Iori Sakamoto is019083@ed Masanori Morise morise@media The development of spoken term detection (STD) techniques, which detect a given

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Logic, Probability and Learning

Logic, Probability and Learning Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern

More information

Phrases. Topics for Today. Phrases. POS Tagging. ! Text transformation. ! Text processing issues

Phrases. Topics for Today. Phrases. POS Tagging. ! Text transformation. ! Text processing issues Topics for Today! Text transformation Word occurrence statistics Tokenizing Stopping and stemming Phrases Document structure Link analysis Information extraction Internationalization Phrases! Many queries

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland

1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland ACOUSTIC MODELS OF CV SYLLABLES FOR SPEECH RECOGNITION S. Fernández 1, S. Feijóo 2, and N. Barros 2 1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland e-mail:

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

CONATION: English Command Input/Output System for Computers

CONATION: English Command Input/Output System for Computers CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India

More information

Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models

Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models Denis Tananaev, Galina Shagrova, Victor Kozhevnikov North-Caucasus Federal University,

More information

Hidden Markov Models 1

Hidden Markov Models 1 Hidden Markov Models 1 Hidden Markov Models A Tutorial for the Course Computational Intelligence http://www.igi.tugraz.at/lehre/ci Barbara Resch (modified by Erhard and Car line Rank) Signal Processing

More information

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,

More information

Hidden Markov Models. Hidden Markov Models

Hidden Markov Models. Hidden Markov Models Markov Chains Simplest possible information graph structure: just a linear chain X Y Z. Obeys Markov property: joint probability factors into conditional probabilities that only depend on the previous

More information

Automatic New Word Acquisition: Spelling from Acoustics

Automatic New Word Acquisition: Spelling from Acoustics Automatic New Word Acquisition: Spelling from Acoustics Fil Alleva and Kai-Fu Lee School of Computer Science Carnegie Mellon University Pittsburgh, PA Abstract The problem of extending the lexicon of words

More information

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models. Terminology and Basic Algorithms Hidden Markov Models Terminology and Basic Algorithms Motivation We make predictions based on models of observed data (machine learning). A simple model is that observations are assumed to be independent

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks

princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks Lecturer: Sanjeev Arora Scribe:Elena Nabieva 1 Basics A Markov chain is a discrete-time stochastic process

More information

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic

More information

Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance

Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance Iosif Mporas, Todor Ganchev, Elias Kotinas, and Nikos Fakotakis Department of Electrical

More information

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX

More information

A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION

A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXT-INDEPENDENT SPEAKER IDENTIFICATION ZAKI B. NOSSAIR AND STEPHEN A. ZAHORIAN Department of Electrical and Computer Engineering Old Dominion University, Norfolk,

More information

Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing)

Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing) Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing) Christos P. Yiakoumettis Department of Informatics University of Sussex, UK (Email: c.yiakoumettis@sussex.ac.uk)

More information

A VOICE INTERFACE FOR THE VISUALLY IMPAIRED

A VOICE INTERFACE FOR THE VISUALLY IMPAIRED SETIT 2005 3 rd International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 27-31, 2005 TUNISIA A VOICE INTERFACE FOR THE VISUALLY IMPAIRED Anand Arokia Raj,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de

More information

PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED

PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED A. Matthew Zimmer, Bingjun Dai, Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk,

More information

Context Free Grammars

Context Free Grammars Context Free Grammars So far we have looked at models of language that capture only local phenomena, namely what we can analyze when looking at only a small window of words in a sentence. To move towards

More information

Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition

Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition 1 Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition Émilie CAILLAULT LASL, ULCO/University of Littoral (Calais) Christian VIARD-GAUDIN IRCCyN, University of

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Markov Chains, part I

Markov Chains, part I Markov Chains, part I December 8, 2010 1 Introduction A Markov Chain is a sequence of random variables X 0, X 1,, where each X i S, such that P(X i+1 = s i+1 X i = s i, X i 1 = s i 1,, X 0 = s 0 ) = P(X

More information

Lecture 3: Quantization Effects

Lecture 3: Quantization Effects Lecture 3: Quantization Effects Reading: 6.7-6.8. We have so far discussed the design of discrete-time filters, not digital filters. To understand the characteristics of digital filters, we need first

More information

A Supervised Approach To Musical Chord Recognition

A Supervised Approach To Musical Chord Recognition Pranav Rajpurkar Brad Girardeau Takatoki Migimatsu Stanford University, Stanford, CA 94305 USA pranavsr@stanford.edu bgirarde@stanford.edu takatoki@stanford.edu Abstract In this paper, we present a prototype

More information

A Study on Speaker-Adaptive Speech Recognition

A Study on Speaker-Adaptive Speech Recognition A Study on Speaker-Adaptive Speech Recognition X.D. Huang School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 ABSTRACT Speaker-independent system is desirable in many applications

More information

Autoregressive HMMs for speech synthesis. Interspeech 2009

Autoregressive HMMs for speech synthesis. Interspeech 2009 William Byrne Interspeech 2009 Outline Introduction Highlights Background 1 Introduction Highlights Background 2 3 Experimental set-up Results 4 Highlights Background Highlights for speech synthesis: synthesis

More information

Why 64-bit processing?

Why 64-bit processing? Why 64-bit processing? The ARTA64 is an experimental version of ARTA that uses a 64-bit floating point data format for Fast Fourier Transform processing (FFT). Normal version of ARTA uses 32-bit floating

More information

COMPARISON OF ENERGY-BASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING

COMPARISON OF ENERGY-BASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING DEPARTMENT OF COMPARISON OF ENERGY-BASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING by A. Ganapathiraju, L. Webster, J. Trimble, K. Bush, P. Kornman Department of Electrical and Computer Engineering

More information

Application of Vocoders to Wireless Communications

Application of Vocoders to Wireless Communications 1 of 5 6/21/2006 6:37 AM DSP Chips Vocoder Boards Voice Codec Units Application Notes Papers Application of Vocoders to Wireless Communications The need for increased utilization of available wireless

More information

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1 WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's

More information