Speech Recognition Basics


 Alexander Walsh
 2 years ago
 Views:
Transcription
1 Sp eec h A nalysis and Interp retatio n L ab Speech Recognition Basics Signal Processing Signal Processing Pattern Matching One Template Dictionary Signal Acquisition Speech is captured by a microphone. The analog signal is converted to a digital signal by sampling it 6000 times a second. Each value is quantized to 6 bits, a number between 768 and 767. A/D,,67,6,...
2 " # " *. / *. / / / %. " " ( + / + / " + " / %=( C. %(! ". %(. %A( " %E (. " %" *(. %?( &. %>(. %B( *. %F( *. %=( ". %==(. %=( E. %=( $ *. %=A( ". / " # C " " # " * % ( * Spectral Analysis Spectral Analysis Take a window of 00 ms (0 to 80 samples). Q FFT (Fast Fourier Transform) used to compute energy in several frequency bands, typically 0 to 0. Q Cepstrum computed, or so coefficients. A frame, a snapshot of the input s spectrum, is represented by a vector of cepstrum coefficients. Q To describe the spectral change with time, one frame is computed every 0 ms or so, i.e. 00 frames per second. Q A word that lasts second is described by 00 numbers. Q Q
3 Probability
4 Markov Models 0 Weather predictor example of a Markov model Weather predictor calculation State : State : State : rain c lo u d su n Statetransitio n p ro b ab ilities, A = { } a ij = () Given today is sunny (i.e., x = ), w h at is th e p rob ab ility of sunsunrainc loudc loudsun w ith m odel M? P (X M) = P (X = {,,,,, } M) = P (x = ) P (x = x = ) P (x = x = ) P (x = x = ) P (x = x = ) P (x 6 = x = ) = π a a a a a = (0.8 )(0.)(0.)(0.6)(0.) = w h ere th e initial state p rob ab ility for state i is π i = P (x = i). () 6 State duration probability Summary of Markov models As a consequence of the firstorder Markov model, the p rob ab ility of occup y ing a state for a g iven duration, τ, is ex p onential: probability p(x M, x = i) = (a ii ) τ ( a ii ). ( ) a = duration Transition probabilities: A = { } a ij = and π = {π i } = [ ]. P robability of a g iv en state seq u enc e X: P (X M ) = π x a x x a x x a x x... a xt x t. () = π x T t= 7 8
5 Hidden Markov Models Hidden Markov Models Urns and balls example (Ferguson) b b b P rob ab ility of state i p rodu c ing an ob servation o t is: b i (o t ) = P (o t x t = i), ( 6 ) w h ic h c an b e discrete or co n tin u o u s in o. 9 Elements of a discrete HMM, λ Hidden Markov model example. u m b e r o f sta te s, x {,..., };. u m b e r o f e v e n ts K, k {,..., K};. In itia lsta te p ro b a b ilitie s, π = {π i } = {P (x = i)} fo r i ; a a a a a a a π a b (o ) b (o ) b (o ) b (o ) b (o ) b (o 6). S ta te tra n sitio n p ro b a b ilitie s, A = {a ij } = {P (x t = j x t = i)} fo r i, j ;. D isc re te o u tp u t p ro b a b ilitie s, B = {b i (k)} = {P (o t = k x t = i)} fo r i a n d k K. o o o o o o 6 with state sequence X = {,,,,, }, P (O X, λ) = b (o ) b (o ) b (o ) b (o ) b (o ) b (o 6 ) P (X λ) = π a a a a a (7 ) P (O, X λ) = π b (o b (o b (o )... (8 ) 0 Isolated digit recognition 0 templates: one template M i per digit. Compare input x with all templates. Select the most similar template j: j min{ f ( x, M )} where f is the comparison function. i Hidden Markov Models (HMM) HMM defined by: umber of states Transition probabilities a ij Output probabilities b i (x): ) i K ik k 0 ( x,, ) c ( x,, ) ik p / ik exp{ a p m0 a a a a ( xm m) m a a a } a a a a
6 Probability of a path P 7 a States a 8 Input Frames a a a a a a a a a a a a 6 7 ) 6 ) 8 ) Probability of a model Is the probability of the best path. Problem: We need to evaluate all possible paths, and there grow exponential with number of states and input frames. Solution: The Viterbi algorithm has complexity that grows linearly with number of states and input frames. Large vocabulary Problem: Huge number of models: 60,000 models Models are hard to train Cannot easily add new words. Solution: Use phoneme models Context Dependent Models To improve accuracy, phone models are contextdependent. Example: THIS = DH IH S DH(SIL,IH) IH(DH,S) S(IH,SIL) 0 =,000 possible models are clustered to about 8,000 generalized triphones Continuous speech Example: THIS IS A TEST SIL DH(SIL,IH) IH(DH,S) S(IH,SIL) SIL IH(SIL,Z) Z(IH,SIL) SIL A(SIL,SIL) SIL T(SIL,EH) EH(T,S) S(EH,T) T(S,SIL) SIL SIL DH(SIL,IH) IH(DH,S) S(IH,IH) IH(S,Z) Z(IH,A) A(Z,T) T(A,EH) EH(T,S) S(EH,T) T(S,SIL) SIL SIL DH IH S SIL IH Z SIL A SIL T EH S T SIL BaumWelch Algorithm t ( i) P( x, x Lxt, st i ) t ( j) t ( i) aij b j ( xt ) i P( X ) ( i) i Exact solution. T s s s a j a j a j s j t t+
7 Speeding Search: Pruning Training Viterbi algorithm: replace sum with max. SpeedAccuracy Tradeoff: Computations can be reduced by eliminating paths that are not promising, at the expense of having a chance of eliminating the best path. Typically, computations can be reduced by more than a factor of 0, without affecting the error rate significantly. 8 HMM Training A lot of speech is needed to train the models. Done with an iterative algorithm: Take initial model (i.e. uniform probabilities). Segment Database (Run Viterbi algorithm). Update models. xij ( xij ˆ j ) i0 i0 ˆ j ˆ j If Converged stop, otherwise go to. HMM Training Rule of thumb: Each state has to appear in the training database at least 0 times. SpeakerIndependent systems have lots of data (models well trained), but contain high variability. SpeakerDependent systems do not have as much data (models not so well trained) but they are more consistent. Language Modeling Bayes Rule in ASR P( A W ) P( W ) P( W A) P( A) Wˆ max P( W A) max P( A W ) P( W ) where A is the Acoustics and W the sequence of words. P(W) is the language model.
8 Statistical Language Model P( W ) P( WW W LW P( W W W ) LP( W W ) P( W ) P( W P W ) P( W ) P( W ) LP( W ( LW W ) ) Unigram W ) P ( W ) P ( W W ) P ( W W ) LP ( W W ) Bigram P( W ) P( W W ) P( W WW ) LP( W W W ) Trigram Trigrams P(THIS IS A TEST) = P(THIS) P(IS THIS) P(A THIS IS) P(TEST IS A) These statistical language models predict the probability of the current word given the past history. While simplistic, they contain a lot of information P(IS THIS) >> P(IS HAVE) Gram Performance Perplexity Error Rate Multiplier x.0 x.0 x. x.0 Trigrams Bigrams Unigrams Uniform Perplexity measures the average branching of a text when presented to a language LP / n model. PP Pˆ( W, W, LWn ) An empirical observation E k P where E is the error rate and P the perplexity. A language model with low perplexity is good for ASR. Tradeoff between perplexity and coverage. Context Free Grammar Applications Advantages: Low error rate (because of low perplexity). Compact. Easy for application developers. Disadvantages: Poor coverage (out of language sentences). 8
9 ASR problem Space ASR Accuracy in the Lab Error Rate Multiplier x0.0 x.0 x.0 More Constraint Unconstrained Task oisy Environment Large Vocabulary Speaker Variance Less Constraint Command &Control (00 words + CFG) % word error rate (speakerindependent) Discrete Dictation (60K words) % word error rate (speakerdependent) Continuous Dictation (60K words) 7% word error rate (speakerdependent) Telephone Digits 0.% word error rate (speakerindependent) C & C Types of Errors Pronunciation errors make some words unrecognizable. User says something outside the vocabulary/grammar. User speaks while machine is talking. Background noise fires recognizer (false alarm). Rejection needs improvement. Spontaneous speech (uhms, ahms) ASR in Real Applications ew word addition Rejection Bargein. Robustness to noise. ASR is just a part of an application, User Interface is critical. Interface LP/ASR SAIL Lab: Loose coupling: LP selects correct input from a list of top candidate sentences: Easy to implement, not optimal Tight coupling. LP provides the language probabilities for the search: Optimal but hard to implement. Can a LP system reduce perplexity?
10
11 Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab Sp eec h A nalysis and Interp retatio n L ab ote: Lectures put together from ow n m a teria l a s w ell a s m a teria l from tutoria ls of A lex A cero & P hilips J a ck son 66
Speech Recognition on Cell Broadband Engine UCRLPRES223890
Speech Recognition on Cell Broadband Engine UCRLPRES223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationHidden Markov Models. and. Sequential Data
Hidden Markov Models and Sequential Data Sequential Data Often arise through measurement of time series Snowfall measurements on successive days in Buffalo Rainfall measurements in Chirrapunji Daily values
More informationLecture 10: Sequential Data Models
CSC2515 Fall 2007 Introduction to Machine Learning Lecture 10: Sequential Data Models 1 Example: sequential data Until now, considered data to be i.i.d. Turn attention to sequential data Timeseries: stock
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationGrammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University
Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun > roller Verb thrills VP Verb NP S NP VP NP S VP
More informationSpeaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System
Speaker Adaptation from Limited Training in the BBN BYBLOS Speech Recognition System Francis Kubala MingWhel Feng, John Makhoul, Richard Schwartz BBN Systems and Technologies Corporation 10 Moulton St.,
More informationComp 14112 Fundamentals of Artificial Intelligence Lecture notes, 201516 Speech recognition
Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 201516 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The
More informationHidden Markov Models Fundamentals
Hidden Markov Models Fundamentals Daniel Ramage CS229 Section Notes December, 2007 Abstract How can we apply machine learning to data that is represented as a sequence of observations over time? For instance,
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationISSN: [N. Dhanvijay* et al., 5 (12): December, 2016] Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY HINDI SPEECH RECOGNITION SYSTEM USING MFCC AND HTK TOOLKIT Nikita Dhanvijay *, Prof. P. R. Badadapure Department of Electronics
More informationAn Approach to Extract Feature using MFCC
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 22503021, ISSN (p): 22788719 Vol. 04, Issue 08 (August. 2014), V1 PP 2125 www.iosrjen.org An Approach to Extract Feature using MFCC Parwinder Pal Singh,
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l  5, Ju l y  Se p t 2013 ISSN : 22307109 (Online) ISSN : 22309543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationSpeech Recognition Software and Vidispine
Speech Recognition Software and Vidispine Tobias Nilsson April 2, 2013 Master s Thesis in Computing Science, 30 credits Supervisor at CSUmU: Frank Drewes Examiner: Fredrik Georgsson Umeå University Department
More informationSignal Processing for Speech Recognition
Signal Processing for Speech Recognition Once a signal has been sampled, we have huge amounts of data, often 20,000 16 bit numbers a second! We need to find ways to concisely capture the properties of
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality texttospeech synthesis can be achieved by unit selection
More informationLiterature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press
Literature: (6) Pattern Recognition and Machine Learning Christopher M. Bishop MIT Press ! Generative statistical models try to learn how to generate data from latent variables (for example class labels).
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationSPEAKER ADAPTATION FOR HMMBASED SPEECH SYNTHESIS SYSTEM USING MLLR
SPEAKER ADAPTATION FOR HMMBASED SPEECH SYNTHESIS SYSTEM USING MLLR Masatsune Tamura y, Takashi Masuko y, Keiichi Tokuda yy, and Takao Kobayashi y ytokyo Institute of Technology, Yokohama, 68 JAPAN yynagoya
More informationLanguage Modeling. Chapter 1. 1.1 Introduction
Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set
More informationHidden Markov Models
Hidden Markov Models Phil Blunsom pcbl@cs.mu.oz.au August 9, 00 Abstract The Hidden Markov Model (HMM) is a popular statistical tool for modelling a wide range of time series data. In the context of natural
More informationMachine Learning I Week 14: Sequence Learning Introduction
Machine Learning I Week 14: Sequence Learning Introduction Alex Graves Technische Universität München 29. January 2009 Literature Pattern Recognition and Machine Learning Chapter 13: Sequential Data Christopher
More informationCentre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4. by Dr Philip Jackson
Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. HMM tutorial 4 by Dr Philip Jackson Discrete & continuous HMMs  Gaussian & other distributions Gaussian mixtures 
More informationInformation Leakage in Encrypted Network Traffic
Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)
More informationHMM Speech Recognition. Words: Pronunciations and Language Models. Pronunciation dictionary. Outofvocabulary (OOV) rate.
HMM Speech Recognition ords: Pronunciations and Language Models Recorded Speech Decoded Text (Transcription) Steve Renals Acoustic Features Acoustic Model Automatic Speech Recognition ASR Lecture 9 JanuaryMarch
More informationBASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS. Keith Vertanen
BASELINE WSJ ACOUSTIC MODELS FOR HTK AND SPHINX: TRAINING RECIPES AND RECOGNITION EXPERIMENTS Keith Vertanen University of Cambridge, Cavendish Laboratory Madingley Road, Cambridge, CB3 0HE, UK kv7@cam.ac.uk
More informationStatistical Natural Language Processing
Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language Processing Some subproblems are partially solved Spelling correction, grammar checking Information retrieval with
More informationHidden Markov Model. Jia Li. Department of Statistics The Pennsylvania State University. Hidden Markov Model
Hidden Markov Model Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Hidden Markov Model Hidden Markov models have close connection with mixture models. A mixture model
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationDigital image processing
Digital image processing The twodimensional discrete Fourier transform and applications: image filtering in the frequency domain Introduction Frequency domain filtering modifies the brightness values
More informationMembering T M : A Conference Call Service with SpeakerIndependent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with SpeakerIndependent Name Dialing on AIN SungJoon Park, KyungAe Jang, JaeIn Kim, MyoungWan Koo, ChuShik Jhon Service Development Laboratory, KT,
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Partofspeech (POS) tagging is perhaps the earliest, and most famous,
More informationPhonemebased Recognition of Finnish Words with Dynamic Dictionaries
Phonemebased Recognition of Finnish Words with Dynamic Dictionaries F. Seydou *, T. Seppänen *, J. Peltola, P. Väyrynen * * University of Oulu, Information Processing Laboratory, Oulu, Finland VTT Electronics,
More informationSpeech recognition technology for mobile phones
Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such
More informationArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition Kapil Kumar 1, Neeraj Dhoundiyal 2 and Ashish Kasyap 3 1,2,3 Department of Computer Science Engineering, IIMT College of Engineering, GreaterNoida, UttarPradesh,
More informationAutomatic Transcription of Continuous Speech using Unsupervised and Incremental Training
INTERSPEECH2004 1 Automatic Transcription of Continuous Speech using Unsupervised and Incremental Training G.L. Sarada, N. Hemalatha, T. Nagarajan, Hema A. Murthy Department of Computer Science and Engg.,
More informationHidden Markov Models for biological systems
Hidden Markov Models for biological systems 1 1 1 1 0 2 2 2 2 N N KN N b o 1 o 2 o 3 o T SS 2005 Heermann  Universität Heidelberg Seite 1 We would like to identify stretches of sequences that are actually
More informationSpeech Signal Processing introduction
Speech Signal Processing introduction Jan Černocký, Valentina Hubeika {cernocky,ihubeika}@fit.vutbr.cz DCGM FIT BUT Brno FIT BUT Brno Speech Signal Processing introduction. Valentina Hubeika, DCGM FIT
More informationSpeech: A Challenge to Digital Signal Processing Technology for HumantoComputer Interaction
: A Challenge to Digital Signal Processing Technology for HumantoComputer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline CGislands The Fair Bet Casino Hidden Markov Model Decoding Algorithm ForwardBackward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training BaumWelch algorithm
More informationAdvanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
More informationMarkov chains and Markov Random Fields (MRFs)
Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume
More informationSpeech recognition for human computer interaction
Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices
More informationL6: Shorttime Fourier analysis and synthesis
L6: Shorttime Fourier analysis and synthesis Overview Analysis: Fouriertransform view Analysis: filtering view Synthesis: filter bank summation (FBS) method Synthesis: overlapadd (OLA) method STFT magnitude
More informationSPEECH RECOGNITION USING ARTIFICIAL NEURAL NETWORKS
SPEECH RECOGNITION USING ARTIFICIAL NEURAL NETWORKS Tiago P. Nascimento, Diego Stefano PDEEC, FEUP and INESCPorto Rua Dr. Roberto Frias, Porto, 4200465 Portugal Faculdade de Tecnologia e Ciências Avenida
More informationOffline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke
1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline
More informationAn introduction to Hidden Markov Models
An introduction to Hidden Markov Models Christian Kohlschein Abstract Hidden Markov Models (HMM) are commonly defined as stochastic finite state machines. Formally a HMM can be described as a 5tuple Ω
More informationInformation Retrieval Statistics of Text
Information Retrieval Statistics of Text James Allan University of Massachusetts, Amherst (NTU ST770A) Fall 2002 All slides copyright Bruce Croft and/or James Allan Outline Zipf distribution Vocabulary
More informationAbout Speaker Recognition Techology
About Speaker Recognition Techology Gerik Alexander von Graevenitz Bergdata Biometrics GmbH, Bonn, Germany gerik @ graevenitz.de Overview about Biometrics Biometrics refers to the automatic identification
More informationTEXTINDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS
TEXTINDEPENDENT SPEAKER IDENTIFICATION USING TEMPORAL PATTERNS Tobias Bocklet and Andreas Maier and Elmar Nöth University of Erlangen Nuremberg, Chair for Pattern Recognition, Martenstr.3, 91058 Erlangen,
More informationChapter 3 DiscreteTime Fourier Series. by the French mathematician Jean Baptiste Joseph Fourier in the early 1800 s. The
Chapter 3 DiscreteTime Fourier Series 3.1 Introduction The Fourier series and Fourier transforms are mathematical correlations between the time and frequency domains. They are the result of the heattransfer
More informationPhonetic Speaker Recognition with Support Vector Machines
Phonetic Speaker Recognition with Support Vector Machines W. M. Campbell, J. P. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek MIT Lincoln Laboratory Lexington, MA 02420 wcampbell,jpc,dar,daj,tleek@ll.mit.edu
More informationGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL Md. Sadek Ali 1, Md. Shariful Islam 1 and Md. Alamgir Hossain 1 1 Dept. of Information & Communication Engineering Islamic University, Kushtia 7003, Bangladesh.
More informationOverview. Speech Signal Analysis. Speech signal analysis for ASR. Speech production model. Vocal Organs & Vocal Tract. Speech Signal Analysis for ASR
Overview Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 7/2 January 23 Speech Signal Analysis for ASR Reading: Features for ASR Spectral analysis
More informationRecent Progress in Robust VocabularyIndependent Speech Recognition HsiaoWuen Hon and Kai.Fu Lee
Recent Progress in Robust VocabularyIndependent Speech Recognition HsiaoWuen Hon and Kai.Fu Lee School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Abstract This paper
More informationReinforcement Learning
Reinforcement Learning LU 2  Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme AlbertLudwigsUniversität Freiburg martin.lauer@kit.edu
More information(LDA). I. INTRODUCTION
A Study on Speech Recognition Mridula Shanbhogue 1, Shreya Kulkarni 2, R Suprith 3, Tejas K I 4, Nagarathna N 5 Dept. Of Computer Science & Engineering, B.M.S. College of Engineering, Bangalore, India
More informationRecurrent (Neural) Networks: Part 1. Tomas Mikolov, Facebook AI Research Deep Learning NYU, 2015
Recurrent (Neural) Networks: Part 1 Tomas Mikolov, Facebook AI Research Deep Learning course @ NYU, 2015 Structure of the talk Introduction Architecture of simple recurrent nets Training: Backpropagation
More informationSpeech Transcription
TCSTAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription JeanLuc Gauvain LIMSI TCSTAR Final Review Luxembourg, 2931 May 2007 1 What Is Speech Recognition? Def: Automatic conversion
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationL9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationSignal Modeling for HighPerformance Robust Isolated Word Recognition
1 Signal Modeling for HighPerformance Robust Isolated Word Recognition Montri Karnjanadecha and Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk, VA
More informationVoice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot
Voice Signal s Noise Reduction Using Adaptive/Reconfigurable Filters for the Command of an Industrial Robot Moisa Claudia**, Silaghi Helga Maria**, Rohde L. Ulrich * ***, Silaghi Paul****, Silaghi Andrei****
More informationat Spoken Term Detection Task in NTCIR10 SpokenDoc2
YLAB@RU at Spoken Term Detection Task in NTCIR10 SpokenDoc2 ABSTRACT Iori Sakamoto is019083@ed Masanori Morise morise@media The development of spoken term detection (STD) techniques, which detect a given
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationLogic, Probability and Learning
Logic, Probability and Learning Luc De Raedt luc.deraedt@cs.kuleuven.be Overview Logic Learning Probabilistic Learning Probabilistic Logic Learning Closely following : Russell and Norvig, AI: a modern
More informationPhrases. Topics for Today. Phrases. POS Tagging. ! Text transformation. ! Text processing issues
Topics for Today! Text transformation Word occurrence statistics Tokenizing Stopping and stemming Phrases Document structure Link analysis Information extraction Internationalization Phrases! Many queries
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More information1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland
ACOUSTIC MODELS OF CV SYLLABLES FOR SPEECH RECOGNITION S. Fernández 1, S. Feijóo 2, and N. Barros 2 1 IDSIA Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno, Switzerland email:
More informationSecureAccess System via Fixed and Mobile Telephone Networks using Voice Biometrics
SecureAccess System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationCONATION: English Command Input/Output System for Computers
CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India
More informationIdentification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models
Identification of Exploitation Conditions of the Automobile Tire while Car Driving by Means of Hidden Markov Models Denis Tananaev, Galina Shagrova, Victor Kozhevnikov NorthCaucasus Federal University,
More informationHidden Markov Models 1
Hidden Markov Models 1 Hidden Markov Models A Tutorial for the Course Computational Intelligence http://www.igi.tugraz.at/lehre/ci Barbara Resch (modified by Erhard and Car line Rank) Signal Processing
More informationVOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
More informationHidden Markov Models. Hidden Markov Models
Markov Chains Simplest possible information graph structure: just a linear chain X Y Z. Obeys Markov property: joint probability factors into conditional probabilities that only depend on the previous
More informationAutomatic New Word Acquisition: Spelling from Acoustics
Automatic New Word Acquisition: Spelling from Acoustics Fil Alleva and KaiFu Lee School of Computer Science Carnegie Mellon University Pittsburgh, PA Abstract The problem of extending the lexicon of words
More informationHidden Markov Models. Terminology and Basic Algorithms
Hidden Markov Models Terminology and Basic Algorithms Motivation We make predictions based on models of observed data (machine learning). A simple model is that observations are assumed to be independent
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationprinceton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks
princeton university F 02 cos 597D: a theorist s toolkit Lecture 7: Markov Chains and Random Walks Lecturer: Sanjeev Arora Scribe:Elena Nabieva 1 Basics A Markov chain is a discretetime stochastic process
More informationFinal Year Project Progress Report. FrequencyDomain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones
Final Year Project Progress Report FrequencyDomain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic
More informationExamining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance
Examining the Influence of Speech Frame Size and Number of Cepstral Coefficients on the Speech Recognition Performance Iosif Mporas, Todor Ganchev, Elias Kotinas, and Nikos Fakotakis Department of Electrical
More informationAutomatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sungwon ark and Jose Trevino Texas A&M UniversityKingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 5932638, FAX
More informationA NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXTINDEPENDENT SPEAKER IDENTIFICATION
A NEURAL NETWORK CLUSTERING TECHNIQUE FOR TEXTINDEPENDENT SPEAKER IDENTIFICATION ZAKI B. NOSSAIR AND STEPHEN A. ZAHORIAN Department of Electrical and Computer Engineering Old Dominion University, Norfolk,
More informationSpeech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing)
Speech Synthesis by Artificial Neural Networks (AI / Speech processing / Signal processing) Christos P. Yiakoumettis Department of Informatics University of Sussex, UK (Email: c.yiakoumettis@sussex.ac.uk)
More informationA VOICE INTERFACE FOR THE VISUALLY IMPAIRED
SETIT 2005 3 rd International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 2731, 2005 TUNISIA A VOICE INTERFACE FOR THE VISUALLY IMPAIRED Anand Arokia Raj,
More informationReinforcement Learning
Reinforcement Learning LU 2  Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme AlbertLudwigsUniversität Freiburg jboedeck@informatik.unifreiburg.de
More informationPERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED
PERSONAL COMPUTER SOFTWARE VOWEL TRAINING AID FOR THE HEARING IMPAIRED A. Matthew Zimmer, Bingjun Dai, Stephen A. Zahorian Department of Electrical and Computer Engineering Old Dominion University Norfolk,
More informationContext Free Grammars
Context Free Grammars So far we have looked at models of language that capture only local phenomena, namely what we can analyze when looking at only a small window of words in a sentence. To move towards
More informationUsing Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition
1 Using Segmentation Constraints in an Implicit Segmentation Scheme for Online Word Recognition Émilie CAILLAULT LASL, ULCO/University of Littoral (Calais) Christian VIARDGAUDIN IRCCyN, University of
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speechrecognition technology is embedded in voiceactivated routing systems at customer call centres, voice dialling
More informationMarkov Chains, part I
Markov Chains, part I December 8, 2010 1 Introduction A Markov Chain is a sequence of random variables X 0, X 1,, where each X i S, such that P(X i+1 = s i+1 X i = s i, X i 1 = s i 1,, X 0 = s 0 ) = P(X
More informationLecture 3: Quantization Effects
Lecture 3: Quantization Effects Reading: 6.76.8. We have so far discussed the design of discretetime filters, not digital filters. To understand the characteristics of digital filters, we need first
More informationA Supervised Approach To Musical Chord Recognition
Pranav Rajpurkar Brad Girardeau Takatoki Migimatsu Stanford University, Stanford, CA 94305 USA pranavsr@stanford.edu bgirarde@stanford.edu takatoki@stanford.edu Abstract In this paper, we present a prototype
More informationA Study on SpeakerAdaptive Speech Recognition
A Study on SpeakerAdaptive Speech Recognition X.D. Huang School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 ABSTRACT Speakerindependent system is desirable in many applications
More informationAutoregressive HMMs for speech synthesis. Interspeech 2009
William Byrne Interspeech 2009 Outline Introduction Highlights Background 1 Introduction Highlights Background 2 3 Experimental setup Results 4 Highlights Background Highlights for speech synthesis: synthesis
More informationWhy 64bit processing?
Why 64bit processing? The ARTA64 is an experimental version of ARTA that uses a 64bit floating point data format for Fast Fourier Transform processing (FFT). Normal version of ARTA uses 32bit floating
More informationCOMPARISON OF ENERGYBASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING
DEPARTMENT OF COMPARISON OF ENERGYBASED ENDPOINT DETECTORS FOR SPEECH SIGNAL PROCESSING by A. Ganapathiraju, L. Webster, J. Trimble, K. Bush, P. Kornman Department of Electrical and Computer Engineering
More informationApplication of Vocoders to Wireless Communications
1 of 5 6/21/2006 6:37 AM DSP Chips Vocoder Boards Voice Codec Units Application Notes Papers Application of Vocoders to Wireless Communications The need for increased utilization of available wireless
More informationANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 21
WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's
More information