KEYWORD SPOTTING USING HIDDEN MARKOV MODELS. by Şevket Duran B.S. in E.E., Boğaziçi University, 1997

Size: px
Start display at page:

Download "KEYWORD SPOTTING USING HIDDEN MARKOV MODELS. by Şevket Duran B.S. in E.E., Boğaziçi University, 1997"

Transcription

1 KEYWORD SPOTTING USING HIDDEN MARKOV MODELS by Şevket Duran B.S. in E.E., Boğaziçi University, 1997 Submitted to the Institute for Graduate Studies in Science and Engineering in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering Boğaziçi University 2001

2 i ACKNOWLEDGEMENTS To Dr. Levent M. Arslan: Thank you for the sacrifices of your personal time that you have made unselfishly to help me prepare this thesis. Thank you for your encouraging me to study in the area of speech processing. It is a privilege for me to be your student. Şevket Duran

3 ii ABSTRACT KEYWORD SPOTTING USING HIDDEN MARKOV MODELS The aim of keyword spotting system is to detect a small set of keywords from a continuous speech. It is important to obtain the highest possible keyword detection rate without increasing the number of false insertions in this system. Modeling only keywords is not enough. To seperate keywords from non-keywords, models for out-of-vocabulary words are needed, too. Since the structure and type of garbage model has great effect on the entire system performance, out-of-vocabulary modeling is done by the use of garbage models. The subject of this MS thesis is to examine context independent phonemes as garbage models and evaluate the performance of different criteria as confidence measures for out-of-vocabulary word rejection. Two different databases are collected for keyword spotting and isolated word recognition experiments over telephone lines. For keyword spotting use of monophone models together with one-state general garbage model gives the best performance. Using average phoneme likelihoods with phoneme durations gives the best performance for confidence measures.

4 iii ÖZET SAKLI MARKOV MODELLERİ KULLANILARAK ANAHTAR KELİME YAKALAMA Anahtar kelime yakalama sisteminin amacı sürekli bir sesin içinde barınan küçük bir anahtar kelimeler gurubu ortaya çıkarmaktır. Bu sistemde önemli olan, kelime olmadığı halde hata verme oranını artırmaksızın olası en yüksek anahtar kelime bulma oranını elde etmektir. Bunun için sadece anahtar kelimeleri modelleme yapmak yeterli değildir. Anahtar kelimeleri, olmayanlardan ayırmak için, sözlük dışı kelimelerin modellemesi de gerekmektedir. Bu modelleme, yapısı ve türü itibarıyla tüm sistem performansı üzerinde büyük etkisi bulunan garbage modellemesi ile yapılmaktadır. Bu tezin konusu garbage modelleri olarak bağımsız içerikli sesbirimleri (monophone) incelemek ve sözlük dışı kelime dışlamaları için güvenilirlik oranları bazında değişik kriterlerin performansını değerlendirmektir. Anahtar kelime yakalama ve telefon üzerinden yalıtılmış ses tanıma denemeleri için iki veritabanı oluşturuldu. Anahtar kelime bulma için en iyi performansı tek fazlı genel garbage modelleme ile birlikte tek-sesbirimsel modellerin kullanılması verdi. Güvenilirlik oranları içinse süreleri ile ortalama sesbirim benzeşmelerinin birlikte kullanımı en iyi performansı gösterdi.

5 iv TABLE OF CONTENTS ACKNOWLEDGEMENTS..iii ABSTRACT..iv ÖZET.. v LIST OF FIGURES....viii LIST OF TABLES x 1. INTRODUCTION BACKGROUND Speech Recognition Problem Speech Recognition Process Gathering Digital Speech Input Feature Extraction Hidden Markov Model Assumption in the Theory of HMMs The Markov Assumption The Stationarity Assumption The Output Independence Assumption Three Basic Problem of HMMs The Evaluation Problem The Decoding Problem The Learning Problem The Evaluation Problem and the Forward Algorithm The Decoding Problem and the Viterbi Algorithm The Learning Problem Maximum Likelihood (ML) Criterion Baum-Welch Algorithm Types of Hidden Markov Models Use of HMMs in Speech Recognition Subword Unit Selection Word Networks Training of HMMs Recognition Viterbi Based Recognition N-Best Search Keyword Spotting Problem PROPOSED KEYWORD SPOTTING ALGORITHM Introduction Experiment Data Performance of a System System Structure Performance of Monophone Models for Isolated Word Recognition CONFIDENCE MEASURES FOR ISOLATED WORD RECOGNITION Introduction Experiment Data Minimum Edit Distance....38

6 4.4. Phoneme Durations Garbage Model Using Same 512 Mixtures Comparison of Confidence Measures CONCLUSION APPENDIX A: SENTENCES USED FOR KEYWORD SPOTTING 49 APPENDIX B: MINIMUM EDIT DISTANCE ALGORITHM...51 REFERENCES.52 v

7 vi LIST OF FIGURES Figure 2.1. The waveform and spectrogram of ev and ben eve...3 Figure 2.2. The waveform and spectrogram of okul and holding...4 Figure 2.3. Components of a typical recognition system...5 Figure 2.4. The spectrogram of /S/ sound and /e/ sound in word Sevket...6 Figure 2.5. Flowchart of deriving Mel Frequency Cepstrum Coefficients...8 Figure 2.6. A simple isolated speech unit recognizer that uses null-grammar...24 Figure 2.7. The expanded network using the best match triphones...25 Figure 2.8. The null-grammar network showing the underlying states...25 Figure 3.1. General structure of the proposed keyword spotter...31 Figure 3.2. ROC points for different alternatives for garbage model...32 Figure 3.3. ROC points for different number of keywords for keyword spotting...33 Figure 3.4. Network structure for the keyword spotter used as a post-processor for isolated word recognizer...34 Figure 3.5. ROC curves for monophone and garbage model based out-of-vocabulary word rejection...36 Figure 4.1. ROC curves before/after applying Minimum Edit Distance Revision...39 Figure 4.2. Forced alignment of the waveform for keyword iszbankasizkurucu...41 Figure 4.3. Forced alignment of the waveform for keyword milpa...42

8 vii Figure 4.4. ROC curves for phoneme duration based confidence measure...42 Figure 4.5. Likelihood profiles for ceylangiyim and the base garbage model proposed..43 Figure 4.6. ROC curves for different emphasis values for power value Figure 4.7. ROC curves for different power values with emphasis set to Figure 4.8. ROC curve for phoneme duration based confidence measure and confidence measure with likelihood ratio scoring included...46

9 viii LIST OF TABLES Table 3.1. Database used for keyword spotting...30 Table 3.2. Number of occurrences of the keywords used for keyword spotting tests...30 Table 3.3. Results for monophone model based out-of-vocabulary word rejection for isolated word recognition...35 Table 3.4. Results for general garbage model based out-of-vocabulary word rejection for isolated word recognition...35 Table 4.1. Average phoneme durations in Turkish...40 Table 4.2. Computation time required with/without phoneme duration evaluation...47

10 1 1. INTRODUCTION Communication between people and computers using more natural interfaces is an important issue in order to use computers in our daily lives. To interact with computers you always have to use your hands. The device may be a keyboard or a mouse or the dialing pad on your phone if you want to access information on a computer over a telephone line. A more natural interface for input is speech. Human-computer interaction via speech involves speech recognition [1, 2, 3] and speech synthesis [4]. Speech recognition is the conversion of speech signal into text and synthesis is the opposite. Speech recognition may range from understanding simple commands to getting all information in speech signal such as all words, the meaning and the emotional state of the speaker. After many years work speech recognition is at a level mature enough to be used in practical applications. This is due to the availability of algorithms developed and the increase in the computational power. Speech recognition may be speaker dependent or speaker independent. If the application is for home use, and the same person will use the same microphone at the same place, then the problem is simple and you don t need a robust algorithm. But if it is an application that will recognize speech over a public telephone network where speaker variability and the environment that speech passes through are different among different calls, you need a robust algorithm. If recognition of isolated words or phrases is the problem, then you will have less of a problem as far as the speakers only give the required input. If the speakers also use other words in addition to the keywords you require, you need to perform keyword spotting which means recognizing the keywords among other non-keyword filler words. If we go further, recognition from a large vocabulary where you have to recognize all of the words, it is called dictation, which is a harder task. We will be dealing with the keyword-spotting problem in this thesis.

11 2 For speech recognition, the digitized speech signal that is in the time domain must be transformed into another domain. Generally some part of the speech is taken and a feature vector is derived to represent that part. Next these feature vectors are used to guess the sequence of words that generated this speech signal. We need algorithms that account for the variability in the speech signal. The most common technique for acoustic modeling is called hidden Markov modeling (HMM). We have used this model in this thesis. In order to have an operating system independent notation we preferred not to use non-ansi characters in Turkish character set. We have used lower case letters for characters that are in the ANSI character set, and upper case letters for Turkish characters. We have used /S/ instead of /ş/, /U/ instead of /ü/, and so on. We use /Z/ for interword silence. So the word savaş alanı is represented as savaszalani. In this thesis we investigate some models for garbage models for keyword spotting and try to find some confidence measure for detection of out-of-vocabulary words in an isolated word recognizer. In chapter 2, we give the theory of each step in speech recognition process and give details of the techniques we have used. In chapter 3, we study the keyword-spotting algorithm we have proposed and conclude that using the monophone models of the words and a one-state 16-mixture general garbage models with different bonus values give the best performance. We evaluate the performance of monophone models as garbage model for isolated word recognition. In chapter 4, we evaluate some measures to obtain a good confidence measure and decide both likelihood and phoneme duration are important for obtaining a good confidence measure. Finally, in chapter 5 we give our conclusions from these experiments and suggest some directions for future study.

12 3 2. BACKGROUND 2.1. Speech Recognition Problem The speech signal is different if input is given with isolated words or the speech is continuous. If the speaker knows that a computer will try to recognize the speech, then he/she may pause between words. However in continuous speech some sounds will disappear and sometimes there will be no silence between words. It may be hard to say a word in a different context. An exaggerated example may be a tongue twister like SemsiZpaSaZpasajIndaZsesiZbUzUSesiceler. Even in normal cases there is great difference in the characteristics of the speech signal. Figure 2.1 shows the same /e/ sound in ev and ben eve. The waveforms are shown at the top of the figure. The spectrograms at the bottom show the energy at different frequencies versus time. The darkness shows the amplitude. The effect of the context can be seen on the characteristics of the /e/ sound. The effect of the context leads us to model each phoneme according to the neighboring phonemes. Figure 2.1. The waveform and spectrogram of ev (on the left) and ben eve (on the right).

13 4 Spontaneous speech may contain other fillers that are not words like ee or himm. It is another difficulty in continuous speech recognition. The task should be known while designing the algorithm. If we are to use a recognizer in continuous speech recognition, the training data should consist of continuous speech as well. The main difficulty of the speech recognition problem comes from the variability of the source of the signal. First, the characteristics of phonemes, the smallest sound units, are dependent on the context in which they appear. An example to phonetic variability is the acoustic differences of the phoneme /o/ in, okul and holding in Turkish. See Figure 2.2. The marked region corresponds to the /o/ sound. Figure 2.2. The waveform and spectrogram of okul (on the left) and holding (on the right) The environment also causes variability. The same speaker will say a word differently according to his physical and emotional state, speaking rate, or voice quality. The difference in the vocal tract size and shape of different people also causes variability. The problem is to find the meaningful information in the speech signal. The meaningful information is different for speech recognition and speaker recognition. The same information in the speech signal may be necessary for some application and redundant for

14 5 some other application. For speaker independent speech recognition we have to get rid of as much speaker related features as possible Speech Recognition Process Figure 2.3 shows the main components of a typical speech recognition system. The digitized speech signal is first transformed into a set of useful measurements or features at a fixed rate, typically once every 10 milliseconds. These measurements are then used to search for the most likely word candidate, making use of constraints imposed by the acoustic, lexical, and language models. Throughout this process, training data are used to determine the values of the model parameters. Figure 2.3. Components of a typical speech recognition system Gathering Digital Speech Input Speech recognition is the process of converting digital speech signal into words. To capture speech signal we need a device that converts physical speech wave into digital signal. This may be a microphone that converts the speech into analog signal and a sound card that is an A/D converter that converts the analog signal to the digital signal. Another way of obtaining digital speech input is to use a telephone card that converts the analog

15 6 signal that comes from the telephone line into digital signal. There are also devices that can take the digital signal coming from E-1 or T-1 lines directly. Dialogic has JCT LS240 and JCT LS300 for T-1 lines and E-1 lines respectively. We have been using a JCT LS120 card, which is a speech-processing card for 12 analog lines. We have used 8 KHz sampling rate which is the sampling rate for telephone lines and converted the µ -law encoded signal into 16-bit linear encoded signal before processing Feature Extraction To get rid of redundancies in the speech signal mentioned earlier we have to represent the signal by only taking the perceptually most important speaker-independent features [5]. Passing the excitation signal generated by the larynx through the vocal tract produces the speech signal. We are interested in the properties of the speech generated by the overall shape of the vocal tract. To distinguish phonemes better (the voiced /unvoiced distinction), we examine if the vocal folds are vibrating but ignore the variations in the frequency of vibration. The spectrum of voiced sounds has several sharp peaks, which are called formant frequencies. The spectrum of unvoiced sounds looks like white noise spectrum. Figure 2.4 shows the spectrum of the unvoiced sound /S/ and the voiced sound /e/. Figure 2.4. The spectrum (found using 256 point FFT) of /S/ sound (on the left) and /e/ sound (on the right) in word Sevket Since our ears are insensitive to phase effects we use the power spectrum as a basis for speech recognition front-end. The power spectrum is represented on a log scale. When

16 7 the overall gain of the signal varies, the shape of the log power spectrum is the same but shifted up or down. The convolutional effects of the telephone lines are multiplied with the signal on the linear power spectrum. In log power spectrum the effect is additive. Since a voiced speech waveform corresponds to convolution of a quasi-periodic excitation signal and a time-varying filter (shape of the vocal tract), we can separate them in the log power spectrum. Assigning a lower limit to the log function solves the problem of low energy levels at some part of the spectrum. Before computing short-term power spectra, the waveform is processed by a simple pre-emphasis filter to give a 6 db/octave increase in gain. This makes the average speech spectrum roughly flat. We have to extract the effects caused by the shape of the vocal tract. One method is to predict the coefficients of the filter that corresponds to the shape of the vocal tract. The vocal tract is assumed to be a combination of lossless tubes with different radius. The number of parameters derived corresponds to the number of tubes assumed. The filter is assumed to be an all-pole linear filter. The parameters are called Linear Predictive Coding (LPC) parameters and the procedure is known as LPC analysis. There are different methods to calculate these coefficients [6]. To calculate the short-term spectra we take overlapping portions of the waveform. We take a frame of 25 milliseconds and multiply it with a window function to avoid artificial high frequencies. We use a Hamming window. Then we apply Fourier transform. We have to get rid of harmonic structure at the multiples of fundamental frequency, f 0, because it is the effect of the excitation signal. The smoothed spectrum without the effect of the excitation signal corresponds to the Fourier Transform of the LPC parameters. We use a different method and group components of the power spectrum and form frequency bands. Grouping is not linear; the human ear sensitivity is taken into account. The bands are linear up to 1 khz and logarithmic at higher frequencies. The frequency bands are broader at higher frequencies. The positions of the bands are set according to the mel frequency scale [7].

17 8 The relation between mel frequency scale and linear frequency scale is as follows: f Mel ( f ) = 2595log10 (1 + ) (2.1) 700 Figure 2.5. Flowchart for deriving Mel Frequency Cepstrum Coefficients To calculate the filterbank coefficients, the magnitude coefficients of the spectrum are accumulated after windowing with these triangular windows. Triangular filters are

18 9 spread over the whole frequency range from zero upto the Nyquist frequency. We have chosen 16 filter banks. Since the shape of the spectrum imposed by the vocal tract is smooth, energy levels in adjacent bands are correlated. We have to remove correlation since in further statistical analysis we assume that feature vector elements are uncorrelated and use a diagonal variance vector. Removing the correlation helps the number of parameters to be reduced without loss of useful information. The discrete cosine transform (a version of the Fourier transform using only cosine basis functions) converts the set of log energies to a set of cepstral coefficients, which are largely uncorrelated. The formula for Discrete Cosine Transform is: c i = 2 N πi m j cos ( j 0.5) N j= 1 N, i = 1,..., P ( 2.2) where { m } are log filter bank amplitudes. j and N is the number of filterbank channels which we set to 16. The required number of cepstral coefficients is P and we set it to 12. Figure 2.5 shows the steps in obtaining Mel Frequency Cepstrum Coefficients (MFCCs). Many systems use the rate of change of the short-term power spectrum as additional information. The simplest way to obtain this dynamic information is to take the difference between consecutive frames. But this is too sensitive to random interframe variations. So, linear trends are estimated over sequences of typically five or seven frames [8]. We use five frames; there will be a delay of 2 times step size in real-time operation. d G (2 * c + c c 2 * c ) ( 2.3) t = * t+ 2 t+ 1 t 1 t 2 where d is the difference evaluated at time t, and c, c,, t t+2 t+1 c t 1 t 2 c are the coefficients at time t+2, t+1, t-1 and t-2, respectively. G is a gain factor selected as Some systems use the acceleration features as well as linear rates of change. These second-order dynamic features need longer sequences of frames for reliable estimation [9].

19 10 Since cepstral coefficients are largely uncorrelated, probability estimates are easier in further analysis. We can simply calculate Euclidean distances from reference model vectors. Statistically based methods weigh coefficients by the inverse of their standard deviations computed around their overall means. Current representations concentrate on the spectrum envelope and ignore fundamental frequency; but we know that even in isolated-word recognition fundamental frequency contours carry important information. At the acoustic phonetic level, speaker variability is typically modeled using statistical techniques applied to large amounts of training data. Effects of context at the acoustic phonetic level are handled by training separate models for phonemes in different contexts; this is called context dependent acoustic modeling. Word level variability can be handled by allowing alternate pronunciations of words in representations known as pronunciation networks. Another technique is to add different pronunciations to the network because after pruning common nodes at the network, it corresponds to different pronunciations of the same word Hidden Markov Model The most widely used recognition algorithm in the past fifteen years is Hidden Markov Models (HMM) [10, 11, 12]. Although there had been some attempts at using Neural Networks, those have not been very successful. The Hidden Markov Model is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution. Transition probabilities are assigned to the transitions among the states. In a particular state an outcome or observaiton can be generated, according to the associated probability distribution. The external observer can only see the the outcome, not the states. Therefore states are hidden to the outside. The following part is the teory of the HMMs taken from the tutorial [3]. The

20 11 advanced reader can skip this part. In order to define an HMM completely, following elements are needed: The number of states of the model, N. The number of observation symbols in the alphabet, M. If the observations are continuous then M is infinite. A set of state transition probabilities, A = a } { ij a = p{ q + 1 = j q i}, 1 i, j N ( 2.4) ij t t = where q t denotes the state index at time t and a ij corresponds to the transition probability from state i to state j. Transition probabilities should satisfy the normal stochastic constraints, a 0, 1 i, j N ( 2.5) ij and N a ij j= 1 = 1, 1 i N ( 2.6) A probability distribution in each of the states, B = { b ( k)}. j b ( k) = p{ o = ν q j}, 1 i N, 1 k M ( 2.7) j t k t = where ν k denotes the k th observation symbol in the alphabet, and o t the current observation vector. Following stochastic constraints must be satisfied. b j ( k) 0, 1 j N, 1 k M ( 2.8) and

21 12 M b j ( k) = 1, 1 j N ( 2.9) k= 1 If the observations are continuous then we will have to use a continuous probability density function, instead of a set of discrete probabilities. In this case we specify the parameters of the probability density function. Usually the probability density is approximated by a weighted sum of M Gaussian distributions, j t M b ( o ) = c Ν( µ, Σ, o ) ( 2.10) m= 1 jm jm jm t where, c jm = mixture weights for th j state s th m mixture µ jm = mean vectors Σ jm = covariance matrices c jm should satisfy the stochastic constrains, c 0, 1 j N, 1 m M ( 2.11) jm and M c jm m= 1 = 1, 1 j N ( 2.12) The initial state distribution, π = π }. { i where, π = p{ q 1 i}, 1 i N ( 2.13) i = Therefore we can use the compact notation

22 13 λ = ( A, B, π ) ( 2.14) can be used to denote an HMM with discrete probability distributions, while λ = ( A,, µ, Σ, π ) ( 2.15) c jm jm jm to denote one with continuous densities Assumptions in the Theory of HMMs For the sake of mathematical and computational tractability, following assumptions are made in the theory of HMMs The Markov Assumption. As given in the definition of HMMs, transition probabilities are defined as, a = p{ q + 1 = j q i}, 1 i, j N ( 2.16) ij t t = In other words it is assumed that the next state is dependent only upon the current state. This is called the Markov assumption and the resulting model becomes actually a first order HMM. However the next state may depend on past k states and it is possible to obtain such a model, called a k th order HMM. But a higher order HMM will have a higher complexity The Stationarity Assumption. Here it is assumed that state transition probabilities are independent of the actual time at which the transitions takes place. Mathematically, p{ qt + 1 = j qt = i} = p{ qt + 1 = j qt = i} ( 2.17) for any t 1 and t The Output Independence Assumption. This is the assumption that current output (observation) is statistically independent of the previous outputs(observations). We can formulate this assumption mathematically, by considering a sequence of observations,

23 14 O = o o,..., ( 2.18) 1, 2 o T Then according to the assumption for an HMM λ, T p{ O / q1, q1,..., qt, λ } = p( ot qt, λ) ( 2.19) t= 1 However unlike the other two, this assumption has a very limited validity. In some cases this assumption may not be fair enough and therefore becomes a severe weakness of the HMMs Three Basic Problems of HMMs Once we have an HMM, there are three problems of interest The Evaluation Problem. Given an HMM λ and a sequence of observations O = o o,...,, what is the probability that the observations are generated by the model, 1, 2 p { O λ}? o T The Decoding Problem. Given a model λ and a sequence of observations O = o o,...,, what is the most likely state sequence in the model that produced the 1, 2 observations? o T The Learning Problem. Given a model λ and a sequence of observations O = o o,...,, how should we adjust the model parameters ( A, B, π ) in order to 1, 2 o T maximize p { O λ}? Evaluation problem can be used for isolated (word) recognition. Decoding problem is related to the continuous recognition as well as to the segmentation. Learning problem must be solved, if we want to train an HMM for the subsequent use of recognition tasks The Evaluation Problem and the Forward Algorithm

24 15 We have a model λ = ( A, B, π ) and a sequence of observations O = o1, o2,..., ot, and p { O λ} must be found. If we can calculate this quantity using simple probabilistic arguments the number of operations are on the order of. This is very large even if the length of the sequence, T is small. The idea of keeping the multiplications that are common led to the idea of using an auxiliary variable, which is called the forward variable and denoted as α (i). t T N The forward variable is defined as the probability of the partial observation sequence O = o1, o2,..., ot, when it terminates at the state i. Mathematically, α i) = p{ o, o,..., o, q i } ( 2.20) t ( 1 2 t t = λ Then it is easy to see that following recursive relationship holds. N α + 1 ( j) = b ( o + 1) α ( i) a, 1 j N, 1 t T 1 ( 2.21) t j t i= 1 t ij where, α j) = π b ( ), 1 j N ( 2.22) 1( j j o1 Using this recursion we can calculate α (i), 1 j N T and then the required probability is given by, N { O λ } = α ( i ) ( 2.23) p i= 1 T N 2 T The complexity of this method, known as the forward algorithm is proportional to, which is linear with respect to T whereas the direct calculation had an exponential complexity. In a similar way the backword variable β (i) is defined as the probability of the partial observation sequence o + o,..., o, given that the current state is i. t 1, t+ 2 T t

25 16 Mathematically, βt ( i) = p{ ot+ 1, ot+ 2,..., ot qt = i, λ} ( 2.24) As in the case of α (i) there is a recursive relationship which can be used to calculate β (i) efficiently. t t N β ( i) = β ( j) a b ( o, 1 i N, 1 t T 1 ( 2.25) t j= 1 t+ 1 ij j t+ 1) where, β ( i) = 1, 1 i N ( 2.26) T Further we can see that, α ( i) β ( i) = p{ O, q i λ}, 1 i N, 1 t T ( 2.27) t t t = Therefore this gives another way to calculate p { O λ}, by using both forward and backward variables : N p{ O λ } = p{ O, q = i λ} = α ( i) β ( i) ( 2.28) t i= 1 i= 1 N t t The last equation is very useful, especially in deriving the formulas required for gradient based training.

26 The Decoding Problem and the Viterbi Algorithm In this case we want to find the most likely state sequence for a given sequence of observations, O = o1, o2,..., ot and a model, λ = ( A, B, π ). The solution to this problem depends upon the way most likely state sequence is defined. One approach is to find the most likely state qt at t=t and to concatenate all such ' q t 's. But some times this method does not give a physically meaningful state sequence. Therefore we would need another method which has no such problems. In this method, commonly known as Viterbi algorithm [13], the whole state sequence with the maximum likelihood is found. In order to facilitate the computation we define an auxiliary variable, δ t ( i) = max p{ q1, q2,..., qt 1, qt = i, o1, o2,... ot 1 q1q 2... q t 1 λ} ( 2.29) which gives the highest probability that partial observation sequence and state sequence up to t=t can have, when the current state is i. It is easy to observe that the following recursive relationship holds. δ t+ 1( j ) = b j ( ot+ 1) maxδ t ( i) aij, 1 i N, 1 t T 1 ( 2.30) 1 i N where, δ j) = π b ( ), 1 i N ( 2.31) 1( j j o1 So the procedure to find the most likely state sequence starts from calculation of δ ( j), 1 i N using recursion in Eqn (2.30) while always keeping a pointer to the T winning state in the maximum finding operation. Finally the state * j, is found where j * = arg max δ ( j) ( 2.32) 1 j N T and starting from this state, the sequence of states is back-tracked as the pointer in each

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1 WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

CONATION: English Command Input/Output System for Computers

CONATION: English Command Input/Output System for Computers CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India

More information

Solutions to Exam in Speech Signal Processing EN2300

Solutions to Exam in Speech Signal Processing EN2300 Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

Lecture 1-10: Spectrograms

Lecture 1-10: Spectrograms Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed

More information

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition

Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The

More information

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX

More information

Coding and decoding with convolutional codes. The Viterbi Algor

Coding and decoding with convolutional codes. The Viterbi Algor Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Voice Transmission --Basic Concepts-- Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics: Amplitude Frequency Phase Voice Digitization in the POTS Traditional

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA ABSTRACT Random vibration is becoming increasingly recognized as the most realistic method of simulating the dynamic environment of military

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Spectrum Level and Band Level

Spectrum Level and Band Level Spectrum Level and Band Level ntensity, ntensity Level, and ntensity Spectrum Level As a review, earlier we talked about the intensity of a sound wave. We related the intensity of a sound wave to the acoustic

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

Data a systematic approach

Data a systematic approach Pattern Discovery on Australian Medical Claims Data a systematic approach Ah Chung Tsoi Senior Member, IEEE, Shu Zhang, Markus Hagenbuchner Member, IEEE Abstract The national health insurance system in

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Introduction to Engineering System Dynamics

Introduction to Engineering System Dynamics CHAPTER 0 Introduction to Engineering System Dynamics 0.1 INTRODUCTION The objective of an engineering analysis of a dynamic system is prediction of its behaviour or performance. Real dynamic systems are

More information

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables. Generation of random variables (r.v.) Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract

More information

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper

More information

Abant Izzet Baysal University

Abant Izzet Baysal University TÜRKÇE SESLERİN İSTATİSTİKSEL ANALİZİ Pakize ERDOGMUS (1) Ali ÖZTÜRK (2) Abant Izzet Baysal University Technical Education Faculty Electronic and Comp. Education Dept. Asit. Prof. Abant İzzet Baysal Üniversitesi

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY 3 th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 296 SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY ASHOK KUMAR SUMMARY One of the important

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS 1. Bandwidth: The bandwidth of a communication link, or in general any system, was loosely defined as the width of

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Speech recognition for human computer interaction

Speech recognition for human computer interaction Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices

More information

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3 Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is

More information

Developing acoustics models for automatic speech recognition

Developing acoustics models for automatic speech recognition Developing acoustics models for automatic speech recognition GIAMPIERO SALVI Master s Thesis at TMH Supervisor: Håkan Melin Examiner: Rolf Carlson TRITA xxx yyyy-nn iii Abstract This thesis is concerned

More information

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems

A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems A Soft Computing Based Approach for Multi-Accent Classification in IVR Systems by Sameeh Ullah A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of

More information

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

More information

Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

More information

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP

Department of Electrical and Computer Engineering Ben-Gurion University of the Negev. LAB 1 - Introduction to USRP Department of Electrical and Computer Engineering Ben-Gurion University of the Negev LAB 1 - Introduction to USRP - 1-1 Introduction In this lab you will use software reconfigurable RF hardware from National

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

5 Signal Design for Bandlimited Channels

5 Signal Design for Bandlimited Channels 225 5 Signal Design for Bandlimited Channels So far, we have not imposed any bandwidth constraints on the transmitted passband signal, or equivalently, on the transmitted baseband signal s b (t) I[k]g

More information

Speech recognition technology for mobile phones

Speech recognition technology for mobile phones Speech recognition technology for mobile phones Stefan Dobler Following the introduction of mobile phones using voice commands, speech recognition is becoming standard on mobile handsets. Features such

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE

More information

The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review. Pierre Granjon The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

More information

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Non-Data Aided Carrier Offset Compensation for SDR Implementation Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

203.4770: Introduction to Machine Learning Dr. Rita Osadchy

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1 Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2 About the course Course Homepage:

More information

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer

A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742

More information

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT

The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT The Effect of Network Cabling on Bit Error Rate Performance By Paul Kish NORDX/CDT Table of Contents Introduction... 2 Probability of Causing Errors... 3 Noise Sources Contributing to Errors... 4 Bit Error

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Implementation of Digital Signal Processing: Some Background on GFSK Modulation

Implementation of Digital Signal Processing: Some Background on GFSK Modulation Implementation of Digital Signal Processing: Some Background on GFSK Modulation Sabih H. Gerez University of Twente, Department of Electrical Engineering s.h.gerez@utwente.nl Version 4 (February 7, 2013)

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Linear Codes. Chapter 3. 3.1 Basics

Linear Codes. Chapter 3. 3.1 Basics Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

More information

METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS.

METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS. SEDM 24 June 16th - 18th, CPRI (Italy) METHODOLOGICL CONSIDERTIONS OF DRIVE SYSTEM SIMULTION, WHEN COUPLING FINITE ELEMENT MCHINE MODELS WITH THE CIRCUIT SIMULTOR MODELS OF CONVERTERS. Áron Szûcs BB Electrical

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Figure1. Acoustic feedback in packet based video conferencing system

Figure1. Acoustic feedback in packet based video conferencing system Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents

More information