CHEN Y N et al.: Speech Rate Robustness in Speech Recognition 757 tion feature vectors O = (o 1 ;o 2 ;:::;o T ) represented by state sequence S = (s 1

Size: px
Start display at page:

Download "CHEN Y N et al.: Speech Rate Robustness in Speech Recognition 757 tion feature vectors O = (o 1 ;o 2 ;:::;o T ) represented by state sequence S = (s 1"

Transcription

1 Nov. 2003, Vol.18, No.6, pp J. Comput. Sci. & Technol. Towards Robustness to Speech Rate in Mandarin All-Syllable Recognition CHEN YiNing ( ±Λ), ZHU Xuan ( Φ), LIU Jia ( ) and LIU RunSheng ( ΞΠ) Department of Electronic Engineering, Tsinghua University, Beijing , P.R. China chenyining99@mails.tsinghua.edu.cn Received April 26, 2002; revised February 14, Abstract In mandarin all-syllable recognition, many insert errors occur due to the influence of non-consonant syllables. Introducing the duration model into the recognition process is a direct way to lessen these errors. But that usually could not work well as expected, for the duration is sensitive to speech rate. Hence, aiming at this problem, a novel context dependent duration distribution normalized by speech rate is proposed in this paper and applied to a speech recognition system based on the frame of improved Hidden Markov Model (HMM). To realize this algorithm, the authors employ a new method to estimate the speech rate of a sentence; then compute the duration probability combined with speech rate; and finally implement this duration information in the post-processing stage. With little change in the recognition process and resource demand, the duration model is adopted efficiently in the system. The experimental results indicate that the syllable error rates decrease significantly in two different speech corpora. Especially for the insertions, the error rates reduce about sixty to eighty percent. Keywords 1 Introduction speech recognition, speech rate, duration distribution As we know, the performance of large-vocabulary continuous speech recognition (LVCSR) systems dramatically degrades with the variety of the speech rates, while human auditory system can keep robust under this condition [1]. Consequently, it becomes a natural idea to involve the speech rate information in the framework of speech recognition. On the other hand, it has been well known that introducing the duration model properly can reduce a great deal of insert errors in the LVCSR system. However, it is proved that the duration model is not only context dependent but also speech rate dependent, so combining both of them is essential in the process of obtaining the duration model. Several methods have been proposed recently to solve this problem in different ways [2 4]. The most popular ones are transition probability adaptation and speech rate dependent acoustic modeling. However, when any of these methods is utilized, it will bring about sharp increase in either the space complexity [2] or the time complexity [3;4] with about 10% word error rate reduction. Our method is different from them. Since the duration models have a close relationship with speech rate, we build novel duration models normalized by the speech rate and join them into the second decoding stage. Thus, the mended system can also get about 10 percent depressing, while the time spending only rises about 10% and the space demand is nearly no change. Obviously, this new method makes the system show higher performance by adding the new duration information skillfully. This paper is organized as follows. In Section 2, we introduce our framework employing the duration models. Section 3 describes how to adopt the duration that is normalized by speech rate in training and decoding process. In Section 4 experimental results are presented. Finally, Section 5 gives our conclusion and future work. 2 Duration Modeling Compared with the systems in [2 4], our baseline system is quite different. It derives from segmental HMM [5;6], which can avoid a traditional HMM's problem caused by transition probabilities. Therefore, our duration model becomes special in this kind of methods. 2.1 Baseline Framework In our framework, a segment of speech observa- The research is supported by the National Natural Science Foundation of China (Grant No ).

2 CHEN Y N et al.: Speech Rate Robustness in Speech Recognition 757 tion feature vectors O = (o 1 ;o 2 ;:::;o T ) represented by state sequence S = (s 1 ;s 2 ;:::;s N ) is modeled as: semi-syllable [p]. Here the independent variable of Gaussian distribution is limited to [1, 12], which is reasonable in real speech. p(o; D=S) = p(o=d; S); P(D=S) (1) where D = (d 1 ;d 2 ;:::;d N ) is the duration of each state, T is the frame number of this speech segment and N is the state number in the sequence S. Here, p(o; D=S)is known as the likelihood function, while the first item on the right is the observation distribution and the second item is the duration distribution. Since the computational complexity of segmental HMM will be significantly higher than conventional HMM [5], two assumptions must be introduced, that is: ffl Suppose that the duration and the observation vectors are independent. Then, (1) is simplified to p(o; D=S) = p(o=s)p (D=S) (2) ffl Premise that both the feature vector and the duration are independent, thus P (O=S) = P (D=S) = TY NY k=1 p(o i =S) (3) p(d k =s k ) (4) Therefore, in our framework there are no positions left for transition probabilities. The duration distribution information can be used in the first or second stage of recognition decoding process. In this paper, we use the duration models in the second pass. 2.2 Duration Model Estimation In training process, without considering the effect of duration model, we attain supervised segmentations of training set with the observation probabilities p(o=s). Then the histogram of duration for each state is collected. 1-D Gaussian distribution p(x) = p 1 exp 2ßff (x μ)2 2ff 2 (5) is found to produce a satisfactory quality fit to the empirical duration distribution, so we employ it in our system. Fig.1 shows the empirical duration distribution and its Gaussian fit for the first state of Fig.1. Duration distribution for the first state of [p]. (a: Empirical distribution (solid line); b: Gaussian fit (dashed line.) To estimate the parameters of Gaussian distribution, only the mean μ and the variance ff 2 are needed. They can be calculated as follows. P dmax μ = P h(i) i dmax h(i) (6) ff 2 = P dmax P dmax h(i) i2 h(i) μ 2 (7) where h(i) is the number of occurrences of duration i and d max is the largest duration allowed. 3 Duration Normalization with Speech Rate To use the duration information normalized by speech rate, different methods are employed in training and decoding processes. In both processes, we make a robust estimation of speaking rate first and then normalize the duration with it. 3.1 Duration Normalization in Training Process In the training process, the duration normalization is carefully done with an EM (Expectation Maximization) like iteration method. That is, since the speech rate is hard to be estimated, we treat it as a latent variable of EM algorithm. From the iteration method, the speech rate independent duration distribution can be estimated robustly. The

3 758 J. Comput. Sci. & Technol., Nov. 2003, Vol.18, No.6 detail of our method is addressed in the following steps: Step 1. Align the speech data only with observation probabilities p(o=s). Step 2. Get the duration d il of each state i in sentence l. Step 3. Set loop variable n = 1. Step 4. Estimate the Gaussian duration distribution N(μ i;ff 2 i ) of each state i with (6) and (7) (maximization step of EM algorithm). Step 5. Calculate the speech rate of each sentence l. speed l = 1 M MX k=1 d kl μ k (8) Here the speed means the average ratio between the state duration and the mean of state durations in one sentence, d kl is the duration of the state s kl in the sentence l and μ k is the mean duration of the state s k in the whole speech database, M presents the state number of one sentence. M does not include the state of silence. Step 6. Normalize the duration of each state in each sentence. d kl = d kl =speed l (9) Step 7. Calculate the duration probabilities of the whole corpus p(n) with (5) (expectation step of EM algorithm). Step 8. IF (p(n) p(n 1))=p(n 1) < Threshold THEN end ELSE n = n +1, GOTO step 4 where the Threshold is an experimental parameter to stop training. In training process, we align the speech data firstly only with observation probabilities (Step 1) and get the duration of each state in all the sentences at the same time (Step 2). Then, with these duration data as well as the mean duration distribution of each state (Step 4), we can obtain the speech rate estimation (Step 5). The duration can be normalized by the speech rate to get the speechrate-independent duration (Step 6). If the duration model is not good enough (Step 8), another cycle will start from Step 4. By introducing this iterative algorithm, the normalized duration probability distributions are estimated. 3.2 Duration Normalization in Decoding Process In decoding process, the method is different. Due to the ineluctable recognition errors, speech rate estimation is really difficult. A skillful method uses the following steps: Step 1. Recognize the speech data only with observation probabilities p(o=s). Step 2. Get the duration d il of each state i in sentence l. Step 3. Calculate the speech rate of each state for each sentence, in which the speech rate of state k in sentence l is d kl =μ k. Step 4. Sort the speech rate of each state for each sentence. In sentence l the speech rate is ranked from fd kl =μ k, k = 1; 2;:::;Mg to n d 0 kl o ; k = 1; 2;:::;M : μ 0 k Step 5. Compute the speech rate of each sentence. In sentence l that is speed l = 1 M 3M=4 X k=m=4 d 0 kl μ 0 k (10) Step 6. Normalize the duration of each state in each sentence with (9). Step 7. Calculate the probability of duration of each state p(d kl =s k ) with (5). In the first stage of decoding process we acquire the best recognition results only with the observation distributions (Step 1) and collect the durations of all states (Step 2). With these durations and the mean duration distribution of each state, we can get the speech rate of the sentence, which is similar to the training process. But for inserting or deleting syllables, the speech rate d k =m k of error state s k is much higher or lower than real speech rate of this sentence. Hence the accurate speech rate is difficult to get. To solve this problem, an effective method is employed as follows. First of all, we sort the absolute speed of each state (Steps 3 and 4). And then just using those items belonging to the area of [M=4; 3M=4], the sentence speed is computed (Step 5). Finally normalized duration is obtained (Step 6) and the duration probability of each state can be calculated (Step 7). Eliminating the disturbance of insert and delete error syllables, the correct duration probabilities can be acquired normally. 3.3 Utilize Normalized Duration in Decoding Process Mandarin speech includes 408 different nontone syllables, called all-syllables. A lot of highlyconfusing syllables are included in the all-syllables list. The recognition accuracy of all-syllables has a great effect on the performance of a dictation machine or a dialog system. The syllable error rate of all-syllables can also represent the performance of

4 CHEN Y N et al.: Speech Rate Robustness in Speech Recognition 759 acoustic models in the continuous speech recognition system. In the first stage, our base-line speaker independent speech recognition system can obtain N- Best all-syllable recognition results with 3 different syllable numbers. In the second stage, the duration models are used for improving recognition accuracy. The whole process consists of the following two steps. Step 1. With the assumptions in Subsection 2.1, the logarithmic likelihood LL(O; D=S) = log (p(o; D=S)) can be computed by the equation: LL(O; D=S) = TX log(p(o i =S))+ MX k=1 log(p(d k =s k )) (11) with which we can get duration probability p(d k =s k ) of state k through the steps in Subsection 3.2, and attain the p(o i =S) in the first stage of our baseline system. Using these items, the LL(O; D=S) of each result can be easily calculated without high computational complexity. The best candidate with the maximum LL(O; D=S) can be obtained as the result. Because the results with the wrong syllable number usually have bad duration probabilities, utilizing the duration information can effectively improve recognition accuracy. Step 2. The experiments reveal that some inserting errors still exist after the previous step. For example, sometimes the syllable [tang] may be recognized as [ta] and [ang] as shown in Fig.2. This kind of errors is about 70% of total insert ones, so some further process is required. In the best candidate, some syllables are easily divided into two syllables, the second of which is often nonconsonant syllable. For instance, the syllable [tang] is usually recognized as [ta] and [ang]. Accordingly, the speech data, which are joined by each couple of neighboring syllables, is aligned again with other syllable's acoustic model. Thus, we can gain the new log likelihood LL(O; D=S), for the duration probability has changed. Then, the new log likelihood is compared with the inherent one. If the new one is better, it will replace the position of original two syllables. So that [t ang] is selected as the correct result, if LL [t ang] (O; D=S) is higher than LL [ta] (O; D=S) + LL [tang] (O; D=S). Altogether, this method is to merge two neighboring syllables and then check it, so we can simply call it as Merge- Check algorithm. Fig.2. Insert error example. 4 Experimental Results The following experiments are based on two different speech corpora to test the robustness of our algorithm. The basic acoustic units adopted in our system are bi-phone units, which include 101 initials, 146 tonal finals, 1 pause and 1 silence. Each Initial has 2 states while each Final has 4. There are totally 788 states. For each state we use Gaussian mixtures to approach it. The component amount of each state is carefully selected. 4.1 Experiment with 863 Corpus The first experiment is done with the National 863 Standard Mandarin Speech Corpus [7]. Training database includes 20 hours' data spoken by 34 female speakers. Testing database contains 3.6 hours' data spoken by 6 female speakers. The speech rates of these sentences are about 0.6 times to 1.5 times compared with the normal speech rate. The speech is sampled by 16KHz and quantified linearly into 16 bits. 20ms frame length and 10ms frame overlap are used. 15 (including C 0 ) mel-frequency perceptual linear predictive coefficients (MF-PLP) [8] and their first and second order derivatives are adopted. The syllable error rate of Mandarin 408 different non-tone syllables is used to measure the system performance. The result is shown in Table 1. The error rate of our baseline system for the acou- Table 1. Word Error Rate Improvement for the 863 Corpus (%) Baseline Method A Method B Decrease from Baseline to Method B Delete error rate :44 Insert error rate Substitute error rate Error rate

5 760 J. Comput. Sci. & Technol., Nov. 2003, Vol.18, No.6 stic models is 22.69%, which is quite good performance in the published Mandarin speech recognition systems. The training and decoding processes are just as the steps described previously. In the baseline system we get the best candidate with the maximum p(o=s). In the method A system, we use the candidate afterstep 1 insubsection 3.3, which can achieve the best syllable number. In the method B system, we use the candidate after Step 2 in Subsection 3.3. It is improved based on method A by the Merge-Check algorithm. It is clear that both steps in Subsection 3.3 do help to decrease the error rate. Especially after Step 2, the syllable error rate is cut down to 12.58%, while the insert error rate reduces by 80.50%. Although the delete error rate increases, it is only 0.27% higher. 4.2 Experiment of Microsoft Corpus In this section the corpus is supplied by Microsoft Research Asia [9]. The training set is read by 100 male speakers, each speaking approximately 200 sentences, with a total of 19,688 sentences and 454,315 syllables. The test set involves 25 male speakers, with 20 test sentences per speaker. The speech rate of the test set is 0.8 to 1.3 times from normal. The speech is sampled by 16KHz and quantified into 16 bits. 20ms frame length and 10ms frame overlap are used. 13 (including C 0 ) mel-frequency cepstral coefficients (MFCC) and their first and second order of derivates are adopted. From the results of method B, we can see that our method gets almost the same improvement with Microsoft Corpus, while no parameters are changed. And, the results of method C display that employing the un-normalized duration model will cause a sudden drop of the system performance. The syllable error rate of Mandarin 408 different non-tone syllables is also adopted as the system performance's measure. The error rate of our baseline system for the acoustic models is 26.91%, which is comparable with the performance announced by Microsoft Research Asia in 2001 [9]. The same experiment was done without changing any parameter in our system. Table 2 shows the comparative performance. Here, method C introduces the duration model without speech rate normalization into the system. Table 2. The Improvement of Word Error Rates with the Microsoft Corpus (%) Baseline Method B Method C Decrease from Baseline to Method B Delete error rate :38 Insert error rate Substitute error rate Error rate In addition, the algorithms proposed in this paper have low computational complexity. The recognition time for method B is just increased by 10% and the memory demand is about 6K bytes more than that in the baseline system. Fig.3 shows the probability of insert error versus speech rate collected from Microsoft Corpus. Speech rate more than 1.0 means that the sentence is spoken slower than normal. The insert probability means how many percent of sentences at that speech rate have insert errors. For example, when the speech rate is about 1.3, insert errors will appear in every sentence. It is clear that the insert errors increase while the speech rate rises. Fig.4 reveals the probability of insert error versus speech rate after using our method. It is obvious that the reduction of the insert errors at high speech rate is much more than that at low speech rate. Now the insert error is almost independent of the speech rate. But when the speech rate is ultra-high, our algorithms show little help. 4.3 Relationship Between Insert Error and Speech Rate Fig.3. Probability of insert error vs. speech rate of Microsoft Corpus.

6 CHEN Y N et al.: Speech Rate Robustness in Speech Recognition 761 References Fig.4. Probability of insert error vs. speech rate of Microsoft Corpus. 5 Conclusions As shown in Subsections 4.1 and 4.2, our method gets nearly the same improvements with different corpora and various features. With speech rate independent duration model, the total word error rate decreases about 10 percent and the reduction of the insert errors is most remarkable, which is cut down to 20% 40%. This performance is much better than the method with speech rate un-normalized duration model. From Subsection 4.3 our method can also make the insert error independent of speech rate. Like other methods, decreasing the insert errors also leads to the increasing of delete errors. But in most Mandarin speech recognition systems the delete error rate is always very low and in our baseline system it is no more than 1%. After applying the normalized duration model in our system, the increasing of delete errors is less than 0.5%, which is considered to be acceptable. Since our method is used in the second stage of recognition process and does not need adaptation, the computational complexity is just 10% higher than the conventional HMM. Furthermore, this algorithm can be naturally adopted in any system that contains duration model. From the results in Subsection 4.3, our method works well when the speech rate is diverse in the common condition. Future work will be done under the condition of ultra-high or ultra-low speech rate. Acknowledgement We are grateful to Microsoft Research Asia for supplying the Corpus. [1] David W Carroll. Psychology of Language. Third Edition, Brooks/Cole Publishing Company, [2] Zheng J, Franco H, Stolcke A. Rate-of-speech modeling for large vocabulary conversational speech recognition. In Proc. the ISCA ITRW ASR2000, Paris, France, 2000, pp [3] Martinez F, Tapias D, Alvarez J. Towards speech rate independence in large vocabulary continuous speech recognition. In Proc. ICASSP, vol. 2, New York, NY, USA, May 12 15, 1998, pp [4] Kwon O W, Un C K. Context dependent word duration modeling for Korean connected digit recognition. Electronic Letters, 1995, 31(19): [5] Steve Young. Statistical modeling in continuous speech recognition. In Proc. Int. Conf. Uncertainty in Artificial Intelligence, Seattle, WA, Aug. 2001, pp [6] Liu Jia, Pan S X. A new robust telephone speech recognition algorithm with the multi-model structures. Chinese Journal of Electronics, Apr. 2000, 9(2): [7] Wang R H. National performance assessment of speech recognition systems of Chinese. In Proc. Oriental CO- COSDA Workshop'99, Taipei, 1999, pp [8] Woodland P C, Gales M J F, Pye D et al. Broadcast news transcription using HTK. In Proc. ICASSP'97, Los Alamitos, CA, USA, 1997, pp [9] Eric Chang, Yu Shi, Jianlai Zhou et al. Speech lab in a box: A mandarin speech toolbox to jumpstart speech related research. In Proc. Eurospeech, Aalborg, Denmark, 2001, pp CHEN YiNing received the B.S. and M.S. degrees in electronic engineering from Tsinghua University in 1999 and 2001 respectively. He is a Ph.D. candidate at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition and spoken language processing. ZHU Xuan received the B.S. degree in electronic engineering from Beijing University of Astronautics and Aeronautics in 1998 and M.S. degree in electronic engineering from Tsinghua University in She is a Ph.D. candidate at the Department of Electronic Engineering of Tsinghua University. Her research interests are speech recognition and embedded signal processing system design. LIU Jia received the B.S., M.S. and Ph.D. degrees in electronic engineering from Tsinghua University from 1983, 1986 and 1990 respectively, and was a post doctor in Cambridge University from 1992 to He is a professor at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition, speech coding, speech synthesis and speech ASIC design. Dr. Liu is a member of IEEE and a senior member of China Institute of Electronics. LIU RunSheng received the B.S. degree in electronic engineering from Tsinghua University in He is a professor at the Department of Electronic Engineering of Tsinghua University. His research interests are speech recognition, IC design and CAD.

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition , Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology

More information

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Automatic slide assignation for language model adaptation

Automatic slide assignation for language model adaptation Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Combating Anti-forensics of Jpeg Compression

Combating Anti-forensics of Jpeg Compression IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No 3, November 212 ISSN (Online): 1694-814 www.ijcsi.org 454 Combating Anti-forensics of Jpeg Compression Zhenxing Qian 1, Xinpeng

More information

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating Window-Based Passage-Level Evidence in Document Retrieval Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

Loudspeaker Equalization with Post-Processing

Loudspeaker Equalization with Post-Processing EURASIP Journal on Applied Signal Processing 2002:11, 1296 1300 c 2002 Hindawi Publishing Corporation Loudspeaker Equalization with Post-Processing Wee Ser Email: ewser@ntuedusg Peng Wang Email: ewangp@ntuedusg

More information

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney

ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH. David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney ADVANCES IN ARABIC BROADCAST NEWS TRANSCRIPTION AT RWTH David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department,

More information

Fault Analysis in Software with the Data Interaction of Classes

Fault Analysis in Software with the Data Interaction of Classes , pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network , pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,

More information

Optimization of PID parameters with an improved simplex PSO

Optimization of PID parameters with an improved simplex PSO Li et al. Journal of Inequalities and Applications (2015) 2015:325 DOI 10.1186/s13660-015-0785-2 R E S E A R C H Open Access Optimization of PID parameters with an improved simplex PSO Ji-min Li 1, Yeong-Cheng

More information

Circle Object Recognition Based on Monocular Vision for Home Security Robot

Circle Object Recognition Based on Monocular Vision for Home Security Robot Journal of Applied Science and Engineering, Vol. 16, No. 3, pp. 261 268 (2013) DOI: 10.6180/jase.2013.16.3.05 Circle Object Recognition Based on Monocular Vision for Home Security Robot Shih-An Li, Ching-Chang

More information

Reliable and Cost-Effective PoS-Tagging

Reliable and Cost-Effective PoS-Tagging Reliable and Cost-Effective PoS-Tagging Yu-Fang Tsai Keh-Jiann Chen Institute of Information Science, Academia Sinica Nanang, Taipei, Taiwan 5 eddie,chen@iis.sinica.edu.tw Abstract In order to achieve

More information

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James

More information

Generating Training Data for Medical Dictations

Generating Training Data for Medical Dictations Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN pakhomov.sergey@mayo.edu Michael Schonwetter Linguistech Consortium, NJ MSchonwetter@qwest.net Joan Bachenko

More information

Artificial Neural Network for Speech Recognition

Artificial Neural Network for Speech Recognition Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken

More information

Input Support System for Medical Records Created Using a Voice Memo Recorded by a Mobile Device

Input Support System for Medical Records Created Using a Voice Memo Recorded by a Mobile Device International Journal of Signal Processing Systems Vol. 3, No. 2, December 2015 Input Support System for Medical Records Created Using a Voice Memo Recorded by a Mobile Device K. Kurumizawa and H. Nishizaki

More information

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering

More information

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA

SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,

More information

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

More information

Online Diarization of Telephone Conversations

Online Diarization of Telephone Conversations Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of

More information

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL

PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra

More information

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language

AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language AUDIMUS.media: A Broadcast News Speech Recognition System for the European Portuguese Language Hugo Meinedo, Diamantino Caseiro, João Neto, and Isabel Trancoso L 2 F Spoken Language Systems Lab INESC-ID

More information

Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

More information

EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION

EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION By Ramasubramanian Sundaram A Thesis Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the

More information

L9: Cepstral analysis

L9: Cepstral analysis L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,

More information

METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS.

METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS. SEDM 24 June 16th - 18th, CPRI (Italy) METHODOLOGICL CONSIDERTIONS OF DRIVE SYSTEM SIMULTION, WHEN COUPLING FINITE ELEMENT MCHINE MODELS WITH THE CIRCUIT SIMULTOR MODELS OF CONVERTERS. Áron Szûcs BB Electrical

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

Myanmar Continuous Speech Recognition System Based on DTW and HMM

Myanmar Continuous Speech Recognition System Based on DTW and HMM Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-

More information

Ultrasonic Detection Algorithm Research on the Damage Depth of Concrete after Fire Jiangtao Yu 1,a, Yuan Liu 1,b, Zhoudao Lu 1,c, Peng Zhao 2,d

Ultrasonic Detection Algorithm Research on the Damage Depth of Concrete after Fire Jiangtao Yu 1,a, Yuan Liu 1,b, Zhoudao Lu 1,c, Peng Zhao 2,d Advanced Materials Research Vols. 368-373 (2012) pp 2229-2234 Online available since 2011/Oct/24 at www.scientific.net (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/amr.368-373.2229

More information

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

More information

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications

The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.

More information

Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition

Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition Automatic Cross-Biometric Footstep Database Labelling using Speaker Recognition Ruben Vera-Rodriguez 1, John S.D. Mason 1 and Nicholas W.D. Evans 1,2 1 Speech and Image Research Group, Swansea University,

More information

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,

More information

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network Proceedings of the 8th WSEAS Int. Conf. on ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING & DATA BASES (AIKED '9) ISSN: 179-519 435 ISBN: 978-96-474-51-2 An Energy-Based Vehicle Tracking System using Principal

More information

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan 7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation

More information

Developing an Isolated Word Recognition System in MATLAB

Developing an Isolated Word Recognition System in MATLAB MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services

Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Achieve Better Ranking Accuracy Using CloudRank Framework for Cloud Services Ms. M. Subha #1, Mr. K. Saravanan *2 # Student, * Assistant Professor Department of Computer Science and Engineering Regional

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

CONATION: English Command Input/Output System for Computers

CONATION: English Command Input/Output System for Computers CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India

More information

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31 Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the

More information

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India

2695 P a g e. IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India Integrity Preservation and Privacy Protection for Digital Medical Images M.Krishna Rani Dr.S.Bhargavi IV Semester M.Tech (DCN) SJCIT Chickballapur Karnataka India Abstract- In medical treatments, the integrity

More information

Automatic Recognition Algorithm of Quick Response Code Based on Embedded System

Automatic Recognition Algorithm of Quick Response Code Based on Embedded System Automatic Recognition Algorithm of Quick Response Code Based on Embedded System Yue Liu Department of Information Science and Engineering, Jinan University Jinan, China ise_liuy@ujn.edu.cn Mingjun Liu

More information

The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

More information

A Stock Pattern Recognition Algorithm Based on Neural Networks

A Stock Pattern Recognition Algorithm Based on Neural Networks A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER Gholamreza Anbarjafari icv Group, IMS Lab, Institute of Technology, University of Tartu, Tartu 50411, Estonia sjafari@ut.ee

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Subjective SNR measure for quality assessment of. speech coders \A cross language study

Subjective SNR measure for quality assessment of. speech coders \A cross language study Subjective SNR measure for quality assessment of speech coders \A cross language study Mamoru Nakatsui and Hideki Noda Communications Research Laboratory, Ministry of Posts and Telecommunications, 4-2-1,

More information

A Neural Network and Web-Based Decision Support System for Forex Forecasting and Trading

A Neural Network and Web-Based Decision Support System for Forex Forecasting and Trading A Neural Network and Web-Based Decision Support System for Forex Forecasting and Trading K.K. Lai 1, Lean Yu 2,3, and Shouyang Wang 2,4 1 Department of Management Sciences, City University of Hong Kong,

More information

A Digital Audio Watermark Embedding Algorithm

A Digital Audio Watermark Embedding Algorithm Xianghong Tang, Yamei Niu, Hengli Yue, Zhongke Yin Xianghong Tang, Yamei Niu, Hengli Yue, Zhongke Yin School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, Zhejiang, 3008, China tangxh@hziee.edu.cn,

More information

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm , pp. 99-108 http://dx.doi.org/10.1457/ijfgcn.015.8.1.11 Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm Wang DaWei and Wang Changliang Zhejiang Industry Polytechnic College

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS

SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS Mbarek Charhad, Daniel Moraru, Stéphane Ayache and Georges Quénot CLIPS-IMAG BP 53, 38041 Grenoble cedex 9, France Georges.Quenot@imag.fr ABSTRACT The

More information

Meeting Scheduling with Multi Agent Systems: Design and Implementation

Meeting Scheduling with Multi Agent Systems: Design and Implementation Proceedings of the 6th WSEAS Int. Conf. on Software Engineering, Parallel and Distributed Systems, Corfu Island, Greece, February 16-19, 2007 92 Meeting Scheduling with Multi Agent Systems: Design and

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION Ulpu Remes, Kalle J. Palomäki, and Mikko Kurimo Adaptive Informatics Research Centre,

More information

A secure face tracking system

A secure face tracking system International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF

A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF THE µ-law PNLMS ALGORITHM Laura Mintandjian and Patrick A. Naylor 2 TSS Departement, Nortel Parc d activites de Chateaufort, 78 Chateaufort-France

More information

Discriminative Multimodal Biometric. Authentication Based on Quality Measures

Discriminative Multimodal Biometric. Authentication Based on Quality Measures Discriminative Multimodal Biometric Authentication Based on Quality Measures Julian Fierrez-Aguilar a,, Javier Ortega-Garcia a, Joaquin Gonzalez-Rodriguez a, Josef Bigun b a Escuela Politecnica Superior,

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic Approach*

How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic Approach* Int. J. Engng Ed. Vol. 22, No. 6, pp. 1281±1286, 2006 0949-149X/91 $3.00+0.00 Printed in Great Britain. # 2006 TEMPUS Publications. How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Online Tuning of Artificial Neural Networks for Induction Motor Control

Online Tuning of Artificial Neural Networks for Induction Motor Control Online Tuning of Artificial Neural Networks for Induction Motor Control A THESIS Submitted by RAMA KRISHNA MAYIRI (M060156EE) In partial fulfillment of the requirements for the award of the Degree of MASTER

More information

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke 1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline

More information

CONSTRUCTION PROJECT BUFFER MANAGEMENT IN SCHEDULING PLANNING AND CONTROL

CONSTRUCTION PROJECT BUFFER MANAGEMENT IN SCHEDULING PLANNING AND CONTROL CONSTRUCTION PROJECT BUFFER MANAGEMENT IN SCHEDULING PLANNING AND CONTROL Jan, Shu-Hui Ph.D. Student Construction Engineering & Management Program Department of Civil Engineering National Taiwan University

More information

Design and Implementation of Supermarket Management System Yongchang Rena, Mengyao Chenb

Design and Implementation of Supermarket Management System Yongchang Rena, Mengyao Chenb 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Design and Implementation of Supermarket Management System Yongchang Rena, Mengyao Chenb College

More information

Research on the UHF RFID Channel Coding Technology based on Simulink

Research on the UHF RFID Channel Coding Technology based on Simulink Vol. 6, No. 7, 015 Research on the UHF RFID Channel Coding Technology based on Simulink Changzhi Wang Shanghai 0160, China Zhicai Shi* Shanghai 0160, China Dai Jian Shanghai 0160, China Li Meng Shanghai

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

FOREX TRADING PREDICTION USING LINEAR REGRESSION LINE, ARTIFICIAL NEURAL NETWORK AND DYNAMIC TIME WARPING ALGORITHMS

FOREX TRADING PREDICTION USING LINEAR REGRESSION LINE, ARTIFICIAL NEURAL NETWORK AND DYNAMIC TIME WARPING ALGORITHMS FOREX TRADING PREDICTION USING LINEAR REGRESSION LINE, ARTIFICIAL NEURAL NETWORK AND DYNAMIC TIME WARPING ALGORITHMS Leslie C.O. Tiong 1, David C.L. Ngo 2, and Yunli Lee 3 1 Sunway University, Malaysia,

More information

WEI CHEN. IT-enabled Innovation, Online Community, Open-Source Software, Startup Angel Funding, Interactive Marketing, SaaS Model

WEI CHEN. IT-enabled Innovation, Online Community, Open-Source Software, Startup Angel Funding, Interactive Marketing, SaaS Model WEI CHEN Rady School of Management University of California, San Diego 9500 Gilman Drive, MC 0553 La Jolla, CA 92093-0553 +1(858)337-5951 +1(858)534-0862 wei.chen@rady.ucsd.edu www.mrweichen.info RESEARCH

More information

Tracking and Recognition in Sports Videos

Tracking and Recognition in Sports Videos Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer

More information

A Dynamic Approach to Extract Texts and Captions from Videos

A Dynamic Approach to Extract Texts and Captions from Videos Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

HMM-based Breath and Filled Pauses Elimination in ASR

HMM-based Breath and Filled Pauses Elimination in ASR HMM-based Breath and Filled Pauses Elimination in ASR Piotr Żelasko 1, Tomasz Jadczyk 1,2 and Bartosz Ziółko 1,2 1 Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science

More information

Multi-Lingual Display of Business Documents

Multi-Lingual Display of Business Documents The Data Center Multi-Lingual Display of Business Documents David L. Brock, Edmund W. Schuster, and Chutima Thumrattranapruk The Data Center, Massachusetts Institute of Technology, Building 35, Room 212,

More information

Multihopping for OFDM based Wireless Networks

Multihopping for OFDM based Wireless Networks Multihopping for OFDM based Wireless Networks Jeroen Theeuwes, Frank H.P. Fitzek, Carl Wijting Center for TeleInFrastruktur (CTiF), Aalborg University Neils Jernes Vej 12, 9220 Aalborg Øst, Denmark phone:

More information