A Direct Approach for Speech to Speech Translation Project Report
|
|
- Martha Hampton
- 7 years ago
- Views:
Transcription
1 A Direct Approach for Speech to Speech Translation Project Report 1 Kishore Prahallad Language Technologies Institute, Carnegie Mellon University skishore@cs.cmu.edu I. Introduction to Speech Translation The objective of a speech to speech translation (SST) system is to convert speech from one language to another. A typical SST system consists of three components: 1. Speech recognition system (ASR) which converts speech in source language to its corresponding text 2. Machine translation system (MT) which translates text in source language to text in target language and 3. Speech synthesis system (SS) which converts text in target language to its spoken form. The conventional architecture of such SST system is shown in Fig. 1. It is a cascade architecture, where ASR, MT and SS systems are loosely coupled to form a SST system. The output obtained from an ASR system is given as input to an MT system, and the output obtained from the MT system is given an input to a SS system. Given a perfect ASR and MT systems, this architecture may be sufficient to achieve the goal of SST system. However, due to the limitations of current state-of-art technology in ASR and MT areas there are errors or ambiguities in the output of these components which are propagated to its successive components. The conventional cascade architecture does not provide efficient coordination or feedback among the components to achieve an optimal performance. To improve the performance of SST systems many researchers are working on integrated models using finite state machines [1][2]. In this report, we adapt a different perspective of performing a speech-speech translation. We view the source-language speech vectors as observable sequence produced by the target-language word models. Our speech to speech translation system has only two components: 1. Source-speech to target-text recognition (including the word ordering and syntax) and 2. Target-text to speech conversion. To model source-speech to target-text translation we estimate P(TS/SS), where SS is source speech vectors and TS is target sentence. P(TS/SS) = P(SS/TS) P(TS) where P(SS/TS) is the likelihood and P(TS) is the language model of the target-language. P(SS/TS) essentially models the cross-language correspondence between target-text and source-speech without any intermediate steps. The cross-language word models can be trained using fully connected trellis and continuous distribution models. In this report we demonstrate how such a direct model could be built for SST system and report the performance of direct model for a limited domain Telugu-English SST system. This report is organized as follows: Section II formulates the direct model approach
2 2 for the task of speech translation. Section III gives an intuitive reasoning about why a direct model should work. Section IV describes the Telugu-English limited domain system built using this direct model. Section V discusses the performance of the direct model for speech translation. Fig. 1. Cascade architecture of a typical speech to speech translation system. II. Formulation of a Direct Model Given the acoustics A s, the goal of SST system is to obtain acoustics A t corresponding to a target language such that P (A t /A s ) is maximized. To achieve this goal, conventional SST systems use three components: ASR, MT and TTS systems. ASR system models P (W s /A s ) where P (W s ) is the source language text corresponding to the acoustics A s. P (W s /A s ) = P (A s /W s )P (W s ), where P (A s /W s ) is the acoustic model and P (W s ) is the language model of the source language. MT system models P (W t /W s ) where P (W t ) is the translation of the source text W s. P (W t /W s ) = P (W s /W t )P (W t ), where P (W s /W t ) is the translation model and P (W t ) is the language model of the target language. TTS system models P (A t /W t ) where P (A t ) is the acoustic sequence corresponding to target text W t. Current state-of-art TTS systems use unit selection approach or HMM based speech synthesis techniques to obtain natural speech. Using these three components, P (A t /A s ) is obtained as factored probabilities. P (A t /A s ) = P (A t /W t )P (W t /W s )P (W s /A s ). Our proposal is to integrate MT system P (W t /W s ) and ASR system P (W s /A s ) to obtain a direct model which is given by P (W t /A s ) = P (A s /W t )P (W t ), where P (A s /W t ) is the cross-language acoustic model and P (W t ) is the language model of the target language. Using this direct model, P (A t /A s ) = P (A t /W t )P (W t /A s ). This ap-
3 3 proach now uses one acoustic model and one translation model as apposed to one acoustic, one translation model and two language models used in cascaded SST system. III. How Direct Model Works To understand how a direct model could be built/work for speech translation, let us review the HMM work on Statistical Machine Translation (SMT) system. Let us look at a case of two example languages S1 and T1, where S1 has two words x1 and x2 and T1 has three words y1, y2 and y3. To build a SMT system for translating S1 to T1, we need a set of parallel sentences which could be as follows: t1: y3 y1 y2 s1: x1 x2 t2: y1 y2 s2: x1 t3: y3 s3: x2 Each word (y1, y2 and y3) in the target language is represented with one HMM state and is referred to as word model. These models are trained using the parallel data as follows: Given a translation pair say t1 and s1, a sentence model is built for t1 with y1, y2 and y3 word models which are fully-connected i.e., each word in connected to all other words with equal transition probabilities as shown in Fig.2(a). This sentence model is aligned with the source text s1 using Baum-Welsh training algorithm. It should be noted that we use fully-connected sentence model as the positional correspondence between target words and source words is not known apriori. Using such fully connected sentence model the SMT system slowly learns the association between the target words and source words. For example after observing the sentences t2 and t3, the probability counts of y1 and y2 would be more biased for x1, while y3 would be biased for x2. (a) (b) Fig. 2. (a) A fully-connected trellis as used in SMT systems (b) A sequential connected trellis used in typical ASR systems
4 4 A. Vector Representation for Source Language Words Let us assume that each of the source language word is represented by a set of vectors which have a direct correspondence with the word either in the text domain or in some transformed domain. Let x1 = g 1, g 2..g n and x2 = h 1, h 2..h n. Let us use these vectors and replace the parallel data as follows: t1: y3 y1 y2 s1: g 1..g n h 1..h n t2: y1 y2 s2: g 1..g n t3: y3 s3: h 1..h n In this parallel data each target word is associated with a sequence of source-language vectors. When the vector g 1 is observed in the source language side then the corresponding target word (say y1 or y2) is likely to observe the next n-1 vectors. In other words, a sentence model would be built with word models which are fully-connected but the selftransition probabilities of these models would be higher than the other transitions. This sentence model would be similar to the one used in standard SMT (Fig.2(a)), but with the following differences: Self-transition probabilities is higher than the other transitions Since g, h are a sequence of vectors, the emission probabilities would be obtained from a continuous distribution model such as Gaussian Mixture Model. So far, we have not defined the vectors g, h. These vectors could be any features representing the words x1 and x2. For a speech translation system, these vectors are the features representing the spoken form of the words x1 and x2. The process of obtaining these vectors from a speech signal is explained in Section IV-A. However, given the nature of speech signal the following are the major issues in building a direct model for speechtranslation system: The spoken form of a given word is of varying length. Each time a word is spoken, it does not yield the same sequence of vectors but they are drawn from a unknown distribution. To incorporate the varying observation length, each word model will have different selftransition probabilities which could be learned as a process of training. Each word may also be modeled with more than one state, but the method of determining the number of states has to be explored. To show how this training is different from standard speech recognition model (monolanguage) training, we are showing the trellis connection of the sentence model t1 but for training its own acoustics in Fig.2(b). In mono-language training the sentence model is built as a sequence of word models, thus only sequential connections are allowed. At the same time the number of states in each word model is different and is know apriori. IV. Building a Direct Model for Telugu-English SST To explore the effectiveness of this approach, we have built a direct model for a Telugu- English limited domain application. Since there is no benchmark corpus available to evaluate the approach of direct model, we have developed a limited domain Telugu-English speech corpus recorded by a single speaker. There are 78 parallel sentences in this corpus corresponding to Telugu-English travel guide application. The 78 sentences corresponding to Telugu text are read out by a male speaker. Each utterance is recorded three times so
5 5 that there is more than one example to train a statistical model. The recording is done in a typical lab environment using the multimedia facilities available with the Linux Desktop. The primary purpose of this corpus is to study whether a direct model could be trained and to explore the effectiveness of the direct model. A. Features Representing the Speech Signal Give a speech signal, features are extracted from short-time (10-30 ms) processing of the signal and are referred to as segmental features. Some of the segmental features are linear prediction cepstral coefficients, Melcepstral coefficients, log spectral energy values, etc. [3]. These features represent the short-term spectra of the speech signal. The spectrum of a speech segment is determined primarily by the shape of the vocal tract. In this work, the spectral features are represented by linear prediction cepstral coefficients [4]. A.1 Pre-processing of Speech Signal The speech signal x(n) is pre-emphasized to counteract the spectral roll-off due to the glottal closure in voiced speech [5]. x(n) = x(n) αx(n 1), where α = 1. Differencing the speech signal in time domain, multiplies the signal spectrum with linear filter to give emphasis to the high frequency components [3]. A.2 Extraction of Mel-Frequency Cepstral Coefficients The characteristics of the speech signal are assumed to be stationary over a short duration of time (between ms) [3]. The differenced speech signal is segmented into frames of 10 ms using a Hamming window with a shift of 5 ms. The differenced speech signal is passed through a set of Mel-Frequency filters to obtain 13 cepstral coefficients. Thus each frame of speech data is represented by a vector of 13 coefficients. A.3 Feature Scaling Modeling and recognition is easier if all the acoustic features have roughly the same numerical range. One of the standard methods of scaling the feature vectors is to normalize the features to have zero mean and to have a specified variance. If X k is the feature vector, then the normalized feature vector is given by X k = k (X k X)/σ, where X is the mean vector and σ is the standard deviation and k is a constant scaling factor. B. Training Word Models To build a direct model, we need to train target word models with the source language acoustics. The following steps clearly explains the process of training the word models. 1. Given: A target sentence (sequence of words) and the corresponding source language acoustics 2. A target sentence model is constructed by concatenating the set of target word models - Each target word model has one or more number of states with left to right transitions and there are no skip states.
6 6 - All the word models are connected (i.e there is allowed transition from one word to any other word). Thus we call the trellis as fully connected. 3. Given this target sentence model, it is aligned with the source-acoustics using forwardbackward algorithm. 4. Steps 2 and 3 are repeated for all the training pairs, and the probability counts are accumulated. 5. Steps 2, 3 and 4 constitute one iteration, and the models are re-estimated using the accumulated probability counts. 6. The re-estimated models are used again to repeat Step 2,3, 4 and 5 until the stopping criteria is met. 7. For all the experiments reported in this study, we have used 10 iterations as stopping criteria of the training process. V. Results and Discussion Once the target word models are trained, a simple evaluation criteria which is taken up in this project is to study the alignments of the direct model. Since this approach is not studied in the literature, we want to see the alignments of the target word models with respect to source language acoustics. A sample alignment for a sentence I want to know. is shown in Table 1. Table 1: An Example Alignment Machine-Labeled Hand-labeled 0 # 0 # SIL SIL i i know know want want to SIL SIL In Table 1, the first two columns are machine generated labels of the English sentence for the Telugu acoustics. The columns three and four show the hand-labeled time stamps for the same acoustic signal. The machine labeled data demonstrate the advantages of direct model. As can be observed, the model is able to learn the acoustics across language. Moreover, the time stamp generated for to indeed correspond to morpheme change of the word want in Telugu, where the marker is attached to the end of the word, and hence to got aligned to the end of the speech segment corresponding to want. To automatically evaluate the alignments, we have hand-labeled some of the training sentences. It should be noted that this labeling is cross-language labeling, i.e, for a given Telugu speech segment the corresponding English word has to be written. Given the effort it takes, we could hand-label 30 sentences in this fashion. Thus training of the models is done on 254 (78 * 3) utterances and the evaluation is done on 30 utterances. The steps followed for the evaluation is as explained below.
7 7 Given a machine generated time stamps for the words, the nearest time stamps in the reference label is searched. If the labeled word and the reference word matches then the score is incremented by one. Performance is reported in terms of score/total words * 100 Table 2. shows the performance of direct model for different states and different Gaussian components used to represent a target word model. Table 2: Performance of Direct Model reported in terms of accuracy States/Word Gaussians/State Accuracy % % % % % n % As can be observed from Table 2, the alignment accuracy increases with the increase in the number of states per word model. The relative high accuracy is obtained using 5 states per word model. The last row in Table 2, refers to our experimental results where we used non-uniform number of states per word model. The number of states for each word model is manually assigned in this case. However it seems to be sub-optimal than the use of 5 states for each word model. VI. Conclusions This report proposed a direct model for speech to speech translation system which integrates ASR and MT components. A set of experiments was carried out on a limited domain speech corpus to investigate whether such models could be trained. Empirical results observed for every frame have shown that such models could be trained. Quantitative analysis have shown that a near 30% alignment accuracy (with reference to hand-label data) could be obtained. The evaluation metric used here was naive and the decision was binary (it does not take into account deviations). While this study has shown the direct model could be trained, a number of issues still remain to be resolved. How to confine on the number of states per word model. In this report, we have empirically used a uniform/non-uniform number of states. However, a better method could be to train a predictor which can predict the number of states for the target word model. How to compensate for word models (specifically articles, prepositions) whose acoustics need not be present in the source language. In this study, we have not taken any specific measure to account for this phenomenon. Adding null observations (similar to the techniques of using null states in statistical machine translation) could be explored. How to build a decoder for this direct model. The direct model captures the acoustics of the target word but does not capture the word order of target language. A decoder for this
8 8 approach would need to resolve this issue and word-spotting techniques, stack decoder, sentence level decoding could be some of the possible directions to explore. VII. Software Resources Recording and labeling (required for testing) of speech data is done under Festvox environment. To extract features from the speech signal, Mel-frequency cepstral coefficients (MFCC) are extracted from the speech signal using the software available from skishore > software To train the word models a HMM trainer and decoder is written in Perl. This package is written specifically for this project. The HMM trainer use Baum-Welsh algorithm to train the models and the decoder uses Viterbi algorithm to find the alignment. This software supports arbitrary number of states for each word model. VIII. Acknowledgment I would like to thank Stephan Vogel, Dr. Alan Black, Dr. Robert Frederking, Dr. Ravi Mosur and Dr. James K Baker for useful discussions and suggestions which have led to refinement of this idea. References [1] H. Ney, Speech translation: Coupling of recognition and translation, in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, [2] Francisco Casacuberta, Enrique Vidal, and Juan Miguel Vilar, Architectures for speech-to-speech translation using finite-state models, in Proceedings of Workshop on Speech-to-Speech Translation: Algorithms and Systems, Philadelphia, [3] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Prentice-Hall, [4] S. Furui, Cepstral analysis technique for automatic speaker verification, IEEE Trans. Acoust., Speech, Signal Processing, vol. 29, pp , Apr [5] D. O Shaughnessy, Speech Communication-Human and machine. Addison-Wesley, 1987.
Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationL9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationAdvanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationA TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
More informationHow To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
More informationBy choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal
More informationLecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationAvailable from Deakin Research Online:
This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationAutomatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
More informationUmbrella: A New Component-Based Software Development Model
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Umbrella: A New Component-Based Software Development Model Anurag Dixit and P.C.
More informationFrom Concept to Production in Secure Voice Communications
From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationFinal Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones
Final Year Project Progress Report Frequency-Domain Adaptive Filtering Myles Friel 01510401 Supervisor: Dr.Edward Jones Abstract The Final Year Project is an important part of the final year of the Electronic
More informationShould we Really Care about Building Business. Cycle Coincident Indexes!
Should we Really Care about Building Business Cycle Coincident Indexes! Alain Hecq University of Maastricht The Netherlands August 2, 2004 Abstract Quite often, the goal of the game when developing new
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationIEEE Proof. Web Version. PROGRESSIVE speaker adaptation has been considered
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification Shou-Chun Yin, Richard Rose, Senior
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationSolutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
More informationNew Pulse Width Modulation Technique for Three Phase Induction Motor Drive Umesha K L, Sri Harsha J, Capt. L. Sanjeev Kumar
New Pulse Width Modulation Technique for Three Phase Induction Motor Drive Umesha K L, Sri Harsha J, Capt. L. Sanjeev Kumar Abstract In this paper, various types of speed control methods for the three
More informationMETHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS.
SEDM 24 June 16th - 18th, CPRI (Italy) METHODOLOGICL CONSIDERTIONS OF DRIVE SYSTEM SIMULTION, WHEN COUPLING FINITE ELEMENT MCHINE MODELS WITH THE CIRCUIT SIMULTOR MODELS OF CONVERTERS. Áron Szûcs BB Electrical
More informationEfficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation
Efficient Data Recovery scheme in PTS-Based OFDM systems with MATRIX Formulation Sunil Karthick.M PG Scholar Department of ECE Kongu Engineering College Perundurau-638052 Venkatachalam.S Assistant Professor
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationStatic Environment Recognition Using Omni-camera from a Moving Vehicle
Static Environment Recognition Using Omni-camera from a Moving Vehicle Teruko Yata, Chuck Thorpe Frank Dellaert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 USA College of Computing
More information2014/02/13 Sphinx Lunch
2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue
More informationVideo Affective Content Recognition Based on Genetic Algorithm Combined HMM
Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationhave more skill and perform more complex
Speech Recognition Smartphone UI Speech Recognition Technology and Applications for Improving Terminal Functionality and Service Usability User interfaces that utilize voice input on compact devices such
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationEnhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm
1 Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm Hani Mehrpouyan, Student Member, IEEE, Department of Electrical and Computer Engineering Queen s University, Kingston, Ontario,
More informationBroadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
More informationMiSeq: Imaging and Base Calling
MiSeq: Imaging and Page Welcome Navigation Presenter Introduction MiSeq Sequencing Workflow Narration Welcome to MiSeq: Imaging and. This course takes 35 minutes to complete. Click Next to continue. Please
More informationLess naive Bayes spam detection
Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationTracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationMachine Translation. Agenda
Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationEfficient Recovery of Secrets
Efficient Recovery of Secrets Marcel Fernandez Miguel Soriano, IEEE Senior Member Department of Telematics Engineering. Universitat Politècnica de Catalunya. C/ Jordi Girona 1 i 3. Campus Nord, Mod C3,
More informationVEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS
VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James
More informationSpeech recognition for human computer interaction
Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices
More informationSOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY
3 th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 296 SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY ASHOK KUMAR SUMMARY One of the important
More informationComp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition
Comp 14112 Fundamentals of Artificial Intelligence Lecture notes, 2015-16 Speech recognition Tim Morris School of Computer Science, University of Manchester 1 Introduction to speech recognition 1.1 The
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationMusic Genre Classification
Music Genre Classification Michael Haggblade Yang Hong Kenny Kao 1 Introduction Music classification is an interesting problem with many applications, from Drinkify (a program that generates cocktails
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationNon-Data Aided Carrier Offset Compensation for SDR Implementation
Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center
More informationUnderstanding CIC Compensation Filters
Understanding CIC Compensation Filters April 2007, ver. 1.0 Application Note 455 Introduction f The cascaded integrator-comb (CIC) filter is a class of hardware-efficient linear phase finite impulse response
More informationREAL TIME TRAFFIC LIGHT CONTROL USING IMAGE PROCESSING
REAL TIME TRAFFIC LIGHT CONTROL USING IMAGE PROCESSING Ms.PALLAVI CHOUDEKAR Ajay Kumar Garg Engineering College, Department of electrical and electronics Ms.SAYANTI BANERJEE Ajay Kumar Garg Engineering
More informationCreating voices for the Festival speech synthesis system.
M. Hood Supervised by A. Lobb and S. Bangay G01H0708 Creating voices for the Festival speech synthesis system. Abstract This project focuses primarily on the process of creating a voice for a concatenative
More information5. Binary objects labeling
Image Processing - Laboratory 5: Binary objects labeling 1 5. Binary objects labeling 5.1. Introduction In this laboratory an object labeling algorithm which allows you to label distinct objects from a
More information(Refer Slide Time: 01:52)
Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationMFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming
International Journal of Science and Research (IJSR) MFCC-Based Voice Recognition System for Home Automation Using Dynamic Programming Sandeep Joshi1, Sneha Nagar2 1 PG Student, Embedded Systems, Oriental
More informationConditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
More informationCHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present
CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,
More informationCODED SOQPSK-TG USING THE SOFT OUTPUT VITERBI ALGORITHM
CODED SOQPSK-TG USING THE SOFT OUTPUT VITERBI ALGORITHM Daniel Alam Department of Electrical Engineering & Computer Science University of Kansas Lawrence, KS 66045 danich@ku.edu Faculty Advisor: Erik Perrins
More informationAlgorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha
Algorithm & Flowchart & Pseudo code Staff Incharge: S.Sasirekha Computer Programming and Languages Computers work on a set of instructions called computer program, which clearly specify the ways to carry
More informationSpeech Processing Applications in Quaero
Speech Processing Applications in Quaero Sebastian Stüker www.kit.edu 04.08 Introduction! Quaero is an innovative, French program addressing multimedia content! Speech technologies are part of the Quaero
More informationMarathi Interactive Voice Response System (IVRS) using MFCC and DTW
Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,
More informationA MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS
A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationCROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES
Proceedings of the 2 nd Workshop of the EARSeL SIG on Land Use and Land Cover CROP CLASSIFICATION WITH HYPERSPECTRAL DATA OF THE HYMAP SENSOR USING DIFFERENT FEATURE EXTRACTION TECHNIQUES Sebastian Mader
More informationCONATION: English Command Input/Output System for Computers
CONATION: English Command Input/Output System for Computers Kamlesh Sharma* and Dr. T. V. Prasad** * Research Scholar, ** Professor & Head Dept. of Comp. Sc. & Engg., Lingaya s University, Faridabad, India
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationLog-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network
Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering
More informationTime Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper
More informationSemantic Video Annotation by Mining Association Patterns from Visual and Speech Features
Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering
More informationB. Raghavendhar Reddy #1, E. Mahender *2
Speech to Text Conversion using Android Platform B. Raghavendhar Reddy #1, E. Mahender *2 #1 Department of Electronics Communication and Engineering Aurora s Technological and Research Institute Parvathapur,
More informationDATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
More informationReconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio
Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio 1 Anuradha S. Deshmukh, 2 Prof. M. N. Thakare, 3 Prof.G.D.Korde 1 M.Tech (VLSI) III rd sem Student, 2 Assistant Professor(Selection
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationFace Model Fitting on Low Resolution Images
Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationFinding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
More informationA Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman
A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints
More information