Japanese Phoneme Recognition Based on Recurrent Neural Network Integrating Dynamic Parameters
|
|
- Tobias Franklin
- 7 years ago
- Views:
Transcription
1 Journal of Communication and Computer 9 (2012) D DAVID PUBLISHING Japanese Phoneme Recognition Based on Recurrent Neural Network Integrating Dynamic Parameters Mohammed Rokibul Alam Kotwal 1, Konica Bhowmik 1, Md. Merajul Islam 2 and Mohammad Nurul Huda 1 1. Department of Computer Science and Engineering, United International University, Dhaka-1209, Bangladesh 2. Department of Computer Science and Engineering, Military Institute of Science and Technology, Dhaka-1216, Bangladesh Received: August 04, 2011 / Accepted: September 06, 2011 / Published: March 31, Abstract: This paper presents a method for Japanese phoneme recognition based on recurrent neural network (RNN) integrating dynamic parameters ( and ). Articulatory features (AFs) or distinctive phonetic features (DPFs)-based system shows its superiority in performances over acoustic features-based in ASR. These performances can be further improved by incorporating articulatory dynamic parameters into it. In this paper, we have proposed such a phoneme recognition system that comprises three stages: (1) DPFs extraction using a recurrent neural network (RNN) from acoustic features, (2) incorporation of dynamic parameters into a multilayer neural network (MLN) for reducing DPF context, and (3) addition of an Inhibition/Enhancement network (In/En) network for categorizing the DPF movement more accurately and Gram-Schmidt orthogonalization procedure for decorrelating the inhibited/enhanced data vector before connecting with a hidden Markov models (HMMs)-based classifier. From the experiments on Japanese Newspaper Article Sentences (JNAS), it is observed that the proposed method provides a higher phoneme correct rate over the method that does not incorporate dynamic articulatory parameters. Moreover, it reduces mixture components in HMM for obtaining a higher performance. Key words: Distinctive phonetic feature, multi-layer neural network, recurrent neural network, inhibition/enhancement network, local features. 1. Introduction In automatic speech recognition (ASR), articulatory features (AFs) or distinctive phonetic features (DPFs) play an important role [1-3]. These features provide a higher word recognition performance in speech recognition in clean and noise corrupted acoustic environment [4-5]. Moreover, a higher phoneme recognition performance in different acoustic environments is also achieved using these features [6-7]. The generation of wide margin of acoustic likelihood between two phonemes, which is not affected much by the noisy environments is the reason Mohammad Nurul Huda, Ph.D., associate professor, research fields: phonetics, automatic speech recognition, neural networks, artificial intelligence, algorithms. Corresponding author: Mohammed Rokibul Alam Kotwal, research assistant, lecturer, research fields: neural networks, phonetics, automatic speech recognition and data mining. rokib_kotwal@yahoo.com. for providing a better recognition performance. Besides, these methods incorporated context window of limited size instead of using context sensitive triphone models, which requires a large scale speech corpus and a large number of speech parameters, to resolve coarticulation effects. The context window in multilayer neural network (MLN)-based speech recognition system reduces coarticulation effect slightly and consequently, provides a reasonable performance at fewer mixture components in the hidden Markov models (HMMs). On the other hand, a recurrent neural network (RNN) having feedback connections models a context window unto several number of frames and shows a better performances [8]. These performances were further improved slightly by adding an MLN in the method proposed by us [9], which reduces DPF fluctuations in phoneme boundaries. The reason not for obtaining a
2 318 Japanese Phoneme Recognition Based on Recurrent Neural Network Integrating Dynamic Parameters higher performance improvement further is that the second stage MLN has an inability of handing longer context. In this paper, we propose a phoneme recognition method, which incorporates dynamic articulatory parameters ( and ) at second stage, to reduce coarticulation effect further. The method comprises three stages: (i) DPFs extraction using a recurrent neural network (RNN) from acoustic features, (ii) incorporation of dynamic parameters into a multilayer neural network (MLN) for constraining the context, and (iii) addition of an Inhibition/Enhancement network (In/En) network for categorizing the DPF movement more accurately and Gram-Schmidt (GS) orthogonalization procedure for decorrelating the inhibited/enhanced data vector before connecting with HMMs-based classifier. The specialty of this paper is the incorporation of dynamic articulatory parameters to solve the coarticulation effect further. The paper is organized as follows: Section 2 discusses the articulatory features. Section 3 explains the system configuration of the existing method with the proposed. Experimental database and setup are provided in Section 4, while experimental results are analyzed in Section 5. Finally, Section 6 draws some conclusion and remarks on future works. 2. Distinctive Phonetic Features By using its unique articulatory features or distinctive phonetic features (DPFs) set, a phone can easily be identified [10-11]. The Japanese balanced DPF set [4] for classifying Advanced Telecommunications Research Institute International (ATR) phonemes have 15 elements. These DPF values are mora, high, low, intermediate between high and low <nil>, anterior, back, intermediate between anterior and back <nil>, coronal, plosive, affricate, continuant, voiced, unvoiced, nasal and semi-vowel. Table 1 shows a part of this balanced DPF set. Here, present and absent elements of the DPFs are indicated by + and - signs, respectively. Table 1 Japanese balanced DPF-set. DPF/Phone a e f r mora high low nil anterior back nil _ 3. Phoneme Recognition Methods 3.1 The Existing Method The existing method comprises two neural networks: (i) RNN and (ii) MLN, which is called hybrid neural network (HNN) and shown in Fig. 1. The RNN represents dynamics in a sequence of acoustic features to resolve coarticulation effects and the MLN reduces fluctuation of DPF patterns. The external input acoustic vector at time t, for the RNN, is formed by taking preceding (t - 3)-th and succeeding (t + 3)-th frames together with the current t-th frame. Each frame is composed of 25 local features (LFs) [12] that are same as the DPF-based phoneme recognition using MLN [4]. The RNN outputs 45 DPF values of which 15 are for the preceding frame, 15 for the current frame, and the rest for the succeeding frame. Next, the MLN outputs 45 DPF values for the current input frame by reducing DPF fluctuation. After that, the 45 dimensional DPF vector outputted by the MLN are inserted into In/En network, which will be described in Section 3.2.2, to obtain categorical DPF movements and next, the inhibited/enhanced data vector are decorrelated with each other by using the GS orthogonation procedure [9] before connecting with an MLN. A fully recurrent neural network (FRNN), which has a hidden layer of 350 units and an output layer, is used for this approach. Each time total input vector is formed by taking the output layer (OL) feedback values and the hidden layer (HL) feedback values together with the external input (25 3) LF values of that time.
3 Japanese Phoneme Recognition Based on Recurrent Neural Network Integrating Dynamic Parameters 319 RNN AMs Speech Local Feature Extraction x t-3 : 25 LF x t : 25 LF x t+3 :25 LF OL HL External input Output Layer Hidden Layer 350 y t-3 :15 DPF y t : 15 DPF y t+3 : 15 DPF MLN y t-3 :15 DPF Y t : 15 DPF y t+3 : 15 DPF Inhibition/Enhancement Network 45 Gram-Schmidt Orthogonalization 45 HMM Phoneme strings Phone-list Fig. 1 The existing method [9] without articulatory dynamic parameters. The feedback values of the hidden layer and the output layer at time t 0 are assumed to be 0.1. The back-propagation through time algorithm is used for training the RNN. Again, the MLN has three layers including two hidden layers and an output layer, and is trained by using the standard back-propagation algorithm. The hidden layers are of 180 and 90 units, respectively. 3.2 Proposed Method The proposed method diagram is depicted in Fig. 2 and comprises three stages: (1) DPFs extraction using a recurrent neural network (RNN) from acoustic features; (2) Incorporation of dynamic parameters into a multilayer neural network (MLN) for constraining the context; (3) Addition of an Inhibition/Enhancement network (In/En) network for categorizing the DPF movement more accurately and Gram-Schmidt (GS) orthogonalization procedure for decorrelating the inhibited/enhanced data vector before connecting with HMMs-based classifier DPF Extractor The RNN, which has same architecture and learning mechanism described in Section 3.1, generates a 45 dimensional DPF vector (15 DPF 3) for the current input frame t. The 45-dimensional context-dependent Fig. 2 Proposed method with articulatory dynamic parameters.
4 320 Japanese Phoneme Recognition Based on Recurrent Neural Network Integrating Dynamic Parameters DPF vector provided by the RNN at time t, and its corresponding Δ and ΔΔ vectors calculated by three-point linear regression (LR) are appended into the subsequent MLN with four layers including two hidden layers of 300 and 100 units, respectively. The MLNDyn is trained using the standard back-propagation algorithm and outputs a 45-dimensional DPF vector in which context effects for the current t -th frame are reduced Inhibition/Enhancement Network The In/En network is used to obtain modified DPF patterns from the patterns produced by the RNN + MLN. The algorithm for this network is given below: Step1: For each element of the DPF vectors, find the acceleration (ΔΔ) parameters by using three-point LR. Step2: Check whether (ΔΔ) is positive (concave pattern) or negative (convex pattern) or zero (steady state). Step3: Calculate f ( ) if pattern is convex, c1 f ( ) 1 ( c1 1) e if pattern is concave, 2(1 c 2 ) f ( ) c 2 1 e if steady state, f ( ) 1.0 Step4: Find modified DPF patterns by multiplying the DPF patterns with f ( ). 4. Experiments 4.1 Speech Database The following two clean data sets are used in our experiments D1. Training Data Set A subset of the Acoustic Society of Japan (ASJ) Continuous Speech Database comprising 4,503 sentences uttered by 30 different male speakers (16 khz, 16 bit) is used [13] D2. Test Data Set This test data set comprises 2,379 JNAS [14] sentences uttered by 16 different male speakers (16 khz, 16 bit). 4.2 Experimental Setup The frame length and frame rate (frame shift between two consecutive frames) are set to 25 ms and 10 ms, respectively, to obtain acoustic features from an input speech. LFs are a 25-dimensional vector consisting of 12 delta coefficients along time axis, 12 delta coefficients along frequency axis, and delta coefficient of log power of a raw speech signal [12]. Phoneme correct rate (PCR) for D2 data set is evaluated using an HMM-based classifier. The D1 data set is used to design 38 Japanese monophone HMMs with five states, three loops, and left-to-right models. In the HMMs, the output probabilities are represented in the form of Gaussian mixtures, and diagonal matrices are used. The mixture components are set to 1, 2, 4, 8, and 16. In our experiments of the RNN and MLN, the non-linear function is a sigmoid from 0 to 1 (1/(1 + exp(-x))) for the hidden and output layers. For the In/En network, C1, C2, and β are set to 4.0, 0.25, and 80, respectively. To evaluate PCRs using D2 data set for observing the effects of articulatory dynamic parameters ( and ), the following six experiments are designed, where input features for HMM-based classifier are DPFs of 45 dimensions for the existing and proposed methods. (1) DPF(RNN+Not-.MLN,dim:45); (2) DPF(RNN+Not-.MLN+GS,dim:45); (5) DPF(RNN+.MLN,dim:45); (9) DPF(RNN+Not-.MLN+In/En+GS,dim:45); (i) DPF(RNN+MLN +GS,dim:45); (q)dpf(rnn+mln+in/en+gs,dim:45) [Proposed]. 5. Experimental Results and Analysis Figs. 3 and 4 explain the effects of ΔDPF and ΔΔDPF parameters, which are inputted to the second stage MLN of hybrid neural network (HNN)-based
5 Japanese Phoneme Recognition Based on Recurrent Neural Network Integrating Dynamic Parameters 321 phoneme recognizer. From the Fig. 3, in which GS orthogonalization is not used, it is observed that an addition of Δ and ΔΔ parameters in the method (1) increase PCR by 1.37% at mixture component 16. Again, an improvement of 2.34% PCR, because of Δ and ΔΔ parameters, is shown in Fig. 4 at 16 mixture component by the HNN-based method (q) with the GS orthogonalization procedure. Fig. 5 also shows the effect of using ΔDPF and ΔΔDPF as input to the second stage MLN in the hybrid neural network-based phoneme recognizers with In/En and GS. In the figure, an addition of Δ and ΔΔ parameters always increases PCR significantly. For example, at mixture component 16, the proposed Phoneme Correct Rate(%) (1) DPF(RNN+Not-Δ.MLN,dim:45) (5) DPF(RNN+Δ.MLN,dim:45) Clean Number of mixture component(s) Fig. 3 Effects of articulatory dynamic parameters ( and ) on the method (1), DPF (RNN + Not-.MLN,dim:45). Phoneme Correct Rate(%) (2) DPF(RNN+Not-Δ.MLN+GS,dim:45) (i) DPF(RNN+MLN+GS,dim:45) Clean Number of mixture component(s) Fig. 4 Effects of articulatory dynamic parameters ( and ) on the method (2) containing GS orthogonalization, DPF(RNN + Not-.MLN + GS, dim:45). Phoneme Correct Rate(%) (9) DPF(RNN+NotΔ.MLN+In/En+GS,dim:45) (q) DPF(RNN+MLN+In/En+GS,dim:45) Clean Number of mixture component(s) Fig. 5 Effects of articulatory dynamic parameters ( and ) on the method (9) containing In/En and GS orthogonalization, DPF (RNN + Not-.MLN + In/En + GS, dim:45) method with articulatory dynamic parameters. method (q) that incorporates Δ and ΔΔ parameters improves PCR by 0.73% in comparison with the method (9). It is claimed that the proposed method reduces mixture components in HMMs and hence computation time. For an example from the Figure 5, approximately 81.50% phoneme correct rate is obtained by the methods (9) and (q) at mixture components 16 and one, respectively. 6. Conclusions This paper has presented an articulatory feature based phoneme recognition method using a hybrid neural network for an ASR system, which integrates articulatory dynamic parameters into it. From the experiments on Japanese Newspaper Article Sentences (JNAS), the following conclusions are drawn: (1) The proposed method provides a higher phoneme correct rate over the method that does not incorporate dynamic articulatory parameters. (2) It reduces mixture components in HMM for obtaining a higher phoneme recognition performance. In near future, the authors would like to do some experiments for evaluating Bangla phonemes spoken by Bangladeshi People. Moreover, we have intension to evaluate word recognition performance using the proposed method.
6 322 Japanese Phoneme Recognition Based on Recurrent Neural Network Integrating Dynamic Parameters References [1] K. Kirchhoff, et. al., Combining acoustic and articulatory feature information for robust speech recognition, Speech Commun. 37 (2002) [2] K. Kirchhoffs, Robust Speech Recognition Using Articulatory information, Ph.D thesis, University of Bielefeld, Germany, July [3] K.Y. Leung, M.W. Mak, S.Y. Kung, Applying articulatory features to telephone-based speaker verification, Proc. IEEE ICASSP 04, 2004, pp [4] T. Fukuda, W. Yamamoto, T. Nitta, Distinctive Phonetic feature Extraction for robust speech recognition, Proc. ICASSP 03, 2003, pp [5] T. Fukuda, T. Nitta, Orthogonalized Distinctive Phonetic feature Extraction for Noise-Robust Automatic Speech Recognition, The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems 5 (2004) [6] Huda, et. al., Distinctive Phonetic Feature (DPF) based phone segmentation using 2-stage multilayer neural network, NCSP 07, Shanghai, China, [7] L. Ansary, et. al., Modeling phones coarticulation effects in a neural network based speech recognition system, Proc. Interspeech, [8] T. Robinson, An application of recurrent nets to phone probability estimation, IEEE Trans. Neural Networks 5 (1994). [9] M.N. Huda, et. al, Phoneme recognition based on hybrid neural network with inhibition/enhancement of distinctive phonetic feature (DPF) trajectories, InterSpeech 08, Brisbane, Australia, [10] S. King, P. Taylor, Detection of phonological features in continuous speech using neural networks, Computer Speech and Language 14 (2000) [11] E. Eide, Distinctive features for use in an automatic speech recognition system, Proc. Eurospeech 2001, pp [12] T. Nitta, Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA, Proc. ICASSP 99, 1999, pp [13] T. Kobayashi, et al., ASJ Continuous speech corpus for research, Acoustic Society of Japan Trans. 48 (1992) [14] JNAS: Japanese Newspaper Article Sentences, available online at:
Lecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationInvestigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition
, Lisbon Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition Wolfgang Macherey Lars Haferkamp Ralf Schlüter Hermann Ney Human Language Technology
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationSpeech Recognition on Cell Broadband Engine UCRL-PRES-223890
Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationSpeech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,
More informationHow To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationGLOVE-BASED GESTURE RECOGNITION SYSTEM
CLAWAR 2012 Proceedings of the Fifteenth International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines, Baltimore, MD, USA, 23 26 July 2012 747 GLOVE-BASED GESTURE
More informationMethod of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks
Method of Combining the Degrees of Similarity in Handwritten Signature Authentication Using Neural Networks Ph. D. Student, Eng. Eusebiu Marcu Abstract This paper introduces a new method of combining the
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationCOMPARATIVE STUDY OF RECOGNITION TOOLS AS BACK-ENDS FOR BANGLA PHONEME RECOGNITION
ITERATIOAL JOURAL OF RESEARCH I COMPUTER APPLICATIOS AD ROBOTICS ISS 2320-7345 COMPARATIVE STUDY OF RECOGITIO TOOLS AS BACK-EDS FOR BAGLA PHOEME RECOGITIO Kazi Kamal Hossain 1, Md. Jahangir Hossain 2,
More informationThe effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications
Forensic Science International 146S (2004) S95 S99 www.elsevier.com/locate/forsciint The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications A.
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1
IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL.?, NO.?, MONTH 2009 1 Data balancing for efficient training of Hybrid ANN/HMM Automatic Speech Recognition systems Ana Isabel García-Moral,
More informationA Control Scheme for Industrial Robots Using Artificial Neural Networks
A Control Scheme for Industrial Robots Using Artificial Neural Networks M. Dinary, Abou-Hashema M. El-Sayed, Abdel Badie Sharkawy, and G. Abouelmagd unknown dynamical plant is investigated. A layered neural
More informationSPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA
SPEAKER IDENTIFICATION FROM YOUTUBE OBTAINED DATA Nitesh Kumar Chaudhary 1 and Shraddha Srivastav 2 1 Department of Electronics & Communication Engineering, LNMIIT, Jaipur, India 2 Bharti School Of Telecommunication,
More informationHow to Improve the Sound Quality of Your Microphone
An Extension to the Sammon Mapping for the Robust Visualization of Speaker Dependencies Andreas Maier, Julian Exner, Stefan Steidl, Anton Batliner, Tino Haderlein, and Elmar Nöth Universität Erlangen-Nürnberg,
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationElectroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep
Engineering, 23, 5, 88-92 doi:.4236/eng.23.55b8 Published Online May 23 (http://www.scirp.org/journal/eng) Electroencephalography Analysis Using Neural Network and Support Vector Machine during Sleep JeeEun
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationSpeech and Network Marketing Model - A Review
Jastrzȩbia Góra, 16 th 20 th September 2013 APPLYING DATA MINING CLASSIFICATION TECHNIQUES TO SPEAKER IDENTIFICATION Kinga Sałapa 1,, Agata Trawińska 2 and Irena Roterman-Konieczna 1, 1 Department of Bioinformatics
More informationANN Based Fault Classifier and Fault Locator for Double Circuit Transmission Line
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 2016 E-ISSN: 2347-2693 ANN Based Fault Classifier and Fault Locator for Double Circuit
More informationSubjective SNR measure for quality assessment of. speech coders \A cross language study
Subjective SNR measure for quality assessment of speech coders \A cross language study Mamoru Nakatsui and Hideki Noda Communications Research Laboratory, Ministry of Posts and Telecommunications, 4-2-1,
More informationSpeech recognition for human computer interaction
Speech recognition for human computer interaction Ubiquitous computing seminar FS2014 Student report Niklas Hofmann ETH Zurich hofmannn@student.ethz.ch ABSTRACT The widespread usage of small mobile devices
More informationExperiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationEfficient on-line Signature Verification System
International Journal of Engineering & Technology IJET-IJENS Vol:10 No:04 42 Efficient on-line Signature Verification System Dr. S.A Daramola 1 and Prof. T.S Ibiyemi 2 1 Department of Electrical and Information
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationSUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK
SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK N M Allinson and D Merritt 1 Introduction This contribution has two main sections. The first discusses some aspects of multilayer perceptrons,
More informationAccurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
More informationSolutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
More informationTrading Strategies and the Cat Tournament Protocol
M A C H I N E L E A R N I N G P R O J E C T F I N A L R E P O R T F A L L 2 7 C S 6 8 9 CLASSIFICATION OF TRADING STRATEGIES IN ADAPTIVE MARKETS MARK GRUMAN MANJUNATH NARAYANA Abstract In the CAT Tournament,
More informationNeural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
More informationImpact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
More informationAvailable from Deakin Research Online:
This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,
More informationSubspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft
More informationOnline Farsi Handwritten Character Recognition Using Hidden Markov Model
Online Farsi Handwritten Character Recognition Using Hidden Markov Model Vahid Ghods*, Mohammad Karim Sohrabi Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University,
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationSemantic Video Annotation by Mining Association Patterns from Visual and Speech Features
Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering
More informationResearch Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network
Distributed Sensor Networks Volume 2015, Article ID 157453, 7 pages http://dx.doi.org/10.1155/2015/157453 Research Article Distributed Data Mining Based on Deep Neural Network for Wireless Sensor Network
More informationTechnical Trading Rules as a Prior Knowledge to a Neural Networks Prediction System for the S&P 500 Index
Technical Trading Rules as a Prior Knowledge to a Neural Networks Prediction System for the S&P 5 ndex Tim Cheno~ethl-~t~ Zoran ObradoviC Steven Lee4 School of Electrical Engineering and Computer Science
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationObjective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification
Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,
More informationLOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING
LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,
More informationADVANCED APPLICATIONS OF ELECTRICAL ENGINEERING
Development of a Software Tool for Performance Evaluation of MIMO OFDM Alamouti using a didactical Approach as a Educational and Research support in Wireless Communications JOSE CORDOVA, REBECA ESTRADA
More informationIntrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationHybrid Lossless Compression Method For Binary Images
M.F. TALU AND İ. TÜRKOĞLU/ IU-JEEE Vol. 11(2), (2011), 1399-1405 Hybrid Lossless Compression Method For Binary Images M. Fatih TALU, İbrahim TÜRKOĞLU Inonu University, Dept. of Computer Engineering, Engineering
More informationChapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network
Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the
More informationDysarthric Speech Recognition Using a Convolutive Bottleneck Network
ICSP24 Proceedings Dysarthric Speech Recognition Using a Convolutive Bottlenec Networ Toru Naashia, Toshiya Yoshioa, Tetsuya Taiguchi, Yasuo Arii, Stefan Duffner and Christophe Garcia Graduate School of
More informationQMeter Tools for Quality Measurement in Telecommunication Network
QMeter Tools for Measurement in Telecommunication Network Akram Aburas 1 and Prof. Khalid Al-Mashouq 2 1 Advanced Communications & Electronics Systems, Riyadh, Saudi Arabia akram@aces-co.com 2 Electrical
More informationDeveloping an Isolated Word Recognition System in MATLAB
MATLAB Digest Developing an Isolated Word Recognition System in MATLAB By Daryl Ning Speech-recognition technology is embedded in voice-activated routing systems at customer call centres, voice dialling
More informationPerceived Speech Quality Prediction for Voice over IP-based Networks
Perceived Speech Quality Prediction for Voice over IP-based Networks Lingfen Sun and Emmanuel C. Ifeachor Department of Communication and Electronic Engineering, University of Plymouth, Plymouth PL 8AA,
More informationAPPLYING DATA MINING TECHNIQUES TO FORECAST NUMBER OF AIRLINE PASSENGERS
APPLYING DATA MINING TECHNIQUES TO FORECAST NUMBER OF AIRLINE PASSENGERS IN SAUDI ARABIA (DOMESTIC AND INTERNATIONAL TRAVELS) Abdullah Omer BaFail King Abdul Aziz University Jeddah, Saudi Arabia ABSTRACT
More informationAn Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network
Proceedings of the 8th WSEAS Int. Conf. on ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING & DATA BASES (AIKED '9) ISSN: 179-519 435 ISBN: 978-96-474-51-2 An Energy-Based Vehicle Tracking System using Principal
More informationEffects of Pronunciation Practice System Based on Personalized CG Animations of Mouth Movement Model
Effects of Pronunciation Practice System Based on Personalized CG Animations of Mouth Movement Model Kohei Arai 1 Graduate School of Science and Engineering Saga University Saga City, Japan Mariko Oda
More informationNumerical Field Extraction in Handwritten Incoming Mail Documents
Numerical Field Extraction in Handwritten Incoming Mail Documents Guillaume Koch, Laurent Heutte and Thierry Paquet PSI, FRE CNRS 2645, Université de Rouen, 76821 Mont-Saint-Aignan, France Laurent.Heutte@univ-rouen.fr
More informationComparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
More informationComparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
Volume 3, No. 8, August 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info Comparison of Supervised and Unsupervised Learning Classifiers for Travel Recommendations
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationFPGA Implementation of Human Behavior Analysis Using Facial Image
RESEARCH ARTICLE OPEN ACCESS FPGA Implementation of Human Behavior Analysis Using Facial Image A.J Ezhil, K. Adalarasu Department of Electronics & Communication Engineering PSNA College of Engineering
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationAutomatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion
Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion Prasanta Kumar Ghosh a) and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
More informationRecurrent Neural Networks
Recurrent Neural Networks Neural Computation : Lecture 12 John A. Bullinaria, 2015 1. Recurrent Neural Network Architectures 2. State Space Models and Dynamical Systems 3. Backpropagation Through Time
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationPerformance Evaluation of Artificial Neural. Networks for Spatial Data Analysis
Contemporary Engineering Sciences, Vol. 4, 2011, no. 4, 149-163 Performance Evaluation of Artificial Neural Networks for Spatial Data Analysis Akram A. Moustafa Department of Computer Science Al al-bayt
More informationNeural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationStock Data Analysis Based On Neural Network. 1Rajesh Musne, 2 Sachin Godse
Stock Analysis Based On Neural Network. 1Rajesh Musne, 2 Sachin Godse 1ME Research Scholar Department of Computer Engineering 2 Assistant Professor Department of Computer Engineering Sinhgad Academy Of
More informationA Stock Pattern Recognition Algorithm Based on Neural Networks
A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent
More informationEFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION
EFFECTS OF TRANSCRIPTION ERRORS ON SUPERVISED LEARNING IN SPEECH RECOGNITION By Ramasubramanian Sundaram A Thesis Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the
More informationTRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY
4 4th International Workshop on Acoustic Signal Enhancement (IWAENC) TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY Takuya Toyoda, Nobutaka Ono,3, Shigeki Miyabe, Takeshi Yamada, Shoji Makino University
More informationHow To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web. By C.Moreno, A. Antolin and F.Diaz-de-Maria. Summary By Maheshwar Jayaraman 1 1. Introduction Voice Over IP is
More informationGender Identification using MFCC for Telephone Applications A Comparative Study
Gender Identification using MFCC for Telephone Applications A Comparative Study Jamil Ahmad, Mustansar Fiaz, Soon-il Kwon, Maleerat Sodanil, Bay Vo, and * Sung Wook Baik Abstract Gender recognition is
More informationEFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE
EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate
More informationChapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
More informationMPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music
ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationThe Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network
, pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and
More informationLow-resolution Character Recognition by Video-based Super-resolution
2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro
More informationMUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
More informationA Novel Method to Improve Resolution of Satellite Images Using DWT and Interpolation
A Novel Method to Improve Resolution of Satellite Images Using DWT and Interpolation S.VENKATA RAMANA ¹, S. NARAYANA REDDY ² M.Tech student, Department of ECE, SVU college of Engineering, Tirupati, 517502,
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationLecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
More informationCIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies
CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies Hang Li Noah s Ark Lab Huawei Technologies We want to build Intelligent
More informationCustomer Relationship Management using Adaptive Resonance Theory
Customer Relationship Management using Adaptive Resonance Theory Manjari Anand M.Tech.Scholar Zubair Khan Associate Professor Ravi S. Shukla Associate Professor ABSTRACT CRM is a kind of implemented model
More informationA MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS
A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University
More information