Outline. Motivation. Introduction. Bilingual speech synthesis. Previous research 26/11/2013
|
|
- Patricia Roberts
- 7 years ago
- Views:
Transcription
1 Outline Application of speaker adaptation methods used in speech synthesis for speaker de-identification Miran Pobar, Department of Informatics, University of Rijeka, Croatia Introduction Motivation Previous research The idea/planned work Conclusions and future work COST ACTION IC 1206 Meeting, Mataró, Spain, November Introduction Speaker de-identification: natural and intelligible sounding voice, that doesn t reveal the speaker s identity [Jin et al., 2009.] Not concerned with identification based on message content Applications Call centers, anonymous help lines Motivation Motivation: application of VT for de-identification, without the need for previous enrollment of new speakers. Previous research by [Jin et al ]: Voice Convergin: Speaker De-Identification by Voice Transformation Application of Voice Transformation to speaker deidentification Goal: Voice sounds natural and intelligible, but doesn t reveal the speaker s identity Results show successful de-identification while preserving intelligibility Speaker enrollment phase limits applicability Previous research Speaker adaptation in speech synthesis, esp. bilingual speech synthesis A Bilingual HMM-Based Speech Synthesis System for Closely Related Languages [Justin et al., 2012] A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. [Pobar et al. 2013] The basic concepts were already introduced in [Latorre et al., 2006], [Wu et al., 2009], [Liang et al., 2008] Bilingual speech synthesis Bilingual speech system: synthesizing speech in two languages with the same speaker identity. Quite a large number of languages with small database resources hard to recruit speakers fluent in both languages building the synthetic voice in target language, even though data in the target language does not exist for the target speaker. Data from multiple speakers, adaptation to target speaker Data from both languages treated as monolingual, labeled with a joined phoneme set Adapted to the target speaker using the CMLLR adaptation State mapping approach Speaker independent models trained separately for both lanugages Mapping between states derived automatically from minimum KL divergence between states Mapping used to apply transforms calculated for one language to the other language How to automatize finding the phoneme/state similarities/differences between spoken data in similar languages? Similar method as in de-identification: transformation of one voice to the target voice 1
2 HMM bilingual system overview Truly bilingual model speaker independent bilingual voice average voice (training data from multiple speakers in both languages) Speaker adaptive training [Yamagishi et al., 2003.] Obtained voice used as basis for adaptation to the target speaker identity CMLLR adaptation of average voice models to the target speaker Data labeled with a common phoneme set Conducted with help from a phonetic expert, who has knowledge about both languages. Slovene and Croatian databases already had phonetic dictionaries closely related to SAMPA phonetic alphabets of respective languages, we decided to map the phonemes with a help of SAMPA alphabet. Diagram adopted from [Yamagishi et al., 2009] Database preparation State mapping approach State mapping approach Train two Average Voice models in both source and target languages. Target speaker: speaks one language, we want synthesis in both Intra-lingual CMLLR adaptation for the target speaker s language Application of transforms trained in one language to the other via state mapping Establish state mappings between the source state model and the target state models, i.e. find a nearest state model in target language for each state model in source language. Train the transforms for the Average Voice model in target language using the adaptation data. Adapt each state model in source language using the transform attached to the mapped state model in target language. [Wu et al., 2009] 2
3 Test configurations Evaluation results Figure: Mean opinion scores for systems. Left: Croatian synthesis language, right: Slovene synthesis language Figure: Similarity to target speaker (degradation mean opinion score). Left: Croatian ynthesis language, right: Slovene synthesis language Summary of results Results similar for both approaches in intralingual situation Asymmetric results in cross-lingual situation The results suggest that although there are differences in the two approaches of phoneme mapping,the target speaker itself may play important role in the performance of bilingual systems. The duration model in cases of S1* for each phoneme is also shared between the trained model, but that is not the case in S2*. De-identification using VT Source: Tanja Schultz, Qin Jin, Arthur Toth, Alan W Black: Speaker De-Identification, COST Action IC 1206 Meeting, Zagreb, July De-identification using VT Speaker de-identification via VT Source: Tanja Schultz, Qin Jin, Arthur Toth, Alan W Black: Speaker De-Identification, COST Action IC 1206 Meeting, Zagreb, July [Jin et al ] Various strategies tested on GMM and Phonetic speaker identification systems Standard Voice Transformation De-Duration Voice Transformation Double Voice Transformation - composition of de-duration + standard VT Transterpolated VT linear inter/extrapolation of MCEPs x=s+f(v-s) Results from 92% for GMM and 42% for Phonetic in case of standard voice transformation to 100% for GMM and 87.5% for Phonetic in case of transterpolated VT Better results on large database (95 male and 102 female speakers) Human listening tests: intelligibility, quality of de-identification, comparison with naïve pitch shifting approach 3
4 Speaker de-identification via VT [Jin et al ] Voice transformation strategies can be used to de-identify speech Standard VT performed well with GMM, but poor with Phonetic Investigated different VT strategies: Transterpolation worked best Human tests: de-identified voices are understandable while providing securely transmitted content VT targets frame level and some prosodic characteristics BUT SID possible on higher-level aspects, like prosody, word choice Secure ways to re-identify a (selection of) speaker(s) Need for enrollment of new speakers limits applications Speaker de-identification via VT: planned improvements Can we avoid enrollment phase? Similar idea to state mapping in bilingual speech synthesis: apply transform learned on one speaker pair to another speaker Automatic mapping of states (synthesis) Synthesis: find suitable transform via minimum KL divergence between states De-identification: find suitable transform via SID in a closed set of source speakers Learning phase: Database of a number of male and female speakers Train a SID system using the closed set of speakers in the database Calculate transforms for each speaker in the database towards one target speaker for deidentification De-identification phase: Forced classification of the new speaker into one of the speakers from the set in the SID system (source speaker) VT of speech using the transform learned on sorce-target speaker pair If enough data is collected from the new speaker, learn speaker GMM and transform and include in the source speaker set expanding set of deidentification transforms. Planned setup Database: The multilingual COST278BN corpus (BCN) COST278BN Audio indexing of Broadcast News files in different languages complete news shows that were broadcasted by different TV stations in different countries. 16 TV stations, 11 languages (Basque, Croatian, Czech, Dutch, Galician, Greek, Hungarian, Portuguese, Slovakian, Slovenian (2 sets), Spanish) Audio and video: the audio is (16 khz, 16 bit, wave format) and the video (only 6 data sets also come with video) is mostly in Real Media Video. The aim is more to provide data that can be used for testing, model adaptation and for research on front-end processing algorithms such as speech/nonspeech and speaker turn segmentation, background classification, speaker clustering, etc. that are supposed to be only weakly language dependent. Planned setup Speaker identification: ALIZE Open Source platform, GMM/UBM based Voice transformation: HTS? Matlab Evaluation Automatic: speaker identification using ALIZE (baseline) Subjective/human listening test Conclusions De-identification of speakers using VT Suggested improvements increase possibilities for application: Goal: eliminate the need for enrollment of new speakers to be de-identified, which significantly increases application possibilities for de-identification. A pool of transformations, apply transfomation according to speaker identification results Expanding set of transformation using data collected from new speakers Application of speaker adaptation methods used in speech synthesis for speaker de-identification Miran Pobar, Department of Informatics, University of Rijeka, Croatia COST ACTION IC 1206 Meeting, Mataró, Spain, November
5 References Jin, Q., Toth, A. R., Schultz, T., & Black, A. W. (2009, April). Voice convergin: Speaker de-identification by voice transformation. In Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on (pp ). IEEE. Miran Pobar, Tadej Justin, Janez Žibert, France Mihelič, Ivo Ipšić. A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. Text, Speech, and Dialogue, Lecture Notes in Computer Science, Volume 8082, 2013, pp A. Vandecatseye, J.P. Martens, J. Neto, H. Meinedo, C. Garcia-Mateo, J.Dieguez, F. Mihelic, J. Zibert, J. Nouza, P. David, M. Pleva, A.Cizmar, H. Papageorgiou, C. Alexandris (2004). The COST278 pan- European broadcast news database, Proceedings LREC (Lissabon), J. Zibert, F. Mihelic, J.P. Martens, H. Meinedo, J. Neto, L.Docio, C. Garcia-Mateo, P. David, J. Zdansky, M. Pleva, A. Cizmar, A. Zgank, Z. Kacic, C. Teleki, K. Vicsi (2005). The COST278 Broadcast News Segmentation and Speaker Clustering Evaluation - Overview, Methodology, Systems, Results,Proceedings Interspeech (Lissabon), Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., and Isogai, J. (2009). Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm. Audio, Speech, and Language Processing, IEEE Transactions on, 17(1):
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationExperiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
More informationThirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationHMM-based Speech Synthesis with Various Degrees of Articulation: a Perceptual Study
HMM-based Speech Synthesis with Various Degrees of Articulation: a Perceptual Study Benjamin Picart, Thomas Drugman, Thierry Dutoit TCTS Lab, Faculté Polytechnique (FPMs), University of Mons (UMons), Belgium
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationEfficient diphone database creation for MBROLA, a multilingual speech synthesiser
Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models
More informationPhD students (1): Ing. Jozef Hámorský
prof. Ing. Jozef Juhár, PhD. Department of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Košice 20.3.2015 1 Permanent academic staff
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationExploring the Structure of Broadcast News for Topic Segmentation
Exploring the Structure of Broadcast News for Topic Segmentation Rui Amaral (1,2,3), Isabel Trancoso (1,3) 1 Instituto Superior Técnico 2 Instituto Politécnico de Setúbal 3 L 2 F - Spoken Language Systems
More informationUsing Adaptation to Improve Speech Transcription Alignment in Noisy and Reverberant Environments
Using Adaptation to Improve Speech Transcription Alignment in Noisy and Reverberant Environments Y. Mamiya 1, A. Stan 2, J. Yamagishi 1,3, P. Bell 1, O. Watts 1, R.A.J. Clark 1, S. King 1 1 Centre for
More informationDevelop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationAutomatic Classification and Transcription of Telephone Speech in Radio Broadcast Data
Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data Alberto Abad, Hugo Meinedo, and João Neto L 2 F - Spoken Language Systems Lab INESC-ID / IST, Lisboa, Portugal {Alberto.Abad,Hugo.Meinedo,Joao.Neto}@l2f.inesc-id.pt
More informationEffect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果
Effect of Captioning Lecture Videos For Learning in Foreign Language VERI FERDIANSYAH 1 SEIICHI NAKAGAWA 1 Toyohashi University of Technology, Tenpaku-cho, Toyohashi 441-858 Japan E-mail: {veri, nakagawa}@slp.cs.tut.ac.jp
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationAnalysis and Synthesis of Hypo and Hyperarticulated Speech
Analysis and Synthesis of and articulated Speech Benjamin Picart, Thomas Drugman, Thierry Dutoit TCTS Lab, Faculté Polytechnique (FPMs), University of Mons (UMons), Belgium {benjamin.picart,thomas.drugman,thierry.dutoit}@umons.ac.be
More informationSubjective SNR measure for quality assessment of. speech coders \A cross language study
Subjective SNR measure for quality assessment of speech coders \A cross language study Mamoru Nakatsui and Hideki Noda Communications Research Laboratory, Ministry of Posts and Telecommunications, 4-2-1,
More informationSpeech Transcription
TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationA Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman
A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationEricsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
More informationCorpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System
Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Arun Soman, Sachin Kumar S., Hemanth V. K., M. Sabarimalai Manikandan, K. P. Soman Centre for Excellence in Computational
More informationWhere to Find the Highest Audio Engineer Salary. How Education and Training can affect the Audio Engineer Salary
Where to Find the Highest Audio Engineer Salary The audio engineer salary will vary and be mostly based on the level of experience of the engineer. Other factors will include level of education, number
More informationCallAn: A Tool to Analyze Call Center Conversations
CallAn: A Tool to Analyze Call Center Conversations Balamurali AR, Frédéric Béchet And Benoit Favre Abstract Agent Quality Monitoring (QM) of customer calls is critical for call center companies. We present
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationObjective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification
Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,
More informationWeb-Conferencing System SAViiMeeting
Web-Conferencing System SAViiMeeting Alexei Machovikov Department of Informatics and Computer Technologies National University of Mineral Resources Mining St-Petersburg, Russia amachovikov@gmail.com Abstract
More informationLuxembourg-Luxembourg: FL/RAIL16 Translation services 2016/S 054-089888. Contract notice. Services
1 / 11 This notice in TED website: http://ted.europa.eu/udl?uri=ted:notice:89888-2016:text:en:html Luxembourg-Luxembourg: FL/RAIL16 Translation services 2016/S 054-089888 Contract notice Services Directive
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationivms-4500 HD (Android) Mobile Client Software User Manual (V3.4)
ivms-4500 HD (Android) Mobile Client Software User Manual (V3.4) UD.6L0202D1597A01 Thank you for purchasing our product. This manual applies to ivms-4500 HD (Android) mobile client software; please read
More informationA TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
More informationCentral and South-East European Resources in META-SHARE
Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB
More informationSocial Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets
Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets D7.5 Dissemination Plan Project ref. no H2020 141111 Project acronym Start date of project (dur.) Document due Date
More informationIntegration of Negative Emotion Detection into a VoIP Call Center System
Integration of Negative Detection into a VoIP Call Center System Tsang-Long Pao, Chia-Feng Chang, and Ren-Chi Tsao Department of Computer Science and Engineering Tatung University, Taipei, Taiwan Abstract
More informationProgram curriculum for graduate studies in Speech and Music Communication
Program curriculum for graduate studies in Speech and Music Communication School of Computer Science and Communication, KTH (Translated version, November 2009) Common guidelines for graduate-level studies
More informationThings to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
More informationADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt
ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA
More informationRegionalized Text-to-Speech Systems: Persona Design and Application Scenarios
Regionalized Text-to-Speech Systems: Persona Design and Application Scenarios Michael Pucher, Gudrun Schuchmann, and Peter Fröhlich ftw., Telecommunications Research Center, Donau-City-Strasse 1, 1220
More informationFrom Concept to Production in Secure Voice Communications
From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure
More informationMusic Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationMISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION Ulpu Remes, Kalle J. Palomäki, and Mikko Kurimo Adaptive Informatics Research Centre,
More informationComputer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
More informationEffects of Automated Transcription Delay on Non-native Speakers Comprehension in Real-time Computermediated
Effects of Automated Transcription Delay on Non-native Speakers Comprehension in Real-time Computermediated Communication Lin Yao 1, Ying-xin Pan 2, and Dan-ning Jiang 2 1 Institute of Psychology, Chinese
More informationIBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream
RC23499 (W0501-090) January 19, 2005 Computer Science IBM Research Report CSR: Speaker Recognition from Compressed Packet Stream Charu Aggarwal, David Olshefski, Debanjan Saha, Zon-Yin Shae, Philip Yu
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationBOOTSTRAPPING TEXT-TO-SPEECH FOR SPEECH PROCESSING IN LANGUAGES WITHOUT AN ORTHOGRAPHY
BOOTSTRAPPING TEXT-TO-SPEECH FOR SPEECH PROCESSING IN LANGUAGES WITHOUT AN ORTHOGRAPHY Sunayana Sitaram Sukhada Palkar Yun-Nung Chen Alok Parlikar Alan W Black Language Technologies Institute, Carnegie
More informationSASSC: A Standard Arabic Single Speaker Corpus
SASSC: A Standard Arabic Single Speaker Corpus Ibrahim Almosallam, Atheer AlKhalifa, Mansour Alghamdi, Mohamed Alkanhal, Ashraf Alkhairy The Computer Research Institute King Abdulaziz City for Science
More information(Refer Slide Time: 4:45)
Digital Voice and Picture Communication Prof. S. Sengupta Department of Electronics and Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 38 ISDN Video Conferencing Today we
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationSecure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
More informationModule 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur
Module 5 Broadcast Communication Networks Lesson 1 Network Topology Specific Instructional Objectives At the end of this lesson, the students will be able to: Specify what is meant by network topology
More informationAutomatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
More informationProblems and Prospects in Collection of Spoken Language Data
Problems and Prospects in Collection of Spoken Language Data Kishore Prahallad+*, Suryakanth V Gangashetty*, B. Yegnanarayana*, D. Raj Reddy+ *Language Technologies Research Center (LTRC) International
More informationPerceived Speech Quality Prediction for Voice over IP-based Networks
Perceived Speech Quality Prediction for Voice over IP-based Networks Lingfen Sun and Emmanuel C. Ifeachor Department of Communication and Electronic Engineering, University of Plymouth, Plymouth PL 8AA,
More informationDVB-T 730. User s Manual
EPG Program Reservation There are 10 program timers to bring up reminder for a reserved program. 20 seconds before the start of the reserved program, a pop-up window will remind viewer. If no further instruction,
More informationGV-FE420. Visítenos: www.dluxsecurity.com. Cámara IP Fisheye Megapixel
GV-FE420 Introduction The GV FE420 is a fisheye camera that allows you to monitor all angles of a location using just one camera. The distorted hemispherical image of the fisheye camera will be converted
More informationADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2
3rd European ignal Processing Conference (EUIPCO) ADAPTIVE AND ONLINE PEAKER DIARIZATION FOR MEETING DATA Giovanni oldi, Christophe Beaugeant and Nicholas Evans Multimedia Communications Department, EURECOM,
More informationL2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES
L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES Zhen Qin, Allard Jongman Department of Linguistics, University of Kansas, United States qinzhenquentin2@ku.edu, ajongman@ku.edu
More informationAttenuation (amplitude of the wave loses strength thereby the signal power) Refraction Reflection Shadowing Scattering Diffraction
Wireless Physical Layer Q1. Is it possible to transmit a digital signal, e.g., coded as square wave as used inside a computer, using radio transmission without any loss? Why? It is not possible to transmit
More informationTED-LIUM: an Automatic Speech Recognition dedicated corpus
TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr
More informationivms-4500 HD (ios) Mobile Client Software User Manual (V3.4)
ivms-4500 HD (ios) Mobile Client Software User Manual (V3.4) UD.6L0202D1587A01 Thank you for purchasing our product. This manual applies to ivms-4500 HD (ios) mobile client software; please read it carefully
More informationIMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS
IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS Yeon-Jun Kim, Ann Syrdal AT&T Labs-Research, 180 Park Ave. Florham Park, NJ 07932 Matthias Jilka Institut für Linguistik,
More informationEnglish for communication in the workplace
English for communication in the workplace David Bonamy Introduction If you are teaching or planning to teach English to help your students communicate effectively in their present or future place of work,
More informationEvaluating grapheme-to-phoneme converters in automatic speech recognition context
Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating
More informationNew Features SMART Sync 2009. Collaboration Feature Improvements
P L E A S E T H I N K B E F O R E Y O U P R I N T New Features SMART Sync 2009 SMART Sync (formerly SynchronEyes ) is easy-to-use classroom management software that allows teachers to monitor and control
More informationTRUE: an Online Testing Platform for Multimedia Evaluation
TRUE: an Online Testing Platform for Multimedia Evaluation Santiago Planet, Ignasi Iriondo, Elisa Martínez, José A. Montero GPMM Grup de Recerca en Processament Multimodal Enginyeria i Arquitectura La
More informationTEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
More informationLuxembourg-Luxembourg: FL/TERM15 Translation services 2015/S 253-462303. Contract notice. Services
1 / 12 This notice in TED website: http://ted.europa.eu/udl?uri=ted:notice:462303-2015:text:en:html Luxembourg-Luxembourg: FL/TERM15 Translation services 2015/S 253-462303 Contract notice Services Directive
More informationCurriculum Vitae. Alison M. Trude December 2014. Website: https://sites.google.com/site/alisontrude/ Department of Psychology, University of Chicago
Curriculum Vitae Alison M. Trude December 2014 Dept. of Psychology University of Chicago 5848 S. University Ave. Chicago, IL 60637 Email: trude@uchicago.edu Website: https://sites.google.com/site/alisontrude/
More informationCreating voices for the Festival speech synthesis system.
M. Hood Supervised by A. Lobb and S. Bangay G01H0708 Creating voices for the Festival speech synthesis system. Abstract This project focuses primarily on the process of creating a voice for a concatenative
More informationBroadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
More informationAdvanced Speech-Audio Processing in Mobile Phones and Hearing Aids
Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Synergies and Distinctions Peter Vary RWTH Aachen University Institute of Communication Systems WASPAA, October 23, 2013 Mohonk Mountain
More informationEUROPASS DIPLOMA SUPPLEMENT
EUROPASS DIPLOMA SUPPLEMENT TITLE OF THE DIPLOMA (ES) Técnico Superior en Sistemas de Telecomunicaciones e Informáticos TRANSLATED TITLE OF THE DIPLOMA (EN) (1) Higher Technician in Telecommunications
More informationConference interpreting with information and communication technologies experiences from the European Commission DG Interpretation
Jose Esteban Causo, European Commission Conference interpreting with information and communication technologies experiences from the European Commission DG Interpretation 1 Introduction In the European
More informationNavigation Aid And Label Reading With Voice Communication For Visually Impaired People
Navigation Aid And Label Reading With Voice Communication For Visually Impaired People A.Manikandan 1, R.Madhuranthi 2 1 M.Kumarasamy College of Engineering, mani85a@gmail.com,karur,india 2 M.Kumarasamy
More informationVOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
More informationINTRODUCTION TO COMMUNICATION SYSTEMS AND TRANSMISSION MEDIA
COMM.ENG INTRODUCTION TO COMMUNICATION SYSTEMS AND TRANSMISSION MEDIA 9/6/2014 LECTURES 1 Objectives To give a background on Communication system components and channels (media) A distinction between analogue
More informationLuxembourg-Luxembourg: FL/SCIENT15 Translation services 2015/S 039-065697. Contract notice. Services
1/12 This notice in TED website: http://ted.europa.eu/udl?uri=ted:notice:65697-2015:text:en:html Luxembourg-Luxembourg: FL/SCIENT15 Translation services 2015/S 039-065697 Contract notice Services Directive
More informationFigure1. Acoustic feedback in packet based video conferencing system
Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents
More informationIndustry Guidelines on Captioning Television Programs 1 Introduction
Industry Guidelines on Captioning Television Programs 1 Introduction These guidelines address the quality of closed captions on television programs by setting a benchmark for best practice. The guideline
More informationRecent advances in Digital Music Processing and Indexing
Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components
More informationplacing people first SALARY REPORT Summary of 2014 Bratislava
placing people first SALARY REPORT Summary of 2014 1 / Cpl Jobs Salary Report Table of content 3 / Summary of 2014 / About Cpl Jobs 4 / IT 7 / Finance 9 / BPO/SSC 2 / Cpl Jobs Salary Report Summary of
More informationDirect Link Networks. Introduction. Physical Properties. Lecture - Ethernet 1. Areas for Discussion. Ethernet (Section 2.6)
reas for Discussion Direct Link Networks Joseph Spring School of Computer Science Sc - Computer Network Protocols & rch s ased on Chapter 2, Peterson & Davie, Computer Networks: Systems pproach, 5 th Ed
More informationTranSegId: A System for Concurrent Speech Transcription, Speaker Segmentation and Speaker Identification
TranSegId: A System for Concurrent Speech Transcription, Speaker Segmentation and Speaker Identification Mahesh Viswanathan, Homayoon S.M. Beigi, Alain Tritschler IBM Thomas J. Watson Research Labs Research
More informationUSER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE
USER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE DUET EXE OVERVIEW Control Button Panel Connector Panel Loudspeaker Microphone The Duet is a high performance speakerphone for desktop use that can cover
More informationMLLP Transcription and Translation Platform
MLLP Transcription and Translation Platform Alejandro Pérez González de Martos, Joan Albert Silvestre-Cerdà, Juan Daniel Valor Miró, Jorge Civera, and Alfons Juan Universitat Politècnica de València, Camino
More informationAM TRANSMITTERS & RECEIVERS
Reading 30 Ron Bertrand VK2DQ http://www.radioelectronicschool.com AM TRANSMITTERS & RECEIVERS Revision: our definition of amplitude modulation. Amplitude modulation is when the modulating audio is combined
More informationOnline Diarization of Telephone Conversations
Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of
More informationHMM-based Breath and Filled Pauses Elimination in ASR
HMM-based Breath and Filled Pauses Elimination in ASR Piotr Żelasko 1, Tomasz Jadczyk 1,2 and Bartosz Ziółko 1,2 1 Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science
More informationAutomatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations
Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ
More informationSPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems
SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems WILLIAM KUFFNER, M.A. Sc., P.Eng, PMP Senior Fire Protection Engineer Director Fire Protection Engineering October 30, 2013 Code Reference
More informationencoding compression encryption
encoding compression encryption ASCII utf-8 utf-16 zip mpeg jpeg AES RSA diffie-hellman Expressing characters... ASCII and Unicode, conventions of how characters are expressed in bits. ASCII (7 bits) -
More information