Outline. Motivation. Introduction. Bilingual speech synthesis. Previous research 26/11/2013

Size: px
Start display at page:

Download "Outline. Motivation. Introduction. Bilingual speech synthesis. Previous research 26/11/2013"

Transcription

1 Outline Application of speaker adaptation methods used in speech synthesis for speaker de-identification Miran Pobar, Department of Informatics, University of Rijeka, Croatia Introduction Motivation Previous research The idea/planned work Conclusions and future work COST ACTION IC 1206 Meeting, Mataró, Spain, November Introduction Speaker de-identification: natural and intelligible sounding voice, that doesn t reveal the speaker s identity [Jin et al., 2009.] Not concerned with identification based on message content Applications Call centers, anonymous help lines Motivation Motivation: application of VT for de-identification, without the need for previous enrollment of new speakers. Previous research by [Jin et al ]: Voice Convergin: Speaker De-Identification by Voice Transformation Application of Voice Transformation to speaker deidentification Goal: Voice sounds natural and intelligible, but doesn t reveal the speaker s identity Results show successful de-identification while preserving intelligibility Speaker enrollment phase limits applicability Previous research Speaker adaptation in speech synthesis, esp. bilingual speech synthesis A Bilingual HMM-Based Speech Synthesis System for Closely Related Languages [Justin et al., 2012] A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. [Pobar et al. 2013] The basic concepts were already introduced in [Latorre et al., 2006], [Wu et al., 2009], [Liang et al., 2008] Bilingual speech synthesis Bilingual speech system: synthesizing speech in two languages with the same speaker identity. Quite a large number of languages with small database resources hard to recruit speakers fluent in both languages building the synthetic voice in target language, even though data in the target language does not exist for the target speaker. Data from multiple speakers, adaptation to target speaker Data from both languages treated as monolingual, labeled with a joined phoneme set Adapted to the target speaker using the CMLLR adaptation State mapping approach Speaker independent models trained separately for both lanugages Mapping between states derived automatically from minimum KL divergence between states Mapping used to apply transforms calculated for one language to the other language How to automatize finding the phoneme/state similarities/differences between spoken data in similar languages? Similar method as in de-identification: transformation of one voice to the target voice 1

2 HMM bilingual system overview Truly bilingual model speaker independent bilingual voice average voice (training data from multiple speakers in both languages) Speaker adaptive training [Yamagishi et al., 2003.] Obtained voice used as basis for adaptation to the target speaker identity CMLLR adaptation of average voice models to the target speaker Data labeled with a common phoneme set Conducted with help from a phonetic expert, who has knowledge about both languages. Slovene and Croatian databases already had phonetic dictionaries closely related to SAMPA phonetic alphabets of respective languages, we decided to map the phonemes with a help of SAMPA alphabet. Diagram adopted from [Yamagishi et al., 2009] Database preparation State mapping approach State mapping approach Train two Average Voice models in both source and target languages. Target speaker: speaks one language, we want synthesis in both Intra-lingual CMLLR adaptation for the target speaker s language Application of transforms trained in one language to the other via state mapping Establish state mappings between the source state model and the target state models, i.e. find a nearest state model in target language for each state model in source language. Train the transforms for the Average Voice model in target language using the adaptation data. Adapt each state model in source language using the transform attached to the mapped state model in target language. [Wu et al., 2009] 2

3 Test configurations Evaluation results Figure: Mean opinion scores for systems. Left: Croatian synthesis language, right: Slovene synthesis language Figure: Similarity to target speaker (degradation mean opinion score). Left: Croatian ynthesis language, right: Slovene synthesis language Summary of results Results similar for both approaches in intralingual situation Asymmetric results in cross-lingual situation The results suggest that although there are differences in the two approaches of phoneme mapping,the target speaker itself may play important role in the performance of bilingual systems. The duration model in cases of S1* for each phoneme is also shared between the trained model, but that is not the case in S2*. De-identification using VT Source: Tanja Schultz, Qin Jin, Arthur Toth, Alan W Black: Speaker De-Identification, COST Action IC 1206 Meeting, Zagreb, July De-identification using VT Speaker de-identification via VT Source: Tanja Schultz, Qin Jin, Arthur Toth, Alan W Black: Speaker De-Identification, COST Action IC 1206 Meeting, Zagreb, July [Jin et al ] Various strategies tested on GMM and Phonetic speaker identification systems Standard Voice Transformation De-Duration Voice Transformation Double Voice Transformation - composition of de-duration + standard VT Transterpolated VT linear inter/extrapolation of MCEPs x=s+f(v-s) Results from 92% for GMM and 42% for Phonetic in case of standard voice transformation to 100% for GMM and 87.5% for Phonetic in case of transterpolated VT Better results on large database (95 male and 102 female speakers) Human listening tests: intelligibility, quality of de-identification, comparison with naïve pitch shifting approach 3

4 Speaker de-identification via VT [Jin et al ] Voice transformation strategies can be used to de-identify speech Standard VT performed well with GMM, but poor with Phonetic Investigated different VT strategies: Transterpolation worked best Human tests: de-identified voices are understandable while providing securely transmitted content VT targets frame level and some prosodic characteristics BUT SID possible on higher-level aspects, like prosody, word choice Secure ways to re-identify a (selection of) speaker(s) Need for enrollment of new speakers limits applications Speaker de-identification via VT: planned improvements Can we avoid enrollment phase? Similar idea to state mapping in bilingual speech synthesis: apply transform learned on one speaker pair to another speaker Automatic mapping of states (synthesis) Synthesis: find suitable transform via minimum KL divergence between states De-identification: find suitable transform via SID in a closed set of source speakers Learning phase: Database of a number of male and female speakers Train a SID system using the closed set of speakers in the database Calculate transforms for each speaker in the database towards one target speaker for deidentification De-identification phase: Forced classification of the new speaker into one of the speakers from the set in the SID system (source speaker) VT of speech using the transform learned on sorce-target speaker pair If enough data is collected from the new speaker, learn speaker GMM and transform and include in the source speaker set expanding set of deidentification transforms. Planned setup Database: The multilingual COST278BN corpus (BCN) COST278BN Audio indexing of Broadcast News files in different languages complete news shows that were broadcasted by different TV stations in different countries. 16 TV stations, 11 languages (Basque, Croatian, Czech, Dutch, Galician, Greek, Hungarian, Portuguese, Slovakian, Slovenian (2 sets), Spanish) Audio and video: the audio is (16 khz, 16 bit, wave format) and the video (only 6 data sets also come with video) is mostly in Real Media Video. The aim is more to provide data that can be used for testing, model adaptation and for research on front-end processing algorithms such as speech/nonspeech and speaker turn segmentation, background classification, speaker clustering, etc. that are supposed to be only weakly language dependent. Planned setup Speaker identification: ALIZE Open Source platform, GMM/UBM based Voice transformation: HTS? Matlab Evaluation Automatic: speaker identification using ALIZE (baseline) Subjective/human listening test Conclusions De-identification of speakers using VT Suggested improvements increase possibilities for application: Goal: eliminate the need for enrollment of new speakers to be de-identified, which significantly increases application possibilities for de-identification. A pool of transformations, apply transfomation according to speaker identification results Expanding set of transformation using data collected from new speakers Application of speaker adaptation methods used in speech synthesis for speaker de-identification Miran Pobar, Department of Informatics, University of Rijeka, Croatia COST ACTION IC 1206 Meeting, Mataró, Spain, November

5 References Jin, Q., Toth, A. R., Schultz, T., & Black, A. W. (2009, April). Voice convergin: Speaker de-identification by voice transformation. In Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on (pp ). IEEE. Miran Pobar, Tadej Justin, Janez Žibert, France Mihelič, Ivo Ipšić. A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. Text, Speech, and Dialogue, Lecture Notes in Computer Science, Volume 8082, 2013, pp A. Vandecatseye, J.P. Martens, J. Neto, H. Meinedo, C. Garcia-Mateo, J.Dieguez, F. Mihelic, J. Zibert, J. Nouza, P. David, M. Pleva, A.Cizmar, H. Papageorgiou, C. Alexandris (2004). The COST278 pan- European broadcast news database, Proceedings LREC (Lissabon), J. Zibert, F. Mihelic, J.P. Martens, H. Meinedo, J. Neto, L.Docio, C. Garcia-Mateo, P. David, J. Zdansky, M. Pleva, A. Cizmar, A. Zgank, Z. Kacic, C. Teleki, K. Vicsi (2005). The COST278 Broadcast News Segmentation and Speaker Clustering Evaluation - Overview, Methodology, Systems, Results,Proceedings Interspeech (Lissabon), Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., and Isogai, J. (2009). Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm. Audio, Speech, and Language Processing, IEEE Transactions on, 17(1):

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS

SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University

More information

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National

More information

Thirukkural - A Text-to-Speech Synthesis System

Thirukkural - A Text-to-Speech Synthesis System Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,

More information

HMM-based Speech Synthesis with Various Degrees of Articulation: a Perceptual Study

HMM-based Speech Synthesis with Various Degrees of Articulation: a Perceptual Study HMM-based Speech Synthesis with Various Degrees of Articulation: a Perceptual Study Benjamin Picart, Thomas Drugman, Thierry Dutoit TCTS Lab, Faculté Polytechnique (FPMs), University of Mons (UMons), Belgium

More information

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,

More information

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser

Efficient diphone database creation for MBROLA, a multilingual speech synthesiser Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models

More information

PhD students (1): Ing. Jozef Hámorský

PhD students (1): Ing. Jozef Hámorský prof. Ing. Jozef Juhár, PhD. Department of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Košice 20.3.2015 1 Permanent academic staff

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Exploring the Structure of Broadcast News for Topic Segmentation

Exploring the Structure of Broadcast News for Topic Segmentation Exploring the Structure of Broadcast News for Topic Segmentation Rui Amaral (1,2,3), Isabel Trancoso (1,3) 1 Instituto Superior Técnico 2 Instituto Politécnico de Setúbal 3 L 2 F - Spoken Language Systems

More information

Using Adaptation to Improve Speech Transcription Alignment in Noisy and Reverberant Environments

Using Adaptation to Improve Speech Transcription Alignment in Noisy and Reverberant Environments Using Adaptation to Improve Speech Transcription Alignment in Noisy and Reverberant Environments Y. Mamiya 1, A. Stan 2, J. Yamagishi 1,3, P. Bell 1, O. Watts 1, R.A.J. Clark 1, S. King 1 1 Centre for

More information

Develop Software that Speaks and Listens

Develop Software that Speaks and Listens Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data

Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data Alberto Abad, Hugo Meinedo, and João Neto L 2 F - Spoken Language Systems Lab INESC-ID / IST, Lisboa, Portugal {Alberto.Abad,Hugo.Meinedo,Joao.Neto}@l2f.inesc-id.pt

More information

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果

Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果 Effect of Captioning Lecture Videos For Learning in Foreign Language VERI FERDIANSYAH 1 SEIICHI NAKAGAWA 1 Toyohashi University of Technology, Tenpaku-cho, Toyohashi 441-858 Japan E-mail: {veri, nakagawa}@slp.cs.tut.ac.jp

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

Analysis and Synthesis of Hypo and Hyperarticulated Speech

Analysis and Synthesis of Hypo and Hyperarticulated Speech Analysis and Synthesis of and articulated Speech Benjamin Picart, Thomas Drugman, Thierry Dutoit TCTS Lab, Faculté Polytechnique (FPMs), University of Mons (UMons), Belgium {benjamin.picart,thomas.drugman,thierry.dutoit}@umons.ac.be

More information

Subjective SNR measure for quality assessment of. speech coders \A cross language study

Subjective SNR measure for quality assessment of. speech coders \A cross language study Subjective SNR measure for quality assessment of speech coders \A cross language study Mamoru Nakatsui and Hideki Noda Communications Research Laboratory, Ministry of Posts and Telecommunications, 4-2-1,

More information

Speech Transcription

Speech Transcription TC-STAR Final Review Meeting Luxembourg, 29 May 2007 Speech Transcription Jean-Luc Gauvain LIMSI TC-STAR Final Review Luxembourg, 29-31 May 2007 1 What Is Speech Recognition? Def: Automatic conversion

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman

A Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System

Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Arun Soman, Sachin Kumar S., Hemanth V. K., M. Sabarimalai Manikandan, K. P. Soman Centre for Excellence in Computational

More information

Where to Find the Highest Audio Engineer Salary. How Education and Training can affect the Audio Engineer Salary

Where to Find the Highest Audio Engineer Salary. How Education and Training can affect the Audio Engineer Salary Where to Find the Highest Audio Engineer Salary The audio engineer salary will vary and be mostly based on the level of experience of the engineer. Other factors will include level of education, number

More information

CallAn: A Tool to Analyze Call Center Conversations

CallAn: A Tool to Analyze Call Center Conversations CallAn: A Tool to Analyze Call Center Conversations Balamurali AR, Frédéric Béchet And Benoit Favre Abstract Agent Quality Monitoring (QM) of customer calls is critical for call center companies. We present

More information

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction : A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)

More information

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,

More information

Web-Conferencing System SAViiMeeting

Web-Conferencing System SAViiMeeting Web-Conferencing System SAViiMeeting Alexei Machovikov Department of Informatics and Computer Technologies National University of Mineral Resources Mining St-Petersburg, Russia amachovikov@gmail.com Abstract

More information

Luxembourg-Luxembourg: FL/RAIL16 Translation services 2016/S 054-089888. Contract notice. Services

Luxembourg-Luxembourg: FL/RAIL16 Translation services 2016/S 054-089888. Contract notice. Services 1 / 11 This notice in TED website: http://ted.europa.eu/udl?uri=ted:notice:89888-2016:text:en:html Luxembourg-Luxembourg: FL/RAIL16 Translation services 2016/S 054-089888 Contract notice Services Directive

More information

Speech Signal Processing: An Overview

Speech Signal Processing: An Overview Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech

More information

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software

Carla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis

More information

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and

More information

ivms-4500 HD (Android) Mobile Client Software User Manual (V3.4)

ivms-4500 HD (Android) Mobile Client Software User Manual (V3.4) ivms-4500 HD (Android) Mobile Client Software User Manual (V3.4) UD.6L0202D1597A01 Thank you for purchasing our product. This manual applies to ivms-4500 HD (Android) mobile client software; please read

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

Central and South-East European Resources in META-SHARE

Central and South-East European Resources in META-SHARE Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB

More information

Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets

Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets Social Semantic Emotion Analysis for Innovative Multilingual Big Data Analytics Markets D7.5 Dissemination Plan Project ref. no H2020 141111 Project acronym Start date of project (dur.) Document due Date

More information

Integration of Negative Emotion Detection into a VoIP Call Center System

Integration of Negative Emotion Detection into a VoIP Call Center System Integration of Negative Detection into a VoIP Call Center System Tsang-Long Pao, Chia-Feng Chang, and Ren-Chi Tsao Department of Computer Science and Engineering Tatung University, Taipei, Taiwan Abstract

More information

Program curriculum for graduate studies in Speech and Music Communication

Program curriculum for graduate studies in Speech and Music Communication Program curriculum for graduate studies in Speech and Music Communication School of Computer Science and Communication, KTH (Translated version, November 2009) Common guidelines for graduate-level studies

More information

Things to remember when transcribing speech

Things to remember when transcribing speech Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely

More information

ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt

ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION. Horacio Franco, Luciana Ferrer, and Harry Bratt ADAPTIVE AND DISCRIMINATIVE MODELING FOR IMPROVED MISPRONUNCIATION DETECTION Horacio Franco, Luciana Ferrer, and Harry Bratt Speech Technology and Research Laboratory, SRI International, Menlo Park, CA

More information

Regionalized Text-to-Speech Systems: Persona Design and Application Scenarios

Regionalized Text-to-Speech Systems: Persona Design and Application Scenarios Regionalized Text-to-Speech Systems: Persona Design and Application Scenarios Michael Pucher, Gudrun Schuchmann, and Peter Fröhlich ftw., Telecommunications Research Center, Donau-City-Strasse 1, 1220

More information

From Concept to Production in Secure Voice Communications

From Concept to Production in Secure Voice Communications From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure

More information

Music Mood Classification

Music Mood Classification Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

More information

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA

APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION MISSING FEATURE RECONSTRUCTION AND ACOUSTIC MODEL ADAPTATION COMBINED FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION Ulpu Remes, Kalle J. Palomäki, and Mikko Kurimo Adaptive Informatics Research Centre,

More information

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites

More information

Effects of Automated Transcription Delay on Non-native Speakers Comprehension in Real-time Computermediated

Effects of Automated Transcription Delay on Non-native Speakers Comprehension in Real-time Computermediated Effects of Automated Transcription Delay on Non-native Speakers Comprehension in Real-time Computermediated Communication Lin Yao 1, Ying-xin Pan 2, and Dan-ning Jiang 2 1 Institute of Psychology, Chinese

More information

IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream

IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream RC23499 (W0501-090) January 19, 2005 Computer Science IBM Research Report CSR: Speaker Recognition from Compressed Packet Stream Charu Aggarwal, David Olshefski, Debanjan Saha, Zon-Yin Shae, Philip Yu

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

BOOTSTRAPPING TEXT-TO-SPEECH FOR SPEECH PROCESSING IN LANGUAGES WITHOUT AN ORTHOGRAPHY

BOOTSTRAPPING TEXT-TO-SPEECH FOR SPEECH PROCESSING IN LANGUAGES WITHOUT AN ORTHOGRAPHY BOOTSTRAPPING TEXT-TO-SPEECH FOR SPEECH PROCESSING IN LANGUAGES WITHOUT AN ORTHOGRAPHY Sunayana Sitaram Sukhada Palkar Yun-Nung Chen Alok Parlikar Alan W Black Language Technologies Institute, Carnegie

More information

SASSC: A Standard Arabic Single Speaker Corpus

SASSC: A Standard Arabic Single Speaker Corpus SASSC: A Standard Arabic Single Speaker Corpus Ibrahim Almosallam, Atheer AlKhalifa, Mansour Alghamdi, Mohamed Alkanhal, Ashraf Alkhairy The Computer Research Institute King Abdulaziz City for Science

More information

(Refer Slide Time: 4:45)

(Refer Slide Time: 4:45) Digital Voice and Picture Communication Prof. S. Sengupta Department of Electronics and Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 38 ISDN Video Conferencing Today we

More information

Lecture 12: An Overview of Speech Recognition

Lecture 12: An Overview of Speech Recognition Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated

More information

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics

Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems

More information

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur

Module 5. Broadcast Communication Networks. Version 2 CSE IIT, Kharagpur Module 5 Broadcast Communication Networks Lesson 1 Network Topology Specific Instructional Objectives At the end of this lesson, the students will be able to: Specify what is meant by network topology

More information

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,

More information

Problems and Prospects in Collection of Spoken Language Data

Problems and Prospects in Collection of Spoken Language Data Problems and Prospects in Collection of Spoken Language Data Kishore Prahallad+*, Suryakanth V Gangashetty*, B. Yegnanarayana*, D. Raj Reddy+ *Language Technologies Research Center (LTRC) International

More information

Perceived Speech Quality Prediction for Voice over IP-based Networks

Perceived Speech Quality Prediction for Voice over IP-based Networks Perceived Speech Quality Prediction for Voice over IP-based Networks Lingfen Sun and Emmanuel C. Ifeachor Department of Communication and Electronic Engineering, University of Plymouth, Plymouth PL 8AA,

More information

DVB-T 730. User s Manual

DVB-T 730. User s Manual EPG Program Reservation There are 10 program timers to bring up reminder for a reserved program. 20 seconds before the start of the reserved program, a pop-up window will remind viewer. If no further instruction,

More information

GV-FE420. Visítenos: www.dluxsecurity.com. Cámara IP Fisheye Megapixel

GV-FE420. Visítenos: www.dluxsecurity.com. Cámara IP Fisheye Megapixel GV-FE420 Introduction The GV FE420 is a fisheye camera that allows you to monitor all angles of a location using just one camera. The distorted hemispherical image of the fisheye camera will be converted

More information

ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2

ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA. Multimedia Communications Department, EURECOM, Sophia Antipolis, France 2 3rd European ignal Processing Conference (EUIPCO) ADAPTIVE AND ONLINE PEAKER DIARIZATION FOR MEETING DATA Giovanni oldi, Christophe Beaugeant and Nicholas Evans Multimedia Communications Department, EURECOM,

More information

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES Zhen Qin, Allard Jongman Department of Linguistics, University of Kansas, United States qinzhenquentin2@ku.edu, ajongman@ku.edu

More information

Attenuation (amplitude of the wave loses strength thereby the signal power) Refraction Reflection Shadowing Scattering Diffraction

Attenuation (amplitude of the wave loses strength thereby the signal power) Refraction Reflection Shadowing Scattering Diffraction Wireless Physical Layer Q1. Is it possible to transmit a digital signal, e.g., coded as square wave as used inside a computer, using radio transmission without any loss? Why? It is not possible to transmit

More information

TED-LIUM: an Automatic Speech Recognition dedicated corpus

TED-LIUM: an Automatic Speech Recognition dedicated corpus TED-LIUM: an Automatic Speech Recognition dedicated corpus Anthony Rousseau, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans, France firstname.lastname@lium.univ-lemans.fr

More information

ivms-4500 HD (ios) Mobile Client Software User Manual (V3.4)

ivms-4500 HD (ios) Mobile Client Software User Manual (V3.4) ivms-4500 HD (ios) Mobile Client Software User Manual (V3.4) UD.6L0202D1587A01 Thank you for purchasing our product. This manual applies to ivms-4500 HD (ios) mobile client software; please read it carefully

More information

IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS

IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS IMPROVING TTS BY HIGHER AGREEMENT BETWEEN PREDICTED VERSUS OBSERVED PRONUNCIATIONS Yeon-Jun Kim, Ann Syrdal AT&T Labs-Research, 180 Park Ave. Florham Park, NJ 07932 Matthias Jilka Institut für Linguistik,

More information

English for communication in the workplace

English for communication in the workplace English for communication in the workplace David Bonamy Introduction If you are teaching or planning to teach English to help your students communicate effectively in their present or future place of work,

More information

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Evaluating grapheme-to-phoneme converters in automatic speech recognition context Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating

More information

New Features SMART Sync 2009. Collaboration Feature Improvements

New Features SMART Sync 2009. Collaboration Feature Improvements P L E A S E T H I N K B E F O R E Y O U P R I N T New Features SMART Sync 2009 SMART Sync (formerly SynchronEyes ) is easy-to-use classroom management software that allows teachers to monitor and control

More information

TRUE: an Online Testing Platform for Multimedia Evaluation

TRUE: an Online Testing Platform for Multimedia Evaluation TRUE: an Online Testing Platform for Multimedia Evaluation Santiago Planet, Ignasi Iriondo, Elisa Martínez, José A. Montero GPMM Grup de Recerca en Processament Multimodal Enginyeria i Arquitectura La

More information

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,

More information

Luxembourg-Luxembourg: FL/TERM15 Translation services 2015/S 253-462303. Contract notice. Services

Luxembourg-Luxembourg: FL/TERM15 Translation services 2015/S 253-462303. Contract notice. Services 1 / 12 This notice in TED website: http://ted.europa.eu/udl?uri=ted:notice:462303-2015:text:en:html Luxembourg-Luxembourg: FL/TERM15 Translation services 2015/S 253-462303 Contract notice Services Directive

More information

Curriculum Vitae. Alison M. Trude December 2014. Website: https://sites.google.com/site/alisontrude/ Department of Psychology, University of Chicago

Curriculum Vitae. Alison M. Trude December 2014. Website: https://sites.google.com/site/alisontrude/ Department of Psychology, University of Chicago Curriculum Vitae Alison M. Trude December 2014 Dept. of Psychology University of Chicago 5848 S. University Ave. Chicago, IL 60637 Email: trude@uchicago.edu Website: https://sites.google.com/site/alisontrude/

More information

Creating voices for the Festival speech synthesis system.

Creating voices for the Festival speech synthesis system. M. Hood Supervised by A. Lobb and S. Bangay G01H0708 Creating voices for the Festival speech synthesis system. Abstract This project focuses primarily on the process of creating a voice for a concatenative

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids Synergies and Distinctions Peter Vary RWTH Aachen University Institute of Communication Systems WASPAA, October 23, 2013 Mohonk Mountain

More information

EUROPASS DIPLOMA SUPPLEMENT

EUROPASS DIPLOMA SUPPLEMENT EUROPASS DIPLOMA SUPPLEMENT TITLE OF THE DIPLOMA (ES) Técnico Superior en Sistemas de Telecomunicaciones e Informáticos TRANSLATED TITLE OF THE DIPLOMA (EN) (1) Higher Technician in Telecommunications

More information

Conference interpreting with information and communication technologies experiences from the European Commission DG Interpretation

Conference interpreting with information and communication technologies experiences from the European Commission DG Interpretation Jose Esteban Causo, European Commission Conference interpreting with information and communication technologies experiences from the European Commission DG Interpretation 1 Introduction In the European

More information

Navigation Aid And Label Reading With Voice Communication For Visually Impaired People

Navigation Aid And Label Reading With Voice Communication For Visually Impaired People Navigation Aid And Label Reading With Voice Communication For Visually Impaired People A.Manikandan 1, R.Madhuranthi 2 1 M.Kumarasamy College of Engineering, mani85a@gmail.com,karur,india 2 M.Kumarasamy

More information

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications

VOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,

More information

INTRODUCTION TO COMMUNICATION SYSTEMS AND TRANSMISSION MEDIA

INTRODUCTION TO COMMUNICATION SYSTEMS AND TRANSMISSION MEDIA COMM.ENG INTRODUCTION TO COMMUNICATION SYSTEMS AND TRANSMISSION MEDIA 9/6/2014 LECTURES 1 Objectives To give a background on Communication system components and channels (media) A distinction between analogue

More information

Luxembourg-Luxembourg: FL/SCIENT15 Translation services 2015/S 039-065697. Contract notice. Services

Luxembourg-Luxembourg: FL/SCIENT15 Translation services 2015/S 039-065697. Contract notice. Services 1/12 This notice in TED website: http://ted.europa.eu/udl?uri=ted:notice:65697-2015:text:en:html Luxembourg-Luxembourg: FL/SCIENT15 Translation services 2015/S 039-065697 Contract notice Services Directive

More information

Figure1. Acoustic feedback in packet based video conferencing system

Figure1. Acoustic feedback in packet based video conferencing system Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents

More information

Industry Guidelines on Captioning Television Programs 1 Introduction

Industry Guidelines on Captioning Television Programs 1 Introduction Industry Guidelines on Captioning Television Programs 1 Introduction These guidelines address the quality of closed captions on television programs by setting a benchmark for best practice. The guideline

More information

Recent advances in Digital Music Processing and Indexing

Recent advances in Digital Music Processing and Indexing Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components

More information

placing people first SALARY REPORT Summary of 2014 Bratislava

placing people first SALARY REPORT Summary of 2014 Bratislava placing people first SALARY REPORT Summary of 2014 1 / Cpl Jobs Salary Report Table of content 3 / Summary of 2014 / About Cpl Jobs 4 / IT 7 / Finance 9 / BPO/SSC 2 / Cpl Jobs Salary Report Summary of

More information

Direct Link Networks. Introduction. Physical Properties. Lecture - Ethernet 1. Areas for Discussion. Ethernet (Section 2.6)

Direct Link Networks. Introduction. Physical Properties. Lecture - Ethernet 1. Areas for Discussion. Ethernet (Section 2.6) reas for Discussion Direct Link Networks Joseph Spring School of Computer Science Sc - Computer Network Protocols & rch s ased on Chapter 2, Peterson & Davie, Computer Networks: Systems pproach, 5 th Ed

More information

TranSegId: A System for Concurrent Speech Transcription, Speaker Segmentation and Speaker Identification

TranSegId: A System for Concurrent Speech Transcription, Speaker Segmentation and Speaker Identification TranSegId: A System for Concurrent Speech Transcription, Speaker Segmentation and Speaker Identification Mahesh Viswanathan, Homayoon S.M. Beigi, Alain Tritschler IBM Thomas J. Watson Research Labs Research

More information

USER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE

USER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE USER MANUAL DUET EXECUTIVE USB DESKTOP SPEAKERPHONE DUET EXE OVERVIEW Control Button Panel Connector Panel Loudspeaker Microphone The Duet is a high performance speakerphone for desktop use that can cover

More information

MLLP Transcription and Translation Platform

MLLP Transcription and Translation Platform MLLP Transcription and Translation Platform Alejandro Pérez González de Martos, Joan Albert Silvestre-Cerdà, Juan Daniel Valor Miró, Jorge Civera, and Alfons Juan Universitat Politècnica de València, Camino

More information

AM TRANSMITTERS & RECEIVERS

AM TRANSMITTERS & RECEIVERS Reading 30 Ron Bertrand VK2DQ http://www.radioelectronicschool.com AM TRANSMITTERS & RECEIVERS Revision: our definition of amplitude modulation. Amplitude modulation is when the modulating audio is combined

More information

Online Diarization of Telephone Conversations

Online Diarization of Telephone Conversations Odyssey 2 The Speaker and Language Recognition Workshop 28 June July 2, Brno, Czech Republic Online Diarization of Telephone Conversations Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman Department of

More information

HMM-based Breath and Filled Pauses Elimination in ASR

HMM-based Breath and Filled Pauses Elimination in ASR HMM-based Breath and Filled Pauses Elimination in ASR Piotr Żelasko 1, Tomasz Jadczyk 1,2 and Bartosz Ziółko 1,2 1 Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science

More information

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations

Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Automatic Detection of Laughter and Fillers in Spontaneous Mobile Phone Conversations Hugues Salamin, Anna Polychroniou and Alessandro Vinciarelli University of Glasgow - School of computing Science, G128QQ

More information

SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems

SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems WILLIAM KUFFNER, M.A. Sc., P.Eng, PMP Senior Fire Protection Engineer Director Fire Protection Engineering October 30, 2013 Code Reference

More information

encoding compression encryption

encoding compression encryption encoding compression encryption ASCII utf-8 utf-16 zip mpeg jpeg AES RSA diffie-hellman Expressing characters... ASCII and Unicode, conventions of how characters are expressed in bits. ASCII (7 bits) -

More information