Efficient diphone database creation for MBROLA, a multilingual speech synthesiser



Similar documents
Text-To-Speech Technologies for Mobile Telephony Services

Carla Simões, Speech Analysis and Transcription Software

SPEECH SYNTHESIZER BASED ON THE PROJECT MBROLA

A CHINESE SPEECH DATA WAREHOUSE

PhD THESIS - ABSTRACT - ROMANIAN HMM-BASED TEXT-TO-SPEECH SYNTHESIS WITH INTERACTIVE INTONATION OPTIMISATION

Measuring and synthesising expressivity: Some tools to analyse and simulate phonostyle

Creating voices for the Festival speech synthesis system.

TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Thirukkural - A Text-to-Speech Synthesis System

Master of Arts in Linguistics Syllabus

1. Introduction to Spoken Dialogue Systems

Collecting Polish German Parallel Corpora in the Internet

Tools & Resources for Visualising Conversational-Speech Interaction

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Turkish Radiology Dictation System

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne

An Arabic Text-To-Speech System Based on Artificial Neural Networks

SASSC: A Standard Arabic Single Speaker Corpus


Program curriculum for graduate studies in Speech and Music Communication

Estonian Large Vocabulary Speech Recognition System for Radiology

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

Study Plan for Master of Arts in Applied Linguistics

A Prototype of an Arabic Diphone Speech Synthesizer in Festival

WinPitch LTL II, a Multimodal Pronunciation Software

Bachelors of Science Program in Communication Disorders and Sciences:

NATURAL SOUNDING TEXT-TO-SPEECH SYNTHESIS BASED ON SYLLABLE-LIKE UNITS SAMUEL THOMAS MASTER OF SCIENCE

Understanding Impaired Speech. Kobi Calev, Morris Alper January 2016 Voiceitt

Annotation in Language Documentation

Web Based Maltese Language Text to Speech Synthesiser

CURRICULUM VITAE. Toby Macrae, Ph.D., CCC-SLP

The use of Praat in corpus research

The SweDat Project and Swedia Database for Phonetic and Acoustic Research

Prosodic Phrasing: Machine and Human Evaluation

Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis

Transcription bottleneck of speech corpus exploitation

Processing: current projects and research at the IXA Group

Telecommunication (120 ЕCTS)

Programme Specification (Postgraduate) Date amended: March 2012

Contemporary Linguistics

Linguistics 2288B Introductory General Linguistics

Things to remember when transcribing speech

The Baltic University Programme

Phonetic and phonological properties of the final pitch accent in Catalan declaratives

Master programmes. Documents communications in Russia and abroad Philosophy Philosophy of I. Kant and modern Neokantianism

Why major in linguistics (and what does a linguist do)?

Transcription Format

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages CHARTE NIVEAU B2 Pages 11-14

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak

New Frontiers of Automated Content Analysis in the Social Sciences

Emotion Detection from Speech

Automatic slide assignation for language model adaptation

Collaboration with industry and doctoral education at Politecnico di Milano

Lecture 1-10: Spectrograms

Modern foreign languages

Teaching Methodology Modules. Teaching Skills Modules

Literacy Studies - Master's Degree Programme

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

How To Teach Reading

Instructor Guide. Excelsior College English as a Second Language Writing Online Workshop (ESL-WOW)

Poznan University of Economics

Pronunciation in English

Master of Arts in Teaching English to Speakers of Other Languages (MA TESOL)

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

St. Petersburg College. RED 4335/Reading in the Content Area. Florida Reading Endorsement Competencies 1 & 2. Reading Alignment Matrix

From Portuguese to Mirandese: Fast Porting of a Letter-to-Sound Module Using FSTs

Problems and Prospects in Collection of Spoken Language Data

Rules of Doctoral Studies at the Gdansk University of Technology

Curriculum Vitae. Alison M. Trude December Website: Department of Psychology, University of Chicago

CIVIL Corpus: Voice Quality for Forensic Speaker Comparison

Analysis and Synthesis of Hypo and Hyperarticulated Speech

Are your employees: Miss Miller s Institute will train your employees in: huge benefits for your company.

SPEECH OR LANGUAGE IMPAIRMENT EARLY CHILDHOOD SPECIAL EDUCATION

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

Multilingual and mixed-lingual TTS applications

Joint and Double Degree Doctoral Programs

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

Transcription:

Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland

Why? useful for testing speech models in linguistic work easy manipulation of duration and pitch values easy to create new synthetic voices Recently used for: expressive speech dialogue synthesis voice quality underresourced languages large speech corpora evaluation (ACCS)

Ph.D. thesis context to model different speech styles which will align with the speaker in a consultation situation in a stress situation based on the phonetic and linguistic characteristics of the speaker s speech to design and build a speech synthesis component and a style selection module for an adaptive dialogue system 3

Ph.D. thesis context Adaptive dialogue system to adapt its speech by selecting a speech style appropriate for the speaker s level of speech arousal to improve human-computer interaction at emergency unit control centres and the help desks of call centres, by making the dialogue more natural. 4

Objectives Minimasation of the material to be recorded and annotated for a synthetic voice creation Automatisation of the process of synthetic voice creation 5

voice creation (Dutoit et al. 1996) Creating text corpus list of phones with allophones (PL) list of diphones (DL) DL = PL 2 list of words words in carries sentences Recording corpus with monotonous intonation Segmenting corpus phone level automatically and/or manually extracting diphones Equalising corpus (mbrolation) energy levels normalisation pitch normalisation 6

voice creation (Dutoit et al. 1996) Creating text corpus list of phones with allophones (PL) list of diphones (DL) DL = PL 2 list of words words in carries sentences Recording corpus with monotonous intonation Segmenting corpus phone level automatically and/or manually extracting diphones Equalising corpus (mbrolation) energy levels normalisation pitch normalisation 7

Mbrolation The Mbrolator, is a software suite for voice creation database file in the SEG format diphone filename diphone start & end diphone label diphone subsplitting restrictions put on the diphone files are: 16000Hz sampling rate no longer than 10000 samples context of 800 samples on the left and the right sides 8

Mbrolation 9

Phonetically rich sentence extractor to select the smallest possible set of sentences from a text corpus which will contain the largest number of diphones 10

Available text resources 1623 sentences from the BOSS corpus 8828 sentences from the Jurisdict database 10451 altogether transcription in Polish SAMPA = 37 phonemes Polish Extended-SAMPA (PE-SAMPA) = 40 phonemes 11

Sentence extraction procedure 12

Results SAMPA (38*38=1444 diphones) 1008 diphones in 211 sentences out of 10451 PE-SAMPA (41*41=1681 diphones) 1095 diphones in 201 out of 10451 13

Diphone extractor to automatically cut out diphones from the recordings based on the annotations of those recordings on the phone level 14

Available material 1580 sentences from BOSS corpus recordings in professional recording studio recorded male voice in monotonous intonation annotated in Polish Extended-SAMPA automatic annotation manual correction 15

Diphone extractor architecture 16

Diphone extraction results SAMPA: 1039 diphones from 1580 sentences PE-SAMPA: 1058 diphones from 1580 sentences 17

Tools combination and evaluation 226 sentences rocorded by a male speaker sentences annotated automatically 1002 extracted diphones voice creation Total time: ca. 5 hours 18

Tools combination and evaluation original fully automatic manual correction (micro-voice) 19

Conclusions Phonetically rich sentence extractor and diphone extractor seem to be indispensable in voice creation 20

Acknowledgements This work was partly funded by the research supervisor project grant to Prof. Grażyna Demenko & the author No. N N104 119838 the international cooperation scholarship funded by the Bielefeld University, Germany the scholarship for scientific achievements funded by the Kulczyk Family Foundation The author is very grateful to Prof. Grażyna Demenko for providing the text and speech corpora and to Prof. Dafydd Gibbon for his invaluable advice on the system design and implementation. 21

Thank you! 22