SPEECH DATA ANALYSIS FOR DIPHONE CONSTRUCTION OF A MÄORI ONLINE TEXT-TO-SPEECH SYNTHESIZER
|
|
- Eugenia Skinner
- 7 years ago
- Views:
Transcription
1 SPEECH DATA ANALYSIS FOR DIPHONE CONSTRUCTION OF A MÄORI ONLINE TEXT-TO-SPEECH SYNTHESIZER Mark R. Laws Visiting Post-Doctoral Research Fellow Center for Advanced Computing and Virtual Experiments, College of Engineering, University of Hawaii, USA Abstract One of the main types of speech processing technologies today is text-to-speech (TTS) synthesis. A well established speech synthesizer technique called diphone concatenation uses a speakers processed speech examples to apply a more human-like response to the TTS synthesis system. This methodology has been used to construct many diphone databases for various languages, and was the basis for building the first Mäori diphone database. The database was then tested using a number of TTS tools. Integrating the TTS system with an existing internet based English-Mäori Word Translator has advantages for language learners [1]. The goal to assist people to hear the proper pronunciation of translated Mäori words is the prime motivation. Utilizing internet technology as the means to widely distribute the written and spoken language forms reflects the multi-disciplinary cross purpose nature of this research. Propagation of the Mäori language via the internet can assist in it s long term survival through greater external exposure, awareness and usage. This paper brings together various theoretical contributions in publications and practical applications from research projects to help construct and test a working model for synthesizing Mäori speech. Key Words text-to-speech; synthesis; diphone; database; Mäori language; translator. 1. Introduction TTS is the artificial generation of a language from one state (i.e. text) to another state (i.e. speech) under complex conditions associated with human languages and computing processes [2, 3]. These complex issues relate to linguistics, psychology, biology and acoustics, including the mathematical modeling, the structured computational programming, systems performance, and distribution [4]. Research on TTS synthesis has provided a wealth of documentation, software, tools and systems for theoretical and applied practices. High quality synthetic speech which retains intact many facets of language, linguistic, acoustic and articulatory properties to closely resemble meaningful human speech is the prime objective of researchers, developers and engineers in this field [5]. Research of late suggests that speech synthesis has been one of the fastest growing speech technologies in the computing industry [6]. Thus demand by a large cross-section of end-users to access various applications, available across all system platforms, fuels the prime objective to improve on the current technologies [7]. Many dedicated international conferences, laboratory and collaborative research teams on TTS synthesis have also contributed to that growth. All the world s major languages have been taken through some process in one form or another, to provide numerous working TTS models [2, 3, 5, 6]. 2. Mäori TTS Unfortunately, minor languages have not had the same TTS treatment as the mega-languages, especially any of the Polynesian languages, including Mäori [1, 8, 9]. Therefore, identification of the core components required for a Mäori TTS system have been recently undertaken by a small number of researchers in isolation. Small experimental projects to test a number of methods and principles have resulted. Such as the TTS conversion for Mäori [9] or the computer pronunciation training system [10], and the experimental Mäori TTS system [11]. Thus to hasten the development of a fully functional Mäori TTS synthesizer, joint collaborative research with several renowned international projects were established [4, 6, 8, 11]. This has allowed the Mäori language to be presented to the wider scientific speech community well in advance of other similar languages of the same size. The TTS synthesis technique in this research is derived from concatenating indexed digitized speech examples from a native speaker. This exemplifies the notion of applying a more natural human-like speech capability to the system. This methodology was specifically designed for the construction of the Mäori diphone database containing speech parameters for pitch and duration. 3. Mäori Diphone Analysis This section describes the methodology specifically designed for the construction of a Mäori diphone database used with the MBROLA project [3, 6, 8, 11]. A speech synthesizer based on the concatenation of diphones that reads text via a phoneme transcribed list appended with
2 parameters for pitch and duration, to reproduce the best possible synthesized output [3, 6]. The speech data analysis and the construction of a complete Mäori diphone database first began with a small test diphone experiment with the MBROLA project as described in [8, 11]. This section expands on that description to which a complete diphone database of the Mäori language was developed. This outlines the process involved with guidelines to consider before and during full construction. 4. What is a diphone? The term seems to be used only amongst the speech synthesis research fraternity, as it is not a common term used in linguistics by phoneticians or phonologists. There is no entry in the Encyclopedic Dictionary of Language and Linguistics [12] or in the Collins English Dictionary [13] to describe what the word actually means. Still, it has been best described as all the phonemic transcriptions taken between all the possible phonemes used in a particular language. For example, there are over 1,400 diphones for English, 1,200 for French, 1,800 for German and 800 for Spanish. Yet, just over 230 for Mäori [6, 8]. Thus each language has varying numbers of diphones based on the total number of possible phoneme combinations. Then as a general rule, the number of diphones in a language is the number of phonemes squared. The main feature of the diphone is the phonemic transitional boundary between two adjacent phonemes. Therefore, boundary holds the rich co-articulation information that is an important feature of natural continuous speech [3, 4, 6, 14]. Their main interest in synthesis is that they minimize concatenation problems, since they involve most of the transitions and coarticulations between phones... [6]. The current method to derive all the diphones from a speech data set is by hand segmenting the transitions between two neighboring phonemes. Starting from one phonemes stable state position (usually in the middle), across the articulation channel, to the middle of the next phonemes stable position. Silence is also classed as a consonant phoneme (e.g. labeled _ ), usually taken at the beginning and ending of the phonemes to match the silences before, between and after words; they are therefore an important unit within the diphone inventory. Segmenting all the possible diphones from any language is a time consuming process, thus preparation is the key to successfully undertaking this task. Preparing a complete inventory of the diphones took into consideration the following points. The first phase involved the transcribing all the consonant-vowel combinations of CV, VC, VV, and CC. Identifying all the diphones in Mäori included mapping all the possible combinations of vowel and consonant clusters, noting that the only CC pairs included the initial silence before all the consonants (see Table 1, 2 and 4). Note that the phonetic notion used in this paper is SAMPA, as this notation is the standard used by the MBROLA project when working with diphone databases. More details on this and other symbolic coding systems is outlined in [15]. Table 1: The Mäori diphone inventory matrix showing all the possible CV combinations. Note that these three tables are also used to identify the impossible diphone combinations that are redundant in the language, these are left blank. CV: e i A o u ei ai _ p p-e p-i p-a p-o p-u p-ei p-ai p-oi p-@u t t-e t-i t-a t-o t-u t-ei t-ai t-oi t-@u k k-e k-i k-a k-o k-u k-ei k-ai k-oi k-@u f f-e f-i f-a f-o f-u f-ai f-@u h h-e h-i h-a h-o h-u h-ei h-ai h-oi h-@u m m-e m-i m-a m-o m-u m-ei m-ai m-@u n n-e n-i n-a n-o n-u n-ei n-ai n-oi n-@u N N-e N-i N-A N-o N-u N-aI N-OI N-@U r r-e r-i r-a r-o r-u r-ei r-ai r-oi r-@u w w-e w-i w-a w-ai w-@u Table 2: The Mäori diphone inventory matrix showing all the possible VV combinations. VV: e i A o u ei ai _ e e-e ea e-o eu e-_ i i-e i-i i-a i-o i-u i-ai i-_ A A-e A-A A-o A-u A-aI A-@U A-_ o o-e o-a o-o o-u o-_ u u-e u-i ua u-o u-u u-_ ei ei-_ ai ai-e ai-i ai-a ai-o ai-_ OI OI-A @U-o _-e _-i _-A _-o _-u _-ei _-ai _-@U _-_
3 Table 3: All the possible VC combinations of the Mäori diphone inventory matrix. VC: p t k f h m n N r w e e-p e-t e-k e-f e-h e-m e-n e-n e-r e-w i i-p i-t i-k i-f i-h i-m i-n i-n i-r i-w A A-p A-t A-k A-f A-h A-m A-n A-N A-r A-w o o-p o-t o-k o-f o-h o-m o-n o-n o-r o-w u u-p u-t u-k u-f u-h u-m u-n u-n u-r u-w ei ei-t ei-k ei-h ei-m ei-n ei-n ei-r ai ai-p ai-t ai-k ai-f ai-h ai-m ai-n ai-n ai-r ai-w OI OI-p OI-t OI-k OI-m OI-n @U-w -p _-t _-k _-f _-h _-m _-n _-N _-r _-w The second part of the diphone inventory process was to compile a set of Mäori words that each contained a predetermined diphone. Therefore, a list of words was created (see Table 4) [8]. The words used for the inventory (or corpus) were randomly selected from the English-Mäori Word Translator database [1]. As long as each selected word contained one of the diphones which was derived from the above three consonant and vowel combinations tables, it was accepted. It was decided that as there were 230 diphones (including silence), as a minimum, there would be 230 separate words assigned for each individual diphone. There are cases where many of the single words contained more than one diphone, but to reduce confusion the one word for one diphone criteria upheld. Table 4: Examples of the Mäori word inventory with associated diphones and recorded speech files. Word Left Right Word Number: Phoneme: Phoneme: Utterance: Wave Files: 1 Silence ##2216.wav 2 auahi #@u2216.wav 3 _ A a #a2216.wav 35 A _ huia a#2216.wav 36 whakaauraki a@u2216.wav 37 A A a aa2216.wav 142 n A na na2216.wav 143 N A nga Nga2216.wav 144 n ai naihi nai2216.wav 228 w ai wai wai2216.wav 229 w e wero we2216.wav 230 w i wiki wi2216.wav The third phase required the 230 selected Mäori words to be recorded as speech files in an environment that provided very good acoustics low background noise with no reverberation or echo, using high quality industry standard sound equipment (e.g. Digital-to-Analogue tape technology). The native speaker first practiced the intelligibly and pronunciation of all the words. The next step required the speaker to pronounce all the words with a constant pitch this was to enhance the analysis resynthesis procedure undertaken by MBROLA [6]. To maintain constant pitch throughout the entire session, the recording of a single vowel utterance (e.g. A ) on a separate recorder was used as an audible reference. This was played at the beginning of every fifth word utterance to assist the speaker to keep at a steady pitch. The speaker had to speak very intelligibly, but not too slowly so the co-articulation between the phonemes was not lost. Because Mäori vowels are very stable in all positions within a word (e.g. initial, medial and final), there were no severe co-articulation problems encountered. It was also very important to only use one recording session, this is to be consistent with the equalization steps involved with the segmentation process. The speech utterances were saved directly to DAT at 48 khz 16 bit mono. Once the recording session was completed, the speech was transferred as a computer readable file. The file was processed for normalization and then down-sampled to 22,050 Hz 16 bit mono. The file was then divided into the many word labeled units and saved in separate manageable wave files. This made the diphone segmentation process an easier task to perform. The actual segmentation and labeling of the diphones was the painstaking part of this work. Figure 1 shows how the diphones were processed using an application called Diphone Studio, which was specifically designed to assist with the construction of diphone databases for MBROLA [16]. Hand segmentation of each file with three sampling points was undertaken, indicating the left, the middle and right boundaries. These three boundaries represent the start of the first phoneme sample, the crossover point between the two phonemes and the end of the second phoneme sample (see Table 5).
4 Figure 1: An example of the diphone unit A-f being segmented from the word awhina, the sampling points are shown in milliseconds. Table 5: Examples of the Mäori diphone data text file generated by the Diphone Studio segmentation tool. Note, a full listing is published in [8]. Diphone File: Diphone Label: Left Boundary: Right Boundary: Middle Boundary: d0.raw _-_ d1.raw _-@U d2.raw _-A d100.raw h-ei d101.raw h-i d102.raw h-o d227.raw w-ai d228.raw w-e d229.raw w-i To monitor the quality of the diphone segmentation, the pitch and the energy, each diphone was checked by testing it with a number of words/phrases (e.g. A haka mana para tawaa NA fa). This was an effective way to monitor the level and tone each diphone should not vary too far from the pitch and energy. This first evaluation step only gave a very crude reproduction of the speech, but in terms of checking all adjoining diphones, it was very successful, the diphones performed well within the required specifications from MBROLA and the Diphone Studio documentation, including an assurance from the native speaker. Post-processing of all the diphones took place with the entire inventory being compiled into; i) A data text file containing all the diphone details; ii) Each diphone was extracted from their original word examples and saved in their.raw format (e.g. headerless wave files). The files contain no other linguistic information. The diphone database was then sent to the MBROLA project team for further processing, where the data analysis and re-synthesis procedures were performed, then the compiled database was returned for testing and evaluation. 5. Testing the Mäori Diphone Database The compiled (or binary) Mäori diphone database named mb1 was initially tested using the MBROLIGN speech analysis and synthesis tool [6]. MBROLIGN is a speech and phonetic alignment program that generates files based on prosody (i.e. mainly pitch and duration). This tool can automatically extract prosodic features from digitized speech examples and aligns the phoneme labels from the text example with pitch and duration vectors. These prosodic files can be read, and by calling the correct diphones, it produces the appropriate high quality speech synthesis output [6]. This tool is ideal for testing all the
5 diphones in the database with various diphone-to-diphone combinations including manual adjustments to the prosodic vectors to fine tune the speech outputs for each utterance. For example, Figure 2 and Table 6 illustrate the process involved to test all the diphones required from the database to construct the phrase kia ora. First a speech file and the phonetic transcription of the diphones are loaded, then a synthesized version of the phrase is generated and saved as an output file along with the prosodic file containing the duration and pitch points. The prosodic information for each diphone will contain its duration in milliseconds, and an optional series of pitch pattern points composed of two floating integers, the position of the pitch pattern point within the diphone (in percentage of its total duration), and the pitch value (in Hz) at this position [14]. Figure 2: The MBROLIGN test toolbox used for the phrase kia ora, showing the original speech file example (kiaora.wav), the phonetic transcription, the synthesised output file (kiaoras.wav) and the pitch rise/fall analysis. Table 6: The phrase kia ora requires the following diphone examples and prosodic features. MBROLIGN generates parameters for each diphone based on the pitch analysis of the original speech file example and saves these in an output file (kiaora.pho). Diphone Diphone Duration Pitch Point-1 Pitch Value-1 Pitch Point-2 Pitch Value-2 Name: Units: (msec) (%) (Hz) (%) (Hz) -_ k _-k i k-i A i-a _ A-_ o _-o r o-r A r-a _ A-_ These initial test results were very acceptable, given the duration and pitch pattern points were adjusted to compensate for slight variations in the quality of some diphone units. The speech synthesis output wave files all had a very close resemblance to the original Mäori speaker s voice. This represents a very good reproduction that far exceeds all other attempts made with the current TTS synthesis tools to pronounce Mäori with the correct intonation [8].
6 Although the MBROLA synthesizer implements a crossplatform working system that can access many diphone databases to synthesize those languages with high quality speech, there is one setback with this system. It is not a complete TTS package, as every word, phrase or sentence requires a prosodic file to be generated before it can be synthesized. This means a text-to-phonetic processing system is also required to enable the automatic generation of the prosodic commands. 6. Festival Speech Synthesis System The Festival Speech Synthesis System [17] integrates with MBROLA to deliver a complete TTS system ready for installation with other applications such as the English-Mäori Online Word Translator [1]. MBROLA and its many users prefer to use the Festival system because it offers full TTS compatibility with their current diphone databases. Another feature that Festival now offers is the option to build new voices based on either an already supported language or a new language [18]. Experiments to build a Festival diphone database with the existing Mäori segmented and labeled diphones has already begun [8] 7. Conclusion Fundamental computational linguistics provided the model to build a working diphone database for a TTS system based on the MBROLA project. By utilizing a variety of specialized tools to create, analyze then test the collection of customized files and scripts sped up the developmental cycle. Finally, by utilizing a team of local and international researchers to provide the technical assistance when appropriate or when requested, was extremely valuable. It is apparent that this entire process of building a functional TTS system for Mäori has opened up further research issues that need to be addressed in the near future. Important issues associated with advancing the prosodic parameters of duration and pitch, with appropriate tokens for breaks, stress placement, syllable prediction, intonation and lexical rules. All these are rich in the domain of computational linguistic problem solving tasks [19]. The construction of the Mäori diphone database is the first step towards achieving this. 8. Acknowledgements The New Zealand Foundation for Research Science and Technology, for the Connectionist-based Intelligent Information Systems Grants (OU0808 and AUTX0201) and the Post-Doctoral Science and Technology Fellowship (AITX0205). References [1] M. Laws, R. Kilgour, M. Watts, Analysis of the New Zealand and Mäori On-Line Translator, Proceedings of the Fifth Joint Conference on Information Sciences, Atlantic City, NJ. USA, [2] B. Möbius, J. Schroeter, J. Santen, R. Sproat, J. Olive, Recent Advances in Multilingual Text-To-Speech Synthesis, AT&T Bell Laboratories, Murray Hill, New Jersey, [3] T. Dutoit, An introduction to text-to-speech synthesis Dordrecht, Kluwer Academic Publishers, [4] R. Sproat, Multilingual Text Analysis for Text-to- Speech Synthesis. 12th European Conference on Artificial Intelligence [5] J.P.H. van Santen, R. Sproat, J. Olive (Eds) Speech synthesis Heidelberg, Springer Verlag, [6] T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. Van Der Vrecken, The MBROLA Project: Towards a Set of High- Quality Speech Synthesizers Free of Use for Non- Commercial Purposes, ICSLP 96, Philadelphia, [7] D.G. Stork, HAL'S legacy: 2001's computer as dream and reality Cambridge Massachusetts, The MIT Press, [8] M.R. Laws, (2001) Mäori Language Integration in the Age of Information Technology: A Computational Approach. Doctor of Philosophy Thesis, University of Otago, Dunedin, New Zealand. [9] K.P.H. Sullivan, Text-to-speech conversion for Mäori: a choice of methods?, Department of Phonetics, Umeå University, [10] P. Kingi, (1992) Te whakahuatia reo Mäori/A Mäori speech training system. Unpublished BSc Dissertation Thesis, University of Waikato, New Zealand. [11] M.R. Laws, Development of a Mäori Database for Speech Perception and Generation, 5th Joint Conference on Information Sciences, Atlantic City, NJ, [12] D. Crystal, An encyclopedic dictionary of language and linguistics London, Penguin, [13] P. Hanks, T.H. Long, L. Urdang (Eds), Collins dictionary of the English language: an extensive coverage of contemporary international and Australian English Collins, Sydney, [14] F. Malfrére, T. Dutoit, Speech Synthesis for Textto-Speech Alignment and Prosodic Feature Extraction. Proceedings of the International Symposium on Circuits and Systems, 1997, [15] J. Wells, SAMPA: Computer readable phonetic alphabet, Department of Phonetics and Linguistics, University College London, [16] A. Dirksen, L Menert, Diphone Studio Manual, Utrecht, Netherlands, [17] A. Black, P. Taylor, R. Caley, The Festival Speech Synthesis System, Centre for Speech Technology Research, University of Edinburgh, Scotland, [18] A. Black, K. Lenzo, Building Voices in the Festival Speech Synthesis System, Carnegie Mellon University, [19] R. Sproat, Algorithms for Speech Recognition and Language Processing: Part III, Finite State Methods in Language Processing, COLING'96. >
Thirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationEfficient diphone database creation for MBROLA, a multilingual speech synthesiser
Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationCreating voices for the Festival speech synthesis system.
M. Hood Supervised by A. Lobb and S. Bangay G01H0708 Creating voices for the Festival speech synthesis system. Abstract This project focuses primarily on the process of creating a voice for a concatenative
More informationTEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
More informationDesign and Implementation of Text To Speech Conversion for Visually Impaired People
Design and Implementation of Text To Speech Conversion for Visually Impaired People Itunuoluwa Isewon* Department of Computer and Information Sciences Covenant University PMB 1023, Ota, Nigeria * Corresponding
More informationThings to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
More informationSWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne
SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne Published in: Proceedings of Fonetik 2008 Published: 2008-01-01
More informationCorpus Design for a Unit Selection Database
Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationSPEECH SYNTHESIZER BASED ON THE PROJECT MBROLA
Rajs Arkadiusz, Banaszak-Piechowska Agnieszka, Drzycimski Paweł. Speech synthesizer based on the project MBROLA. Journal of Education, Health and Sport. 2015;5(12):160-164. ISSN 2391-8306. DOI http://dx.doi.org/10.5281/zenodo.35266
More informationDevelop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationL2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES
L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES Zhen Qin, Allard Jongman Department of Linguistics, University of Kansas, United States qinzhenquentin2@ku.edu, ajongman@ku.edu
More informationWeb Based Maltese Language Text to Speech Synthesiser
Web Based Maltese Language Text to Speech Synthesiser Buhagiar Ian & Micallef Paul Faculty of ICT, Department of Computer & Communications Engineering mail@ian-b.net, pjmica@eng.um.edu.mt Abstract An important
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationThai Pronunciation and Phonetic Symbols Prawet Jantharat Ed.D.
Thai Pronunciation and Phonetic Symbols Prawet Jantharat Ed.D. This guideline contains a number of things concerning the pronunciation of Thai. Thai writing system is a non-roman alphabet system. This
More informationWorkshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking
Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking The perception and correct identification of speech sounds as phonemes depends on the listener extracting various
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationProgram curriculum for graduate studies in Speech and Music Communication
Program curriculum for graduate studies in Speech and Music Communication School of Computer Science and Communication, KTH (Translated version, November 2009) Common guidelines for graduate-level studies
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationThe sound patterns of language
The sound patterns of language Phonology Chapter 5 Alaa Mohammadi- Fall 2009 1 This lecture There are systematic differences between: What speakers memorize about the sounds of words. The speech sounds
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationColaboradores: Contreras Terreros Diana Ivette Alumna LELI N de cuenta: 191351. Ramírez Gómez Roberto Egresado Programa Recuperación de pasantía.
Nombre del autor: Maestra Bertha Guadalupe Paredes Zepeda. bparedesz2000@hotmail.com Colaboradores: Contreras Terreros Diana Ivette Alumna LELI N de cuenta: 191351. Ramírez Gómez Roberto Egresado Programa
More informationCAMBRIDGE FIRST CERTIFICATE Listening and Speaking NEW EDITION. Sue O Connell with Louise Hashemi
CAMBRIDGE FIRST CERTIFICATE SKILLS Series Editor: Sue O Connell CAMBRIDGE FIRST CERTIFICATE Listening and Speaking NEW EDITION Sue O Connell with Louise Hashemi PUBLISHED BY THE PRESS SYNDICATE OF THE
More informationThe Role of Listening in Language Acquisition; the Challenges & Strategies in Teaching Listening
International Journal of Education and Information Studies. ISSN 2277-3169 Volume 4, Number 1 (2014), pp. 59-63 Research India Publications http://www.ripublication.com The Role of Listening in Language
More informationA Prototype of an Arabic Diphone Speech Synthesizer in Festival
Department of Linguistics and Philology Språkteknologiprogrammet (Language Technology Programme) Master s thesis in Computational Linguistics A Prototype of an Arabic Diphone Speech Synthesizer in Festival
More informationACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING
ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING Dennis P. Driscoll, P.E. and David C. Byrne, CCC-A Associates in Acoustics, Inc. Evergreen, Colorado Telephone (303)
More informationIntonation difficulties in non-native languages.
Intonation difficulties in non-native languages. Irma Rusadze Akaki Tsereteli State University, Assistant Professor, Kutaisi, Georgia Sopio Kipiani Akaki Tsereteli State University, Assistant Professor,
More informationINTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS
INTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS By Dr. Kay MacPhee President/Founder Ooka Island, Inc. 1 Integrating the Common Core Standards into Interactive, Online
More informationDevelopmental Verbal Dyspraxia Nuffield Approach
Developmental Verbal Dyspraxia Nuffield Approach Pam Williams, Consultant Speech & Language Therapist Nuffield Hearing & Speech Centre RNTNE Hospital, London, Uk Outline of session Speech & language difficulties
More informationCorpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System
Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Arun Soman, Sachin Kumar S., Hemanth V. K., M. Sabarimalai Manikandan, K. P. Soman Centre for Excellence in Computational
More informationPoints of Interference in Learning English as a Second Language
Points of Interference in Learning English as a Second Language Tone Spanish: In both English and Spanish there are four tone levels, but Spanish speaker use only the three lower pitch tones, except when
More informationstress, intonation and pauses and pronounce English sounds correctly. (b) To speak accurately to the listener(s) about one s thoughts and feelings,
Section 9 Foreign Languages I. OVERALL OBJECTIVE To develop students basic communication abilities such as listening, speaking, reading and writing, deepening their understanding of language and culture
More informationMother Tongue Influence on Spoken English
Mother Tongue Influence on Spoken English Shruti Pal Central Institute of Education (India) palshruti27@gmail.com Abstract Pronunciation is not a major problem in a language classroom until it hinders
More informationA Computer Program for Pronunciation Training and Assessment in Japanese Language Classrooms Experimental Use of Pronunciation Check
A Computer Program for Pronunciation Training and Assessment in Japanese Language Classrooms Experimental Use of Pronunciation Check 5 CHIHARU TSURUTANI, Griffith University, Australia 10 E-learning is
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationModern foreign languages
Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007
More informationPhonetic and phonological properties of the final pitch accent in Catalan declaratives
Abstract Phonetic and phonological properties of the final pitch accent in Catalan declaratives Eva Estebas-Vilaplana * This paper examines the phonetic and phonological properties of the last pitch accent
More informationWinPitch LTL II, a Multimodal Pronunciation Software
WinPitch LTL II, a Multimodal Pronunciation Software Philippe MARTIN UFRL Université Paris 7 92, Ave. de France 75013 Paris, France philippe.martin@linguist.jussieu.fr Abstract We introduce a new version
More informationPhonetic Perception and Pronunciation Difficulties of Russian Language (From a Canadian Perspective) Alyssa Marren
The Arbutus Review, Vol. 2, No. 1 (2011) 75 Phonetic Perception and Pronunciation Difficulties of Russian Language (From a Canadian Perspective) Alyssa Marren Abstract: This study looked at the most important
More informationAnalysis and Synthesis of Hypo and Hyperarticulated Speech
Analysis and Synthesis of and articulated Speech Benjamin Picart, Thomas Drugman, Thierry Dutoit TCTS Lab, Faculté Polytechnique (FPMs), University of Mons (UMons), Belgium {benjamin.picart,thomas.drugman,thierry.dutoit}@umons.ac.be
More informationReading Competencies
Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationCURRICULUM VITAE. Toby Macrae, Ph.D., CCC-SLP
CURRICULUM VITAE Toby Macrae, Ph.D., CCC-SLP Assistant Professor School of Communication Science and Disorders Florida State University 201 W. Bloxham Street Tallahassee, Florida 32306-1200 toby.macrae@cci.fsu.edu
More informationPronunciation in English
The Electronic Journal for English as a Second Language Pronunciation in English March 2013 Volume 16, Number 4 Title Level Publisher Type of product Minimum Hardware Requirements Software Requirements
More informationFunctional Auditory Performance Indicators (FAPI)
Functional Performance Indicators (FAPI) An Integrated Approach to Skill FAPI Overview The Functional (FAPI) assesses the functional auditory skills of children with hearing loss. It can be used by parents,
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationEnterprise Voice Technology Solutions: A Primer
Cognizant 20-20 Insights Enterprise Voice Technology Solutions: A Primer A successful enterprise voice journey starts with clearly understanding the range of technology components and options, and often
More informationVOICE RECOGNITION KIT USING HM2007. Speech Recognition System. Features. Specification. Applications
VOICE RECOGNITION KIT USING HM2007 Introduction Speech Recognition System The speech recognition system is a completely assembled and easy to use programmable speech recognition circuit. Programmable,
More informationCanalis. CANALIS Principles and Techniques of Speaker Placement
Canalis CANALIS Principles and Techniques of Speaker Placement After assembling a high-quality music system, the room becomes the limiting factor in sonic performance. There are many articles and theories
More informationOn the distinction between 'stress-timed' and 'syllable-timed' languages Peter Roach
(paper originally published in Linguistic Controversies, ed. D. Crystal, 1982, pp. 73-79. The paper is now badly out of date, but since it is still frequently cited I feel it is worth making it available
More informationTOOLS FOR RESEARCH AND EDUCATION IN SPEECH SCIENCE
TOOLS FOR RESEARCH AND EDUCATION IN SPEECH SCIENCE Ronald A. Cole Center for Spoken Language Understanding, Univ. of Colorado, Boulder ABSTRACT The Center for Spoken Language Understanding (CSLU) provides
More informationSASSC: A Standard Arabic Single Speaker Corpus
SASSC: A Standard Arabic Single Speaker Corpus Ibrahim Almosallam, Atheer AlKhalifa, Mansour Alghamdi, Mohamed Alkanhal, Ashraf Alkhairy The Computer Research Institute King Abdulaziz City for Science
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationBBC Learning English - Talk about English July 18, 2005
BBC Learning English - July 18, 2005 About this script Please note that this is not a word for word transcript of the programme as broadcast. In the recording and editing process changes may have been
More informationAspects of North Swedish intonational phonology. Bruce, Gösta
Aspects of North Swedish intonational phonology. Bruce, Gösta Published in: Proceedings from Fonetik 3 ; Phonum 9 Published: 3-01-01 Link to publication Citation for published version (APA): Bruce, G.
More informationTeaching and Learning Mandarin Tones. 19 th May 2012 Rob Neal
Teaching and Learning Mandarin Tones 19 th May 2012 Rob Neal Aims of the presentation Reflect on why tones are so challenging for Anglophone learners Review several empirical studies which have examined
More informationBachelors of Science Program in Communication Disorders and Sciences:
Bachelors of Science Program in Communication Disorders and Sciences: Mission: The SIUC CDS program is committed to multiple complimentary missions. We provide support for, and align with, the university,
More informationIndiana Department of Education
GRADE 1 READING Guiding Principle: Students read a wide range of fiction, nonfiction, classic, and contemporary works, to build an understanding of texts, of themselves, and of the cultures of the United
More informationTHE MEASUREMENT OF SPEECH INTELLIGIBILITY
THE MEASUREMENT OF SPEECH INTELLIGIBILITY Herman J.M. Steeneken TNO Human Factors, Soesterberg, the Netherlands 1. INTRODUCTION The draft version of the new ISO 9921 standard on the Assessment of Speech
More informationNATURAL SOUNDING TEXT-TO-SPEECH SYNTHESIS BASED ON SYLLABLE-LIKE UNITS SAMUEL THOMAS MASTER OF SCIENCE
NATURAL SOUNDING TEXT-TO-SPEECH SYNTHESIS BASED ON SYLLABLE-LIKE UNITS A THESIS submitted by SAMUEL THOMAS for the award of the degree of MASTER OF SCIENCE (by Research) DEPARTMENT OF COMPUTER SCIENCE
More informationTHE COLLECTION AND PRELIMINARY ANALYSIS OF A SPONTANEOUS SPEECH DATABASE*
THE COLLECTION AND PRELIMINARY ANALYSIS OF A SPONTANEOUS SPEECH DATABASE* Victor Zue, Nancy Daly, James Glass, David Goodine, Hong Leung, Michael Phillips, Joseph Polifroni, Stephanie Seneff, and Michal
More informationGenetic Algorithms and Sudoku
Genetic Algorithms and Sudoku Dr. John M. Weiss Department of Mathematics and Computer Science South Dakota School of Mines and Technology (SDSM&T) Rapid City, SD 57701-3995 john.weiss@sdsmt.edu MICS 2009
More informationMathematical modeling of speech acoustics D. Sc. Daniel Aalto
Mathematical modeling of speech acoustics D. Sc. Daniel Aalto Inst. Behavioural Sciences / D. Aalto / ORONet, Turku, 17 September 2013 1 Ultimate goal Predict the speech outcome of oral and maxillofacial
More informationStrand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details
Strand: Reading Literature Key Ideas and Craft and Structure Integration of Knowledge and Ideas RL.K.1. With prompting and support, ask and answer questions about key details in a text RL.K.2. With prompting
More informationMergers in Produc.on and Percep.on. Ka.e Drager (University of Hawai i at Mānoa) Jennifer Hay (University of Canterbury)
Mergers in Produc.on and Percep.on Ka.e Drager (University of Hawai i at Mānoa) Jennifer Hay (University of Canterbury) Big huge thank you to: Our collaborators: Paul Warren, Bryn Thomas, and Rebecca Clifford
More informationEmpowering. American Translation Partners. We ll work with you to determine whether your. project needs an interpreter or translator, and we ll find
American Translation Partners Empowering Your Organization Through Language. We ll work with you to determine whether your project needs an interpreter or translator, and we ll find the right language
More informationThe. Languages Ladder. Steps to Success. The
The Languages Ladder Steps to Success The What is it? The development of a national recognition scheme for languages the Languages Ladder is one of three overarching aims of the National Languages Strategy.
More informationCircuits and Boolean Expressions
Circuits and Boolean Expressions Provided by TryEngineering - Lesson Focus Boolean logic is essential to understanding computer architecture. It is also useful in program construction and Artificial Intelligence.
More informationSPEECH AUDIOMETRY. @ Biswajeet Sarangi, B.Sc.(Audiology & speech Language pathology)
1 SPEECH AUDIOMETRY Pure tone Audiometry provides only a partial picture of the patient s auditory sensitivity. Because it doesn t give any information about it s ability to hear and understand speech.
More informationNFL Quarterback Bernie Kosar told
RESEARCH PAPER VOLUME 1 Why It Is Important to Teach Phonemic Awareness and Alphabet Recognition by Dr. Cathy Collins Block Professor of Education Texas Christian University NFL Quarterback Bernie Kosar
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationInformation and Communication Technology and the Criminal Justice System in 10 years time
Information and Communication Technology and the Criminal Justice System in 10 years time Laurens Cloete CSIR Information Society Technologies Centre (ISTC) Presented at: A New Decade of Criminal Justice
More informationFrom Portuguese to Mirandese: Fast Porting of a Letter-to-Sound Module Using FSTs
From Portuguese to Mirandese: Fast Porting of a Letter-to-Sound Module Using FSTs Isabel Trancoso 1,Céu Viana 2, Manuela Barros 2, Diamantino Caseiro 1, and Sérgio Paulo 1 1 L 2 F - Spoken Language Systems
More informationMultichannel Audio Line-up Tones
EBU Tech 3304 Multichannel Audio Line-up Tones Status: Technical Document Geneva May 2009 Tech 3304 Multichannel audio line-up tones Contents 1. Introduction... 5 2. Existing line-up tone arrangements
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationCHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14
CHARTES D'ANGLAIS SOMMAIRE CHARTE NIVEAU A1 Pages 2-4 CHARTE NIVEAU A2 Pages 5-7 CHARTE NIVEAU B1 Pages 8-10 CHARTE NIVEAU B2 Pages 11-14 CHARTE NIVEAU C1 Pages 15-17 MAJ, le 11 juin 2014 A1 Skills-based
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationDegree of highness or lowness of the voice caused by variation in the rate of vibration of the vocal cords.
PITCH Degree of highness or lowness of the voice caused by variation in the rate of vibration of the vocal cords. PITCH RANGE The scale of pitch between its lowest and highest levels. INTONATION The variations
More informationSDL BeGlobal: Machine Translation for Multilingual Search and Text Analytics Applications
INSIGHT SDL BeGlobal: Machine Translation for Multilingual Search and Text Analytics Applications José Curto David Schubmehl IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200
More informationWhat Is Linguistics? December 1992 Center for Applied Linguistics
What Is Linguistics? December 1992 Center for Applied Linguistics Linguistics is the study of language. Knowledge of linguistics, however, is different from knowledge of a language. Just as a person is
More informationPerceptual experiments sir-skur-spur-stir
Perceptual experiments sir-skur-spur-stir Amy Beeston & Guy Brown 19 May 21 1 Introduction 2 Experiment 1: cutoff Set up Results 3 Experiment 2: reverse Set up Results 4 Discussion Introduction introduction
More informationSYSTEM DESIGN AND THE IMPORTANCE OF ACOUSTICS
SYSTEM DESIGN AND THE IMPORTANCE OF ACOUSTICS n Will your communication or emergency notification system broadcast intelligible speech messages in addition to alarm tones? n Will your system include multiple
More informationAlignment of the Hawaii Preschool Content Standards With HighScope s Preschool Child Observation Record (COR), 2nd edition
Alignment of the Hawaii Preschool Content Standards With HighScope s Preschool Child Observation Record (COR), 2nd edition The following chart shows how items from the Hawaii Preschool Content Standards
More informationReading Specialist (151)
Purpose Reading Specialist (151) The purpose of the Reading Specialist test is to measure the requisite knowledge and skills that an entry-level educator in this field in Texas public schools must possess.
More informationCBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC
CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC 1. INTRODUCTION The CBS Records CD-1 Test Disc is a highly accurate signal source specifically designed for those interested in making
More informationTEACHER CERTIFICATION STUDY GUIDE LANGUAGE COMPETENCY AND LANGUAGE ACQUISITION
DOMAIN I LANGUAGE COMPETENCY AND LANGUAGE ACQUISITION COMPETENCY 1 THE ESL TEACHER UNDERSTANDS FUNDAMENTAL CONCEPTS AND KNOWS THE STRUCTURE AND CONVENTIONS OF THE ENGLISH LANGUAGE Skill 1.1 Understand
More informationSchool of Computer Science
School of Computer Science Computer Science - Honours Level - 2014/15 October 2014 General degree students wishing to enter 3000- level modules and non- graduating students wishing to enter 3000- level
More information(in Speak Out! 23: 19-25)
1 A SMALL-SCALE INVESTIGATION INTO THE INTELLIGIBILITY OF THE PRONUNCIATION OF BRAZILIAN INTERMEDIATE STUDENTS (in Speak Out! 23: 19-25) Introduction As a non-native teacher of English as a foreign language
More informationKeywords academic writing phraseology dissertations online support international students
Phrasebank: a University-wide Online Writing Resource John Morley, Director of Academic Support Programmes, School of Languages, Linguistics and Cultures, The University of Manchester Summary A salient
More informationPTE Academic. Score Guide. November 2012. Version 4
PTE Academic Score Guide November 2012 Version 4 PTE Academic Score Guide Copyright Pearson Education Ltd 2012. All rights reserved; no part of this publication may be reproduced without the prior written
More informationA CHINESE SPEECH DATA WAREHOUSE
A CHINESE SPEECH DATA WAREHOUSE LUK Wing-Pong, Robert and CHENG Chung-Keng Department of Computing, Hong Kong Polytechnic University Tel: 2766 5143, FAX: 2774 0842, E-mail: {csrluk,cskcheng}@comp.polyu.edu.hk
More informationPERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*
More information20 by Renaissance Learning, Inc. All rights reserved. Printed in the United States of America.
R4 Advanced Technology for, Renaissance, Renaissance Learning, Renaissance Place, STAR Early Literacy, STAR Math, and STAR Reading, are trademarks of Renaissance Learning, Inc., and its subsidiaries, registered,
More informationRunning head: PROCESSING REDUCED WORD FORMS. Processing reduced word forms: the suffix restoration effect. Rachèl J.J.K. Kemps, Mirjam Ernestus
Running head: PROCESSING REDUCED WORD FORMS Processing reduced word forms: the suffix restoration effect Rachèl J.J.K. Kemps, Mirjam Ernestus Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationBBC Learning English - Talk about English July 11, 2005
BBC Learning English - July 11, 2005 About this script Please note that this is not a word for word transcript of the programme as broadcast. In the recording and editing process changes may have been
More information