SASSC: A Standard Arabic Single Speaker Corpus
|
|
|
- Olivia Stevens
- 10 years ago
- Views:
Transcription
1 SASSC: A Standard Arabic Single Speaker Corpus Ibrahim Almosallam, Atheer AlKhalifa, Mansour Alghamdi, Mohamed Alkanhal, Ashraf Alkhairy The Computer Research Institute King Abdulaziz City for Science and Technology Riyadh, Saudi Arabia {ialmosallam,aalkhalifa,mghamdi,alkanhal,alkhairy}@kacst.edu.sa Abstract This paper describes the process of collecting and recording a large scale Arabic single speaker speech corpus. The collection and recording of the corpus was supervised by professional linguists and was recorded by a professional speaker in a soundproof studio using specialized equipments and stored in high quality formats. The pitch of the speaker (EGG) was also recorded and synchronized with the speech signal. Careful attempts were taken to insure the quality and diversity of the read text to insure maximum presence and combinations of words and phonemes. The corpus consists of 51 thousand words that required 7 hours of recording, and it is freely available for academic and research purposes. Index Terms: Text-to-Speech, Arabic Speech Corpus 1. Introduction In the last few years, an increasing number of research projects in Natural Language Processing (NLP) have developed an interest in the Arabic language. Arabic is the official language in more than 21 countries and has two major forms: Standard Arabic and Dialectical Arabic. While dialectical Arabic is the native language of many Arabic speakers nowadays, Standard Arabic is the formal language used in education, culture and media. Standard Arabic consists of: Modern Standard Arabic (MSA) and Classical Arabic. Though MSA is derived from Classical Arabic, the language of Qura n (Islam s Holy Book), it is more simplified [1]. This paper describes the process of building an Arabic speech corpus, with the aim of collecting large amounts of Arabic speech data for research purposes. The availability of such corpora without copyrights restrictions is important for Arab and non-arab researchers who are looking for speech resources of contemporary Arabic. MSA was the language of choice to represent the literary standard across the Arab world, while the existing of an Arabic corpora that are specified for speech synthesis and recognition research purposes is still insufficient. Despite the existence of several Arabic speech corpora developed in the last few years, a diverse fully transcribed and segmented MSA speech corpus is a necessity for Text-To-Speech (TTS) and Automatic Speech Recognition (ASR) research and development. The presented corpus was designed to contain a large selection of text collected from different modern text resources. These resources were chosen to capture the phonetic distribution of the Arabic language. Although the overwhelming majority of the corpus is in MSA form, a designated subcategory was recorded in traditional or classical Arabic. The data was then revised and manually diacritized (Tashkeel) by linguists. Afterward, the voice and pitch of the speaker were captured using specialized equipments and recorded in high quality formats by a professional speaker. The transcriptions were revised again to match the pronunciation of the spoken records. The recordings were then divided into phrases by each sentence s pause and stored in separate files. Finally, the data was phonetically segmented and labeled in preparation for TTS and ASR usages. The paper is organized as follows: In section 2, we present and discuss some related work. Section 3 outlines the constitution of the corpus and section 4 will describe the recording procedure. Then, the corpus evaluation procedure will be described in section 5. Finally, in section 6 we conclude and discuss the availability of the dataset. 2. Related Work The quality of speech synthesis systems highly depends on the corpus s phoneme set distribution. According to [1], an overlook on the existing Arabic speech datasets shows the lack of available public domain, raw-data and single speaker MSA corpus. Similar corpora are either copyrighted, accented [2] [3] and or of a multi-speaker [4]. Most available datasets are designed for speech recognition proposes, whereas TTS systems mostly rely on commercial corpora or record the designated speech database according to the system s needs [5]. The Linguistics Data Consortium (LCD) has a collection of MSA and dialectical Arabic speech and text datasets for research and development purposes. The content varies between annotated news, conversational telephone speech [6] [7] and others; scripted and unscripted while recorded in different conditions. On the other hand, Speech Ocean offers an Arabic speech synthesis database (King-TTS-004) recorded in a studio [8]; It contains a single speaker recordings by a native professional broadcaster with sampling rate of 16kHz and two channels for speech and Electro Glotto Graph (EGG) signals. It includes 12 hours of pure recording time of 3930 sentences with sub database for currency, time, numbers, English alphabets and so on. However, these databases are licensed and can only be used for a fee. Access to fully available, transcribed and segmented speech corpora is essential in speech and language research. As an example, [9] built 7 speech databases of 7 Indian languages and was released without restriction for commercial and noncommercial usage. The text was selected from Wikipedia articles in Indian languages due to the lack of public domain text corpora. Afterwards, a set of 1000 phonetically balanced sentences were nominated using Festvox script that applies certain 249
2 I. Almosallam, A. Alkhalifa, M. Alghamdi, M. Alkanhal, A. Alkhairy criteria to achieve the optimal selection. The data was recorded by native speakers of these languages in a professional recording studio, and the transcription was altered to accommodate any mistakes were done while recording. The databases were between one to two hours long, and the recordings were automatically segmented afterward using Zero Frequency Filtering (ZFF) technique. Another important source of data in speech and language research is the Electro Glotto Graph signal or (EGG). EGG, also known as Laryngograph signal, is the measurement of vocal folds vibrations during speech. These measurements are sensed by electrodes attached to the speaker s throat. It is important in aiding medical research such as voice disorders or laryngeal mechanism. Moreover, in NLP it plays an important role in speech recognition and synthesis. For example, it can act as an additional source of information in word recognition [10] or as a mean of detecting glottal pulses (pitch-marks) to assure that the synthesized speech is in a consistent matter. Likewise, [11] introduced a high-quality, Romanian speech corpus called RSS. It is also available for academic uses to help promote Romanian speech technology research. The data was recorded at 96 khz sampling frequency then down-sampled to 48 khz based on an over-sampling method with the intention of noise reduction. However, the total length of the recorded data is only about 3.5 hours based on 3500 sentences. Additionally, the article discussed the effect of sampling frequency on the speaker similarity of synthesized voices. It found that the use of lower sampling frequencies such as 16kHz lowers the similarity of the synthesized voice to the original speaker Text Selection 3. Constitution of the Corpus The text corpus included a variety of genres and writing styles from multiple sources to insure a representative sample. Furthermore, diversity of pronunciation was also taken into consideration during the text selection process to capture the entire phoneme spectrum that exist in the Arabic language. The corpus is divided into sub-categories to capture a wide variety of vocabulary and to enable users of the data set to perhaps extract exact relevant phrases if necessary. For example, the use of numbers, date and time are very common in most applications. The dataset is divided into nine such categories: 1. Dates and Time: Phrases about time such as dates, time of the day, days of the week and months of the year, In both Hijri and Gregorian calendars. 2. Numbers: The speaker uttered several numbers in isolations as well as in combined forms. For example the number 123 was pronounced as (one two three) and as (one hundred twenty three). 3. Financial: Phrases about money and currency. 4. Customer Service: Phrases common in IVR systems (question and answer format). 5. Names: A list of common Arabic names (first, middle and last). 6. Story: Excerpts from novels and stories, 7. Traditional: Text selected from classic sources such as old books. 8. Miscellaneous: A collection of phrases from various domains and writing styles Total Unique Median Syllables 333, Tri-phones 324,225 8,689 7 Bi-phones 335,769 1, Mono-phones 347, ,651.5 Table 3: Syllables, Tri-phones, Bi-phones and Mono-phones frequencies 9. News: Excerpts from local and foreign news The text corpus consists of 51,432 words, amongst which 21,556 were unique and required 7 hours and 20 minutes of audio recording. It contained 627 unique syllables out of a total of 333,981 syllables. The break down by category is provided in table 1. All of the above mentioned categories were recorded in a normal tone of voice. However, a part of the miscellaneous text was recorded using different expressions such as sadness, joy, surprised and questioned for 15 minute each Transcription A professional linguist was closely involved during both corpus selection and recording to verify the quality of the text and pronunciation. Furthermore, unlike most other languages, Arabic can only be unambiguously pronounced when diacritized. Due to this problem, most applications rely on automatic Arabic diacritization as a pre-processing step to transcription [12]. Therefore, the text was manually diacritized and revised to insure unambiguity. An automatic text-to-transcription algorithm was used to transcribe the text [13]. However, the entire corpus was revised manually. The total number of phonemes used to transcribe the corpus is 38, shown in table 2. The Statistics about the phonemes and syllables distribution in the corpus is provided in table 3. Furthermore, the frequency distribution for the phonemes are provided in figure Speaker selection 4. Recording Procedure Due to the nature and size of the corpus, a professional speaker who can maintain his performance for long recording sessions was hired to record the corpus. Although diacritization reduces ambiguity, it makes the reading experience more challenging. Diacritized text forces the reader to pay closer attention to the symbols at the character level which slows down the reading process and makes it more difficult. In normal situations, readers would predict the words they are reading based on the content and experience. Ten candidates were screened and tested to insure their proficiency in the Arabic language, clarity of pronunciation and voice. The panel of linguists voted on the first two criteria while the pleasantness of the voice was aided by online voting. The speaker, selected based on the previous criteria, is a professional male news anchor who has worked in several TV programs and has experience in recording poetry books Recording environment The recording sessions took place in a sound proof studio to minimize undesired noise. Both the speech and the EGG signals were recorded and stored synchronized in a stereo wave file (left=speech, right=egg) using a 96 khz sampling rate and a 16-bit resolution. The intensity of the signal was carefully monitored to remain equal across different sessions by request- 250
3 Number of Words Unique Words Recording Time Number of Utterances Dates & Time :09: Numbers :13: Financial :21: Customer Service :22: Names :43: Story :44: Traditional :10: Miscellaneous :45: News :44: Full Corpus :20: Table 1: General statistics about the text and audio corpora in each sub-category Symbol Phone Symbol Phone Symbol Phone Symbol Phone a (ف ة) gh غ m م T ط A (ف ة مف مة) h ه n ن th ث aa ا i (كسرة) pau pause TH ظ Aa (م مف م) ا I ي q ق u (ضمة) Ah ح j ج r ر U و Az ذ Jn ء R ر w و b ب JU ع s س y ي d د k ك S ص z ز D ض kx خ sh ش f ف l ل t ت Table 2: Phonemes and their corresponding labels in the transcription files ing the speaker to repeat a phrase and compare it with a reference phrase recorded at the first session. The positions of the microphone, the speaker and the EGG electrodes were adjusted in every session to match the reference phrase. A linguist was present to follow the speaker and correct him in case he skipped, mispronounced or could not recognize a word. Each session was 15 minutes long and no more than four sessions per day were taken Editing and preparation The recorded data was further edited to remove unwanted segments such as errors in pronunciation, long pauses, etc. The sessions were then split based on pauses and the EGG signals were exported into separate files. The corpus is divided into utterances by pauses and an HMM aligner was built to label and segment each utterance on the phoneme level. The Cambridge University HMM toolkit was used to build the speakerspecific aligner [14]. The phoneme symbols, the beginning and the ending timestamps are provided in the transcription files. Although the corpus was segmented and aligned automatically by an HMM system, over half an hour of speech was manually aligned by computational linguists. The final corpus is divided into four sets of files: 1. Wave: The speech waveform for each utterance in 96 khz and 16-bit resolution. 2. EGG: The corresponding EGG signal in 96 khz and 16- bit resolution. 3. Text: The corresponding diacritized text. 4. Label: The corresponding transcription along with the phoneme boundary segmentation. The above set of files are provided for each expressive version and each utterance was labeled by their corresponding category discussed in section 3. The labels are provided in two formats (mono and full) in accordance with the HTS standard format 1. The mono labels provide only the mono-phoneme sequence along with the beginning and ending timestamps. The full labels however provide the penta-phone sequence, each phone with its two proceeding and two preceding phonemes. Moreover, they are more detailed and provide more information such as the number of phonemes, syllables and words in each utterance. The position and number of phonemes, syllables and words are also indicated in the full label file. In the case of syllables, the files also indicate whether a syllable is stressed or accented. It is worth mentioning here that there are two types of syllables in the Arabic language: open syllables CV and CV:, and closed syllables CVC, CV:C and CVCC. Where C stands for consonant, V for vowel and V: for long vowel. 5. Corpus Evaluation In order to evaluate the quality of the final dataset, the corpus was used to build a text-to-speech model. This model will put to test various quality aspects of the data, such as: size, structure, format and consistency. This section will describe briefly the TTS model generation and output evaluation procedures. It is important to emphasize that the purpose of the test and evaluation is not to compete with other Arabic TTS systems but to evaluate the dataset and to illustrate its usage. The HMM-based Speech Synthesis System (HTS) was used to build the model. HTS is a free tool to build HMM-based TTS, which is basically a patch code for the HTK. It provides scripts and code that uses HTK as well as other free speech tools to train TTS models. The version used to evaluate the dataset in this paper was HTS 2.2. The format of the dataset was de- 1 HMM-based Speech Synthesis System (HTS): 251
4 I. Almosallam, A. Alkhalifa, M. Alghamdi, M. Alkanhal, A. Alkhairy Figure 1: Distribution of the Arabic phonemes in the corpus signed to be compatible with HTS and the default parameters were tuned for the data. The training procedure completed successfully and five sentences were synthesised to be evaluated. Two well known metrics were measured to evaluate the synthesized speech, namely naturalness and intelligibility. Some of the group members as well as other researchers (total of 10) were asked to listen to the five generated samples and provide their evaluation on each one. Evaluators were asked individually to write down what they have heard, and to rate each audio clip (on a scale from 1 to 5) based on the following qualities: 1. Naturalness: How close was the synthesised speech to being natural? (1: Very robotic sound - 5: Close to natural ) 2. Intelligibility: How much effort was taken to understand the content of the sample? (1: Had to focus - 5: Easy to understand ) The average naturalness score was 3.58 and the average intelligibility score was 3.9. The original sentences were compared to what the evaluators have written and the worderror rate was 2.13%. The numbers suggest that the performance of the TTS system on the provided corpus was good on naturalness and very good on intelligibility. The synthesized samples used in the evaluation process can be found at SASSC as large repository for Arabic speech, provides several avenues for future work. These may include a study on the minimum amount of speech required to produce an acceptable voice. Invest more in recording different speech expressions and evaluate their results. Moreover, linguists can find the corpus as a resource for studying different Arabic structures and pronunciations of MSA. In order to promote and facilitate research and development for Arabic language in the NLP field, the corpus will be freely available for research and educational purposes only, a permission has to be obtained from KACST for any commercial or non-academic use of the corpus. The dataset is available at 7. Acknowledgment The authors would like to acknowledge and thank Dr. Mohsen Rashwan, Dr. Sherif Abdou, Dr. Ahmad Ragheb and Mostafa Shahin from Research and Development International (RDI) at Cairo, Egypt for providing their help and expertise in various phases during the corpus collection and recording procedures Conclusion and Future Work In this paper, the processes for collecting and recording the SASSC dataset was described. The text was designed to capture a wide variety of usages and as much of the Arabic phonetic spectrum as possible. The text is fully diacratized, transcribed and was recorded in a high quality format in a soundproof studio by a professional speaker along with the pitch signal. Moreover, a TTS system based on the corpus has been evaluated which achieved promising results.
5 8. References [1] N. Habash, Introduction to Arabic Natural Language Processing. Morgan and Claypool Publishers, [2] M. Alghamdi, F. Alhargan, M. Alkanhal, A. Alkhairy, M. Eldesouki, and A. Alenazi, Saudi accented arabic voice bank, Experimental Linguistics ExLing, p. 9, [3] G. Droua-Hamdani, S. A. Selouani, and M. Boudraa, Algerian arabic speech database (ALGASD): Corpus design and automatic speech recognition application, Arabian Journal for Science and Engineering, vol. 35, no. 2C, p. 158, [4] M. Algamdi, KACST Arabic phonetics database, in The 15th International Congress of Phonetics Science, 2003, pp [5] F. Chouireb and M. Guerti, Towards a high quality arabic speech synthesis system based on neural networks and residual excited vocal tract model, Signal, Image and Video Processing, vol. 2, no. 1, pp , [6] M. Maamouri, D. Graff, and C. Cieri, Arabic broadcast news transcripts, LDC - Linguistic Data Consortium, December [Online]. Available: edu/catalog/catalogentry.jsp?catalogid=ldc2006t20 [7] Appen-Ltd, Levantine Arabic conversational telephone speech, transcripts, LDC - Linguistic Data Consortium., Jan [Online]. Available: Catalog/CatalogEntry.jsp?catalogId=LDC2007T01 [8] SpeechOcean, Arabic speech synthesis database I (Male), April [Online]. Available: speechocean.com/en-tts-corpora/371.html [9] K. Prahallad, N. Kumar, V. Keri, R. S, and A. W. Black, The IIIT-H indic speech databases, in INTERSPEECH, [10] P. S. Dikshit and R. W. Schubert, Electroglottograph as an additional source of information in isolated word recognition, in Proceedings of the Fourteenth Southern Biomedical Engineering Conference, 1995, pp [11] A. Stan, J. Yamagishi, S. King, and M. Aylett, The Romanian speech synthesis (RSS) corpus: building a high quality HMM-based speech synthesis system using a high sampling rate, Speech Communication, vol. 53, no. 3, pp , [12] M. Rashwan, M. Al-Badrashiny, M. Attia, S. Abdou, and A. Rafea, A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features, IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp , [13] M. Attia, Theory and implementation of a large-scale arabic phonetic transcriptor, and applications, Ph.D. dissertation, Dept. of Electronics and Electrical Communications, Cairo University, Cairo, Egypt, [14] S. Young, G. Evermann, M. Gales, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The htk book version 3.4, Cambridge University Engineering Department,
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus
Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus Yousef Ajami Alotaibi 1, Mansour Alghamdi 2, and Fahad Alotaiby 3 1 Computer Engineering Department, King Saud University,
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]
An Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
Thirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis
Experiments with Signal-Driven Symbolic Prosody for Statistical Parametric Speech Synthesis Fabio Tesser, Giacomo Sommavilla, Giulio Paci, Piero Cosi Institute of Cognitive Sciences and Technologies, National
PHONETIC TOOL FOR THE TUNISIAN ARABIC
PHONETIC TOOL FOR THE TUNISIAN ARABIC Abir Masmoudi 1,2, Yannick Estève 1, Mariem Ellouze Khmekhem 2, Fethi Bougares 1, Lamia Hadrich Belguith 2 (1) LIUM, University of Maine, France (2) ANLP Research
Text-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
Things to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
Carla Simões, [email protected]. Speech Analysis and Transcription Software
Carla Simões, [email protected] Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
Robust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist ([email protected]) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
Verification of Correct Pronunciation. of Mexican Spanish using Speech Technology 1
Verification of Correct Pronunciation of Mexican Spanish using Speech Technology 1 Ingrid Kirschning and Nancy Aguas Tlatoa Speech Processing Group, ICT, CENTIA 2, Universidad de las Américas- Puebla.
Pronunciation in English
The Electronic Journal for English as a Second Language Pronunciation in English March 2013 Volume 16, Number 4 Title Level Publisher Type of product Minimum Hardware Requirements Software Requirements
Efficient diphone database creation for MBROLA, a multilingual speech synthesiser
Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models
Problems and Prospects in Collection of Spoken Language Data
Problems and Prospects in Collection of Spoken Language Data Kishore Prahallad+*, Suryakanth V Gangashetty*, B. Yegnanarayana*, D. Raj Reddy+ *Language Technologies Research Center (LTRC) International
Generating Training Data for Medical Dictations
Generating Training Data for Medical Dictations Sergey Pakhomov University of Minnesota, MN [email protected] Michael Schonwetter Linguistech Consortium, NJ [email protected] Joan Bachenko
Information Leakage in Encrypted Network Traffic
Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)
Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System
Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Arun Soman, Sachin Kumar S., Hemanth V. K., M. Sabarimalai Manikandan, K. P. Soman Centre for Excellence in Computational
Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
Thai Pronunciation and Phonetic Symbols Prawet Jantharat Ed.D.
Thai Pronunciation and Phonetic Symbols Prawet Jantharat Ed.D. This guideline contains a number of things concerning the pronunciation of Thai. Thai writing system is a non-roman alphabet system. This
Myanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
Develop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
Study Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
Improving Automatic Forced Alignment for Dysarthric Speech Transcription
Improving Automatic Forced Alignment for Dysarthric Speech Transcription Yu Ting Yeung 2, Ka Ho Wong 1, Helen Meng 1,2 1 Human-Computer Communications Laboratory, Department of Systems Engineering and
CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages 8-10. CHARTE NIVEAU B2 Pages 11-14
CHARTES D'ANGLAIS SOMMAIRE CHARTE NIVEAU A1 Pages 2-4 CHARTE NIVEAU A2 Pages 5-7 CHARTE NIVEAU B1 Pages 8-10 CHARTE NIVEAU B2 Pages 11-14 CHARTE NIVEAU C1 Pages 15-17 MAJ, le 11 juin 2014 A1 Skills-based
Speech Analytics. Whitepaper
Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the
Reading Assistant: Technology for Guided Oral Reading
A Scientific Learning Whitepaper 300 Frank H. Ogawa Plaza, Ste. 600 Oakland, CA 94612 888-358-0212 www.scilearn.com Reading Assistant: Technology for Guided Oral Reading Valerie Beattie, Ph.D. Director
Analysis and Synthesis of Hypo and Hyperarticulated Speech
Analysis and Synthesis of and articulated Speech Benjamin Picart, Thomas Drugman, Thierry Dutoit TCTS Lab, Faculté Polytechnique (FPMs), University of Mons (UMons), Belgium {benjamin.picart,thomas.drugman,thierry.dutoit}@umons.ac.be
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content
The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content Omar F. Zaidan and Chris Callison-Burch Dept. of Computer Science, Johns Hopkins University Baltimore,
Arabic Language Resources and Tools for Speech and Natural Language
Arabic Language Resources and Tools for Speech and Natural Language Mansour Alghamdi*, Chafic Mokbel** and Mohamed Mrayati*** *King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia. [email protected];
Annotation Pro Software Speech signal visualisation, part 1
Annotation Pro Software Speech signal visualisation, part 1 [email protected] katarzyna.klessa.pl Katarzyna Klessa ` Topics of the class 1. Introduction: annotation of speech recordings 2. Annotation Pro
A CHINESE SPEECH DATA WAREHOUSE
A CHINESE SPEECH DATA WAREHOUSE LUK Wing-Pong, Robert and CHENG Chung-Keng Department of Computing, Hong Kong Polytechnic University Tel: 2766 5143, FAX: 2774 0842, E-mail: {csrluk,cskcheng}@comp.polyu.edu.hk
INTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS
INTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS By Dr. Kay MacPhee President/Founder Ooka Island, Inc. 1 Integrating the Common Core Standards into Interactive, Online
Indiana Department of Education
GRADE 1 READING Guiding Principle: Students read a wide range of fiction, nonfiction, classic, and contemporary works, to build an understanding of texts, of themselves, and of the cultures of the United
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Anastasis Kounoudes 1, Anixi Antonakoudi 1, Vasilis Kekatos 2 1 The Philips College, Computing and Information Systems
Thai Language Self Assessment
The following are can do statements in four skills: Listening, Speaking, Reading and Writing. Put a in front of each description that applies to your current Thai proficiency (.i.e. what you can do with
Grade 1 LA. 1. 1. 1. 1. Subject Grade Strand Standard Benchmark. Florida K-12 Reading and Language Arts Standards 27
Grade 1 LA. 1. 1. 1. 1 Subject Grade Strand Standard Benchmark Florida K-12 Reading and Language Arts Standards 27 Grade 1: Reading Process Concepts of Print Standard: The student demonstrates knowledge
Speech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
How To Read With A Book
Behaviors to Notice Teach Level A/B (Fountas and Pinnell) - DRA 1/2 - NYC ECLAS 2 Solving Words - Locates known word(s) in. Analyzes words from left to right, using knowledge of sound/letter relationships
CIVIL Corpus: Voice Quality for Forensic Speaker Comparison
CIVIL Corpus: Voice Quality for Forensic Speaker Comparison Eugenia San Segundo Helena Alves Marianela Fernández Trinidad Phonetics Lab. CSIC CILC2013 Alicante 15 Marzo 2013 CIVIL Project Cualidad Individual
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
Tools & Resources for Visualising Conversational-Speech Interaction
Tools & Resources for Visualising Conversational-Speech Interaction Nick Campbell NiCT/ATR-SLC Keihanna Science City, Kyoto, Japan. [email protected] Preamble large corpus data examples new stuff conclusion
LMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH
LMELECTURES: A MULTIMEDIA CORPUS OF ACADEMIC SPOKEN ENGLISH K. Riedhammer, M. Gropp, T. Bocklet, F. Hönig, E. Nöth, S. Steidl Pattern Recognition Lab, University of Erlangen-Nuremberg, GERMANY [email protected]
Lecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
Study of Humanoid Robot Voice Q & A System Based on Cloud
Send Orders for Reprints to [email protected] 1014 The Open Cybernetics & Systemics Journal, 2015, 9, 1014-1018 Open Access Study of Humanoid Robot Voice Q & A System Based on Cloud Tong Chunya
Dr. Wei Wei Royal Melbourne Institute of Technology, Vietnam Campus January 2013
Research Summary: Can integrated skills tasks change students use of learning strategies and materials? A case study using PTE Academic integrated skills items Dr. Wei Wei Royal Melbourne Institute of
Advice for Class Teachers. Moderating pupils reading at P 4 NC Level 1
Advice for Class Teachers Moderating pupils reading at P 4 NC Level 1 Exemplars of writing at P Scales and into National Curriculum levels. The purpose of this document is to provide guidance for class
Speech Technology Project 2004 Building an HMM Speech Recogniser for Dutch
Speech Technology Project 2004 Building an HMM Speech Recogniser for Dutch Frans Adriaans Markus Heukelom Marijn Koolen Tom Lentz Ork de Rooij Daan Vreeswijk Supervision: Rob van Son 9th July 2004 Contents
The Pitch-Tracking Database from Graz University of Technology
The Pitch-Tracking Database from Graz University of Technology Author: Gregor Pirker, Michael Wohlmayr, Stefan Petrik, Franz Pernkopf Date: Graz, August 22, 2012 Rev.: alpha 1.1 Abstract The Pitch Tracking
Proceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production
Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics
Unlocking Value from Patanjali V, Lead Data Scientist, Anand B, Director Analytics Consulting, EXECUTIVE SUMMARY Today a lot of unstructured data is being generated in the form of text, images, videos
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
3-2 Language Learning in the 21st Century
3-2 Language Learning in the 21st Century Advanced Telecommunications Research Institute International With the globalization of human activities, it has become increasingly important to improve one s
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects David Graff, Mohamed Maamouri Linguistic Data Consortium University of Pennsylvania E-mail: [email protected], [email protected]
Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6
Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6 4 I. READING AND LITERATURE A. Word Recognition, Analysis, and Fluency The student
A secure face tracking system
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking
Common European Framework of Reference for Languages: learning, teaching, assessment. Table 1. Common Reference Levels: global scale
Common European Framework of Reference for Languages: learning, teaching, assessment Table 1. Common Reference Levels: global scale C2 Can understand with ease virtually everything heard or read. Can summarise
Longman English Interactive
Longman English Interactive Level 2 Orientation (English version) Quick Start 2 Microphone for Speaking Activities 2 Translation Setting 3 Goals and Course Organization 4 What is Longman English Interactive?
MODERN WRITTEN ARABIC. Volume I. Hosted for free on livelingua.com
MODERN WRITTEN ARABIC Volume I Hosted for free on livelingua.com TABLE OF CcmmTs PREFACE. Page iii INTRODUCTICN vi Lesson 1 1 2.6 3 14 4 5 6 22 30.45 7 55 8 61 9 69 10 11 12 13 96 107 118 14 134 15 16
Specialty Answering Service. All rights reserved.
0 Contents 1 Introduction... 2 1.1 Types of Dialog Systems... 2 2 Dialog Systems in Contact Centers... 4 2.1 Automated Call Centers... 4 3 History... 3 4 Designing Interactive Dialogs with Structured Data...
Understanding Impaired Speech. Kobi Calev, Morris Alper January 2016 Voiceitt
Understanding Impaired Speech Kobi Calev, Morris Alper January 2016 Voiceitt Our Problem Domain We deal with phonological disorders They may be either - resonance or phonation - physiological or neural
The LENA TM Language Environment Analysis System:
FOUNDATION The LENA TM Language Environment Analysis System: The Interpreted Time Segments (ITS) File Dongxin Xu, Umit Yapanel, Sharmi Gray, & Charles T. Baer LENA Foundation, Boulder, CO LTR-04-2 September
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*
Reading Competencies
Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies
C E D A T 8 5. Innovating services and technologies for speech content management
C E D A T 8 5 Innovating services and technologies for speech content management Company profile 25 years experience in the market of transcription/reporting services; Cedat 85 Group: Cedat 85 srl Subtitle
Figure1. Acoustic feedback in packet based video conferencing system
Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents
Effect of Captioning Lecture Videos For Learning in Foreign Language 外 国 語 ( 英 語 ) 講 義 映 像 に 対 する 字 幕 提 示 の 理 解 度 効 果
Effect of Captioning Lecture Videos For Learning in Foreign Language VERI FERDIANSYAH 1 SEIICHI NAKAGAWA 1 Toyohashi University of Technology, Tenpaku-cho, Toyohashi 441-858 Japan E-mail: {veri, nakagawa}@slp.cs.tut.ac.jp
Points of Interference in Learning English as a Second Language
Points of Interference in Learning English as a Second Language Tone Spanish: In both English and Spanish there are four tone levels, but Spanish speaker use only the three lower pitch tones, except when
Colaboradores: Contreras Terreros Diana Ivette Alumna LELI N de cuenta: 191351. Ramírez Gómez Roberto Egresado Programa Recuperación de pasantía.
Nombre del autor: Maestra Bertha Guadalupe Paredes Zepeda. [email protected] Colaboradores: Contreras Terreros Diana Ivette Alumna LELI N de cuenta: 191351. Ramírez Gómez Roberto Egresado Programa
Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
Lecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed
MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music
ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final
(in Speak Out! 23: 19-25)
1 A SMALL-SCALE INVESTIGATION INTO THE INTELLIGIBILITY OF THE PRONUNCIATION OF BRAZILIAN INTERMEDIATE STUDENTS (in Speak Out! 23: 19-25) Introduction As a non-native teacher of English as a foreign language
Industry Guidelines on Captioning Television Programs 1 Introduction
Industry Guidelines on Captioning Television Programs 1 Introduction These guidelines address the quality of closed captions on television programs by setting a benchmark for best practice. The guideline
The Use of Software Project Management Tools in Saudi Arabia: An Exploratory Survey
The Use of Software Project Management Tools in Saudi Arabia: An Exploratory Survey Nouf AlMobarak, Rawan AlAbdulrahman, Shahad AlHarbi and Wea am AlRashed Software Engineering Department King Saud University
NICE MULTI-CHANNEL INTERACTION ANALYTICS
NICE MULTI-CHANNEL INTERACTION ANALYTICS Revealing Customer Intent in Contact Center Communications CUSTOMER INTERACTIONS: The LIVE Voice of the Customer Every day, customer service departments handle
Creating voices for the Festival speech synthesis system.
M. Hood Supervised by A. Lobb and S. Bangay G01H0708 Creating voices for the Festival speech synthesis system. Abstract This project focuses primarily on the process of creating a voice for a concatenative
Strand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details
Strand: Reading Literature Key Ideas and Craft and Structure Integration of Knowledge and Ideas RL.K.1. With prompting and support, ask and answer questions about key details in a text RL.K.2. With prompting
CAMBRIDGE FIRST CERTIFICATE Listening and Speaking NEW EDITION. Sue O Connell with Louise Hashemi
CAMBRIDGE FIRST CERTIFICATE SKILLS Series Editor: Sue O Connell CAMBRIDGE FIRST CERTIFICATE Listening and Speaking NEW EDITION Sue O Connell with Louise Hashemi PUBLISHED BY THE PRESS SYNDICATE OF THE
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper
Materials Software Systems Inc (MSSI). Enabling Speech on Touch Tone IVR White Paper Reliable Customer Service and Automation is the key for Success in Hosted Interactive Voice Response Speech Enabled
