EVALUATION OF KANNADA TEXT-TO-SPEECH [KTTS] SYSTEM
|
|
- Magnus Calvin Harrington
- 7 years ago
- Views:
Transcription
1 Volume 2, Issue 1, January 2012 ISSN: X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: EVALUATION OF KANNADA TEXT-TO-SPEECH [KTTS] SYSTEM D.J.Ravi 1 Department of Electronics and Communication 1 Research Scholar, JSS Research Foundation Mysore, India. Sudarshan Patilkulkarni 2 Department of Electronics and Communication 2 Assistant Professor, SJ College of Engg., Mysore, India. Abstract The paper discusses evaluation tests namely Mean Opinion Score (MOS) test - carried out to examine the naturalness of the synthesized output of Kannada Text-To-Speech [KTTS] system, Diagnostic Rhyme Test (DRT) and Comprehension Test (CT) carried out to measure the intelligibility of KTTS system. A comparative study between recorded and synthesized speech carried out using two techniques namely DTW (Dynamic Time Warping) and Linear Predictor Co-efficient (LPC) Spectral Distance to measure how close the synthesized output is to a naturally spoken phrase. The tests helped in concluding that the synthesized speech output of KTTS is almost natural. Keywords-component; Mean Opinion Score (MOS); Diagnostic Rhyme Test (DRT); Comprehension Test (CT); Dynamic Time Warping (DWT); Linear Predictor Co-efficient (LPC); Kannada Text-To-Speech [KTTS] System. I. EVALUATION TESTS The basic criteria for measuring the performance of a TTS system can be listed as the similarity to the human voice (naturalness) and the ability to be understood (intelligibility). The ideal speech synthesizer is both natural and intelligible, or at least try to maximize both characteristics. Therefore, the aim of TTS is also determined as to synthesize the speeches in accordance with natural human speech and clarify the sounds as much as possible. For overall quality evaluation, the International Telecommunication Union (ITU) recommends a specific method that, in this author s opinion, are suitable for also testing naturalness (ITU-T Recommendation P ) [1,2]. Several methods have been developed to evaluate the overall quality or acceptability of synthetic speech [3,4]. Diagnostic Rhyme Test (DRT), Comprehension Test (CT) and Mean Opinion Score (MOS) are the most frequently used techniques for the evaluation of the naturalness and the intelligibility of TTS systems. Naturalness and intelligibility of the Kannada TTS system is tested by MOS and CT-DRT respectively. II. MEAN OPINION SCORE [MOS] The study was tested by making use of the Mean Opinion Score (MOS). The MOS that is expressed as a single number in the range 1 to 5, where 1 is lowest perceived quality and 5 is the highest perceived quality [3]. MOS tests for voice are specified by ITU-T recommendation. The MOS is generated by averaging the results of a set of standard, subjective tests where a number of listeners rate the perceived audio quality of test sentences read aloud by both male and female speakers over the communications medium being tested. A listener is required to give each sentence a rating using the rating scheme in Table I. The perceptual score of the method MOS is calculated by taking the mean of the all scores of each sentence. TABLE I. MEAN OPINION SCORE In the context of this study, 10 sentences that are provided in Table II are used for tests and 25 native Kannada speakers employed in the evaluation. For each sentence, MOS ranges that are assigned by the listeners are shown in Table III. Distribution of MOS values for test sentences by testers and average MOS values for each sentence are given in Figure 1 and Figure 2 respectively. TABLE II. TEST SENTENCES
2 TABLE III. MOS RANGE AND SCORES FOR EACH SENTENCE III. MEAN OPINION SCORE OF SYNTHESIZED EMOTIONAL SPEECH Quality of the speech is subjective in nature, speech which appears good to one person may not appear good to other, and hence it was decided to collect Mean Opinion Score (MOS) about the speech quality [3]. Some sentences neutral and emotional were synthesized. These synthesized speech were made to hear by 25 randomly selected people. The method of synthesis was not revealed to them, they were asked to assign a score on the scale of 0 to 10 for neutral as well as emotional speech. The results are tabulated in Table IV. Distribution of MOS percentage for Test Sentences in different emotions is shown in Figure 3. These results show that the synthesized speech quality is very high. TABLE IV. EMOTION RECOGNITION RATE OF SYNTHESIZED SPEECH IN % (MEAN OPINION SCORE) Figure 3. Distribution of MOS percentage for Test Sentences in different Emotions Figure 1. Distribution of MOS values for test sentences assigned by testers Figure 2. Distribution Average MOS values for each sentence and system average IV. COMPREHENSION TEST In the comprehension tests, a subject hears a few sentences or paragraphs and answers the questions about the content of the text, so some of the items may be missed [4]. It is not important to recognize one single phoneme, but the intended meaning. If the meaning of the sentence is understood, the 100% segmental intelligibility is not crucial for text comprehension and sometimes even long sections may be missed. In the comprehension tests three subtests are applied. In all three cases, the testers are allowed to listen to the sentences twice. In the majority of the tests, success is achieved for the first listening trial, and second one also improves the results.
3 First comprehension subtest has 5 sentences and 5 questions that are shown Table V about the content. Listeners answer the question about content of each sentence. First listening trial accuracy is calculated as the ratio of number of correct answers given by the testers to the whole set of correct answer as (T/N=118/125) =0.944 and second trial accuracy is obtained as 1. Second subtest is about answering common questions. It contains 5 sentences, S1-S5. Listeners answer the questions. Question sentences and number of correct answers at first and second listening are shown in Table VI. The results indicate that the understandability of the system is very high. Additionally, the accuracies of the first and second listening trials are 1. Third subtest is applied as a filling in the blanks test. There are 5 noun phrases, one word of the phrase is provided to the listener and other is left as blank. The testers listened to the speech and filled in the blanks. Noun phrases and number of correct answers at first and second listening trial are shown in Table VII. The words that are underlined and given in capital letters are the blanks in the test. The accuracies of the first and second trial are 0.96 and 1, respectively by achieving a high understandability rate. TABLE V. LISTENING COMPREHENSION V. DIAGNOSTIC RHYME TEST Diagnostic rhyme test (DRT) [4] is an American National Standards Institute (ANSI) standard for measuring speech intelligibility (ANSI S ). DRT, introduced by Fairbanks in 1958, uses a set of isolated words to test for consonant intelligibility in initial position. DRT is used how the initial consonant is recognized properly. In the DRT test of the current system, the consonants that are similar to each other are selected and the listeners are asked to distinguish the correct consonant among the similar sounding alternatives [7]. The letters that have the same way out such as b and p are plosive and bilabial consonant and can be easily misunderstood. The similar sounding words that are used for DRT and number of correct answers for the first and the second listening trials are shown in Table VIII. Bold words indicate the correct answers. Listeners choose one word from the table they hear. The accuracies are calculated above 0.93 and mostly close to 1 especially after the second trial. The summary of accuracies of the CT and DRT tests are given in Table IX. It shows system intelligibility rate very high and satisfactory. TABLE VIII. WORDS FOR DIAGNOSTIC RHYME TEST (DRT) TABLE VI. ANSWERING COMMON QUESTIONS TABLE VII. FILLING IN THE BLANKS TABLE IX. COMPREHENSION TESTS AND DIAGNOSTIC RHYME TEST (DRT) ACCURACY
4 Amplitude Amplitude Volume 2, issue 1, January 2012 Original signals Warped signals VI. RECORDED VERSUS SYNTHESIZED SPEECH A 0.8 signal 1 signal signal 1 signal 2 COMPARATIVE STUDY Analysis is one of the important tasks of any work. It does the evaluation of the work. Analysis helps the developer to know how effective the system is and also helps in understanding the flaws. The results of the analysis can lead the development of the work in a completely new way. Since speech quality is subjective in nature, absolute measurements cannot be made. But it is possible to measure some relative quantities, which indirectly shows the quality of synthesized speech [5,6]. Following are the two analysis methods, which we used in our work. 1. Dynamic Time Warping distance measure 2. Linear Predictor Co-efficient Spectral distance measure Since, Speech quality measurement is relative; it becomes necessary to have a reference object. The reference object in our work is a directly recorded wave. The creator of the database records a phrase, which acts as our reference. The aim of the work is to produce a synthetic wave, which resembles the recorded wave perfectly. The quality of the synthesized wave is said to be very high, if it resembles the recorded wave more. Above mentioned two methods are used to determine resemblance of two waves. A. Dynamic Time Warping (DTW) distance measure A person cannot speak a word twice, exactly same. The rate of speech varies. But the synthesized speech rate does not vary no matter how many times it is synthesized. Finding out Euclidean distance between two waves, of which one is spoken very fast is not possible and unscientific. The Euclidean distance between two similar speech waves, of which one is relatively fast may show complete mismatch. Hence, before calculating distance, two waves must be aligned. This technique is used to find out distances for phrase. Calculating DTW distance for complete phrase requires a large amount of memory in general purpose computer. Hence, it is applied word by word. Table X shows the DTW distances obtained. TABLE X. RESULTS OF DTW ANALYSIS Samples Samples Figure 4. DTW algorithm for Reference and Synthesized wave DTW algorithm is applied for measuring distance between Reference and Synthesized waves. Blue colored waves (Signal 1) represent synthesized waveforms. Before applying DTW algorithm the similarities are hidden. This is illustrated in the graph on the left side of Figure 4. After applying DTW all hidden similarities are visible and it is shown in right side of Figure 4. The distances are tabulated in Table X which clearly shows the distance between Reference and Synthesized are much lesser, which is the desired result. Similar analysis is made for other word. B. Linear Predictor Co-efficient (LPC) Spectral Distance Linear predictive coding (LPC) is a tool used in speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters. The DTW algorithm finds distance between two signals using time domain method. Only point to point distance is calculated in DTW. But in LPC spectral distance measurements, even frequency components are also taken care of. Similar to DTW analysis, LPC spectral envelope is calculated for Recorded Reference and Synthesize waves for complete phrase. Then Euclidean distance is calculated these values are tabulated in Table XI. Figure 5 shows the Linear Predictor Spectral Estimation. The graphs for Recorded and Synthesized waves are almost same, this indicates the frequency contents are matching. The value in the Table 11 for synthesized wave is much nearer to 0. TABLE XI. LPC SPECTRAL DISTANCE
5 Rhyme Test (DRT) and Comprehension Test (CT) are carried out to measure the intelligibility of KTTS system. These evaluations have provided promising results. Further two techniques DTW (Dynamic Time Warping) and Linear Predictor Co-efficient (LPC) Spectral Distance are applied for comparison of synthesized output versus naturally recorded speech of a phrase. Comparison concludes that the distance between recorded (reference) and synthesized are much lesser, which is the desired result. These tests helped in concluding that the synthesized speech output of KTTS is almost natural. Figure 5. LPC Spectral estimation VII. CONCLUSION A number of subjective tests are used to measure the success of KTTS system. Naturalness defined as closeness to human speech and intelligibility defined as the ability to be understood are two measures used to evaluate the performance of any Text-to-speech system. Mean Opinion Score (MOS) test is carried out to examine the naturalness of the synthesized output of KTTS system. Diagnostic REFERENCES [1] Dmitry Sityaev, Katherine Knill and Tina Burrows, Comparison of the ITU-T P.85 Standard to Other Methods for the Evaluation of Text-to-Speech Systems, INTERSPEECH 2006 ICSLP, pp [2] Yolanda Vazquez Alvarez & Mark Huckvale, The Reliability of the ITU-T P.85 Standard for the Evaluation of Text-To-Speech Systems. Department of Phonetics and Linguistics University College London, U.K., 2002, pp [3] Mean opinion score[mos] : wikipedia : Opinion_Score [4] Speech Quality and Evaluation: wikipedia : ap10.html [5] S ergio Paulo, Lu ıs C. Oliveira, DTW-based Phonetic Alignment Using Multiple Acoustic Features, EUROSPEECH 2003 GENEVA, pp [6] Steven Bo Davis and Paul Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences, 1980, pp [7] B.B.Rajapurohit, Acoustic Characteristics of Kannada, CIIL, Mysore.
Thirukkural - A Text-to-Speech Synthesis System
Thirukkural - A Text-to-Speech Synthesis System G. L. Jayavardhana Rama, A. G. Ramakrishnan, M Vijay Venkatesh, R. Murali Shankar Department of Electrical Engg, Indian Institute of Science, Bangalore 560012,
More informationAn Arabic Text-To-Speech System Based on Artificial Neural Networks
Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department
More informationPERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China PERCENTAGE ARTICULATION LOSS OF CONSONANTS IN THE ELEMENTARY SCHOOL CLASSROOMS Dan Wang, Nanjie Yan and Jianxin Peng*
More informationDevelop Software that Speaks and Listens
Develop Software that Speaks and Listens Copyright 2011 Chant Inc. All rights reserved. Chant, SpeechKit, Getting the World Talking with Technology, talking man, and headset are trademarks or registered
More informationMyanmar Continuous Speech Recognition System Based on DTW and HMM
Myanmar Continuous Speech Recognition System Based on DTW and HMM Ingyin Khaing Department of Information and Technology University of Technology (Yatanarpon Cyber City),near Pyin Oo Lwin, Myanmar Abstract-
More informationSpeech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction
: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction Urmila Shrawankar Dept. of Information Technology Govt. Polytechnic, Nagpur Institute Sadar, Nagpur 440001 (INDIA)
More informationTech Note. Introduction. Definition of Call Quality. Contents. Voice Quality Measurement Understanding VoIP Performance. Title Series.
Tech Note Title Series Voice Quality Measurement Understanding VoIP Performance Date January 2005 Overview This tech note describes commonly-used call quality measurement methods, explains the metrics
More informationA Comparison of Speech Coding Algorithms ADPCM vs CELP. Shannon Wichman
A Comparison of Speech Coding Algorithms ADPCM vs CELP Shannon Wichman Department of Electrical Engineering The University of Texas at Dallas Fall 1999 December 8, 1999 1 Abstract Factors serving as constraints
More informationEstablishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
More informationSchneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, 2013. p i.
New York, NY, USA: Basic Books, 2013. p i. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=2 New York, NY, USA: Basic Books, 2013. p ii. http://site.ebrary.com/lib/mcgill/doc?id=10665296&ppg=3 New
More informationA TOOL FOR TEACHING LINEAR PREDICTIVE CODING
A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering
More informationAnalog-to-Digital Voice Encoding
Analog-to-Digital Voice Encoding Basic Voice Encoding: Converting Analog to Digital This topic describes the process of converting analog signals to digital signals. Digitizing Analog Signals 1. Sample
More informationSchool Class Monitoring System Based on Audio Signal Processing
C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.
More informationSOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS
SOME ASPECTS OF ASR TRANSCRIPTION BASED UNSUPERVISED SPEAKER ADAPTATION FOR HMM SPEECH SYNTHESIS Bálint Tóth, Tibor Fegyó, Géza Németh Department of Telecommunications and Media Informatics Budapest University
More informationFigure1. Acoustic feedback in packet based video conferencing system
Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents
More informationReading Competencies
Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies
More informationFrom Concept to Production in Secure Voice Communications
From Concept to Production in Secure Voice Communications Earl E. Swartzlander, Jr. Electrical and Computer Engineering Department University of Texas at Austin Austin, TX 78712 Abstract In the 1970s secure
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationSpeech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationObjective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification
Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification Raphael Ullmann 1,2, Ramya Rasipuram 1, Mathew Magimai.-Doss 1, and Hervé Bourlard 1,2 1 Idiap Research Institute,
More informationCorpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System
Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System Arun Soman, Sachin Kumar S., Hemanth V. K., M. Sabarimalai Manikandan, K. P. Soman Centre for Excellence in Computational
More informationObjective Speech Quality Measures for Internet Telephony
Objective Speech Quality Measures for Internet Telephony Timothy A. Hall National Institute of Standards and Technology 100 Bureau Drive, STOP 8920 Gaithersburg, MD 20899-8920 ABSTRACT Measuring voice
More informationPacket Loss Concealment Algorithm for VoIP Transmission in Unreliable Networks
Packet Loss Concealment Algorithm for VoIP Transmission in Unreliable Networks Artur Janicki, Bartłomiej KsięŜak Institute of Telecommunications, Warsaw University of Technology E-mail: ajanicki@tele.pw.edu.pl,
More informationDigital Speech Coding
Digital Speech Processing David Tipper Associate Professor Graduate Program of Telecommunications and Networking University of Pittsburgh Telcom 2720 Slides 7 http://www.sis.pitt.edu/~dtipper/tipper.html
More informationApplication Note. Introduction. Definition of Call Quality. Contents. Voice Quality Measurement. Series. Overview
Application Note Title Series Date Nov 2014 Overview Voice Quality Measurement Voice over IP Performance Management This Application Note describes commonlyused call quality measurement methods, explains
More informationFormant Bandwidth and Resilience of Speech to Noise
Formant Bandwidth and Resilience of Speech to Noise Master Thesis Leny Vinceslas August 5, 211 Internship for the ATIAM Master s degree ENS - Laboratoire Psychologie de la Perception - Hearing Group Supervised
More informationTHE MEASUREMENT OF SPEECH INTELLIGIBILITY
THE MEASUREMENT OF SPEECH INTELLIGIBILITY Herman J.M. Steeneken TNO Human Factors, Soesterberg, the Netherlands 1. INTRODUCTION The draft version of the new ISO 9921 standard on the Assessment of Speech
More informationTEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE Sangam P. Borkar M.E. (Electronics)Dissertation Guided by Prof. S. P. Patil Head of Electronics Department Rajarambapu Institute of Technology Sakharale,
More informationMembering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN
PAGE 30 Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN Sung-Joon Park, Kyung-Ae Jang, Jae-In Kim, Myoung-Wan Koo, Chu-Shik Jhon Service Development Laboratory, KT,
More informationSubjective SNR measure for quality assessment of. speech coders \A cross language study
Subjective SNR measure for quality assessment of speech coders \A cross language study Mamoru Nakatsui and Hideki Noda Communications Research Laboratory, Ministry of Posts and Telecommunications, 4-2-1,
More informationFOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM
FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS Fanbo Xiang UIUC Physics 193 POM Professor Steven M. Errede Fall 2014 1 Introduction Chords, an essential part of music, have long been analyzed. Different
More informationEdith Gotesman, PhD Ashkelon Academic College Levinsky College of Education
Edith Gotesman, PhD Ashkelon Academic College Levinsky College of Education Technological literacy allows a person to function efficiently in a global, computerized and competitive environment. Students
More informationTrigonometric functions and sound
Trigonometric functions and sound The sounds we hear are caused by vibrations that send pressure waves through the air. Our ears respond to these pressure waves and signal the brain about their amplitude
More informationApplication Note. Using PESQ to Test a VoIP Network. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN
Using PESQ to Test a VoIP Network Application Note Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: info@psytechnics.com
More informationCreating voices for the Festival speech synthesis system.
M. Hood Supervised by A. Lobb and S. Bangay G01H0708 Creating voices for the Festival speech synthesis system. Abstract This project focuses primarily on the process of creating a voice for a concatenative
More informationCBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC
CBS RECORDS PROFESSIONAL SERIES CBS RECORDS CD-1 STANDARD TEST DISC 1. INTRODUCTION The CBS Records CD-1 Test Disc is a highly accurate signal source specifically designed for those interested in making
More informationComputer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction
Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper
More informationAN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD INTRODUCTION CEA-2010 (ANSI) TEST PROCEDURE
AUDIOMATICA AN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD by Daniele Ponteggia - dp@audiomatica.com INTRODUCTION The Consumer Electronics Association (CEA),
More informationQuarterly Progress and Status Report. Measuring inharmonicity through pitch extraction
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Measuring inharmonicity through pitch extraction Galembo, A. and Askenfelt, A. journal: STL-QPSR volume: 35 number: 1 year: 1994
More informationComparative Analysis on the Armenian and Korean Languages
Comparative Analysis on the Armenian and Korean Languages Syuzanna Mejlumyan Yerevan State Linguistic University Abstract It has been five years since the Korean language has been taught at Yerevan State
More informationAudio Loudness Normalization and Measurement of Broadcast Programs
Audio Loudness Normalization and Measurement of Broadcast Programs Martin Simaliak, Miroslav Uhrina, Jan Hlubik, Martin Vaculik, Martin Cap Department of Telecommunication and Multimedia Faculty of Electrical
More informationSpot me if you can: Uncovering spoken phrases in encrypted VoIP conversations
Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations C. Wright, L. Ballard, S. Coull, F. Monrose, G. Masson Talk held by Goran Doychev Selected Topics in Information Security and
More informationSpeech Analytics. Whitepaper
Speech Analytics Whitepaper This document is property of ASC telecom AG. All rights reserved. Distribution or copying of this document is forbidden without permission of ASC. 1 Introduction Hearing the
More informationUser Acceptance of Social Robots. Ibrahim A. Hameed, PhD, Associated Professor NTNU in Ålesund, Norway ibib@ntnu.no
User Acceptance of Social Robots Ibrahim A. Hameed, PhD, Associated Professor NTNU in Ålesund, Norway ibib@ntnu.no Paper: Hameed, I.A., Tan, Z.-H., Thomsen, N.B. and Duan, X. (2016). User Acceptance of
More informationMarathi Interactive Voice Response System (IVRS) using MFCC and DTW
Marathi Interactive Voice Response System (IVRS) using MFCC and DTW Manasi Ram Baheti Department of CSIT, Dr.B.A.M. University, Aurangabad, (M.S.), India Bharti W. Gawali Department of CSIT, Dr.B.A.M.University,
More informationStrand: Reading Literature Topics Standard I can statements Vocabulary Key Ideas and Details
Strand: Reading Literature Key Ideas and Craft and Structure Integration of Knowledge and Ideas RL.K.1. With prompting and support, ask and answer questions about key details in a text RL.K.2. With prompting
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationTechnical Report. Overview. Revisions in this Edition. Four-Level Assessment Process
Technical Report Overview The Clinical Evaluation of Language Fundamentals Fourth Edition (CELF 4) is an individually administered test for determining if a student (ages 5 through 21 years) has a language
More informationArtificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationDRA2 Word Analysis. correlated to. Virginia Learning Standards Grade 1
DRA2 Word Analysis correlated to Virginia Learning Standards Grade 1 Quickly identify and generate words that rhyme with given words. Quickly identify and generate words that begin with the same sound.
More informationMusic technology. Draft GCE A level and AS subject content
Music technology Draft GCE A level and AS subject content July 2015 Contents The content for music technology AS and A level 3 Introduction 3 Aims and objectives 3 Subject content 4 Recording and production
More informationQuick Start Guide: Read & Write 11.0 Gold for PC
Quick Start Guide: Read & Write 11.0 Gold for PC Overview TextHelp's Read & Write Gold is a literacy support program designed to assist computer users with difficulty reading and/or writing. Read & Write
More informationPhonemic Awareness. Section III
Section III Phonemic Awareness Rationale Without knowledge of the separate sounds that make up words, it is difficult for children to hear separate sounds, recognize the sound s position in a word, and
More informationRegionalized Text-to-Speech Systems: Persona Design and Application Scenarios
Regionalized Text-to-Speech Systems: Persona Design and Application Scenarios Michael Pucher, Gudrun Schuchmann, and Peter Fröhlich ftw., Telecommunications Research Center, Donau-City-Strasse 1, 1220
More informationSPEECH AUDIOMETRY. @ Biswajeet Sarangi, B.Sc.(Audiology & speech Language pathology)
1 SPEECH AUDIOMETRY Pure tone Audiometry provides only a partial picture of the patient s auditory sensitivity. Because it doesn t give any information about it s ability to hear and understand speech.
More informationAN1200.04. Application Note: FCC Regulations for ISM Band Devices: 902-928 MHz. FCC Regulations for ISM Band Devices: 902-928 MHz
AN1200.04 Application Note: FCC Regulations for ISM Band Devices: Copyright Semtech 2006 1 of 15 www.semtech.com 1 Table of Contents 1 Table of Contents...2 1.1 Index of Figures...2 1.2 Index of Tables...2
More informationAcademic Standards for Reading, Writing, Speaking, and Listening
Academic Standards for Reading, Writing, Speaking, and Listening Pre-K - 3 REVISED May 18, 2010 Pennsylvania Department of Education These standards are offered as a voluntary resource for Pennsylvania
More informationInternational Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationIndiana Department of Education
GRADE 1 READING Guiding Principle: Students read a wide range of fiction, nonfiction, classic, and contemporary works, to build an understanding of texts, of themselves, and of the cultures of the United
More informationSPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems
SPEECH INTELLIGIBILITY and Fire Alarm Voice Communication Systems WILLIAM KUFFNER, M.A. Sc., P.Eng, PMP Senior Fire Protection Engineer Director Fire Protection Engineering October 30, 2013 Code Reference
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationBilingual Education Assessment Urdu (034) NY-SG-FLD034-01
Bilingual Education Assessment Urdu (034) NY-SG-FLD034-01 The State Education Department does not discriminate on the basis of age, color, religion, creed, disability, marital status, veteran status, national
More informationAPPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA
APPLYING MFCC-BASED AUTOMATIC SPEAKER RECOGNITION TO GSM AND FORENSIC DATA Tuija Niemi-Laitinen*, Juhani Saastamoinen**, Tomi Kinnunen**, Pasi Fränti** *Crime Laboratory, NBI, Finland **Dept. of Computer
More informationBLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION
BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be
More informationLinear Predictive Coding
Linear Predictive Coding Jeremy Bradbury December 5, 2000 0 Outline I. Proposal II. Introduction A. Speech Coding B. Voice Coders C. LPC Overview III. Historical Perspective of Linear Predictive Coding
More informationINCREASE YOUR PRODUCTIVITY WITH CELF 4 SOFTWARE! SAMPLE REPORTS. To order, call 1-800-211-8378, or visit our Web site at www.pearsonassess.
INCREASE YOUR PRODUCTIVITY WITH CELF 4 SOFTWARE! Report Assistant SAMPLE REPORTS To order, call 1-800-211-8378, or visit our Web site at www.pearsonassess.com In Canada, call 1-800-387-7278 In United Kingdom,
More informationGrade 1 LA. 1. 1. 1. 1. Subject Grade Strand Standard Benchmark. Florida K-12 Reading and Language Arts Standards 27
Grade 1 LA. 1. 1. 1. 1 Subject Grade Strand Standard Benchmark Florida K-12 Reading and Language Arts Standards 27 Grade 1: Reading Process Concepts of Print Standard: The student demonstrates knowledge
More informationMFL skills map. Year 3 Year 4 Year 5 Year 6 Develop understanding of the sounds of Individual letters and groups of letters (phonics).
listen attentively to spoken language and show understanding by joining in and responding explore the patterns and sounds of language through songs and rhymes and link the spelling, sound and meaning of
More informationWhite Paper. PESQ: An Introduction. Prepared by: Psytechnics Limited. 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN
PESQ: An Introduction White Paper Prepared by: Psytechnics Limited 23 Museum Street Ipswich, Suffolk United Kingdom IP1 1HN t: +44 (0) 1473 261 800 f: +44 (0) 1473 261 880 e: info@psytechnics.com September
More informationConstruct User Guide
Construct User Guide Contents Contents 1 1 Introduction 2 1.1 Construct Features..................................... 2 1.2 Speech Licenses....................................... 3 2 Scenario Management
More informationELPS TELPAS. Proficiency Level Descriptors
ELPS TELPAS Proficiency Level Descriptors Permission to copy the ELPS TELPAS Proficiency Level Descriptors is hereby extended to Texas school officials and their agents for their exclusive use in determining
More informationPC Postprocessing Technologies: A Competitive Analysis
PC Postprocessing Technologies: A Competitive Analysis Home Theater v4 SRS Premium Sound Waves MaxxAudio 3 Abstract In a scientifically rigorous analysis of audio postprocessing technologies for laptop
More informationL9: Cepstral analysis
L9: Cepstral analysis The cepstrum Homomorphic filtering The cepstrum and voicing/pitch detection Linear prediction cepstral coefficients Mel frequency cepstral coefficients This lecture is based on [Taylor,
More informationhave more skill and perform more complex
Speech Recognition Smartphone UI Speech Recognition Technology and Applications for Improving Terminal Functionality and Service Usability User interfaces that utilize voice input on compact devices such
More informationThis document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
More informationKindergarten Common Core State Standards: English Language Arts
Kindergarten Common Core State Standards: English Language Arts Reading: Foundational Print Concepts RF.K.1. Demonstrate understanding of the organization and basic features of print. o Follow words from
More informationEvaluation of a Segmental Durations Model for TTS
Speech NLP Session Evaluation of a Segmental Durations Model for TTS João Paulo Teixeira, Diamantino Freitas* Instituto Politécnico de Bragança *Faculdade de Engenharia da Universidade do Porto Overview
More informationBroadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
More informationLecture 12: An Overview of Speech Recognition
Lecture : An Overview of peech Recognition. Introduction We can classify speech recognition tasks and systems along a set of dimensions that produce various tradeoffs in applicability and robustness. Isolated
More informationDirac Live & the RS20i
Dirac Live & the RS20i Dirac Research has worked for many years fine-tuning digital sound optimization and room correction. Today, the technology is available to the high-end consumer audio market with
More informationFREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting
Page 1 of 9 1. SCOPE This Operational Practice is recommended by Free TV Australia and refers to the measurement of audio loudness as distinct from audio level. It sets out guidelines for measuring and
More informationAutomatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
More informationSection 8 Foreign Languages. Article 1 OVERALL OBJECTIVE
Section 8 Foreign Languages Article 1 OVERALL OBJECTIVE To develop students communication abilities such as accurately understanding and appropriately conveying information, ideas,, deepening their understanding
More informationUnderstanding the Transition From PESQ to POLQA. An Ascom Network Testing White Paper
Understanding the Transition From PESQ to POLQA An Ascom Network Testing White Paper By Dr. Irina Cotanis Prepared by: Date: Document: Dr. Irina Cotanis 6 December 2011 NT11-22759, Rev. 1.0 Ascom (2011)
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationFunctional Communication for Soft or Inaudible Voices: A New Paradigm
The following technical paper has been accepted for presentation at the 2005 annual conference of the Rehabilitation Engineering and Assistive Technology Society of North America. RESNA is an interdisciplinary
More informationANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1
WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's
More informationHardware Implementation of Probabilistic State Machine for Word Recognition
IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2
More informationAUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS
AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS PIERRE LANCHANTIN, ANDREW C. MORRIS, XAVIER RODET, CHRISTOPHE VEAUX Very high quality text-to-speech synthesis can be achieved by unit selection
More informationChild-speak Reading Level 1 APP AF1 AF2 AF3 AF4 AF5 AF6 AF7 Use a range of strategies, including accurate decoding text, to read for meaning
Child-speak Reading Level 1 APP In some usually I can break down and blend consonant vowel consonant words e.g. cat (1c) I can recognise some familiar words in the I read. (1c) When aloud, I know the sentences
More informationVoice Encoding Methods for Digital Wireless Communications Systems
SOUTHERN METHODIST UNIVERSITY Voice Encoding Methods for Digital Wireless Communications Systems BY Bryan Douglas Street address city state, zip e-mail address Student ID xxx-xx-xxxx EE6302 Section 324,
More informationA secure face tracking system
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 10 (2014), pp. 959-964 International Research Publications House http://www. irphouse.com A secure face tracking
More informationACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING
ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING Dennis P. Driscoll, P.E. and David C. Byrne, CCC-A Associates in Acoustics, Inc. Evergreen, Colorado Telephone (303)
More informationHoughton Mifflin Harcourt StoryTown Grade 1. correlated to the. Common Core State Standards Initiative English Language Arts (2010) Grade 1
Houghton Mifflin Harcourt StoryTown Grade 1 correlated to the Common Core State Standards Initiative English Language Arts (2010) Grade 1 Reading: Literature Key Ideas and details RL.1.1 Ask and answer
More informationSpectral and Time-Domain Analysis of Recorded Wave Files on the Audio Analyzer R&S UPV Application Note
1GA66 Thomas Lechner, 1GPM 19.Aug.2014-0e Spectral and Time-Domain Analysis of Recorded Wave Files on the Audio Analyzer R&S UPV Application Note Products: R&S UPV R&S UPV-B2 R&S UPV-K61 R&S UPV-K63 The
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationThe PESQ Algorithm as the Solution for Speech Quality Evaluation on 2.5G and 3G Networks. Technical Paper
The PESQ Algorithm as the Solution for Speech Quality Evaluation on 2.5G and 3G Networks Technical Paper 1 Objective Metrics for Speech-Quality Evaluation The deployment of 2G networks quickly identified
More informationLinear Parameter Measurement (LPM)
(LPM) Module of the R&D SYSTEM FEATURES Identifies linear transducer model Measures suspension creep LS-fitting in impedance LS-fitting in displacement (optional) Single-step measurement with laser sensor
More information