International Journal of Advanced Research in Computer Science and Software Engineering

Similar documents
TEXT TO SPEECH SYSTEM FOR KONKANI ( GOAN ) LANGUAGE

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Thirukkural - A Text-to-Speech Synthesis System

A secure face tracking system

Fuzzy Keyword Search over Encrypted Stego in Cloud

Develop Software that Speaks and Listens

Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive Voice Response System

Web Based Maltese Language Text to Speech Synthesiser

Keyboards for inputting Japanese language -A study based on US patents

Section 1.4 Place Value Systems of Numeration in Other Bases

Text-To-Speech Technologies for Mobile Telephony Services

Things to remember when transcribing speech

Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations

Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers

We KNOW We CARE We SERVE. Helping Businesses Make Intelligent Use of Technology. Sample Company. Telephone Engineering Analysis

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

Membering T M : A Conference Call Service with Speaker-Independent Name Dialing on AIN

Philips 9600 DPM Setup Guide for Dragon

SASSC: A Standard Arabic Single Speaker Corpus

have more skill and perform more complex

Doppler Effect Plug-in in Music Production and Engineering

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Step by step guide to using Audacity

COPYRIGHT 2011 COPYRIGHT 2012 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

The Impact of Using Technology in Teaching English as a Second Language

Dragon Medical Practice Edition v2 Best Practices

Positional Numbering System

SSRG International Journal of Computer Science and Engineering (SSRG-IJCSE) volume 2 issue 3 March 2015 Personal Assistant for User Task Automation

Alaa Alhamami, Avan Sabah Hamdi Amman Arab University Amman, Jordan

Creating voices for the Festival speech synthesis system.

Analog-to-Digital Voice Encoding

Specification of the Broadcast Wave Format (BWF)

PRIMER ON PC AUDIO. Introduction to PC-Based Audio

Preservation Handbook

C Implementation & comparison of companding & silence audio compression techniques

CONATION: English Command Input/Output System for Computers

Speech: A Challenge to Digital Signal Processing Technology for Human-to-Computer Interaction

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

School Class Monitoring System Based on Audio Signal Processing

Feasibility Study of Implementation of Cell Phone Controlled, Password Protected Door Locking System

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

NUMBER SYSTEMS. William Stallings

HIGH DENSITY DATA STORAGE IN DNA USING AN EFFICIENT MESSAGE ENCODING SCHEME Rahul Vishwakarma 1 and Newsha Amiri 2

DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH

An Overview of Knowledge Discovery Database and Data mining Techniques

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Multiplexing. Multiplexing is the set of techniques that allows the simultaneous transmission of multiple signals across a single physical medium.

DRA2 Word Analysis. correlated to. Virginia Learning Standards Grade 1

Lecture 2. Binary and Hexadecimal Numbers

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Bluetooth Audio Data Transfer between Bluetooth chipset (PMB6752&PMB6625) and TriCore Host TC1920

INTERACTIVE VOICE RESPONSE WITH AUTOMATED SPEECH RECOGNITION AND BIOMETRICS FOR BANWEB

Telecommunications Switching Systems (TC-485) PRACTICAL WORKBOOK FOR ACADEMIC SESSION 2011 TELECOMMUNICATIONS SWITCHING SYSTEMS (TC-485) FOR BE (TC)

Efficient Processing of XML Documents in Hadoop Map Reduce

Keywords : complexity, dictionary, compression, frequency, retrieval, occurrence, coded file. GJCST-C Classification : E.3

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

BBC Learning English - Talk about English July 18, 2005

Audio Coding Algorithm for One-Segment Broadcasting

Lecture 2 Outline. EE 179, Lecture 2, Handout #3. Information representation. Communication system block diagrams. Analog versus digital systems

Standard Languages for Developing Multimodal Applications

A CHINESE SPEECH DATA WAREHOUSE

BlueMsg: Hands-Free Text Messaging System (Proposal)

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Speech Analytics. Whitepaper

Type of Party on Hold. SIP Trunk, / ISDN CO/ ISDN Trunk via FXO gateway (Incoming) SIP Trunk, ISDN CO/ ISDN Trunk via FXO gateway (Outgoing)

Reading IV Grade Level 4

A Web Based Voice Recognition System for Visually Challenged People

A Novel Cryptographic Key Generation Method Using Image Features

International Journal of Computer Engineering and Applications, Volume V, Issue III, March 14

Enhanced Mid-Call Diversion

Disassembling a Windows Wave File (.wav)

Pronunciation in English

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Voice Driven Animation System

International Journal of Advanced Research in Computer Science and Software Engineering

Comparing Methods to Identify Defect Reports in a Change Management Database

Prediction of Heart Disease Using Naïve Bayes Algorithm

Corpus Design for a Unit Selection Database

How to Record Voice Narration in Powerpoint Rachel Cunliffe 2001

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

ANN Based Fault Classifier and Fault Locator for Double Circuit Transmission Line

Secret Communication through Web Pages Using Special Space Codes in HTML Files

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

FPGA Implementation of an Advanced Traffic Light Controller using Verilog HDL

Phonetic-Based Dialogue Search: The Key to Unlocking an Archive s Potential

International Language Character Code

31 Case Studies: Java Natural Language Tools Available on the Web

WESTERN KENTUCKY UNIVERSITY. Web Accessibility. Objective

Speech and Data Analytics for Trading Floors: Technologies, Reliability, Accuracy and Readiness

Sunny 1, Rinku Garg 2 Department of Electronics and Communication Engg. GJUS&T Hissar, India

Navigation Aid And Label Reading With Voice Communication For Visually Impaired People

From Concept to Production in Secure Voice Communications

A Tokenization and Encryption based Multi-Layer Architecture to Detect and Prevent SQL Injection Attack

HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER

Portions have been extracted from this report to protect the identity of the student. RIT/NTID AURAL REHABILITATION REPORT Academic Year

Call Recorder Oygo Manual. Version

Open Source VoiceXML Interpreter over Asterisk for Use in IVR Applications

Transcription:

Volume 2, Issue 9, September 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Syllable Concatenation for Text to Speech Synthesis for Devnagari Script Mrs Minakshee patil Dr.R.S.Kawitkar Department of E & TC Department of E & TC Sinhgad academy Of Engg, Sinhgad College of Engineering Pune. India Pune. India Abstract This document gives TTS, an abbreviation for Text to Speech is a system which converts the input text to a corresponding sound output. There are various techniques available for TTS. Out of those techniques, this paper uses a combination of database and word breaking into syllables and if necessary barakhadi. A simple and limited database of words is beauty of this project. A failed database search invokes barakhadi breaking. Keywords TTS, wavfile, database, barakhadi, synthesis, joint point, syllables I. INTRODUCTION Speech synthesis is the artificial production of human speech usually produced by means of computers. Speech synthesis systems are often called text to speech (TTS) systems in reference to their ability to convert text into speech. In speech synthesis, the input is standard text or a phonetic spelling, and the output is a spoken version of the text. Speech generation is the process, which allows the transformation of a string of phonetic and prosodic symbols into a synthetic speech signal. The quality of the result is a function of the quality of the string, as well as of the quality of the generation process itself. Text-to-speech synthesis is a research field that has received a lot of attention and resources during the last couple of decades for excellent reasons. One of the most interesting ideas is the fact that a workable TTS system, combined with a workable speech recognition device, would actually be an extremely efficient method for speech coding. It is necessary to find out what is requested today from a text-to-speech system. Usually two quality criteria are proposed. The first one is intelligibility, which can be measured by taking into account several kinds of units (phonemes, words, phrases). The second one, more difficult to define, is often labeled as pleasantness or naturalness. Actually the concept of naturalness may be related to the concept of realism in the field of image synthesis; the goal is not to restitute the reality but to suggest it. Thus, listening to a synthetic voice must allow the listener to attribute this voice to some pseudo-speaker and to perceive some kind of expressivities as well as some indices characterizing the speaking style and the particular situation of elocution. For this purpose the corresponding extra-linguistic information must be supplied to the system. Most of the present TTS systems produce an acceptable level of intelligibility, but the naturalness dimension, the ability to control expressivities, speech style and pseudo-speaker identity still are poorly mastered. However, users demands vary to a large extent according to the field of application: general public applications such as telephonic information retrieval need maximal realism and naturalness, whereas some applications involving professionals (process or vehicle control) or highly motivated persons (visually impaired, applications in hostile environments) demand intelligibility with the highest priority. II. BLOCK DIAGRAM Figure1 Block diagram of Speech Synthesis 2012, IJARCSSE All Rights Reserved Page 180

III. TEXT ENCODING The text entered through keyboard is saved in output.doc file. The text encoding stage assigns each character a predefined code. The entire text is converted into a string of code values. Codes are assigned to each character keeping in mind the purpose of the software i.e. speech synthesis. So separate codes For example code of H (full consonant) is 72 while code of H² (half consonant) is 145.. Separate ranges of codes are assigned for consonants and vowels as shown below, Full Consonants : 72 to 105 Half Consonants : 145 to 178 Vowels : 65 to 71 and 117 to 139 Two ranges of vowels divide the actual consonants and suffixes used for vowels. The range 65 to 71 is given for vowels A Am B B C D E Eo Amo Am A A: and the other range 117 to 139 is given for all the suffices used for the vowels,o p, s, w, y,, Each word is represented as a string of code values separated by. The words converted into the code string are given in following table Sr.No. Word Assigned code 1. MH«Ymas 77 145 97 89 116 97 120 2. H îugma 145 105 174 85 102 116 170 3. Amg_mZ 65 116 102 95 116 163 4. H m ñvw^ 72 116 123 175 86 138 167 Table1 : code string for different words The next stage in the text encoding stage is inserting the rafars, kanas, anuswars & matras in proper position. The pronunciation of rafar is before the character on which it is present whereas the pronunciation of anuswar is after the character on which it is present. For example, in the word A& VZ& X the pronunciation of rafar is immediately after V while pronunciation of anuswar is after A&.So the word A& VZ& X is actually stored as A& V Z&X. The pronunciation of anuswar depends on the character that follows it. So when anuswar occurs, it is replaced by an appropriate half consonant depending on the next character in the word. However when anuswar comes as the last character of the word it has no nasal sound. Table shows the replacement for some of the characters with suitable examples. Table2 : Anuswar Rules If the character is either or «or ç or ² then the previous consonant is made half while the character is replaced by appropriate full consonant. Some examples are illustrated below. d is replaced by d² and é à is replaced by n² and a öçm is replaced by ö² and `m IV. DATABASE MAINTAINED Two databases are maintained viz. Audio database that stores the audio files and Textual database that stores the text files corresponding to audio files in the audio database. The textual database is required to search the required word in the audio database. Audio Database 2012, IJARCSSE All Rights Reserved Page 181

WAVE file format is used for recording files. WAVE files are probably the simplest of the common formats for storing audio samples. Unlike MPEG and other compressed formats, WAVE files store samples in the raw where no preprocessing is required other than the formatting of the data. A separate sound file is created for each word. Details of.wav file are as below Audio Format Pulse Code Modulation Sampling Rate 11,025 KHz Bits per sample 8 bits Channel type Mono Audio Format PCM Bit Rate 88 kbps Table3 : Details of recorded.wav file Textual database It consists of four entities 1) Code string of a word and syllable. 2) word and syllable 3) Wave filename that contains that word/syllabe. 4) The from and to positions of the word/syllable within that audio file. MS Access database is used due to its simplicity & ease of use. Table shows a format of a table in the textual database. Access database contains only last four columns i.e. code, word, wave file name, from and to position. Table4 : Snapshot of textual database(words) - Table5 : Snapshot of textual database(syllables) V. DATABASE SEARCH Database search is carried out in different conditions. They are as follows. 1. Word is present in database : The output from the text encoding stage is a string of code values separated by a. Now this string is searched in the textual database. If match is found it returns the name of the corresponding audio file along with from and to positions. This information is stored in a intermediate file which is later used by the concatenation module. The above method is repeated for the entire text. The output of this module will be a file containing the wave file names along with from and to positions for each word found in the textual database. 2. The word is not present in database : If the word is not present in database, the word is split into syllables. Syllables are maintained into the database same as that of the words. Syllables are formed from the word using CV(consonents- vowel) rules of Marathi language. In the database, code for the syllable, start and end position of it in the word and file name is maintained. e.g n&w = n&a² + W 2012, IJARCSSE All Rights Reserved Page 182

An&aXe H = A + n&a² + Xa² + eh An&a = A + n&a And now these syllables are searched into the database. 3. The syllable is not present in database: If the syllable is not present in the database then it is cut into the barakhadi. VI. DIGIT PROCESSING Numerical figures are unavoidable part of any text. Presently the algorithm is able to convert numerical figures up to 1, 00,000 into speech. Same algorithm can be applied to increase this range. The numbers from 0 to 100 are recorded and stored in a wave file named digits.wav consecutively. The textual database stores the from and to positions of the digits in the digits.wav file. Words, and are recorded each being stored separately. To convert figures in the text into speech, the software first checks whether the separated word is a numerical figure. If the separated word is numerical figure then the Digit Processor comes into the picture. Working of the Digit Processor is explained as follows: Words one to ninety nine are recorded and stored successively in file named digits.wav, also words, AND are recorded and stored in digits. wav. Also, This corresponds to the audio database for digit processing. The textual database stores the from and to positions for each of the numbers stored in the audio database. The to and from positions for each number are stored in a table named digits in textual database. The place of the number is identified and then appropriate suffices like, OR are attached. e.g. consider synthesis of 12350. Table5: Digit Processing Example Algorithm 1. Check if the number is greater than 100000. If yes then continue otherwise follow step 12. 2. Get the quotient of no/100000 in variable quo. Store the remainder in variable rem for further processing. 3. Get the from & to position of the quo by querying the table digits in the textual database. 4. Also Get to and from locations for the word 5. Get the quotient for rem/1000 in quo. Store the new remainder in variable rem for further processing 6. Get the from & to position of the quo by querying the digits table in the database. 7. Also get the to and from locations of the word 8. Get the quotient for rem/100 in quo. Again store the new remainder in variable rem for further processing. 9. Get the from & to position of the quo by querying the digits table in the database. 10. Also get the to and from locations of the word 11. Finally get the from & to position of the number stored in the rem variable by querying the digits table in the database. 12. If the number is less than 100000 and greater then 1000 than follow steps 5 to 11. 13. If the number is less than 1000 and greater than 100 than follow steps 8 to 11. 14. If the number is less than 100 than get the from & to position of the number by querying the digits table in the database. 15. Stop. VII. CONCATENATION AND PLAY This stage concatenates all the wave files for the different words in the text and makes a new wave file. The program receives input as text document containing the names of all the files, which need to be concatenated in order to generate speech corresponding to the text. The program concatenates the wave files written in the filenms.doc to generate target.wav. The samples from individual wave files are read one at a time and appended to the new target.wav file. The process is repeated till all the wave files in the filenms.doc have been concatenated. The data length for the new file 2012, IJARCSSE All Rights Reserved Page 183

target.wav is updated at every loop and finally written to target.wav. All the wave files are opened in the binary mode. This is done because the data in these files can be accessed only in binary mode. VIII. CONCLUSION The word which is present in the database, is not broken into the syllables due to which joint point is absent and hence it sounds natural. If the word is not present in the database, it breaks into syllables and if syllable is not present then only it is broken into barakhadi. Due to syllable approach presence of joint point reduces and the naturalness of the speech output comparatively increases. REFERENCES [1] O Brien, D. Monaghan, Concatenative synthesis based on harmonic model, IEEE Transactions on Speech and Audio Processing, Vol.9 [1], Jan 2001, 11-19. [2] Hamza, Rashwan and Afifi, A quantitative method for modeling context in concatenative speech synthesis using large speech database, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2001. [3] Klatt Denis H., Allen Jonathen, Hunnicutt Sharon M., From Text To Speech: The MITalk System, [Cambridge University Press, London 1987]. [4] Bureau of Indian Standards [1991], Indian Script Code for Information Interchange IS 13194:1991. New Delhi, India. [5] Y.A. El Imam, An Unrestricted vocabulary Arabic synthesis system, IEEE Transactions, Acoustics, Speech and Signal Processing, Vol.37 [12], 1989, pp1829-1845 [6] J. Allen, S. Hunnicut, D.Klatt, From Text To Speech, The MITTALK System, [Cambridge University Press, 1987, 213 pp] 2012, IJARCSSE All Rights Reserved Page 184