Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm M. Arun, S.S. Salvadiswar, J.Sibidharan Department of Electronics and Communication, Mepco Schlenk Engineering College, Virudhunagar-626005 Email: salvadiswar@gmail.com Abstract Character Recognition/Pattern Recognition has become main aspect of Computer vision. Optical Character Recognition is a method where characters are recognized from images digitally. Many algorithms are available for this purpose but i Novel algorithm implements this in a faster and efficient way based on unique segmentation technique. This algorithm has been implemented in MATLAB R2010a with a set of test images and an accuracy of 98% has been obtained. Text to Speech conversion is an artificial tool that converts text to speech sounds. There are quite a number of methods to generate speech sounds but this Selective Speech Synthesis provides naturalness in speech sounds based on phonemes, diphones and syllables. Keywords: Image processing based on OCR, Segment Extraction, Text to Speech conversion, Selective speech synthesis, Feature Extraction. I. Introduction Text to Speech conversion is an artificial tool that converts text to speech sounds. There are quite a number of methods to generate speech sounds but this Selective Speech Synthesis provides naturalness in speech sounds based on phonemes, diphones and syllables. Our ultimate aim is to make visually impaired people to Read books, newspaper and articles. Braille is the system that already exists to aid the visually impaired people. But the main disadvantage in Braille System is that it consumes time. In this Braille system the text is converted to raised dots to make the visually challenged people read and understand the letter/word. Our whole design relies on image processing technique. Any image with text embedded in it can be taken as an input image. This image is processed by image processing techniques to identify each and every character. These characters are converted into speech sounds using Voice shield box and Arduino. Journal on Today s Ideas Tomorrow s Technologies, Vol. 2, No. 1, June 2014 pp. 1 9 2014 by Chitkara University. All Rights Reserved. 1
Arun, M. Salvadiswar, S.S. Sibidharan, J. A algorithm called i Novel Algorithm is used to recognize the characters in the image. The main Module in character recognition is Segment extraction. The Programming is done in MATLAB using Character recognition algorithm and Beagle Bone Black is used to run the program that gives the output as a text file which has the character contents of the image. This Text File is sent to the Arduino through serial UART communication from the Beagle Bone Black. The Arduino receives the input text file and sends character as the output to the Voice shield box. This Voice Shield Box Consists of TTS256 processor (Text to Speech Converter) and a Speak Jet IC. The output from the TTS256 processor is an allophone Codes for the corresponding letter and this allophone code is sent to Speak Jet IC where the allophone codes are get converted to allophone sounds. The allophone sounds are given as electrical signals to the output jack where these signals are converted into sounds. II. i -NOVEL ALGORITHM The proposed algorithm for optical character recognition has been explained in detail in this section. A. OVERVIEW OF i NOVEL ALGORITHM The algorithm is represented in a block diagram as shown in Figure 1. Figure 1 2
First, an image is acquired through any of the standard image acquisition techniques. The input image is assumed to be in the YCbCr color format. The algorithm works on the Y part of the input image, which is a gray level image. Using an appropriate thresholding algorithm, the gray level images are threshold to obtain a binary image which quantizes alphabets and background to black and white colors respectively. The obtained binary image is then passed on to a specific edge detection process. The edge detection algorithm is performed such that only the right sided edges of each alphabet are obtained and the other edges are eliminated. After edge detection, the image is then segmented and feature extraction is performed. A segment is defined as a continuous path of black pixels in the edge detected image, for this algorithm. In this step, different details of the segments, which are required for further processing, are stored. The next step is to profile stored line segments. Profiling of segments is the process of Categorizing them into different types of segments such as short, long, line or curve, etc. Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm B.THRESHOLDING The Y component of the input image is thresholded in this block. The results of thresholding and edge detection for the alphabet Q is shown in Figure 2. Conventional thresholding with a threshold value of 64 is used for this particular input image. The threshold factor was obtained after experimentation on over 500 images of alphabets. Figure 2 3
Arun, M. Salvadiswar, S.S. Sibidharan, J. C.CREATING BOUNDING BOXES The thresholded input image is then confronted to bounding operation. A bounding box is made around each and every character in the text image using propied library function in MATLAB. This library function encloses every character with a bounding box by identifying the pixel values of the input text image. The results of creating a bounding box around a text image are shown in Figure 3. D.IDENTIFICATION OF SPACE To identify the space between two words, an array named space vector is initialized. This space vector array stores the pixel values of white background in it. The pixel value of white is 255. Whenever the array confronts with the value 255, it stores it in the variable. When the variable reaches a maximum limit of 7 or 8 array values of 255 it is assumed that there is a space. If the variable gets less than 4 or 5 array values of 255 then the variable is neglected and the array is again initialized to zero. E.CROPPING AND SEPERATION OF LINES The process of recreation of the entire image in a text file is obtained through cropping and separation. The whole input image is cropped to a predefined size of characters. This i novel algorithm identifies each and every character 4
in the image and compares it with a set of predefined character image set and produces the output in a text file. The entire image is cropped at each and every step of the character recognition iterative process. The identified characters at each step is then stored in a new text file to be further processed. III. TEXT TO SPEECH CONVERSION The proposed algorithm for text to speech conversion has been explained in detail in this section. Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm A. OVERVIEW OF TEXT TO SPECH CONVERSIONALGORITHM The below algorithm to convert the identified text characters into speech sounds is represented in a block diagram as shown in Figure 3. Figure 3 A. BEAGLE BONE BLACK The main component of the project is Beagle Board. The main advantage of this board is that it contains a real time operating System in it which is capable of running on any environment. The other advantage is that it can be accessed remotely from any environment provided with Ethernet and Wi-Fi connectivity. The Beagle Board contains the character recognition algorithm. It runs the algorithm over the input text images and identifies the characters and stores in a text file. This text file is given as an input to the arduino board. The reason behind the usage of this board is that it contains 2Gb built in memory which can store multiple books in it in the form of images. B. TTS256 PROCESSOR The TTS256 is an 8-bit microprocessor programmed with 600 letter-to-sound rules for the automatic, real-time translation of English text to allophone addresses. The TTS256 will read English words, numbers, currency, time, 5
Arun, M. Salvadiswar, S.S. Sibidharan, J. mathematical expressions and some punctuation characters. As the rule set is constrained by the amount of memory in the device, the TTS256 will be able translate and pronounce correctly roughly 90% of text sent to it. The translation quality is adequate for many embedded applications but is guaranteed to mispronounce some common words from time to time. C. OPERATION OF TTS256 Communications parameters of the serial port are set to 9600 Baud N81. The host processors send up to 128 character the TTS256 s serial port on pin 18 (RX). Text is spoken when the TTS256 receives a new line character (0x0A)or the maximum number of 128 characters has been received. The TTS256 then converts the English text to Speak Jet allophone codes, which are then sent to the Speak Jet via the serial port. The TTS256 then monitors the Speak Jet PIN 15 (BUFFER_HALF_FULL) line and waits to send more data. Once the Speak Jet starts speaking, it lowers the Speak Jet ready line (pin 16 of the Speak Jet). After the Speak Jet has completed speaking and is ready to accept more data, the Speak Jet ready line is raised. The host processor should monitor the Speak Jet ready line to determine when to send more text. The host processor should wait at least 12mS after sending text before checking the status of the Speak Jet Speaking flag. D. SPEAK JET The Speak Jet is a completely self contained, single chip voice and complex sound synthesizer. It uses Mathematical Sound Architecture (MSA) technology which controls an internal five channel sound synthesizer to generate onthe-fly, unlimited vocabulary speech synthesis and complex sounds. The Speak Jet is preconfigured with 72 speech elements (allophones), 43 sound effects, and 12 DTMF Touch Tones. Through the selection of these MSA components and in combination with the control of the pitch, rate, bend, and volume parameters, the user has the ability to produce unlimited phrases and sound effects, with thousands of variations, at any time. This is not recorded waveforms or sound fragments but truly synthetic sound. The Speak Jet can be controlled simultaneously by logic changes on any one of its eight Event Input lines, and/or by a Serial Data line from a CPU (such as the OOPic, Basic Stamp or PC) allowing for both CPU controlled and Stand-Alone operations. Other features include an internal 64 byte input Buffer, Internal Programmable EEPROM, three programmable outputs, and direct user access to the internal five channel sound synthesizer. 6
IV. RESULTS The developed algorithm was designed and implemented on MATLAB 7.10.0.499 build R2010a with a thresholding module and a simple alphabet segmentation module. It was tested on 20 different computer generated images of the three major fonts. The Figure 4 shows the process of character recognition being implemented on different images with different font styles and font size on a MATLAB environment. Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm Figure 4 Figure 5 7
Arun, M. Salvadiswar, S.S. Sibidharan, J. The Figure 5 shows the result of character recognition process. The figure shows a text file which contains the text content of the processed image. V. CONCLUSION The motivation for the development of this algorithm was the simple fact that English alphabets are fixed glyphs and they shall not be changed ever. Due to this fact, usage of artificial neural networks and vector based data training provide almost accurate results, but these are performing a lot of redundant work. And also, most of the OCR processes today involves images from high resolution scanners and cameras. OCR technologies now can make use of this advancement in technology and consider techniques which were abandoned due to the lack of present day imaging technology. Our algorithm is one such approach. This algorithm has the advantage of speed, power, memory and area, since it does not include any training or learning mechanisms and also because of lack of image database which some OCR techniques require. Also, this algorithm is the first, in being a multiple font OCR and also a no training type OCR. To conclude, the developed algorithm can readily be used as a kernel within a complete OCR solution to recognize each alphabet after the segmentation and such adjustment operations to align the alphabet horizontally. The algorithm also gives us an accuracy of 100% on the current test set of alphabets for most of the fonts in all the three above mentioned font families currently. Future versions of this algorithm can be expected to extract more types of features from the input alphabet image to improve accuracy and to provide support for other alphabet sets. REFERENCES Ramanathan. R. et al. A Novel Technique for English Font Recognition Using Support Vector Machines, in Advances in Recent Technologies in Communication and Computing, Kottayam, Kerala, 2009, pp. 766-769. DOI 10.1109/ARTCom.2009.89 Sushruth Shastry Gunasheela G. et al., i - A novel Algorithm for Optical Character Recognition in - International Multi Conference on Automation, Computing, Control, Communication and Compressed Sensing in March 2013, pp. 389 393. DOI 10.1109/iMac4s.2013.6526442 J. Zhang, Language generation and speech synthesis in dialogues for language learning, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 20, 2004. Farig Yousuf Sadeque, Bangla Text to Speech Conversion: a Syllabic Unit Selection Approach Department of Computer Science and Engineering Bangladesh University of Engineering and Technology (BUET) Dhaka-1000, Bangladesh. DOI 10.1109/ICIEV.2013.6572593 S. P. Kishore and A.W. Black, Unit size in unit selection speech synthesis, Proceedings of EUROSPEECH, pp. 1317 1320, 2003. 8
M Usman Raza, et al., Text Extraction Using Artificial Neural Networks, in Networked Computing and Advanced Information Management (NCM) 7th International Conference,, Gyeongju, North Gyeongsang, 2011, pp. 134-137. Fonseca, J.M., et al., Optical Character Recognition Using Automatically Generated Fuzzy Classifiers, in Eighth International Conference on Fuzzy Systems and Knowledge Discovery, Shanghai, 2011, pp. 448-452. DOI 10.1109/FSKD.2011.6019585 Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm 9