Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm

Similar documents
ELECTRONIC DOCUMENT IMAGING

An Arabic Text-To-Speech System Based on Artificial Neural Networks

Navigation Aid And Label Reading With Voice Communication For Visually Impaired People

Fingerprint Based Biometric Attendance System

Analecta Vol. 8, No. 2 ISSN

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Question Bank Subject Name: EC Microprocessor & Microcontroller Year/Sem : II/IV

- 35mA Standby, mA Speaking pre-defined phrases with up to 1925 total characters.

Intelligent Home Automation and Security System

Serial Communications

S7 for Windows S7-300/400

A Computer Vision System on a Chip: a case study from the automotive domain

REAL TIME TRAFFIC LIGHT CONTROL USING IMAGE PROCESSING

Implementing a Digital Answering Machine with a High-Speed 8-Bit Microcontroller

MICROPROCESSOR AND MICROCOMPUTER BASICS

MULTIPLE CHOICE FREE RESPONSE QUESTIONS

USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area. CPSC 822 Case Study in Operating Systems

Embedded Systems on ARM Cortex-M3 (4weeks/45hrs)

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

Accurate Measurement of the Mains Electricity Frequency

MACHINE VISION MNEMONICS, INC. 102 Gaither Drive, Suite 4 Mount Laurel, NJ USA

Image Compression through DCT and Huffman Coding Technique

Designing and Embodiment of Software that Creates Middle Ware for Resource Management in Embedded System

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

Low-resolution Image Processing based on FPGA

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

Automatic Detection of PCB Defects

Keywords image processing, signature verification, false acceptance rate, false rejection rate, forgeries, feature vectors, support vector machines.

8051 MICROCONTROLLER COURSE

1 PERSONAL COMPUTERS

Chapter 4 System Unit Components. Discovering Computers Your Interactive Guide to the Digital World

ANYTIME ANYPLACE-REMOTE MONITORING OF STUDENTS ATTENDANCE BASED ON RFID AND GSM NETWORK

Microprocessor & Assembly Language

Measurement and Analysis Introduction of ISO7816 (Smart Card)

Open Architecture Design for GPS Applications Yves Théroux, BAE Systems Canada

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014

FPGA Implementation of Human Behavior Analysis Using Facial Image

IVR CRM Integration. Migrating the Call Center from Cost Center to Profit. Definitions. Rod Arends Cheryl Yaeger BenchMark Consulting International

EARTH PEOPLE TECHNOLOGY SERIAL GRAPH TOOL FOR THE ARDUINO UNO USER MANUAL

Android based Alcohol detection system using Bluetooth technology

Data Acquisition Module with I2C interface «I2C-FLEXEL» User s Guide

Versions. Q.station Q.station T. Q.station D. Q.station DT x x

Vision Impairment and Computing

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)

How To Use Safety System Software (S3)

Form Design Guidelines Part of the Best-Practices Handbook. Form Design Guidelines

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

How To Filter Spam Image From A Picture By Color Or Color

Chapter 2 Logic Gates and Introduction to Computer Architecture

Development of Low Cost Private Office Access Control System(OACS)

From Concept to Production in Secure Voice Communications

VECTORAL IMAGING THE NEW DIRECTION IN AUTOMATED OPTICAL INSPECTION

Optical Character Recognition (OCR)

Computer Organization & Architecture Lecture #19

The Implementation of Face Security for Authentication Implemented on Mobile Phone

Implementation of Knock Based Security System

Serial Communications

Hardware and Logic Implementation of Multiple Alarm System for GSM BTS Rooms

Signature Region of Interest using Auto cropping

Security & Chip Card ICs SLE 44R35S / Mifare

LONG BEACH CITY COLLEGE MEMORANDUM

Best Practises for LabVIEW FPGA Design Flow. uk.ni.com ireland.ni.com

Virtual KNX/EIB devices in IP networks

A Method of Caption Detection in News Video

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

MACHINE ARCHITECTURE & LANGUAGE

OPC COMMUNICATION IN REAL TIME

Digitale Signalverarbeitung mit FPGA (DSF) Soft Core Prozessor NIOS II Stand Mai Jens Onno Krah

A secure face tracking system

CMOS OV7660 Camera Module 1/5-Inch 0.3-Megapixel Module Datasheet

An Embedded Based Web Server Using ARM 9 with SMS Alert System

Discovering Computers Living in a Digital World

GPS & GSM BASED REAL-TIME VEHICLE TRACKING SYSTEM.

Thirukkural - A Text-to-Speech Synthesis System

EPSON PERFECTION SCANNING BASICS

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Smart Thermostat page 1

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.

High-speed image processing algorithms using MMX hardware

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network


Inbound Routing with NetSatisFAXTion

Using Neural Networks to Create an Adaptive Character Recognition System

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

PROGRAMMABLE LOGIC CONTROLLERS Unit code: A/601/1625 QCF level: 4 Credit value: 15 TUTORIAL OUTCOME 2 Part 1

GLOVE-BASED GESTURE RECOGNITION SYSTEM

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

AUTOMATIC NIGHT LAMP WITH MORNING ALARM USING MICROPROCESSOR

Network connectivity controllers

Voice Dialer Speech Recognition Dialing IC

ZIMBABWE SCHOOL EXAMINATIONS COUNCIL. COMPUTER STUDIES 7014/01 PAPER 1 Multiple Choice SPECIMEN PAPER

Low-resolution Character Recognition by Video-based Super-resolution

Serial port interface for microcontroller embedded into integrated power meter

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Abstract. Cycle Domain Simulator for Phase-Locked Loops

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

Transcription:

Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm M. Arun, S.S. Salvadiswar, J.Sibidharan Department of Electronics and Communication, Mepco Schlenk Engineering College, Virudhunagar-626005 Email: salvadiswar@gmail.com Abstract Character Recognition/Pattern Recognition has become main aspect of Computer vision. Optical Character Recognition is a method where characters are recognized from images digitally. Many algorithms are available for this purpose but i Novel algorithm implements this in a faster and efficient way based on unique segmentation technique. This algorithm has been implemented in MATLAB R2010a with a set of test images and an accuracy of 98% has been obtained. Text to Speech conversion is an artificial tool that converts text to speech sounds. There are quite a number of methods to generate speech sounds but this Selective Speech Synthesis provides naturalness in speech sounds based on phonemes, diphones and syllables. Keywords: Image processing based on OCR, Segment Extraction, Text to Speech conversion, Selective speech synthesis, Feature Extraction. I. Introduction Text to Speech conversion is an artificial tool that converts text to speech sounds. There are quite a number of methods to generate speech sounds but this Selective Speech Synthesis provides naturalness in speech sounds based on phonemes, diphones and syllables. Our ultimate aim is to make visually impaired people to Read books, newspaper and articles. Braille is the system that already exists to aid the visually impaired people. But the main disadvantage in Braille System is that it consumes time. In this Braille system the text is converted to raised dots to make the visually challenged people read and understand the letter/word. Our whole design relies on image processing technique. Any image with text embedded in it can be taken as an input image. This image is processed by image processing techniques to identify each and every character. These characters are converted into speech sounds using Voice shield box and Arduino. Journal on Today s Ideas Tomorrow s Technologies, Vol. 2, No. 1, June 2014 pp. 1 9 2014 by Chitkara University. All Rights Reserved. 1

Arun, M. Salvadiswar, S.S. Sibidharan, J. A algorithm called i Novel Algorithm is used to recognize the characters in the image. The main Module in character recognition is Segment extraction. The Programming is done in MATLAB using Character recognition algorithm and Beagle Bone Black is used to run the program that gives the output as a text file which has the character contents of the image. This Text File is sent to the Arduino through serial UART communication from the Beagle Bone Black. The Arduino receives the input text file and sends character as the output to the Voice shield box. This Voice Shield Box Consists of TTS256 processor (Text to Speech Converter) and a Speak Jet IC. The output from the TTS256 processor is an allophone Codes for the corresponding letter and this allophone code is sent to Speak Jet IC where the allophone codes are get converted to allophone sounds. The allophone sounds are given as electrical signals to the output jack where these signals are converted into sounds. II. i -NOVEL ALGORITHM The proposed algorithm for optical character recognition has been explained in detail in this section. A. OVERVIEW OF i NOVEL ALGORITHM The algorithm is represented in a block diagram as shown in Figure 1. Figure 1 2

First, an image is acquired through any of the standard image acquisition techniques. The input image is assumed to be in the YCbCr color format. The algorithm works on the Y part of the input image, which is a gray level image. Using an appropriate thresholding algorithm, the gray level images are threshold to obtain a binary image which quantizes alphabets and background to black and white colors respectively. The obtained binary image is then passed on to a specific edge detection process. The edge detection algorithm is performed such that only the right sided edges of each alphabet are obtained and the other edges are eliminated. After edge detection, the image is then segmented and feature extraction is performed. A segment is defined as a continuous path of black pixels in the edge detected image, for this algorithm. In this step, different details of the segments, which are required for further processing, are stored. The next step is to profile stored line segments. Profiling of segments is the process of Categorizing them into different types of segments such as short, long, line or curve, etc. Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm B.THRESHOLDING The Y component of the input image is thresholded in this block. The results of thresholding and edge detection for the alphabet Q is shown in Figure 2. Conventional thresholding with a threshold value of 64 is used for this particular input image. The threshold factor was obtained after experimentation on over 500 images of alphabets. Figure 2 3

Arun, M. Salvadiswar, S.S. Sibidharan, J. C.CREATING BOUNDING BOXES The thresholded input image is then confronted to bounding operation. A bounding box is made around each and every character in the text image using propied library function in MATLAB. This library function encloses every character with a bounding box by identifying the pixel values of the input text image. The results of creating a bounding box around a text image are shown in Figure 3. D.IDENTIFICATION OF SPACE To identify the space between two words, an array named space vector is initialized. This space vector array stores the pixel values of white background in it. The pixel value of white is 255. Whenever the array confronts with the value 255, it stores it in the variable. When the variable reaches a maximum limit of 7 or 8 array values of 255 it is assumed that there is a space. If the variable gets less than 4 or 5 array values of 255 then the variable is neglected and the array is again initialized to zero. E.CROPPING AND SEPERATION OF LINES The process of recreation of the entire image in a text file is obtained through cropping and separation. The whole input image is cropped to a predefined size of characters. This i novel algorithm identifies each and every character 4

in the image and compares it with a set of predefined character image set and produces the output in a text file. The entire image is cropped at each and every step of the character recognition iterative process. The identified characters at each step is then stored in a new text file to be further processed. III. TEXT TO SPEECH CONVERSION The proposed algorithm for text to speech conversion has been explained in detail in this section. Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm A. OVERVIEW OF TEXT TO SPECH CONVERSIONALGORITHM The below algorithm to convert the identified text characters into speech sounds is represented in a block diagram as shown in Figure 3. Figure 3 A. BEAGLE BONE BLACK The main component of the project is Beagle Board. The main advantage of this board is that it contains a real time operating System in it which is capable of running on any environment. The other advantage is that it can be accessed remotely from any environment provided with Ethernet and Wi-Fi connectivity. The Beagle Board contains the character recognition algorithm. It runs the algorithm over the input text images and identifies the characters and stores in a text file. This text file is given as an input to the arduino board. The reason behind the usage of this board is that it contains 2Gb built in memory which can store multiple books in it in the form of images. B. TTS256 PROCESSOR The TTS256 is an 8-bit microprocessor programmed with 600 letter-to-sound rules for the automatic, real-time translation of English text to allophone addresses. The TTS256 will read English words, numbers, currency, time, 5

Arun, M. Salvadiswar, S.S. Sibidharan, J. mathematical expressions and some punctuation characters. As the rule set is constrained by the amount of memory in the device, the TTS256 will be able translate and pronounce correctly roughly 90% of text sent to it. The translation quality is adequate for many embedded applications but is guaranteed to mispronounce some common words from time to time. C. OPERATION OF TTS256 Communications parameters of the serial port are set to 9600 Baud N81. The host processors send up to 128 character the TTS256 s serial port on pin 18 (RX). Text is spoken when the TTS256 receives a new line character (0x0A)or the maximum number of 128 characters has been received. The TTS256 then converts the English text to Speak Jet allophone codes, which are then sent to the Speak Jet via the serial port. The TTS256 then monitors the Speak Jet PIN 15 (BUFFER_HALF_FULL) line and waits to send more data. Once the Speak Jet starts speaking, it lowers the Speak Jet ready line (pin 16 of the Speak Jet). After the Speak Jet has completed speaking and is ready to accept more data, the Speak Jet ready line is raised. The host processor should monitor the Speak Jet ready line to determine when to send more text. The host processor should wait at least 12mS after sending text before checking the status of the Speak Jet Speaking flag. D. SPEAK JET The Speak Jet is a completely self contained, single chip voice and complex sound synthesizer. It uses Mathematical Sound Architecture (MSA) technology which controls an internal five channel sound synthesizer to generate onthe-fly, unlimited vocabulary speech synthesis and complex sounds. The Speak Jet is preconfigured with 72 speech elements (allophones), 43 sound effects, and 12 DTMF Touch Tones. Through the selection of these MSA components and in combination with the control of the pitch, rate, bend, and volume parameters, the user has the ability to produce unlimited phrases and sound effects, with thousands of variations, at any time. This is not recorded waveforms or sound fragments but truly synthetic sound. The Speak Jet can be controlled simultaneously by logic changes on any one of its eight Event Input lines, and/or by a Serial Data line from a CPU (such as the OOPic, Basic Stamp or PC) allowing for both CPU controlled and Stand-Alone operations. Other features include an internal 64 byte input Buffer, Internal Programmable EEPROM, three programmable outputs, and direct user access to the internal five channel sound synthesizer. 6

IV. RESULTS The developed algorithm was designed and implemented on MATLAB 7.10.0.499 build R2010a with a thresholding module and a simple alphabet segmentation module. It was tested on 20 different computer generated images of the three major fonts. The Figure 4 shows the process of character recognition being implemented on different images with different font styles and font size on a MATLAB environment. Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm Figure 4 Figure 5 7

Arun, M. Salvadiswar, S.S. Sibidharan, J. The Figure 5 shows the result of character recognition process. The figure shows a text file which contains the text content of the processed image. V. CONCLUSION The motivation for the development of this algorithm was the simple fact that English alphabets are fixed glyphs and they shall not be changed ever. Due to this fact, usage of artificial neural networks and vector based data training provide almost accurate results, but these are performing a lot of redundant work. And also, most of the OCR processes today involves images from high resolution scanners and cameras. OCR technologies now can make use of this advancement in technology and consider techniques which were abandoned due to the lack of present day imaging technology. Our algorithm is one such approach. This algorithm has the advantage of speed, power, memory and area, since it does not include any training or learning mechanisms and also because of lack of image database which some OCR techniques require. Also, this algorithm is the first, in being a multiple font OCR and also a no training type OCR. To conclude, the developed algorithm can readily be used as a kernel within a complete OCR solution to recognize each alphabet after the segmentation and such adjustment operations to align the alphabet horizontally. The algorithm also gives us an accuracy of 100% on the current test set of alphabets for most of the fonts in all the three above mentioned font families currently. Future versions of this algorithm can be expected to extract more types of features from the input alphabet image to improve accuracy and to provide support for other alphabet sets. REFERENCES Ramanathan. R. et al. A Novel Technique for English Font Recognition Using Support Vector Machines, in Advances in Recent Technologies in Communication and Computing, Kottayam, Kerala, 2009, pp. 766-769. DOI 10.1109/ARTCom.2009.89 Sushruth Shastry Gunasheela G. et al., i - A novel Algorithm for Optical Character Recognition in - International Multi Conference on Automation, Computing, Control, Communication and Compressed Sensing in March 2013, pp. 389 393. DOI 10.1109/iMac4s.2013.6526442 J. Zhang, Language generation and speech synthesis in dialogues for language learning, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, May 20, 2004. Farig Yousuf Sadeque, Bangla Text to Speech Conversion: a Syllabic Unit Selection Approach Department of Computer Science and Engineering Bangladesh University of Engineering and Technology (BUET) Dhaka-1000, Bangladesh. DOI 10.1109/ICIEV.2013.6572593 S. P. Kishore and A.W. Black, Unit size in unit selection speech synthesis, Proceedings of EUROSPEECH, pp. 1317 1320, 2003. 8

M Usman Raza, et al., Text Extraction Using Artificial Neural Networks, in Networked Computing and Advanced Information Management (NCM) 7th International Conference,, Gyeongju, North Gyeongsang, 2011, pp. 134-137. Fonseca, J.M., et al., Optical Character Recognition Using Automatically Generated Fuzzy Classifiers, in Eighth International Conference on Fuzzy Systems and Knowledge Discovery, Shanghai, 2011, pp. 448-452. DOI 10.1109/FSKD.2011.6019585 Design and Implementation of Text To Speech Conversion for Visually Impaired Using i Novel Algorithm 9