OCR-ANN Back-Propagation Based Classifier

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 1, January 2015, pg.307 313 RESEARCH ARTICLE ISSN 2320 088X OCR-ANN Back-Propagation Based Classifier Asmaa Qasim Shareef 1 Sukaina M. Altayar 2 ¹,2 College of Science, University of Baghdad, Iraq 1 2 asama_sal@yahoo.com sukaina_altayar@yahoo.com Abstract Optical Character Recognition by using Neural Network is a prototype system that is useful to recognize the character. pre-processing for document is the preliminary step for the recognition to be accurate, which transforms the data into a format that will be processed easily and effectively. The main task of pre-processing is to decrease the variation that causes a reduction in the recognition rate and increases the complexities. In this paper, many pre-processing techniques has been used to improved OCR accuracy by pre-processing the original image and its character spiritedly, which includes noise filtering, smoothing, thresholding, and skewing. The experimental results show that the improvement of recognition side has achieved a good result for many types of noisy and non-uniform characters document. The efficiency and recognition testes for training method has been performed and reported in this paper. This paper shows how the use of artificial neural network for an optical character recognition application, while achieving highest quality of recognition and good performance by applying multiple image processing technique. Keywords Optical Characters Recognition; Back-Propagation; Neural Network; Image Analyses I. INTRODUCTION The Optical Character Recognition (OCR) is the process of automatic recognition for different characters from an image documented, and also provides a full alphanumeric recognition of printed or handwritten characters, text, numerals, letters and symbols into a computer process able to be formatted. OCR has gained largely impetus due to its application in the fields of Computer Vision, Intelligent Text Recognition applications and Text based decision-making systems [1,2]. The approach is taken as an attempted to solve the OCR problem, is based on psychology of the characters as perceived by the humans. Thus, the geometrical features of a character and its variants will be considered for recognition. The addition of the neural networks allow improvement on the recognition of arbitrary font styles as opposed to a standard font. The difference in font and its sizes of a scanned image make recognition process difficult if no pre-processing applied. In most document images that are gotten from scanning, allow a noisy pixels to be found; In addition, width of the stroke is also a factor that affects recognition, therefore; to get a good character recognition accuracy an elimination to the noise after reading binary 2015, IJCSMC All Rights Reserved 307

image data is needed, in addition to smooth image for better recognition, and extract features efficiently, train the system and classify patterns [3,4]. II. RELATED WORK [6] proved that the Succession in OCR depends on two factors, which are feature extraction and classification algorithms.[7] applied neural network approach to perform high accuracy recognition on music score with backward propagation. [8] presented scheme to develop a complete OCR system for different five fonts and sizes of characters, and implemented the steps of the OCR system: pre-processing, feature extraction, segmentation, and classification. The artificial neural network (ANN) has been used for classification purpose. [9] explained the classification methods based on learning from examples and its application to character recognition. III. PROPOSED SYSTEM ARCHITECTURE This paper presents a procedure for designing OCR-ANN system, which recognized text from a scanned image. The model built using C# language with Open CV library. The system consists of two parts shown in Fig.1: Training ANN Part and Pattern Recognition part. Fig.1 OCR-ANN system scheme In 1 st part, the database of characters has been created based on training ANN on images of characters with Backpropagation (BP) to generate data matrix that was used for classification operation. In 2 nd part multiple processing have been used to recognize charterers from scanned image includes; preprocessing and classification. ANN Training Part This part used to generate data that used in recognition part. It could be represented by the flow chart in Fig.2. The BP algorithm has been applied to feed-forward multilayer neural network (FFML). The nodes are organized in layers, and send their signals forward with errors propagated backwards. The network receives inputs by neurons in the input layer, and the output of the network is obtained by the neurons on an output layer with one hidden layers. Each layer is fully connected to the next layer. The learning continue until the error is reduced, i.e. the ANN learns the training data. The training starts with random weights, and aim is to adjust them to arrive at minimal error. The number of layers and the number of neurons per layer are important decisions to make when applying this architecture. The complexity between 2015, IJCSMC All Rights Reserved 308

the input data and desired output determines the number of nodes in the hidden layer. Also, the amount of training data sets set an upper bound for the number of nodes in the hidden layer. This upper bound is calculated by dividing the number of input output pairs examples in the training set by the total number of input and output nodes in the network. Then divide the result by scaling factor between five and ten [10]. Pattern Recognition Part This part of scheme will imported the scanned image through pre-processing operations to recognition operation to be compared with the trained images in order to classify each character. pre-processed operation is to enhance and segment the characters to prepare them for recognition process. Image binarization (thresholding) is used for edge/boundary detection [11]. Fig.2 Input/output Data matrix Generation scheme IV. EXPERIMENTAL RESULTS The first step of programming is to generate DataMatrix file that be used later in the recognition process. The program will request font from operation system, then it will convert each character into an image, it will be used as an input data to BP-NN. There is an option to add some noise to characters' image, in order to enhance the recognized result by making ANN trained with non-uniform characters, this causes an increasing in the time of training. The second part is to initialized FFML with BP to adjust the weights. After training was completed, the weights were saved in DataMatrix file. Fig.3 shows the BP-ANN front page. 2015, IJCSMC All Rights Reserved 309

Fig.3 BP-ANN Training Dialog. After completing the training part, recognition part then started. First document image either was loaded from image files or through an optical device, and load training weights to be used in recognition. Fig.4 shows two samples of images has been taken to compare results: white background and graduate lighting images. (d) Fig.4 Binarazation of document images for Character image with white background the binaraized image (c) Character image with graduate lights background (d) the binaraized image 2015, IJCSMC All Rights Reserved 310

From Fig.5 it shows the Comparison between two different background documents the first image that is character image with white background and second are character image with graduate lights background. As appeared there are no noisy background from first image Fig.6-b because of no graduate lighting which makes recognized process with good accuracy, the problem appears with colour and graduate lights background images it makes classification process difficult due to noisy background as in Fig.6-d. The recognition result for both is shown in Fig.7 Fig.5 Recognition of characters for Character image with white background and Character image with graduate lights background Fig.6 Applying pre-processing algorithms on graduate lights background A sample of character images with white background has been used in test. As shown in Fig.7 the background is graduate in lighting duo to scanning process, also it has mirrored characters, this makes more noise and unwanted features that makes recognition process has many errors. Fig.7 Scanned Character image Original scanned Image binarazation of image 2015, IJCSMC All Rights Reserved 311

When used ANN to recognized the image, it obviously that recognition will field to identify the actual characters and it will have many errors as shown in Fig.8. After used pre-processing enhancement (smoothing, adaptive threshold and noise filter), the recognition process has been removed noisy background and recognized just characters with good accuracy. Table-1 shows the results of recognition for all the previous tested images that either use pre-processing or without using it. Fig.8 Recognition results Recognition by ANN without used pre-processing Recognition by ANN with used pre-processing Table 1:Recognition Results for trained document images 1 2 3 Type of Images Computerized Clear White background Computerized Graduate lights background Scanned image with complex background Number of characters Recognition Result No Pre-processing With Pre-processing Detect Failed Eff. % Detect Failed Eff. % 36 35 1 97.22 36 0 100 71 64 7 90.14 71 0 100 396 334 773 43.20 334 62 84.34 v. Conclusion Using BP-ANN in OCR gives good results. However, the OCR affects by many factors like noise, brightness, coloured background. So pre-processing is necessary to be used in documented images as an initial step for character recognition systems to remove the effects of these factors. Each image requires different pre-processing techniques depending on the effect of the factor that may affect the quality of it. ACKNOWLEDGEMENTS Our thanks to Referee who read our paper and to the experts who contributed towards development of the template, also our thanks to the Editorial Support Team, International Journal of Computer Science and Mobile Computing. 2015, IJCSMC All Rights Reserved 312

References [1] Sandeep S., Nabarag P., Sayam K. D. and Sandip K., Optical Character Recognition using 40-point Feature Extraction and Artificial Neural Network, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 4, ISSN: 2277 128X, pp. 495-502, October 2012. [2] Vivek S. and Navdeep S., Optical Character Recognition using Artificial Neural Networks, Signal & Image Processing: An International Journal (SIPIJ) Vol.3, No.5, October 2012. [3] Rakesh B., Artificial Neural Network Based Optical Character Recognition, BLB-International Journal of Science & Technology, Vol.1 No. 2, pp. 143-152, ISSN 0976-3074, 2010. [4] Sameeksha B., Optical Character Recognition Using Artificial Neural Network, International Journal of Advanced Research in Computer Engineering & Technology Vol.1, Issue 4, June 2012. [5] Marinai S., Gori M. and Soda G., Artificial neural networks for document analysis and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.27, no.1, pp. 23-35, ISSN: 0162-8828, Jan. 2005. [6] Ivan D., Machine Learning Methods for Optical Character Recognition, Chair of Computer Science, Department of Mathematics and Informatics, Novi Sad, Serbia, December 2005. [7] Akinwonmi A. E., Adewale O.S., Alese B.K., and Adetunmbi O.S., Design of a Neural Network Based Optical CharacterRecognition System for Musical Notes, the Pacific Journal of Science and Technology, Vol.9. no.1, 2008. [8] Raghuraj S., Yadav C. S., Prabhat V., and Vibhash Y., Optical Character Recognition for Printed Devnagari Script Using Artificial Neural Network, International Journal of Computer Science & Communication Vol.1, no.1, pp. 91-95, 2010. [9] Vijay L. S. and Babita K., Offline Handwritten Character Recognition Techniques using Neural Network, AReview International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064 Vol.2 Issue1, January 2013. [10] Jon M [12] Dike U. I and Adoghe U. A., Back-Propagation Artificial Neural Network Techniques for Optical Character Recognition A Survey, International Journal of Computers and Distributed Systems, Vol. No.3, Issue 2, ISSN: 2278-5183, Jun-July 2013. [11] Khorsheed O. K., Produce Low-Pass and High-Pass Image Filter In Java, International Journal of Advances in Engineering & Technology IJAET, Vol.7, Issue 3, pp. 712-722, ISSN: 22311963, July 2014. 2015, IJCSMC All Rights Reserved 313