Machine Learning Classification Algorithms to Recognize Chart Types in Portable Document Format (PDF) Files

Similar documents
Forecasting the Direction and Strength of Stock Market Movement

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

What is Candidate Sampling

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

An interactive system for structure-based ASCII art creation

Lecture 2: Single Layer Perceptrons Kevin Swingler

A machine vision approach for detecting and inspecting circular parts

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Statistical Approach for Offline Handwritten Signature Verification

A DATA MINING APPLICATION IN A STUDENT DATABASE

An Interest-Oriented Network Evolution Mechanism for Online Communities

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Gender Classification for Real-Time Audience Analysis System

A cooperative connectionist IDS model to identify independent anomalous SNMP situations

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Simple Approach to Clustering in Excel

Forensic Handwritten Document Retrieval System

Fault tolerance in cloud technologies presented as a service

MACHINE VISION SYSTEM FOR SPECULAR SURFACE INSPECTION: USE OF SIMULATION PROCESS AS A TOOL FOR DESIGN AND OPTIMIZATION

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Biometric Signature Processing & Recognition Using Radial Basis Function Network

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

Calculating the high frequency transmission line parameters of power cables

Fast Fuzzy Clustering of Web Page Collections

An Alternative Way to Measure Private Equity Performance

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

L10: Linear discriminants analysis

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

A Multi-Camera System on PC-Cluster for Real-time 3-D Tracking

1. Measuring association using correlation and regression


"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

AUTHENTICATION OF OTTOMAN ART CALLIGRAPHERS

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Rank Based Clustering For Document Retrieval From Biomedical Databases

Automated information technology for ionosphere monitoring of low-orbit navigation satellite signals

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Eye Center Localization on a Facial Image Based on Multi-Block Local Binary Patterns

Single and multiple stage classifiers implementing logistic discrimination

Improved SVM in Cloud Computing Information Mining

Calculation of Sampling Weights

A Multi-mode Image Tracking System Based on Distributed Fusion

Automated Mobile ph Reader on a Camera Phone

The Greedy Method. Introduction. 0/1 Knapsack Problem

Vehicle Detection, Classification and Position Estimation based on Monocular Video Data during Night-time

SIMPLE LINEAR CORRELATION

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

DEFINING %COMPLETE IN MICROSOFT PROJECT

Human behaviour analysis and event recognition at a point of sale

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

A Suspect Vehicle Tracking System Based on Video

Performance Analysis and Coding Strategy of ECOC SVMs

Detecting Global Motion Patterns in Complex Videos

Web Object Indexing Using Domain Knowledge *

Support Vector Machines

Investigation of Normalization Techniques and Their Impact on a Recognition Rate in Handwritten Numeral Recognition

Vehicle Detection and Tracking in Video from Moving Airborne Platform

Machine Learning and Software Quality Prediction: As an Expert System

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

The OC Curve of Attribute Acceptance Plans

Damage detection in composite laminates using coin-tap method

Design of Output Codes for Fast Covering Learning using Basic Decomposition Techniques

IMPACT ANALYSIS OF A CELLULAR PHONE

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Development of an intelligent system for tool wear monitoring applying neural networks

Distributed Multi-Target Tracking In A Self-Configuring Camera Network

Offline Verification of Hand Written Signature using Adaptive Resonance Theory Net (Type-1)

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

Planning for Marketing Campaigns

Politecnico di Torino. Porto Institutional Repository

1 Example 1: Axis-aligned rectangles

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

How To Create An Emoton Recognzer

A Secure Password-Authenticated Key Agreement Using Smart Cards

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

A COLLABORATIVE TRADING MODEL BY SUPPORT VECTOR REGRESSION AND TS FUZZY RULE FOR DAILY STOCK TURNING POINTS DETECTION

Using an Adaptive Fuzzy Logic System to Optimise Knowledge Discovery in Proteomics

Web Spam Detection Using Machine Learning in Specific Domain Features

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

VEHICLE DETECTION BY USING REAR PARTS AND TRACKING SYSTEM

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

PERFORMANCE COMPARISON OF INTRUSION DETECTION SYSTEM USING VARIOUS TECHNIQUES A REVIEW

Transcription:

Internatonal Journal of Computer Applcatons (0975 8887) Volume 39 o. February 01 Machne Learnng Classfcaton Algorthms to Recognze Chart Types n Portable Document Format (PDF) Fles V. Karthkeyan Department of Computer Scence Government Arts College Salem-8. Tamladu. Inda S. agaraan Department of Computer Scence K.S.R College of Arts and Scence Truchengode amakkal (Dst)-17 Taml adu Inda ABSTRACT Chart recognton system from PDF fles s a relatvely young research feld where technques and algorthms are proposed to dentfy type of charts and nterpret them. Ths paper focus on recognton of chart type that s a part of PDF document usng texture features and classfcaton algorthm. Eleven types of texture features and three classfers namely Multlayer perceptron support vector machne and K nearest neghbour are used. Performance analyss of the proposed chart type recognton systems show that texture features for chart type recognton has promsng future and produces best result whle usng K and SVM algorthm. Keywords Chart Classfcaton Texture Feature eural etwork. Support Vector Machne K earest eghbour Classfer. 1. ITRODUCTIO Portable Document Format commonly referred to as PDF s open standard for document exchange created by Adobe systems n 1993. A typcal PDF fle encapsulates many obects whch contan text (n dfferent fonts and sze) graphcs tables fgures and other nformaton needed to dsplay the content of a document. Usage of PDF fles offers two man advantages. The frst advantage s that t preserves the layout and desgn of the document as determned by the author and the second reason s that t s entrely self-contaned that s all nformaton such as the varous fonts needed to dsplay the fles s ntegrated nsde the format tself. Moreover a PDF fle has the advantage of representng documents n an applcaton operatng system software and hardware ndependent format. These advantages have made PDF the most used format and s now consdered as a unversal document format. Document analyss s a feld of research whch dscovers knowledge from a scanned document mage. Owng to the wde usage of PDF fles by common people researchers and ndustres document analyss s also been extended to PDF fles. As a consequence the need for converson tools that can extract text tables fgures and graphs from PDF fle s also growng. The need for converson tools has arsen because many devces lke embedded devces cannot handle PDF formats and onlne users and often have dffculty n readng mult-column documents. Several researchers have focused on text knowledge extracton (data mnng) of PDF and mage documents ([10] [13]). Ths feld s termed as text mnng and many organzatons nternatonally have already realzed the potentalty of text mnng. The process of text mnng extract useful busness knowledge from the unstructured documents by frst convertng them nto structured text and then use data mnng technques lke clusterng and classfcaton on them to derve valuable nsghts. The accuracy of these converters depends on the effcency of the segmentaton algorthms that can separate dfferent obects n a PDF. However on the other hand only a few studes have devoted to extract mages from PDF/mage documents as t s more complcated and challengng. The dffculty arses because the graph obects consst of several small components whch have features that are smlar to text [].Identfcaton of graphs n a PDF fle s composed of three steps. The frst step s to locate the chart obect the second step s to extract the graph obect and the thrd step s to dentfy the type of chart. The frst two steps are dealt n [11]. Ths paper focus on the thrd step that s to dentfy the type of graph located from the PDF fle usng machne learnng classfcaton algorthms. Graph or Chart classfcaton s an area n mage processng where the prmary goal s to separate a set of chart mages accordng to ther vsual content nto one of a number of predefned categores. Eght types of charts are consdered namely D 3D bar chart D 3D pe chart D 3D doughnut chart Lne chart and mxed chart. The present work analyzes the applcablty of three classfers namely Mult Layer Perceptron (MLP) Support Vector Machne (SVM) and K earest eghbour Classfers for the recognton of eght chart types. The vsual content of the graph mage s dentfed usng feature extracton step where texture features that best represents the graph mage are extracted and stored as feature vector. These feature vectors are then used to tran and test the selected classfers. The rest of the paper s organzed as follows. Secton provdes a bref dscusson on some of the prevous work n the related area Secton 3 presents the proposed methodology and Secton 4 presents the results of expermentaton. The study s concluded wth future research deas n Secton 5.. PREVIOUS STUDIES Chart Recognton s an area of research work that has ganed attenton only n the past few decades. From the lterature revew t was found that studes related to scentfc chart recognton s mnmal even though t has been studed as early from 1990. Durng 199 mnng of fgure nformaton from x-y data graphs and gene dagrams was proposed by [6]. Later [19] presented a schema-based model that extracts bar-charts usng horzontal and vertcal layout proecton and relatonshp nformaton. Zhou and Tan [0] analyzed the usage of Hough transform wth Hdden Markov Model for recognzng bar charts n document mages. Other segmentaton technques lke 1

Internatonal Journal of Computer Applcatons (0975 8887) Volume 39 o. February 01 Hough Transformaton [3] curvature estmaton [15] and vector-based technques [5] are also used for lne graph recognton. It s well known fact that the usage of Hough transformaton s computaton expensve and do not work well wth all types of charts. To solve ths problem a raster-tovector converson algorthm was used to dentfy three types of charts namely D bar chart D pe chart and D lne chart [18]. Futrelle et al. [7] and [16] proposed a scheme for recognzng and classfyng vector format graphcs n PDF documents usng technques lke spatal analyss and classfed graphs nto fve categores namely lne bar curve tree and other charts. Another method s based on pattern dscovery algorthms that fnd local structures appearng frequently ([1] [9]) and these structures are used as features. The pattern-dscovery-based method has an advantage n that t can make use of unlabelled data. Yet another approach s to use kernel methods such as Support Vector Machnes (SVMs). From the lterature study t was understood that exstng algorthms have two man drawbacks. Frst most of the methods are desgned for a specfc chart type only. Moreover the exstng technques assume the avalablty of predefned structural models and constrants of all types of charts. To solve these problems classfers that use texture features are used to recognze the chart type. The methodology used s dscussed n the followng secton. 3. METHODOLOGY Chart classfcaton system nvolves the processng of two man tasks feature extracton (extracts mage features and forms a feature vectors) and classfcaton (uses the extracted features to dscrmnate the classes). Both these processes take place after locatng the chart mage n the PDF document. Feature extracton task s used to dentfy a set of texture features from the located graph mages. It s a well-known fact that when small portons of a bgger unt are ndependently processed texture features provde a better descrpton of the selected regon [8]. It captures the spatal varatons n ntenstes of an mage whch form certan repeated pattern. These features are extracted for all chart mages n a PDF database. The three classfers analyzed n ths paper are MLP SVM and K whch are used to perform a mult-class classfcaton of chart mages. Each of these steps are explaned below. 3.1. Features Extracted The GLCM (Gray Level Co-Occurrence Matrx) features were used as texture features n ths study. The selected features are area medan mnmum and maxmum ntensty contrast homogenety energy entropy mean varance standard devaton and correlaton. A bref explanaton of each of these features s gven n ths secton. Snce ts nventon the GLCM has played vtal role n many texture based mage analyss applcatons ([14] [1]). The GLCM uses co-occurrence matrx to extract texture features of an mage usng statstcal equatons. A cooccurrence matrx s a matrx or dstrbuton that s calculated from the dstrbuton of co-occurrng values of an mage at a gven offset. Features generated usng ths technque s usually called Haralck features named after ts founder. Area of an mage n square pxels s calculated by multplyng number of rows and number of columns of the mage. The Mnmum maxmum ntensty and medan values are calculated by consderng all the pxels n the mage. Equaton (1) s used to calculate the contrast of an mage. 1 0 contrast P (1) In ths equaton three condtons arses. The frst s when and values are equal ndcatng that the pxels are n dagonal postn and ts neghbours are smlar and (-) = 0. The second condton s when (-) s between 0 and 1. Ths ndcates a small contrast dfference between the pxels and weght value of 1 s used. The thrd condton s when the dfference between and s. Ths ndcates that the contrast s ncreasng n whch case the weght s assgned a value 4. Thus the weghts contnue to ncrease exponentally as (-) ncreases. The Homogenty feature s calculated as 1 P Homogenty () 0 1 When the contrast n a mage wndow s low energy s best calculated usng a measure called Homogenety. The energy of an mage s calculated as descrbed below. To calculate energy (also called unformty) frst the Angular Second Moment (ASM) s to be calculated. Both ASM and Energy use each P as a weght for tself. 1 ASM P (3) 0 Energy s now calculated as the square root of the ASM (Equaton 4) and the entropy s calculated usng the formula gven n Equaton 5. Energy ASM (4) 1 P 0 1 np Entropy (5) The GLCM mean varance and standard devaton for the horzontal and vertcal drectons are calculated as below. 1 1 P 0 P 0 1 0 P P Mean (6) var ance 1 0 S tan darddevaton (8) (7) The Correlaton feature s calculated usng Equaton (9)/ Correlaton 1 P 0 The features thus extracted are stored as usng a -dmensonal matrx vector data structure havng 13 columns and n rows where n s the number of mages n the dataset. The frst 1 columns are used to store the features whle the last one s used to ndcate the target (label) of the chart type. The structure used s gven below: (9)

Internatonal Journal of Computer Applcatons (0975 8887) Volume 39 o. February 01 Struct FeatureVector { float feature1; float feature; float feature3; float feature4; float feature5; float feature6; float feature7; float feature8; float feature9; float feature10; float feature11; float feature1; nt target; } 3.. Classfers As mentoned earler three classfers are used to perform a mult-class classfcaton durng chart recognton process. The workng of the three classfers s dscussed n ths secton. 3..1 SVM Classfcaton SVM s a classfcaton algorthm based on optmzaton theory and ntally developed by [4]. Here an obect s vewed as an n-dmensonal vector and t separates such obects wth an n-1 dmensonal hyperplane. Ths s called a lnear classfer. There are many hyperplanes to classfy data and ths paper s also emphaszed on fndng out the possblty of maxmum margn between the two data sets. (Fgure 1) The fgure shows three Hyperplanes n -dmentonal space. H3 does not separate the two classes; H1 does wth a small margn and H wth the maxmum margn. 3..3 K Classfcaton The K-earest eghbour(k) machne learnng algorthm s the most frequently used algorthm n many applcatons. Ths algorthm uses dstance measures durng classfcaton and assgns an data obect to a category whch s closest to the data beng examned. When K s 1 the K algorthms works lke nearest neghbour algorthm. In general scenaro the Eucldean dstance measure s used to calculate the dstance between two data ponts and s gven n Equaton (1). d p q (1) 1 p q where d s dstance and p (or q ) s the coordnate of p (or q) n dmenson 3.3. Chart Classfcaton System The schematc block dagram of scentfc chart mage recognton system conssts of varous stages as shown n Fgure. PDF Document Database Locate chart mage Input PDF Locate Chart Create of Feature Vector Extract Features Tranng set Fgure 1: Example of SVM 3.. MLP Classfcaton The MLP neural network has feedforword archtecture wthn nput layer a hdden layer and an output layer. Mult-Layer Perceptron (MLP) wth a back propagaton learnng algorthms s chosen for the proposed system because of ts smplcty robustness and hgh computaton rates. It s assumed that that tranng dataset access of l pars (x y) where x s a vector contanng the pattern whle y s the class of the correspondng pattern. In our case an 8-class task y can be coded 1 to 8(for dentfyng eght dfferent chart) [17]. The MLP model conssts of an nput layer that accepts the nput neuron used n the classfcaton hdden layers and an output layer. A summaton of each neuron n the hdden layer by ts nput neurons x after multplyng the connecton weght w gves the output y as a Actvaton functon of the sum that s y f w x (10) where f s the sgmod or hyperbolc tangent transfer functon. Usng the Back propagaton tranng algorthm the weghts are mnmzed based on the squared dfferences between the actual and desred output values n the output neurons gven by d y E 1/ (11) Where y s the actual output of the neuron and d s the desred output of neuron. Learnng Algorthm Learnng Model Classfcaton Result Fgure : Proposed Chart Classfcaton Model The proposed chart classfcaton system thus consders the use of the three machne learnng algorthms to classfy the charts nto eght types. The nput data for a classfcaton task s a set of 11 texture features arranged as n row-wse fashon (records). Each record otherwse termed as nstance or example s descrbed by as (X y) where X s the attrbute set and y s a specal attrbute desgnated as the class label (also known as category or target attrbute). The classfcaton step s then defned as a task of learnng a target functon f that maps each attrbute set X to one of the predefned class labels y. The target functon s also known nformally as a classfcaton model and s useful for classfcaton purpose. The classfer then uses a systematc approach to buld the classfcaton learnng model from an nput data set usng a learnng algorthm. The man goal of the learnng algorthm s to dentfy a model that dentfes the best correlaton relatonshp among the feature sets and class categores of the nput data. Satsfyng ths goal provdes dual advantages. The frst s t makes sure that both the nput data 3

Accuracy (%) Internatonal Journal of Computer Applcatons (0975 8887) Volume 39 o. February 01 and learnng algorthm ft each other n an effcent manner and the second t to mprove the performance predcton whle suppled wth new records. The classfer s traned usng a data set (tranng set) that conssts of records wth target category provded. The test dataset consst of records wth no knowledge of the target category. The classfer uses the traned knowledge and performs the classfcaton. 4. EXPERIMETAL RESULTS Experments were carred out wth a dataset havng 155 mages belongng to all seven knds of charts (Table 1). All the mages are 56*56 RGB color mages. Experment were conducted usng a Pentum IV dual processor wth 51MB RAM. Zhou and Tan [1] used feed forward backpropagaton neural network for chart type recognton. Ths model referred to as Zhou Model used model based matchng algorthm for chart recognton. The performance of the classfers proposed n ths paper s compared wth Zhou model. Table 1: Detals on Dataset Chart o of Chart o of Type Charts Type Charts DBar chart 40 Doughnut D 7 3DBar chart 16 Doughnut 3D 11 D Pe chart 13 Lne 35 3D Pe Chart 0 Mxed Chart 13 The performance of system s analyzed based on error rate classfcaton accuracy and speed of classfcaton. Durng experments a 10-fold cross-valdaton method s used. The average results were taken as the fnal outcome. As a preprocessng step all the mage features were calculated pror to classfcaton and was converted to a feature vector whch was gven as nput to the classfers. The formula for calculatng error rate s gven below o. of ncorrectly predcton Error Rate x100 Tranng Sze The accuracy of the classfers s calculated as 1 Error Rate. An effectve classfer should reduce the error rate whle ncreasng the accuracy. The tme taken by the classfers to classfy an nput chart mage nto any one of the selected seven chart types s taken as the speed of classfer. 4.1. Error Rate Table shows the error rates obtaned by the selected classfers usng the 11 derved texture features. Table : Error Rate Classfer Error Rate (%) MLP 0.30 K- 0. SVM 0.3 Zhou 0.19 One of the prmary ams of automatc chart recognton systems s to acheve low error rates. Wth regard to ths t could be seen from the results that K- classfer produces the lowest error rate followed by SVM and then MLP. Whle consderng the effcency gan obtaned wth respect to error rate the K classfer produced 6.67% whle t was 4.35% whle comparng K and SVM. 4.. Classfcaton Accuracy The next performance metrc used to evaluate the proposed classfcaton models s accuracy. Fgure 3 shows the results obtaned by all the proposed chart classfer systems. 80 70 60 50 40 30 0 10 0 MLP K- SVM Zhou Classfers Fgure 3: Classfcaton Accuracy The results wth regard to classfcaton accuracy agan prove that the K classfer produces mproved classfcaton results when compared wth MLP and SVM whle usng texture features. The accuracy obtaned by K classfer (78.06%) wll make a great mpact whle usng n a chart recognton system when compared to MLP (69.68%) and SVM (76.77%). 4.3. Classfcaton Speed The classfcaton tme of a model s calculated as the sum of tranng and testng tme. The results obtaned wth respect to classfcaton tme are shown n Table 3. Table 3 : Classfcaton Speed (Seconds) Classfer Tme Taken MLP 8.38 K- 0.6 SVM 0.31 Zhou 0.5 In par wth the prevous results the executon tme of the K classfer base system s lower when compared to MLP and SVM. Moreover the expermental results further prove that the usage of MLP K and SVM algorthms showed sgnfcant mprovement when compared wth the base model (Zhou Model). Thus from the varous results t can be understood that K classfers usng texture features produce best PDF chart classfcaton results. 5. COCLUSIO Research on chart recognton s relatvely young feld and ths paper analyzes the use of texture features wth three frequently used classfers. Whle all the three classfers produce hgh accuracy and low error rate the performance of K classfer shows promsng results. In future more features wth respect to shape and text are to be consdered and methods for ensemble classfcaton n chart classfcaton are also to be probed. 6. REFERECES [1] Caylak E. (010) The studes about phonologcal defct theory n chldren wth developmental dyslexa Revew. Am. J. eurosc. Vol. 1 Pp. 1-1. [] Chowdhury S.P. Mandal S. Das A.K. and Chanda B. (007) Segmentaton of Text and Graphcs from Document Images nth Internatonal Conference on 4

Internatonal Journal of Computer Applcatons (0975 8887) Volume 39 o. February 01 Document Analyss and Recognton ICDAR 007 Pp. 619-63. [3] Conker R.S. (1988) Dual Plane Varaton of the Hough Transform for Detectng on-concentrc Crcles of Dfferent Rad CVGIP Vol. 43 Pp 115-13. [4] Cortes C. and Vapnk V. (1995) Support Vector etworks Machne Learnng Vol. 0 Pp. 73-97. [5] Dor D. (1995) Vector-Based Arc Segmentaton n the Machne Drawng Understandng System Envronment IEEE Transactons on PAMI Vol. 17 o. 11 Pp 1057-1068 1995. [6] Futrelle R.P. Kakadars I.A. Alexander J. Carrero C.M. kolaks. and Futrelle J.M. (199) Understandng dagrams n techncal documents IEEE Computer Vol. 5 Issue 7 Pp. 75-78. [7] Futrelle R.P. Shao M. Ceslk C. and Grmes A.E. (003) Extracton layout analyss and classfcaton of dagrams n PDF documents Intl. Conf. Document Analyss & Recognton. Ednburgh Scotland Pp. 1007-1014. [8] Haralc R.M. Shanmugam K. and Dnsten I. (1973) Textural features for mage classfcaton IEEE Transactons on Systems Man and Cybernetcs Vol. SMC-3 o. 6 Pp. 610-61. [9] Inokuch A. Washo T. and Motoda H. (000) An Apror-based algorthm for mnng frequent substructures from graph data Proceedngs. of the 4th PKDD Pp.13 3. [10] Islam R. Saha R.S. and Hossan A.R. (009) Automatc Readng from Bangla PDF Document Usng Rule Based Concatenatve Synthess Internatonal Conference on Sgnal Processng Systems IEEE Computer Socety Pp. 51-55. [11] Karthkeyan V. and agaraan S. (011) Scentfc Chart Image Property Identfcaton usng Connected Component Labelng n PDF document 3 rd Internatonal Conference on Electroncs Computer Technology Kanyakumar Inda Vol.4 Pp.09-1. [1] Kramer S. and Raedt L.D. (001) Feature constructon wth verson spaces for bochemcal applcaton. Proceedngs of the 18 th ICML Conference [13] Martnez-Alvarez R.P. Costas-Rodrguez S. Gonzalez- Castao F.J. and Gl-Castera F. (010) Automated Document Converson System for Smple Multmeda Platforms 7th IEEE Consumer Communcatons and etworkng Conference (CCC) Pp. 1-. [14] Omama.A. (010) Improvng the performance of backpropagaton neural network algorthm for mage compresson/decompresson system J. Comput. Sc. Vol. 6 Pp. 1347-1354. [15] Rosn P.L. and West G. A. (1989) Segmentaton of Edges nto Lnes and Arcs Image and Vson Computng Vol. 7 o. Pp 109-114. [16] Shao M. and Futrelle R.P. (006) Recognton and Classfcaton of Fgures n PDF Documents W. Lu and J. Lladós (Eds.): Selected papers from Workshop on Graphcs Recognton GREC 005 LCS 396 Sprnger Pp. 31-4. [17] Smach F. Atr. M. Mteran J. and Abd M. (005) Desgn of a eural etworks Classfer for Face Detecton World Academy of Scence Engneerng and Technology Vol. 11 Pp. 13-17. [18] Song J. Su F. Chen J. Ta C. L. and Ca S. (000) Lne net global vectorzaton: an algorthm and ts performance analyss IEEE Conference on Computer Vson and Pattern Recognton South Carolna Pp. 383-388. [19] Yokokura. and Watanabe T. (1997) Layout-Based Approach for extractng constructve elements of barcharts GREC'97 Pp. 163-174. 1997 [0] Zhou Y. and Tan C.L. (001a) Hough-based Model for Recognzng Bar Charts n Document Images SPIE conference on Document mage and retreval Vol. 4307 Pp. 333-340. [1] Zhou Y. and Tan C.L. (001b) Learnng-based scentfc chart recognton 4th Internatonal Workshop on Graphcs Recognton GREC001 Pp. 48-49. 5