STRETCH : A System for Document Storage and Retrieval by Content
|
|
- Kristian Singleton
- 8 years ago
- Views:
Transcription
1 STRETCH : A System for Document Storage and Retrieval by Content E. Appiani, L. Boato, S. Bruzzo, A.M. Colla, M. Davite and D. Sciarra RES Department Elsag spa Via G. Puccini, Genova (Italy) {enrico.appiani,luisa.boato,sandra.bruzzo,annamaria.colla,marco.davite,donatella.sciarra}@elsag.it Abstract In this paper a system for storing and retrieving imaged multimedia documents by content is described. This system is being developed within the Esprit project STRETCH (STorage and RETrieval by Content of imaged documents). The core of STRETCH system is a powerful Archiving and Retrieval Engine, based on a structured document representation and capable of activating appropriate methods to characterise and automatically index heterogeneous documents with variable layout and subsequently retrieve them by answering to complex queries. The produced document base, or Docu-base, relies on an object-oriented internal representation and related characterisation and search methods. A prototype was implemented and successfully tested, in particular, in the creation of an invoice archive. 1. Introduction STRETCH (STorage and RETrieval by Content of imaged documents), ESPRIT Project n , aims at developing a system for storing and retrieving imaged multimedia documents based on their content. STRETCH addresses the Archive Reference System (ARS) market, concerning heterogeneous applications where mass documentary databases are involved. Nowadays, pushed by both new technology developments and the increased need of augmenting the information diffusion and communication efficiency for enterprises, specialised users communities, and the public, there is an ever increasing demand of tools to automatically convert information hold on paper into digital information ( zeropaper option ). The objective of STRETCH is twofold. First, STRETCH aims at combining direct digitalisation, mostly based on location of information fields and OCR, with image indexing open to multimedia, by applying advanced techniques derived from Image Analysis and Pattern Recognition. STRETCH aims at developing a common Archiving and Retrieval shell based on a structured document representation and capable of activating appropriate functions to characterise and subsequently retrieve multimedia documents on users demand. To make such a system effective, the bottleneck of document profiling must be avoided, in particular by overcoming the existing limitations of pre-defined indexing schemes. Second, STRETCH must overcome the main limitations of current ARS systems, offering, in particular, ease of use and programming and ability to dynamically adapt to generic multimedia documents. STRETCH goal to realize a document archive according to the user's view requires the integration among innovative modeling techniques and well established automatic indexing techniques to enable content-based retrieval. The core technology employed is document processing in terms of image enhancement, layout analysis, field location, logo location and recognition, tag identification, Intelligent Character Recognition (ICR). The structure of each document is derived, the document is classified according to user specification, information contents are extracted from the relevant fields. The suitable document representation to support this complex processing has been designed accordingly, along with the corresponding database representation. In such a situation, STRETCH may be regarded as a document meta-engine [1] introducing document logic into existing databases focused on document fields. The document logic, based on STRETCH Document Internal Representation (DIR), is employed for document analysis and classification, information extraction, indexing and retrieval. STRETCH is being tested on three different environments, that are conspicuous examples of the application fields addressed, namely an account payable archive (invoices and related documents: bills of entry, transport documents, and so on), a document archive for the Public Administration (circular letters, statistical reports, and so on), and a medical image archive (miocardial SPECT maps, thorax radiographs). STRETCH started on December 1st, At the time of writing, STRETCH has successfully completed the user requirement capture phase and has consolidated
2 the technical specifications of the data model and of the overall system architecture. The system detailed design and implementation, based on object oriented (OO) approach and incremental prototype production, has been progressively revised and refined during the development stage, following an iterative assessment of the level of users satisfaction. A demonstrative system, endowed with most of the foreseen functionality, has been produced. This paper schematically describes STRETCH architecture, functionality and achievements to date. In the following we will briefly present the system architecture and components (Section 2), the data model (Section 3), and some preliminary experiments in the invoice domain (Section 4). Finally, Section 5 presents some conclusions. 2. Architecture and main components From an architectural point of view, the project relies on a client/server solution, networked through corporate Intranets and (possibly) Internet. The main achievement consists in the development of a powerful Archiving and Retrieval Engine (ARE), based on a general document representation and capable of activating appropriate methods to characterise and retrieve imaged documents. The user interface is constituted by a portable thin client. STRETCH architecture consists of: Client layer: a portable, user-friendly and intuitive Graphical User Interface (GUI); Server layer: a scalable Archiving and Retrieval Engine (ARE); Database layer: a structured document database, or Docu-base. Figure 1 summarizes the main functional components in the Client, Server and Docu-base layers. A standard Corba middleware links all the components in the Client and Server layers, ensuring interoperability among heterogeneous platforms. This choice allows to bring directly in any Corba compliant commercial component, while a Corba wrapping may be implemented for components which are not Corba compliant. The STRETCH Client, also called Docu-client, consists of the STRETCH GUI with related tools. Any other External Application client (for instance, a GUI of external applications extended with STRETCH access) can employ the STRETCH server components through their Corba interfaces. The GUI provides all the ARE services in a friendly way through extensive use of windows, pop-up menus, buttons, icons and thumbnails, as well as context sensitive menus and help windows. In particular, the GUI is used to manage and monitor user sessions and profiles; to allow the user to define new applications and document classes (Maintenance and Definition Tool); to acquire new documents directly by a scanner or other acquisition sources; to visualize archived or retrieved documents; and to inquire the archive. The STRETCH Server, also called Docu-server, is centered on the STRETCH Archiving and Retrieval Engine (ARE), using the Document Internal Representation (DIR). Due to the high-performance requirements and related scalability of most functions to be supported, the ARE can rely on distributed configurations including when necessary parallel machines. The ARE may also interoperate with specialized External Archiving/Retrieval Engines and External Applications, integrated as Corba components. The STRETCH Database, also called Docu-base, includes the main DBMS storing the DIR instances. The latter may contain references to External Databases, if required by application constraints, or to external specialized Archiving/Retrieval Engines. For example, the Docu-base can archive the document description while the external databases store the document images or specific fields. The STRETCH Maintenance and Definition Tool (MDT) consists of both Client and Server components, in charge of application definition, user profile management and system configuration. STRETCH is an open system, due to the standard middleware and the interoperability with external components and applications. Examples of support or external components can be for instance a text retrieval engine or an ERP system. Some basic implementation choices are Java for implementing Client objects and for top-level Server session objects; C++ for Server internal service objects (DIR Objects) and the ARE Manager, in particular for code efficiency; and the adoption of an OODB schema based on the DIR and User Profile classes, permanent objects in the system. DIR methods relevant for retrieval are defined both in the server objects and in the OODB. 3. The data model Any document can be described with respect to two different aspects: the physical and the logical one. The physical structure of a document, also called layout structure, is the collection of the extracted objects, obtained by the repeated partition of the document content into increasingly smaller parts (basic objects), on the basis of the layout appearance. An object of the layout structure is also called physical object. Similarly, the logical structure of a document is the collection of the extracted objects, obtained by the repeated division of the document content into increasingly smaller parts, on the basis of the human perceptible meaning of the content. An object of the logical structure is called logical object.
3 A domain of documents can be defined as a group of documents which can be clustered with respect to their subject or use according to users view: for example journals, tax forms, business letters, invoices, check forms can be regarded as different domains. Since the documents of a domain share the same main subject and are used for similar or related functions, they are characterized by some logical and physical similarities. Some logical objects are common to all the documents of the domain. For example, in an invoice the logical object Total, which contains the amount due to the issuer, is always present. Similarly, a logical object in different types of documents of the same domain, usually keeps some physical features related to its position within the document. Documents belonging to a given domain can be further characterized by different layout or logical structures. Thus documents which feature physical or contextual similarities can be clustered into classes. The internal data structure to represent documents, the so-called Document Internal Representation (DIR), is defined in terms of class objects, with their relevant information and methods to process such information. The DIR objects describe a document according to physical and logical viewpoints. For most applications among those considered by STRETCH, the DIR data structure is based on Modified X-Y trees (Sect. 4.1). The DIR represents the core system data since the Archiving Retrieval Engine uses it during archiving (DIR generation) and retrieval (access and feature matching). The domain representation is based on a similar approach: domain objects are template structures and fields whose methods are the strategies to process new documents and extract information from them. To implement domain knowledge, relevant to specific documents and types of physical structures, we make use of a template structure named correlation graph (Sect. 4.2). The correlation graph makes it possible the implementation of information extraction strategies based on advanced image recognition and reading technologies Modified X-Y tree The Modified X-Y tree (M-X-Y tree) [2] is derived from the X-Y tree [3,4], a well-known data-driven method for page layout analysis. The M-X-Y tree is well suited to the physical representation of documents with complex layout. The basic assumption behind this approach is the fact that structured elements of the page (columns, paragraphs, titles, figures, lines of text, printed symbols) are generally laid out in rectangular blocks, which can almost always be divided into groups in such a way that blocks that are adjacent to one another within a group have one dimension in common [3]. The method consists in using thresholded projection profiles (i.e. the histogram of the number of black pixels along parallel lines through the document) in order to split the document into successively smaller rectangular blocks [4]. Depending on the direction of lines, we can have horizontal or vertical projection profiles. Thresholded projection profiles are obtained by comparing the values of a projection profile with a given threshold. The blocks are split by alternately making horizontal and vertical cuts along either white spaces, found by using the thresholded projection profile, or horizontal or vertical ruler lines. The result of such segmentation can be represented as a tree, where the root is for the whole page, the leaves are for blocks of the page, whereas each level alternatively represents the results of horizontal (X-cut) or vertical (Y-cut) segmentation. In order to maintain consistency in the data representation, the ruler lines, although used as separators, are also stored as leaves in the M-X-Y tree. The tree structure is enriched with descriptions of inter-leaves relationships. Adjacency links among leaves of the tree can be seen as an adjacency graph, where nodes of the graph correspond to leaves of the tree. An adjacency graph [5] describes the structure of a document by giving the position of nearest objects in the horizontal and vertical directions (above, below, left, right relations) Correlation graph The correlation graph is a template structure used to implement domain knowledge, possibly automatically extracted from document samples [6]. This representation is suited for variable layout documents with some spatial structure, or for documents whose semantics can be recognized looking for textual tags in the image. It can be applied to either a full document, or a part of it. The correlation graph describes a document understanding strategy which uses both the predefined template elements (implementing field reading inside a search area in the image, for each field type), and search area computations for fields to be read based on the position of other already found fields. Meaningful fields are of three main types: (i) fields to be read as ASCII strings by the recognition strategy; (ii) textual or geometric tags used by the recognition strategy to understand the document structure; and (iii) image fields that can be recognized by suitable methods (i.e. logos). 4. The invoice application 4.1. Passive invoice management In the scenario of passive invoice management, STRETCH aims at providing, on one hand, data entry automation for VAT recording purposes, interfacing an
4 ERP system, and on the other hand the invoice acquisition, archiving and retrieval capabilities that make the electronic copy immediately available to all the authorized users. In STRETCH environment new invoices can be grouped into batches to be scanned. The acquisition process produces the electronic copy of invoices, in a suitable format that can vary from binary up to colour images depending on users requirements. New invoices can be automatically input to the information extraction procedure, mainly consisting of document classification and automatic ICR reading. It is mandatory for the extracted data, which are to be used for indexing and as input to the VAT recording procedure, to be error free, so a supervision phase before archiving and VAT registration is advisable. The electronic copy of invoices is then archived in the docu-base with the previously extracted archiving indexes. The ICR recognition results also provide automatic data entry to the VAT registration procedure, usually part of the ERP system. The retrieval function is reserved to the authorized users, and makes all archived documents available for immediate consulting with the advantage of eliminating circulation of paper copies. Content-based retrieval allows to find out invoices by means of any partial information. The prototype is centered on INFORMATION EXTRACTION (document classification and reading) and ARCHIVING processes. The SUPERVISION procedure simply consists in presenting the invoice image together with the recognized fields. The user can correct any information, then confirm the data, that will not be modified any more after archiving. The ARCHIVING process directly demonstrates how the recognition results map into STRETCH internal knowledge representation structure, at the moment stored in a relational database. The RETRIEVAL functions are based on the presentation of a form for Query Definition. SQL queries are allowed on the values of known fields, with standard AND-OR expressions Information extraction Three document processing steps are activated in order to extract information for indexing and for the VAT registration procedure (see Figure 2): first the M-X-Y tree generation produces the M-X-Y tree representation of the invoice (see Sect. 4.1); then the classification procedure based on the M-X-Y tree produces the document classification, i.e. the supplier identification; last, the reading strategy [6] for that supplier is applied, which is based on ICR techniques including field finding, neural character reading [7], tag finder and logo recognition [8]. The ICR reader locates and reads the information written on invoices issued by a given supplier. If the supplier identification fails, a general reading strategy can be applied. The ICR result is a set of text strings used as indexes by the archiving procedure and a set of data used as input by the VAT recording procedure. For each information field a basic type is assigned that defines how that field value is interpreted during retrieval. For example a date field older than a certain threshold can be searched, as well as a string field similar to a certain word. A set of tags that have a significant spatial relation with information fields is internally employed by the reading strategy: Date, Invoice Number, Total, VAT,. A set of the most relevant fields from user requirements was selected for the prototype. This set consists of: Supplier (string): the supplier name inherited from the MXY-based classifier, used both as an archiving index and for VAT registration; the supplier logo is located as an accessory information; Date (date): date of issue, used both as an index and for VAT registration; Invoice number (string): used as an index and for VAT registration; Total (integer): the total amount of the invoice, used for VAT registration; IVA (integer): the total amount of Italian VAT tax Preliminary experimental results The invoice documents used as a test set for the demo system were 250 real passive invoices of a company of the Finmeccanica Group. They show different layouts, various styles and many different fonts and font sizes. All the invoices show a company logo, usually in one-to-one correspondence with the supplier, but those issued by one supplier have neither a fixed layout, nor a unique standard writing style. All the documents in the test set are composed of a single page. The acquisition produced binary (black and white) images, with 300 DPI x 300 DPI resolution. No specific filtering or enhancement was applied to the images. The information extraction stage performance was: the M-X-Y tree-based classification achieved 97.8% correct classification in top position; fields were correctly located in 98.4% cases; automatic reading of the field values produced a total of 31 misclassification errors (96.9% correct on fields, 100% on tags); problems were mainly encountered with very noisy images, dot matrix and italic fonts. 5. Conclusions This paper has presented a short architectural and functional description of the STRETCH system, along with the current achievements. The demonstrative system implemented for automated invoice processing has been
5 briefly described and some experimental results presented. The system relies on an open three-tiered architecture, with the capability to interoperate with external applications, engines and databases. Such openness takes into account that advanced technology is nowadays available for document processing. What is expected from STRETCH is to provide the document logic viewpoint above the either flat or explicitly indexed archives of text or images. STRETCH openness, together with the adopted standard middleware and object-oriented approach, will allow to integrate future technology innovations. Acknowledgements We would like to acknowledge the contributions by all STRETCH workteam, in particular by P. Penna (AET, Genova), E. Francesconi and S. Marinai (DSI University of Firenze), M. Diligenti (DI University of Siena). 6. References [1] M. Beigi et al., MetaSEEK: A Content-Based Meta-Search Engine for Images, SPIE Proceedings on Storage and Retrieval for Image and Video Databases, vol. 3312, Jan [2] F. Cesarini, M. Gori, S. Marinai, G. Soda, Structured document segmentation and representation by the modified X-Y tree, Proc. ICDAR 99 (to appear) [3] G. Nagy and S. Seth, Hierarchical representation of optically scanned documents, in Proc. of the International Conference on Pattern Recognition, pp , [4] G. Nagy and M. Viswanathan, Dual representation of segmented technical documents, in Proc. First Int'l Conf. Document Anal. Recog., pp , [5] J. Yuan, Y. Y. Tang, and C. Y. Suen, Four directional adjacency graphs (FDAG) and their application in locating fields in forms, in Proc. Third Int'l Conf. Document Anal. Recog., (Montreal, Canada), pp , [6] L. Boato, E. Cattani, M. Davite, B. Villa, Automatic Programming of Variable Layout Image Documents Reading Applications based on Minimum Description Length Induction, AI*IA Workshop on Automatic Learning and Natural Language, Turin, Italy, Dec [7] A.M. Colla, P. Pedrazzi, Single and Coupled Neural Handprinted Character Classifiers, in M. Marinaro and P.G. Morasso (Ed.s), ICANN 94 Proc. Intl. Conf. on ARTIFICIAL NEURAL NETWORKS, Sorrento, Italy, May , vol. II, pp , Springer-Verlag (1994). [8] M.Corvi, E.Ottaviani, "Multiresolution logo recognition", Proc. Int. Workshop on Visual Form, Capri, Acquisition GUI Maintenance & Definition Tool (C) Docu-client Enhancement Segmentation Layout Analysis Docu-server Docubase Archiving / Retrieval Engine Content-based Image Search Content-based Docum. Analysis Information Retrieval OCR / ICR DBMS Maintenance & Definition Tool (S) Document Internal Repres. Image Analysis Document Internal Repres. Database Instances Figure 1. Main layers with functional modules and data. MXY Generation MXY-based CLASSIFIER Image MXY Tree Supplier Name Document Class Reading Strategy Information from Fields Figure 2. The recognition process for the invoice application.
A Framework of Personalized Intelligent Document and Information Management System
A Framework of Personalized Intelligent and Information Management System Xien Fan Department of Computer Science, College of Staten Island, City University of New York, Staten Island, NY 10314, USA Fang
More informationComponent visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu
More informationEvent-based middleware services
3 Event-based middleware services The term event service has different definitions. In general, an event service connects producers of information and interested consumers. The service acquires events
More informationDistributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationA Grid Architecture for Manufacturing Database System
Database Systems Journal vol. II, no. 2/2011 23 A Grid Architecture for Manufacturing Database System Laurentiu CIOVICĂ, Constantin Daniel AVRAM Economic Informatics Department, Academy of Economic Studies
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationHow To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm
IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode
More information2. Distributed Handwriting Recognition. Abstract. 1. Introduction
XPEN: An XML Based Format for Distributed Online Handwriting Recognition A.P.Lenaghan, R.R.Malyan, School of Computing and Information Systems, Kingston University, UK {a.lenaghan,r.malyan}@kingston.ac.uk
More informationFluency With Information Technology CSE100/IMT100
Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999
More information2 AIMS: an Agent-based Intelligent Tool for Informational Support
Aroyo, L. & Dicheva, D. (2000). Domain and user knowledge in a web-based courseware engineering course, knowledge-based software engineering. In T. Hruska, M. Hashimoto (Eds.) Joint Conference knowledge-based
More informationModeling the User Interface of Web Applications with UML
Modeling the User Interface of Web Applications with UML Rolf Hennicker,Nora Koch,2 Institute of Computer Science Ludwig-Maximilians-University Munich Oettingenstr. 67 80538 München, Germany {kochn,hennicke}@informatik.uni-muenchen.de
More informationCooperative and Fast-Learning Information Extraction from Business Documents for Document Archiving
Cooperative and Fast-Learning Information Extraction from Business Documents for Document Archiving Daniel Esser Technical University Dresden Computer Networks Group 01062 Dresden, Germany daniel.esser@tu-dresden.de
More informationFoundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
More informationA MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS
A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University
More informationTechnical Information Abstract
1/15 Technical Information Abstract Disclaimer: in no event shall Microarea be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits,
More informationChristoph Schlenzig 1
EnviroInfo 2002 (Wien) Environmental Communication in the Information Society - Proceedings of the 16th Conference The MESAP Software for the German Emission Inventory An integrated information system
More informationHELP DESK SYSTEMS. Using CaseBased Reasoning
HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind
More informationSelbo 2 an Environment for Creating Electronic Content in Software Engineering
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 3 Sofia 2009 Selbo 2 an Environment for Creating Electronic Content in Software Engineering Damyan Mitev 1, Stanimir
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More information5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2
Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on
More informationSkills for Employment Investment Project (SEIP)
Skills for Employment Investment Project (SEIP) Standards/ Curriculum Format for Web Application Development Using DOT Net Course Duration: Three Months 1 Course Structure and Requirements Course Title:
More informationA THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING
A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING Ahmet Selman BOZKIR Hacettepe University Computer Engineering Department, Ankara, Turkey selman@cs.hacettepe.edu.tr Ebru Akcapinar
More informationFiltering Noisy Contents in Online Social Network by using Rule Based Filtering System
Filtering Noisy Contents in Online Social Network by using Rule Based Filtering System Bala Kumari P 1, Bercelin Rose Mary W 2 and Devi Mareeswari M 3 1, 2, 3 M.TECH / IT, Dr.Sivanthi Aditanar College
More informationThe Re-emergence of Data Capture Technology
The Re-emergence of Data Capture Technology Understanding Today s Digital Capture Solutions Digital capture is a key enabling technology in a business world striving to balance the shifting advantages
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationReusable Knowledge-based Components for Building Software. Applications: A Knowledge Modelling Approach
Reusable Knowledge-based Components for Building Software Applications: A Knowledge Modelling Approach Martin Molina, Jose L. Sierra, Jose Cuena Department of Artificial Intelligence, Technical University
More informationA Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
More informationFreeForm Designer. Phone: +972-9-8309999 Fax: +972-9-8309998 POB 8792, Natanya, 42505 Israel www.autofont.com. Document2
FreeForm Designer FreeForm Designer enables designing smart forms based on industry-standard MS Word editing features. FreeForm Designer does not require any knowledge of or training in programming languages
More informationDatabases in Organizations
The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron
More informationAutomatic Extraction of Signatures from Bank Cheques and other Documents
Automatic Extraction of Signatures from Bank Cheques and other Documents Vamsi Krishna Madasu *, Mohd. Hafizuddin Mohd. Yusof, M. Hanmandlu ß, Kurt Kubik * *Intelligent Real-Time Imaging and Sensing group,
More informationSelf-Service Business Intelligence
Self-Service Business Intelligence BRIDGE THE GAP VISUALIZE DATA, DISCOVER TRENDS, SHARE FINDINGS Solgenia Analysis provides users throughout your organization with flexible tools to create and share meaningful
More informationLOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE. indhubatchvsa@gmail.com
LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE 1 S.Manikandan, 2 S.Abirami, 2 R.Indumathi, 2 R.Nandhini, 2 T.Nanthini 1 Assistant Professor, VSA group of institution, Salem. 2 BE(ECE), VSA
More informationClient/server is a network architecture that divides functions into client and server
Page 1 A. Title Client/Server Technology B. Introduction Client/server is a network architecture that divides functions into client and server subsystems, with standard communication methods to facilitate
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationSIPAC. Signals and Data Identification, Processing, Analysis, and Classification
SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is
More informationCourse Syllabus For Operations Management. Management Information Systems
For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third
More informationWeb. Studio. Visual Studio. iseries. Studio. The universal development platform applied to corporate strategy. Adelia. www.hardis.
Web Studio Visual Studio iseries Studio The universal development platform applied to corporate strategy Adelia www.hardis.com The choice of a CASE tool does not only depend on the quality of the offer
More informationSOFT FLOW 2012 PRODUCT OVERVIEW
SOFT FLOW 2012 PRODUCT OVERVIEW Copyright 2010-2012 Soft Click 1 About Soft Flow Platform Welcome to Soft Flow, the most flexible and easiest to use document management and business process management
More informationDATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
More informationIFS-8000 V2.0 INFORMATION FUSION SYSTEM
IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationifinder ENTERPRISE SEARCH
DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality
More informationChapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved
More informationSoftware Life-Cycle Management
Ingo Arnold Department Computer Science University of Basel Theory Software Life-Cycle Management Architecture Styles Overview An Architecture Style expresses a fundamental structural organization schema
More informationM3039 MPEG 97/ January 1998
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationThe Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
More informationFast and Easy Delivery of Data Mining Insights to Reporting Systems
Fast and Easy Delivery of Data Mining Insights to Reporting Systems Ruben Pulido, Christoph Sieb rpulido@de.ibm.com, christoph.sieb@de.ibm.com Abstract: During the last decade data mining and predictive
More informationCourse 103402 MIS. Foundations of Business Intelligence
Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:
More informationCommon Questions and Concerns About Documentum at NEF
LES/NEF 220 W Broadway Suite B Hobbs, NM 88240 Documentum FAQ Common Questions and Concerns About Documentum at NEF Introduction...2 What is Documentum?...2 How does Documentum work?...2 How do I access
More informationCode Generation for Mobile Terminals Remote Accessing to the Database Based on Object Relational Mapping
, pp.35-44 http://dx.doi.org/10.14257/ijdta.2013.6.5.04 Code Generation for Mobile Terminals Remote Accessing to the Database Based on Object Relational Mapping Wen Hu and Yan li Zhao School of Computer
More informationProc. of the 3rd Intl. Conf. on Document Analysis and Recognition, Montreal, Canada, August 1995. 1
Proc. of the 3rd Intl. Conf. on Document Analysis and Recognition, Montreal, Canada, August 1995. 1 A Map Acquisition, Storage, Indexing, and Retrieval System Hanan Samet Aya Soer Computer Science Department
More informationOracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
More informationData Analytics and Reporting in Toll Management and Supervision System Case study Bosnia and Herzegovina
Data Analytics and Reporting in Toll Management and Supervision System Case study Bosnia and Herzegovina Gordana Radivojević 1, Gorana Šormaz 2, Pavle Kostić 3, Bratislav Lazić 4, Aleksandar Šenborn 5,
More informationFile Magic 5 Series. The power to share information PRODUCT OVERVIEW. Revised November 2004
File Magic 5 Series The power to share information PRODUCT OVERVIEW Revised November 2004 Copyrights, Legal Notices, Trademarks and Servicemarks Copyright 2004 Westbrook Technologies Incorporated. All
More informationManaging Large Imagery Databases via the Web
'Photogrammetric Week 01' D. Fritsch & R. Spiller, Eds. Wichmann Verlag, Heidelberg 2001. Meyer 309 Managing Large Imagery Databases via the Web UWE MEYER, Dortmund ABSTRACT The terramapserver system is
More informationMaster s Program in Information Systems
The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems
More informationSERVICE-ORIENTED MODELING FRAMEWORK (SOMF ) SERVICE-ORIENTED SOFTWARE ARCHITECTURE MODEL LANGUAGE SPECIFICATIONS
SERVICE-ORIENTED MODELING FRAMEWORK (SOMF ) VERSION 2.1 SERVICE-ORIENTED SOFTWARE ARCHITECTURE MODEL LANGUAGE SPECIFICATIONS 1 TABLE OF CONTENTS INTRODUCTION... 3 About The Service-Oriented Modeling Framework
More informationHOW TO DO A SMART DATA PROJECT
April 2014 Smart Data Strategies HOW TO DO A SMART DATA PROJECT Guideline www.altiliagroup.com Summary ALTILIA s approach to Smart Data PROJECTS 3 1. BUSINESS USE CASE DEFINITION 4 2. PROJECT PLANNING
More informationFoundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management
More informationIntelligent Agents Serving Based On The Society Information
Intelligent Agents Serving Based On The Society Information Sanem SARIEL Istanbul Technical University, Computer Engineering Department, Istanbul, TURKEY sariel@cs.itu.edu.tr B. Tevfik AKGUN Yildiz Technical
More informationCHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL
CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter
More informationAHUDesigner. The Air Handling Units selection software. Product description
AHUDesigner The Air Handling Units selection software Product description Table of contents INTRODUCTION... 4 AHU SELECTION SOFTWARE FUNCTIONAL SPECIFICATIONS... 5 Definition of unit configuration... 5
More informationThe Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)
The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit
More informationImplementation of OCR Based on Template Matching and Integrating it in Android Application
International Journal of Computer Sciences and EngineeringOpen Access Technical Paper Volume-04, Issue-02 E-ISSN: 2347-2693 Implementation of OCR Based on Template Matching and Integrating it in Android
More informationInternational Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
More informationMODEL OF SOFTWARE AGENT FOR NETWORK SECURITY ANALYSIS
MODEL OF SOFTWARE AGENT FOR NETWORK SECURITY ANALYSIS Hristo Emilov Froloshki Department of telecommunications, Technical University of Sofia, 8 Kliment Ohridski st., 000, phone: +359 2 965 234, e-mail:
More informationCONDIS. IT Service Management and CMDB
CONDIS IT Service and CMDB 2/17 Table of contents 1. Executive Summary... 3 2. ITIL Overview... 4 2.1 How CONDIS supports ITIL processes... 5 2.1.1 Incident... 5 2.1.2 Problem... 5 2.1.3 Configuration...
More informationPROCESSING & MANAGEMENT OF INBOUND TRANSACTIONAL CONTENT
PROCESSING & MANAGEMENT OF INBOUND TRANSACTIONAL CONTENT IN THE GLOBAL ENTERPRISE A BancTec White Paper SUMMARY Reducing the cost of processing transactions, while meeting clients expectations, protecting
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationOperations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp
More information01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.
(International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models
More informationCHAPTER 6: TECHNOLOGY
Chapter 6: Technology CHAPTER 6: TECHNOLOGY Objectives Introduction The objectives are: Review the system architecture of Microsoft Dynamics AX 2012. Describe the options for making development changes
More informationCONFIOUS * : Managing the Electronic Submission and Reviewing Process of Scientific Conferences
CONFIOUS * : Managing the Electronic Submission and Reviewing Process of Scientific Conferences Manos Papagelis 1, 2, Dimitris Plexousakis 1, 2 and Panagiotis N. Nikolaou 2 1 Institute of Computer Science,
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationDatabase Optimizing Services
Database Systems Journal vol. I, no. 2/2010 55 Database Optimizing Services Adrian GHENCEA 1, Immo GIEGER 2 1 University Titu Maiorescu Bucharest, Romania 2 Bodenstedt-Wilhelmschule Peine, Deutschland
More informationVISUALIZATION APPROACH FOR SOFTWARE PROJECTS
Canadian Journal of Pure and Applied Sciences Vol. 9, No. 2, pp. 3431-3439, June 2015 Online ISSN: 1920-3853; Print ISSN: 1715-9997 Available online at www.cjpas.net VISUALIZATION APPROACH FOR SOFTWARE
More informationANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR
ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR By: Dmitri Ilkaev, Stephen Pearson Abstract: In this paper we analyze the concept of grid programming as it applies to
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationExtracting Business. Value From CAD. Model Data. Transformation. Sreeram Bhaskara The Boeing Company. Sridhar Natarajan Tata Consultancy Services Ltd.
Extracting Business Value From CAD Model Data Transformation Sreeram Bhaskara The Boeing Company Sridhar Natarajan Tata Consultancy Services Ltd. GPDIS_2014.ppt 1 Contents Data in CAD Models Data Structures
More informationHow To Develop Software
Software Engineering Prof. N.L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-4 Overview of Phases (Part - II) We studied the problem definition phase, with which
More informationMasters in Information Technology
Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationOLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
More informationModern Databases. Database Systems Lecture 18 Natasha Alechina
Modern Databases Database Systems Lecture 18 Natasha Alechina In This Lecture Distributed DBs Web-based DBs Object Oriented DBs Semistructured Data and XML Multimedia DBs For more information Connolly
More informationRequirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao
Requirements Analysis Concepts & Principles Instructor: Dr. Jerry Gao Requirements Analysis Concepts and Principles - Requirements Analysis - Communication Techniques - Initiating the Process - Facilitated
More informationQuality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationChapter 11 Mining Databases on the Web
Chapter 11 Mining bases on the Web INTRODUCTION While Chapters 9 and 10 provided an overview of Web data mining, this chapter discusses aspects of mining the databases on the Web. Essentially, we use the
More informationB.Sc (Computer Science) Database Management Systems UNIT-V
1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used
More informationWEB APPLICATION FOR TIMETABLE PLANNING IN THE HIGHER TECHNICAL COLLEGE OF INDUSTRIAL AND TELECOMMUNICATIONS ENGINEERING
WEB APPLICATION FOR TIMETABLE PLANNING IN THE HIGHER TECHNICAL COLLEGE OF INDUSTRIAL AND TELE ENGINEERING Dra. Marta E. Zorrilla Pantaleón Dpto. Applied Mathematics and Computer Science Avda. Los Castros
More informationProf. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: p.ducange@iet.unipi.it Office: Dipartimento di Ingegneria
More informationLost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
More informationMicrosoft Office 2010: Access 2010, Excel 2010, Lync 2010 learning assets
Microsoft Office 2010: Access 2010, Excel 2010, Lync 2010 learning assets Simply type the id# in the search mechanism of ACS Skills Online to access the learning assets outlined below. Titles Microsoft
More information