Combining Data Integration and Information Extraction Techniques

Size: px
Start display at page:

Download "Combining Data Integration and Information Extraction Techniques"

Transcription

1 Combining Data Integration and Information Extraction Techniques Dean Williams School of Computer Science and Information Systems, Birkbeck College, University of London Abstract We describe a class of applications which are built using databases comprising some structured data and some free text. Conventional database management systems have proved ineffective for these applications and they are rarely suitable for current text and data mining techniques. We argue that combining Information Extraction and Data Integration techniques is a promising direction for research and we outline how our ESTEST system demonstrates this approach. 1. Introduction A class of applications exist which can be characterised by the way in which they combine both data conforming to a schema and some related free text. We describe this application class in Section 2. Our approach is to combine Data Integration (DI) and Information Extraction (IE) techniques to better exploit the text data and, in Section 3, there is a summary of related areas of research and we show how our method relates to these. Section 4 details why we belive text is used in these applications and as a result, why we belive combining DI and IE techniques will be beneficial to these applications. Details of our system Experimental Software to Extract Structure from Text (ES- TEST) are given in Section 5 which shows how we plan to realise our goals. Finally we give our conclusions and plans for future work in Section Partially Structured Data In [1] King and Poulovassilis define a distinct category of data - partially structured data (PSD). Many database applications rely on storing significant amounts of data in the form of free text. Recent developments in database technology have improved the facilities available for storing large amounts of text. However the provision for making use of this text data largely relies on searching the text for keywords. A class of applications exist where the information to be stored consists partly of some structured data conforming to a schema with the remainder left as free text. We consider this data to be partially structured. This idea of PSD is distinct from semistructured data, which is generally taken to mean data that is self-describing. In semistructured data there may not be a schema defined but the data itself contains some structural information e.g. XML tags.

2 An example of an application based on the use of PSD is operational intelligence gathering, which is used in serious crime investigations. The data collected in this application area takes the form of a report that contains some structured data such as the name of the Police Officer making the report, the time and location of the incident, as well as details of subjects and locations contained in the report. This is combined with the actual report of the sighting or information received which is captured as text. A number of other text based applications exist in crime e.g. for witness statements and scene of crime reports. Other application domains we are familiar with which have partially structured data include Road Traffic Accident reports where the standard format statistics are combined with free text accounts in a formalised subset of English. In Bioinfomatics, structured databases such as the SWISS-PROT [2] database includes comment fields that contain related unstructured information. A common theme of many of these applications, including crime and SWISS-PROT, is a requirement for expert users to annotate the text, trying to use standard terms to assist with queries, reduce duplication and highlight important facts. This is often a time consuming, demanding task with results less effective than would be desired and applications to assist with this work are being developed both as academic research projects e.g. [3] and commercial software e.g. [4]. 3. Related Areas A number of active areas of research deal with text in databases and we use the following definitions to establish how our approach relates to these. Data Integration Providing a single schema over a collection of data sources that facilitates queries across the sources [5] Information Extraction Finding pre-defined entities from text and using the extracted data to fill slots in a template using shallow NLP techniques [6]. Data Mining / Knowledge Discovery in Databases Finding patterns in structured data, discovering new deep knowledge embedded in data. Text Mining Application of data mining to text (often some NLP process creates a structured dataset from the text and then this is used for data mining [7]). Graph Based Data Models Current industry standard databases are essentially record based (e.g. the relational model or some form of object data model) where the schema must be determined in advance of populating the database. Graph-based data models offer finer semantic granularity and greater flexibility [8]. We are not proposing use of a text mining technique which finds patterns in very large collections of text e.g. Nahm and Mooney [9] who combine IE with Text Mining. For many of the PSD applications we have described this is unlikely to be effective as there are not very large static datasets to be mined (although there are some exceptions e.g. SWISS-PROT), rather over time new query requirements arise and extensions to the schema are required. We propose an evolutionary system where the user iterates through the steps as new information sources and new query requirements arise. Firstly an initial integrated 2

3 schema is built from a variety of sources including structured data schema, domain ontologies and natural language ontologies. Then information extraction rules are semiautomatically generated from this schema to be used as input for the IE processor. The data extracted from the text is added to the integrated schema and is available to answer queries. The schema may then be extended by new data-sources being added or new schema elements identified and the process repeats. Figure 1 shows how the user will use the ESTEST system in this evolutionary manner. Because of the evolutionary approach we suggest a graphical workbench will be required for end user use of ESTEST and we intend to consider the requirements of such a workbench. Integrate Datasources Create Data to assist the IE process IE Direction Data Enhance Schema Information Global Schema Extracted Data Information Extraction (IE) Query Global Schema Integrate Results of IE Control Flow Data Flow Fig. 1. Evolutionary Use of the ESTEST System 4. Combining Data Integration and Information Extraction We belive that the data collected in the form of free-text is important to PSD applications and is not stored as text due to its secondary value, and that there are two main reasons for storing data as text in PSD applications: It is not possible in advance to know all of the queries that will be required in the future. The text captured represents an intuitive attempt by the user to provide all information that could possibly be relevant. The Road Traffic Accident reports are 3

4 a good example of this. The schema of the structured part of the data covers all currently known requirements in a format known as STATS20 [10] and the text part is used when new reporting requirements arise. Data is captured as text due to the limitation of dynamically building a schema in conventional DBMS where simply adding a column to an existing table can be a major task in production systems. For example in systems storing witness statements in crime reports as entities and relationships are mentioned for the first time it is not possible to dynamically expand the underlying data schema and so the new information is only stored in its text form. Furthermore, the real world entities and relationships described in the text are related to the entities in the structured part of the data. An application combining IE and Data Integration will provide advantages in these applications for a number of reasons. Information Extraction is based on the idea of filling pre-defined templates and Data Integration can provide a global schema to be used as a template. Combining the schema of the structured data together with ontologies and other metadata sources can create the global schema / template. Metadata from the data sources can be used to assist the IE process by semi-automatically creating the required input to the IE modules. Data Integration systems which use a low level graph based common data model (e.g. AutoMed [11]) are able to extend schema as new entities become known without the overhead associated with conventional DBMS as they are not based on record based structures such as tables in relational databases. The templates filled by the IE process will provide a new data source to be added to the global schema supporting new queries which could not previously be answered. 5. The ESTEST System Our ESTEST system makes use of the AutoMed heterogeneous data integration system being developed at Birkbeck and Imperial Colleges [12]. In data integration systems, several data sources, each with an associated local schema, are integrated to form a single virtual database with an associated global schema. If the data sources conform to different data models, then these need to be transformed into a common data model as part of the integration process. The AutoMed system uses a low-level graph-based data model, the HDM, as its common data model - this is suitable for incremental increases in a global schema as new requirements arise. We have developed an AutoMed HDM data store [13] to store instance data and intermediate results for ESTEST. AutoMed implements bi-directional schema transformation pathways to transform and integrate heterogeneous schemas [11] which is a flexible approach amenable to including new domain knowledge dynamically. In summary the ESTEST system works as follows. The data sources are first identified and integrated into a single global schema. In AutoMed each data model which can be integrated is defined in terms of the HDM. Each construct in the external data model has an associated set of HDM nodes and edges. In the ESTEST system some features of data models are required to be preserved across all the integrated data sources. These features include an IS-A concept hierarchy; allowing for attributes; identifying text data to be mined and the ability to attach word forms to concepts. To facilitate the 4

5 automatic creation of the global schema all the data sources used by ESTEST will be transformed to an ESTEST data model. Each construct in the external model also has a set of transformations to map onto the ESTEST data model. Once all the data sources have been transformed to this standard representation and mappings between schema elements obtained - it will be possible to integrate the schemas. ESTEST then takes the metadata in the global schema and uses this to suggest input into the IE process. The user confirms, corrects and appends to this configuration data and the IE process is run. We make use of the GATE [14] IE architecture to build the ESTEST IE processor. As well as reusing standard IE components such as Named Entity gazetteers, sentence splitters, pattern matching grammars (with configured inputs semi-automatically created by ESTEST), a number of new IE components are being developed: TemplateFromSchema Takes an ESTEST global schema and creates templates to be filled by the IE engine and creates input to the standard IE components. NE-DB Named Entity recognition in IE is typically driven by flat file lists, the NE-DB component will associate a query on the global schema with a annotation type. A list of word forms will be materialised in the HDM store for use when the IE process is running (GATE NE gazetteers generate Finite State Machines for possible transitions of tokens). WordForm For a given concept will get relevant word forms from the WordNet natural language ontology. It will be possible to generate more words by increasing the number of traversals allowed through the WordNet hierarchy ordered by an approximation of semantic distance. The templates filled by the IE process will then be used to add to the extent of the concept in the global schema. Extracted annotations which match objects in the global schema will be extracted and put in the HDM store.the global query facilities of AutoMed are now available to the user who can query the global schema using the IQL query language [15, 16]. For more detailed information on the design of the ESTEST system we refer the reader to [17] and for an example of its operation in the Road Traffic Accident domain to [18]. Recent work within the Tristarp group [19], has resulted in advanced visualisation tools for graph-based databases becoming available [20] that may be of assistance in the proposed user workbench. This research interest is also reflected in recent products developed in industry, the Sentences [21] DBMS from Lazysoft is based on a quadruple store and sets out to challenge the dominance of the relational model. 6. Conclusions and Future Work We have discussed how a class of applications based on partially structured data are not adequately supported by current database and data mining techniques. We have stated why we belive combining Information Extraction and Data Integration techniques is a promising direction for research. We are currently completing an initial implementation of the ESTEST system which we will test in the Road Traffic Accident reporting and Crime Investigation domains. 5

6 ESTEST extends the facilities offered by data integration systems by moving towards handling text and extends IE systems by attempting to use schema information to semiautomatically configure the IE process. References 1. P.J.H.King and A Poulovassilis. Enhancing database technology to better manage and exploit partially structured data. Technical report, Birkbeck College, University of London, Bairoch A., Boeckmann B., Ferro S., and Gasteiger E. Swiss-Prot: Juggling between evolution and stability. Brief. Bioinform., 5:39 55, SOCIS Scene of Crime Information System QUENZA Alon Y. Halevy. Data integration: A status report. In Gerhard Weikum, Harald Schöning, and Erhard Rahm, editors, BTW, volume 26 of LNI, pages GI, D. Appelt. An introduction to information extraction. Artificial Intelligence Communications, A.H.Tan. Text mining: The state of the art and the challanges. Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, pages 65 70, William Kent. Limitations of record-based information models. ACM Transactions on Database Systems, 4(1): , March R. Mooney U.Y. Nahm. Using information extraction to aid the discovery of prediction rules from text. Proceedings of the KDD-2000 Workshop on text Mining, pages 51 58, UK Government Department for Transport. Instructions for the completion of road accident report form stats19. transstats/documents/page/dft transstats pdf. 11. P.J. McBrien and A. Poulovassilis. Data integration by bi-directional schema transformation rules. In Proc. ICDE 03, AutoMed Project D. Williams. The AutoMed HDM data store. Technical report, Automed Project, H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. Experience with a language engineering architecture: Three years of GATE. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 02), A. Poulovassilis. The AutoMed Intermediate Query Language. Technical report, AutoMed Project, E. Jasper. Global query processing in the AutoMed heterogeneous database environment. In Proc. BNCOD02, LNCS 2405, pages 46 49, D. Williams and A.Poulovassilis. Combining data integration with natural language technology for the semantic web. In Proc. Workshop on Human Language Technology for the Semantic Web and Web Services, at ISWC 03, page TBC, Dean Williams and Alexandra Poulovassilis. An example of the estest approach to combining unstructured text and structured data. In DEXA Workshops, pages IEEE Computer Society, Tristarp Project M.N. Smith and P.J.H. King. Database support for exploring criminal networks. Intelligence and Security Informatics: First NSF/NIJ Symposium, Lazysoft (maker of Sentences). 6

Dean Williams. A thesis submitted in fulfilment of the requirements for the degree of Doctor of. Philosophy in the University of London.

Dean Williams. A thesis submitted in fulfilment of the requirements for the degree of Doctor of. Philosophy in the University of London. Combining Data Integration and Information Extraction Dean Williams A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in the University of London. Submitted July

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

A Uniform Approach to Workflow and Data Integration

A Uniform Approach to Workflow and Data Integration A Uniform Approach to Workflow and Data Integration Lucas Zamboulis 1, 2, Nigel Martin 1, Alexandra Poulovassilis 1 1 School of Computer Science and Information Systems, Birkbeck, Univ. of London 2 Department

More information

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks Melike Şah, Wendy Hall and David C De Roure Intelligence, Agents and Multimedia Group,

More information

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD 72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Annotation for the Semantic Web during Website Development

Annotation for the Semantic Web during Website Development Annotation for the Semantic Web during Website Development Peter Plessers, Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,

More information

Introduction to IE with GATE

Introduction to IE with GATE Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation

More information

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.

More information

SOCIS: Scene of Crime Information System - IGR Review Report

SOCIS: Scene of Crime Information System - IGR Review Report SOCIS: Scene of Crime Information System - IGR Review Report Katerina Pastra, Horacio Saggion, Yorick Wilks June 2003 1 Introduction This report reviews the work done by the University of Sheffield on

More information

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,

More information

RETRATOS: Requirement Traceability Tool Support

RETRATOS: Requirement Traceability Tool Support RETRATOS: Requirement Traceability Tool Support Gilberto Cysneiros Filho 1, Maria Lencastre 2, Adriana Rodrigues 2, Carla Schuenemann 3 1 Universidade Federal Rural de Pernambuco, Recife, Brazil g.cysneiros@gmail.com

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France

More information

Exploration and Visualization of Post-Market Data

Exploration and Visualization of Post-Market Data Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

User Profile Refinement using explicit User Interest Modeling

User Profile Refinement using explicit User Interest Modeling User Profile Refinement using explicit User Interest Modeling Gerald Stermsek, Mark Strembeck, Gustaf Neumann Institute of Information Systems and New Media Vienna University of Economics and BA Austria

More information

Distributed Database for Environmental Data Integration

Distributed Database for Environmental Data Integration Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

A Framework of Context-Sensitive Visualization for User-Centered Interactive Systems

A Framework of Context-Sensitive Visualization for User-Centered Interactive Systems Proceedings of 10 th International Conference on User Modeling, pp423-427 Edinburgh, UK, July 24-29, 2005. Springer-Verlag Berlin Heidelberg 2005 A Framework of Context-Sensitive Visualization for User-Centered

More information

A Framework and Architecture for Quality Assessment in Data Integration

A Framework and Architecture for Quality Assessment in Data Integration A Framework and Architecture for Quality Assessment in Data Integration Jianing Wang March 2012 A Dissertation Submitted to Birkbeck College, University of London in Partial Fulfillment of the Requirements

More information

MULTI AGENT-BASED DISTRIBUTED DATA MINING

MULTI AGENT-BASED DISTRIBUTED DATA MINING MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Automatic Annotation Wrapper Generation and Mining Web Database Search Result Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India

More information

Supporting Change-Aware Semantic Web Services

Supporting Change-Aware Semantic Web Services Supporting Change-Aware Semantic Web Services Annika Hinze Department of Computer Science, University of Waikato, New Zealand a.hinze@cs.waikato.ac.nz Abstract. The Semantic Web is not only evolving into

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Lightweight Data Integration using the WebComposition Data Grid Service

Lightweight Data Integration using the WebComposition Data Grid Service Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed

More information

Service Oriented Architecture

Service Oriented Architecture Service Oriented Architecture Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline

More information

It s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking

It s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking It s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking Lutz Maicher and Benjamin Bock, Topic Maps Lab at University of Leipzig,

More information

Project Knowledge Management Based on Social Networks

Project Knowledge Management Based on Social Networks DOI: 10.7763/IPEDR. 2014. V70. 10 Project Knowledge Management Based on Social Networks Panos Fitsilis 1+, Vassilis Gerogiannis 1, and Leonidas Anthopoulos 1 1 Business Administration Dep., Technological

More information

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

A Framework of User-Driven Data Analytics in the Cloud for Course Management

A Framework of User-Driven Data Analytics in the Cloud for Course Management A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer

More information

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic

More information

Artificial Intelligence & Knowledge Management

Artificial Intelligence & Knowledge Management Artificial Intelligence & Knowledge Management Nick Bassiliades, Ioannis Vlahavas, Fotis Kokkoras Aristotle University of Thessaloniki Department of Informatics Programming Languages and Software Engineering

More information

Data Integration by Bi-Directional Schema Transformation Rules

Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules Peter M c.brien Dept. of Computing, Imperial College, London SW7 2BZ pjm@doc.ic.ac.uk Alexandra Poulovassilis School of Computer Science and

More information

Linked Science as a producer and consumer of big data in the Earth Sciences

Linked Science as a producer and consumer of big data in the Earth Sciences Linked Science as a producer and consumer of big data in the Earth Sciences Line C. Pouchard,* Robert B. Cook,* Jim Green,* Natasha Noy,** Giri Palanisamy* Oak Ridge National Laboratory* Stanford Center

More information

Abstract. Introduction

Abstract. Introduction CODATA Prague Workshop Information Visualization, Presentation, and Design 29-31 March 2004 Abstract Goals of Analysis for Visualization and Visual Data Mining Tasks Thomas Nocke and Heidrun Schumann University

More information

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,

More information

Information Brokering over the Information Highway: An Internet-Based Database Navigation System

Information Brokering over the Information Highway: An Internet-Based Database Navigation System In Proc. of The Joint Pacific Asian Conference on Expert Systems, Singapore, 1997 Information Brokering over the Information Highway: An Internet-Based Database Navigation System Syed Sibte Raza ABIDI

More information

Find the signal in the noise

Find the signal in the noise Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

ezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics

ezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics ezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics Raxit Goswami*, Neil Shah* and Amit Sheth*, ** ezdi Inc, Louisville, KY and Ahmedabad, India. ** Kno.e.sis-Wright State

More information

Deriving Business Intelligence from Unstructured Data

Deriving Business Intelligence from Unstructured Data International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 971-976 International Research Publications House http://www. irphouse.com /ijict.htm Deriving

More information

A Framework for the Delivery of Personalized Adaptive Content

A Framework for the Delivery of Personalized Adaptive Content A Framework for the Delivery of Personalized Adaptive Content Colm Howlin CCKF Limited Dublin, Ireland colm.howlin@cckf-it.com Danny Lynch CCKF Limited Dublin, Ireland colm.howlin@cckf-it.com Abstract

More information

Software Architecture Document

Software Architecture Document Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2

More information

Caravela: Semantic Content Management with Automatic Information Integration and Categorization (System Description)

Caravela: Semantic Content Management with Automatic Information Integration and Categorization (System Description) Caravela: Semantic Content Management with Automatic Information Integration and Categorization (System Description) David Aumueller, Erhard Rahm University of Leipzig {david, rahm}@informatik.uni-leipzig.de

More information

Visionet IT Modernization Empowering Change

Visionet IT Modernization Empowering Change Visionet IT Modernization A Visionet Systems White Paper September 2009 Visionet Systems Inc. 3 Cedar Brook Dr. Cranbury, NJ 08512 Tel: 609 360-0501 Table of Contents 1 Executive Summary... 4 2 Introduction...

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Using NLP and Ontologies for Notary Document Management Systems

Using NLP and Ontologies for Notary Document Management Systems Outline Using NLP and Ontologies for Notary Document Management Systems Flora Amato, Antonino Mazzeo, Antonio Penta and Antonio Picariello Dipartimento di Informatica e Sistemistica Universitá di Napoli

More information

WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT

WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT CONTENTS 1. THE NEED FOR DATA GOVERNANCE... 2 2. DATA GOVERNANCE... 2 2.1. Definition... 2 2.2. Responsibilities... 3 3. ACTIVITIES... 6 4. THE

More information

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study Amar-Djalil Mezaour 1, Julien Law-To 1, Robert Isele 3, Thomas Schandl 2, and Gerd Zechmeister

More information

Design and Implementation of Domain based Semantic Hidden Web Crawler

Design and Implementation of Domain based Semantic Hidden Web Crawler Design and Implementation of Domain based Semantic Hidden Web Crawler Manvi Department of Computer Engineering YMCA University of Science & Technology Faridabad, India Ashutosh Dixit Department of Computer

More information

CitationBase: A social tagging management portal for references

CitationBase: A social tagging management portal for references CitationBase: A social tagging management portal for references Martin Hofmann Department of Computer Science, University of Innsbruck, Austria m_ho@aon.at Ying Ding School of Library and Information Science,

More information

LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

Service Road Map for ANDS Core Infrastructure and Applications Programs

Service Road Map for ANDS Core Infrastructure and Applications Programs Service Road Map for ANDS Core and Applications Programs Version 1.0 public exposure draft 31-March 2010 Document Target Audience This is a high level reference guide designed to communicate to ANDS external

More information

CorHousing. CorHousing provides performance indicator, risk and project management templates for the UK Social Housing sector including:

CorHousing. CorHousing provides performance indicator, risk and project management templates for the UK Social Housing sector including: CorHousing CorHousing provides performance indicator, risk and project management templates for the UK Social Housing sector including: Corporate, operational and service based scorecards Housemark indicators

More information

Ontology-based Domain Modelling for Consistent Content Change Management

Ontology-based Domain Modelling for Consistent Content Change Management Ontology-based Domain Modelling for Consistent Content Change Management Muhammad Javed 1, Yalemisew Abgaz 2, Claus Pahl 3 Centre for Next Generation Localization (CNGL), School of Computing, Dublin City

More information

The Ontology and Architecture for an Academic Social Network

The Ontology and Architecture for an Academic Social Network www.ijcsi.org 22 The Ontology and Architecture for an Academic Social Network Moharram Challenger Computer Engineering Department, Islamic Azad University Shabestar Branch, Shabestar, East Azerbaijan,

More information

Test Data Management Concepts

Test Data Management Concepts Test Data Management Concepts BIZDATAX IS AN EKOBIT BRAND Executive Summary Test Data Management (TDM), as a part of the quality assurance (QA) process is more than ever in the focus among IT organizations

More information

Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources

Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Investigating Automated Sentiment Analysis of Feedback Tags in a Programming Course Stephen Cummins, Liz Burd, Andrew

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Semantification of Query Interfaces to Improve Access to Deep Web Content

Semantification of Query Interfaces to Improve Access to Deep Web Content Semantification of Query Interfaces to Improve Access to Deep Web Content Arne Martin Klemenz, Klaus Tochtermann ZBW German National Library of Economics Leibniz Information Centre for Economics, Düsternbrooker

More information

A Data Browsing from Various Sources Driven by the User s Data Models

A Data Browsing from Various Sources Driven by the User s Data Models A Data Browsing from Various Sources Driven by the User s Data Models Guntis Arnicans, Girts Karnitis University of Latvia, Raina blvd. 9, Riga, Latvia {Guntis.Arnicans, Girts.Karnitis}@lu.lv Abstract.

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Information Services for Smart Grids

Information Services for Smart Grids Smart Grid and Renewable Energy, 2009, 8 12 Published Online September 2009 (http://www.scirp.org/journal/sgre/). ABSTRACT Interconnected and integrated electrical power systems, by their very dynamic

More information

Ontology and automatic code generation on modeling and simulation

Ontology and automatic code generation on modeling and simulation Ontology and automatic code generation on modeling and simulation Youcef Gheraibia Computing Department University Md Messadia Souk Ahras, 41000, Algeria youcef.gheraibia@gmail.com Abdelhabib Bourouis

More information

Automatic Timeline Construction For Computer Forensics Purposes

Automatic Timeline Construction For Computer Forensics Purposes Automatic Timeline Construction For Computer Forensics Purposes Yoan Chabot, Aurélie Bertaux, Christophe Nicolle and Tahar Kechadi CheckSem Team, Laboratoire Le2i, UMR CNRS 6306 Faculté des sciences Mirande,

More information

A Contribution to Expert Decision-based Virtual Product Development

A Contribution to Expert Decision-based Virtual Product Development A Contribution to Expert Decision-based Virtual Product Development László Horváth, Imre J. Rudas Institute of Intelligent Engineering Systems, John von Neumann Faculty of Informatics, Óbuda University,

More information

Resource Management on Computational Grids

Resource Management on Computational Grids Univeristà Ca Foscari, Venezia http://www.dsi.unive.it Resource Management on Computational Grids Paolo Palmerini Dottorato di ricerca di Informatica (anno I, ciclo II) email: palmeri@dsi.unive.it 1/29

More information

A Mind Map Based Framework for Automated Software Log File Analysis

A Mind Map Based Framework for Automated Software Log File Analysis 2011 International Conference on Software and Computer Applications IPCSIT vol.9 (2011) (2011) IACSIT Press, Singapore A Mind Map Based Framework for Automated Software Log File Analysis Dileepa Jayathilake

More information

SCADE System 17.0. Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System 17.0 1

SCADE System 17.0. Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System 17.0 1 SCADE System 17.0 SCADE System is the product line of the ANSYS Embedded software family of products and solutions that empowers users with a systems design environment for use on systems with high dependability

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

Information Discovery on Electronic Medical Records

Information Discovery on Electronic Medical Records Information Discovery on Electronic Medical Records Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, MD Anthony F. Rossi, MD Jeffrey A. White, FIU FIU Miami Children s Hospital Miami Children s Hospital

More information

Patterns of Information Management

Patterns of Information Management PATTERNS OF MANAGEMENT Patterns of Information Management Making the right choices for your organization s information Summary of Patterns Mandy Chessell and Harald Smith Copyright 2011, 2012 by Mandy

More information

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute

More information

Annotea and Semantic Web Supported Collaboration

Annotea and Semantic Web Supported Collaboration Annotea and Semantic Web Supported Collaboration Marja-Riitta Koivunen, Ph.D. Annotea project Abstract Like any other technology, the Semantic Web cannot succeed if the applications using it do not serve

More information

A METHOD FOR REWRITING LEGACY SYSTEMS USING BUSINESS PROCESS MANAGEMENT TECHNOLOGY

A METHOD FOR REWRITING LEGACY SYSTEMS USING BUSINESS PROCESS MANAGEMENT TECHNOLOGY A METHOD FOR REWRITING LEGACY SYSTEMS USING BUSINESS PROCESS MANAGEMENT TECHNOLOGY Gleison Samuel do Nascimento, Cirano Iochpe Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre,

More information

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many

More information

A semantic extension of a hierarchical storage management system for small and medium-sized enterprises.

A semantic extension of a hierarchical storage management system for small and medium-sized enterprises. Faculty of Computer Science Institute of Software- and Multimedia Technology, Chair of Multimedia Technology A semantic extension of a hierarchical storage management system for small and medium-sized

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

A generic approach for data integration using RDF, OWL and XML

A generic approach for data integration using RDF, OWL and XML A generic approach for data integration using RDF, OWL and XML Miguel A. Macias-Garcia, Victor J. Sosa-Sosa, and Ivan Lopez-Arevalo Laboratory of Information Technology (LTI) CINVESTAV-TAMAULIPAS Km 6

More information

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment www.ijcsi.org 434 A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment V.THAVAVEL and S.SIVAKUMAR* Department of Computer Applications, Karunya University,

More information

Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1

Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1 Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1 Ivo Marinchev Abstract: The paper introduces approach to semantic lifting of unstructured data with the help of natural language

More information

SWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY

SWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY SWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY M. EHRIG, C. TEMPICH AND S. STAAB Institute AIFB University of Karlsruhe 76128 Karlsruhe, Germany E-mail: {meh,cte,sst}@aifb.uni-karlsruhe.de

More information

Recognition and Privacy Preservation of Paper-based Health Records

Recognition and Privacy Preservation of Paper-based Health Records Quality of Life through Quality of Information J. Mantas et al. (Eds.) IOS Press, 2012 2012 European Federation for Medical Informatics and IOS Press. All rights reserved. doi:10.3233/978-1-61499-101-4-751

More information

Ontology-based Archetype Interoperability and Management

Ontology-based Archetype Interoperability and Management Ontology-based Archetype Interoperability and Management Catalina Martínez-Costa, Marcos Menárguez-Tortosa, J. T. Fernández-Breis Departamento de Informática y Sistemas, Facultad de Informática Universidad

More information

Financial Events Recognition in Web News for Algorithmic Trading

Financial Events Recognition in Web News for Algorithmic Trading Financial Events Recognition in Web News for Algorithmic Trading Frederik Hogenboom Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands fhogenboom@ese.eur.nl Abstract. Due to

More information

SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS

SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS Irwan Bastian, Lily Wulandari, I Wayan Simri Wicaksana {bastian, lily, wayan}@staff.gunadarma.ac.id Program Doktor Teknologi

More information

Reverse Engineering in Data Integration Software

Reverse Engineering in Data Integration Software Database Systems Journal vol. IV, no. 1/2013 11 Reverse Engineering in Data Integration Software Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro Integrated applications

More information

Fogbeam Vision Series - The Modern Intranet

Fogbeam Vision Series - The Modern Intranet Fogbeam Labs Cut Through The Information Fog http://www.fogbeam.com Fogbeam Vision Series - The Modern Intranet Where It All Started Intranets began to appear as a venue for collaboration and knowledge

More information

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609. Data Integration using Agent based Mediator-Wrapper Architecture Tutorial Report For Agent Based Software Engineering (SENG 609.22) Presented by: George Shi Course Instructor: Dr. Behrouz H. Far December

More information