Combining Data Integration and Information Extraction Techniques
|
|
- Rudolf Stokes
- 7 years ago
- Views:
Transcription
1 Combining Data Integration and Information Extraction Techniques Dean Williams School of Computer Science and Information Systems, Birkbeck College, University of London Abstract We describe a class of applications which are built using databases comprising some structured data and some free text. Conventional database management systems have proved ineffective for these applications and they are rarely suitable for current text and data mining techniques. We argue that combining Information Extraction and Data Integration techniques is a promising direction for research and we outline how our ESTEST system demonstrates this approach. 1. Introduction A class of applications exist which can be characterised by the way in which they combine both data conforming to a schema and some related free text. We describe this application class in Section 2. Our approach is to combine Data Integration (DI) and Information Extraction (IE) techniques to better exploit the text data and, in Section 3, there is a summary of related areas of research and we show how our method relates to these. Section 4 details why we belive text is used in these applications and as a result, why we belive combining DI and IE techniques will be beneficial to these applications. Details of our system Experimental Software to Extract Structure from Text (ES- TEST) are given in Section 5 which shows how we plan to realise our goals. Finally we give our conclusions and plans for future work in Section Partially Structured Data In [1] King and Poulovassilis define a distinct category of data - partially structured data (PSD). Many database applications rely on storing significant amounts of data in the form of free text. Recent developments in database technology have improved the facilities available for storing large amounts of text. However the provision for making use of this text data largely relies on searching the text for keywords. A class of applications exist where the information to be stored consists partly of some structured data conforming to a schema with the remainder left as free text. We consider this data to be partially structured. This idea of PSD is distinct from semistructured data, which is generally taken to mean data that is self-describing. In semistructured data there may not be a schema defined but the data itself contains some structural information e.g. XML tags.
2 An example of an application based on the use of PSD is operational intelligence gathering, which is used in serious crime investigations. The data collected in this application area takes the form of a report that contains some structured data such as the name of the Police Officer making the report, the time and location of the incident, as well as details of subjects and locations contained in the report. This is combined with the actual report of the sighting or information received which is captured as text. A number of other text based applications exist in crime e.g. for witness statements and scene of crime reports. Other application domains we are familiar with which have partially structured data include Road Traffic Accident reports where the standard format statistics are combined with free text accounts in a formalised subset of English. In Bioinfomatics, structured databases such as the SWISS-PROT [2] database includes comment fields that contain related unstructured information. A common theme of many of these applications, including crime and SWISS-PROT, is a requirement for expert users to annotate the text, trying to use standard terms to assist with queries, reduce duplication and highlight important facts. This is often a time consuming, demanding task with results less effective than would be desired and applications to assist with this work are being developed both as academic research projects e.g. [3] and commercial software e.g. [4]. 3. Related Areas A number of active areas of research deal with text in databases and we use the following definitions to establish how our approach relates to these. Data Integration Providing a single schema over a collection of data sources that facilitates queries across the sources [5] Information Extraction Finding pre-defined entities from text and using the extracted data to fill slots in a template using shallow NLP techniques [6]. Data Mining / Knowledge Discovery in Databases Finding patterns in structured data, discovering new deep knowledge embedded in data. Text Mining Application of data mining to text (often some NLP process creates a structured dataset from the text and then this is used for data mining [7]). Graph Based Data Models Current industry standard databases are essentially record based (e.g. the relational model or some form of object data model) where the schema must be determined in advance of populating the database. Graph-based data models offer finer semantic granularity and greater flexibility [8]. We are not proposing use of a text mining technique which finds patterns in very large collections of text e.g. Nahm and Mooney [9] who combine IE with Text Mining. For many of the PSD applications we have described this is unlikely to be effective as there are not very large static datasets to be mined (although there are some exceptions e.g. SWISS-PROT), rather over time new query requirements arise and extensions to the schema are required. We propose an evolutionary system where the user iterates through the steps as new information sources and new query requirements arise. Firstly an initial integrated 2
3 schema is built from a variety of sources including structured data schema, domain ontologies and natural language ontologies. Then information extraction rules are semiautomatically generated from this schema to be used as input for the IE processor. The data extracted from the text is added to the integrated schema and is available to answer queries. The schema may then be extended by new data-sources being added or new schema elements identified and the process repeats. Figure 1 shows how the user will use the ESTEST system in this evolutionary manner. Because of the evolutionary approach we suggest a graphical workbench will be required for end user use of ESTEST and we intend to consider the requirements of such a workbench. Integrate Datasources Create Data to assist the IE process IE Direction Data Enhance Schema Information Global Schema Extracted Data Information Extraction (IE) Query Global Schema Integrate Results of IE Control Flow Data Flow Fig. 1. Evolutionary Use of the ESTEST System 4. Combining Data Integration and Information Extraction We belive that the data collected in the form of free-text is important to PSD applications and is not stored as text due to its secondary value, and that there are two main reasons for storing data as text in PSD applications: It is not possible in advance to know all of the queries that will be required in the future. The text captured represents an intuitive attempt by the user to provide all information that could possibly be relevant. The Road Traffic Accident reports are 3
4 a good example of this. The schema of the structured part of the data covers all currently known requirements in a format known as STATS20 [10] and the text part is used when new reporting requirements arise. Data is captured as text due to the limitation of dynamically building a schema in conventional DBMS where simply adding a column to an existing table can be a major task in production systems. For example in systems storing witness statements in crime reports as entities and relationships are mentioned for the first time it is not possible to dynamically expand the underlying data schema and so the new information is only stored in its text form. Furthermore, the real world entities and relationships described in the text are related to the entities in the structured part of the data. An application combining IE and Data Integration will provide advantages in these applications for a number of reasons. Information Extraction is based on the idea of filling pre-defined templates and Data Integration can provide a global schema to be used as a template. Combining the schema of the structured data together with ontologies and other metadata sources can create the global schema / template. Metadata from the data sources can be used to assist the IE process by semi-automatically creating the required input to the IE modules. Data Integration systems which use a low level graph based common data model (e.g. AutoMed [11]) are able to extend schema as new entities become known without the overhead associated with conventional DBMS as they are not based on record based structures such as tables in relational databases. The templates filled by the IE process will provide a new data source to be added to the global schema supporting new queries which could not previously be answered. 5. The ESTEST System Our ESTEST system makes use of the AutoMed heterogeneous data integration system being developed at Birkbeck and Imperial Colleges [12]. In data integration systems, several data sources, each with an associated local schema, are integrated to form a single virtual database with an associated global schema. If the data sources conform to different data models, then these need to be transformed into a common data model as part of the integration process. The AutoMed system uses a low-level graph-based data model, the HDM, as its common data model - this is suitable for incremental increases in a global schema as new requirements arise. We have developed an AutoMed HDM data store [13] to store instance data and intermediate results for ESTEST. AutoMed implements bi-directional schema transformation pathways to transform and integrate heterogeneous schemas [11] which is a flexible approach amenable to including new domain knowledge dynamically. In summary the ESTEST system works as follows. The data sources are first identified and integrated into a single global schema. In AutoMed each data model which can be integrated is defined in terms of the HDM. Each construct in the external data model has an associated set of HDM nodes and edges. In the ESTEST system some features of data models are required to be preserved across all the integrated data sources. These features include an IS-A concept hierarchy; allowing for attributes; identifying text data to be mined and the ability to attach word forms to concepts. To facilitate the 4
5 automatic creation of the global schema all the data sources used by ESTEST will be transformed to an ESTEST data model. Each construct in the external model also has a set of transformations to map onto the ESTEST data model. Once all the data sources have been transformed to this standard representation and mappings between schema elements obtained - it will be possible to integrate the schemas. ESTEST then takes the metadata in the global schema and uses this to suggest input into the IE process. The user confirms, corrects and appends to this configuration data and the IE process is run. We make use of the GATE [14] IE architecture to build the ESTEST IE processor. As well as reusing standard IE components such as Named Entity gazetteers, sentence splitters, pattern matching grammars (with configured inputs semi-automatically created by ESTEST), a number of new IE components are being developed: TemplateFromSchema Takes an ESTEST global schema and creates templates to be filled by the IE engine and creates input to the standard IE components. NE-DB Named Entity recognition in IE is typically driven by flat file lists, the NE-DB component will associate a query on the global schema with a annotation type. A list of word forms will be materialised in the HDM store for use when the IE process is running (GATE NE gazetteers generate Finite State Machines for possible transitions of tokens). WordForm For a given concept will get relevant word forms from the WordNet natural language ontology. It will be possible to generate more words by increasing the number of traversals allowed through the WordNet hierarchy ordered by an approximation of semantic distance. The templates filled by the IE process will then be used to add to the extent of the concept in the global schema. Extracted annotations which match objects in the global schema will be extracted and put in the HDM store.the global query facilities of AutoMed are now available to the user who can query the global schema using the IQL query language [15, 16]. For more detailed information on the design of the ESTEST system we refer the reader to [17] and for an example of its operation in the Road Traffic Accident domain to [18]. Recent work within the Tristarp group [19], has resulted in advanced visualisation tools for graph-based databases becoming available [20] that may be of assistance in the proposed user workbench. This research interest is also reflected in recent products developed in industry, the Sentences [21] DBMS from Lazysoft is based on a quadruple store and sets out to challenge the dominance of the relational model. 6. Conclusions and Future Work We have discussed how a class of applications based on partially structured data are not adequately supported by current database and data mining techniques. We have stated why we belive combining Information Extraction and Data Integration techniques is a promising direction for research. We are currently completing an initial implementation of the ESTEST system which we will test in the Road Traffic Accident reporting and Crime Investigation domains. 5
6 ESTEST extends the facilities offered by data integration systems by moving towards handling text and extends IE systems by attempting to use schema information to semiautomatically configure the IE process. References 1. P.J.H.King and A Poulovassilis. Enhancing database technology to better manage and exploit partially structured data. Technical report, Birkbeck College, University of London, Bairoch A., Boeckmann B., Ferro S., and Gasteiger E. Swiss-Prot: Juggling between evolution and stability. Brief. Bioinform., 5:39 55, SOCIS Scene of Crime Information System QUENZA Alon Y. Halevy. Data integration: A status report. In Gerhard Weikum, Harald Schöning, and Erhard Rahm, editors, BTW, volume 26 of LNI, pages GI, D. Appelt. An introduction to information extraction. Artificial Intelligence Communications, A.H.Tan. Text mining: The state of the art and the challanges. Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, pages 65 70, William Kent. Limitations of record-based information models. ACM Transactions on Database Systems, 4(1): , March R. Mooney U.Y. Nahm. Using information extraction to aid the discovery of prediction rules from text. Proceedings of the KDD-2000 Workshop on text Mining, pages 51 58, UK Government Department for Transport. Instructions for the completion of road accident report form stats19. transstats/documents/page/dft transstats pdf. 11. P.J. McBrien and A. Poulovassilis. Data integration by bi-directional schema transformation rules. In Proc. ICDE 03, AutoMed Project D. Williams. The AutoMed HDM data store. Technical report, Automed Project, H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. Experience with a language engineering architecture: Three years of GATE. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 02), A. Poulovassilis. The AutoMed Intermediate Query Language. Technical report, AutoMed Project, E. Jasper. Global query processing in the AutoMed heterogeneous database environment. In Proc. BNCOD02, LNCS 2405, pages 46 49, D. Williams and A.Poulovassilis. Combining data integration with natural language technology for the semantic web. In Proc. Workshop on Human Language Technology for the Semantic Web and Web Services, at ISWC 03, page TBC, Dean Williams and Alexandra Poulovassilis. An example of the estest approach to combining unstructured text and structured data. In DEXA Workshops, pages IEEE Computer Society, Tristarp Project M.N. Smith and P.J.H. King. Database support for exploring criminal networks. Intelligence and Security Informatics: First NSF/NIJ Symposium, Lazysoft (maker of Sentences). 6
Dean Williams. A thesis submitted in fulfilment of the requirements for the degree of Doctor of. Philosophy in the University of London.
Combining Data Integration and Information Extraction Dean Williams A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy in the University of London. Submitted July
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationSemantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
More informationA Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
More informationA Uniform Approach to Workflow and Data Integration
A Uniform Approach to Workflow and Data Integration Lucas Zamboulis 1, 2, Nigel Martin 1, Alexandra Poulovassilis 1 1 School of Computer Science and Information Systems, Birkbeck, Univ. of London 2 Department
More informationSemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks
SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks Melike Şah, Wendy Hall and David C De Roure Intelligence, Agents and Multimedia Group,
More information72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
More informationAnnotation for the Semantic Web during Website Development
Annotation for the Semantic Web during Website Development Peter Plessers, Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,
More informationIntroduction to IE with GATE
Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation
More informationSEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.
More informationSOCIS: Scene of Crime Information System - IGR Review Report
SOCIS: Scene of Crime Information System - IGR Review Report Katerina Pastra, Horacio Saggion, Yorick Wilks June 2003 1 Introduction This report reviews the work done by the University of Sheffield on
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationRETRATOS: Requirement Traceability Tool Support
RETRATOS: Requirement Traceability Tool Support Gilberto Cysneiros Filho 1, Maria Lencastre 2, Adriana Rodrigues 2, Carla Schuenemann 3 1 Universidade Federal Rural de Pernambuco, Recife, Brazil g.cysneiros@gmail.com
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationTraining Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France
More informationExploration and Visualization of Post-Market Data
Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationUser Profile Refinement using explicit User Interest Modeling
User Profile Refinement using explicit User Interest Modeling Gerald Stermsek, Mark Strembeck, Gustaf Neumann Institute of Information Systems and New Media Vienna University of Economics and BA Austria
More informationDistributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
More informationTowards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
More informationA Framework of Context-Sensitive Visualization for User-Centered Interactive Systems
Proceedings of 10 th International Conference on User Modeling, pp423-427 Edinburgh, UK, July 24-29, 2005. Springer-Verlag Berlin Heidelberg 2005 A Framework of Context-Sensitive Visualization for User-Centered
More informationA Framework and Architecture for Quality Assessment in Data Integration
A Framework and Architecture for Quality Assessment in Data Integration Jianing Wang March 2012 A Dissertation Submitted to Birkbeck College, University of London in Partial Fulfillment of the Requirements
More informationMULTI AGENT-BASED DISTRIBUTED DATA MINING
MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:
More informationTopics in basic DBMS course
Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch
More informationAutomatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
More informationSupporting Change-Aware Semantic Web Services
Supporting Change-Aware Semantic Web Services Annika Hinze Department of Computer Science, University of Waikato, New Zealand a.hinze@cs.waikato.ac.nz Abstract. The Semantic Web is not only evolving into
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationLightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
More informationService Oriented Architecture
Service Oriented Architecture Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline
More informationIt s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking
It s all around the domain ontologies - Ten benefits of a Subject-centric Information Architecture for the future of Social Networking Lutz Maicher and Benjamin Bock, Topic Maps Lab at University of Leipzig,
More informationProject Knowledge Management Based on Social Networks
DOI: 10.7763/IPEDR. 2014. V70. 10 Project Knowledge Management Based on Social Networks Panos Fitsilis 1+, Vassilis Gerogiannis 1, and Leonidas Anthopoulos 1 1 Business Administration Dep., Technological
More informationIFS-8000 V2.0 INFORMATION FUSION SYSTEM
IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationKEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationA Framework of User-Driven Data Analytics in the Cloud for Course Management
A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer
More informationQualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1
Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic
More informationArtificial Intelligence & Knowledge Management
Artificial Intelligence & Knowledge Management Nick Bassiliades, Ioannis Vlahavas, Fotis Kokkoras Aristotle University of Thessaloniki Department of Informatics Programming Languages and Software Engineering
More informationData Integration by Bi-Directional Schema Transformation Rules
Data Integration by Bi-Directional Schema Transformation Rules Peter M c.brien Dept. of Computing, Imperial College, London SW7 2BZ pjm@doc.ic.ac.uk Alexandra Poulovassilis School of Computer Science and
More informationLinked Science as a producer and consumer of big data in the Earth Sciences
Linked Science as a producer and consumer of big data in the Earth Sciences Line C. Pouchard,* Robert B. Cook,* Jim Green,* Natasha Noy,** Giri Palanisamy* Oak Ridge National Laboratory* Stanford Center
More informationAbstract. Introduction
CODATA Prague Workshop Information Visualization, Presentation, and Design 29-31 March 2004 Abstract Goals of Analysis for Visualization and Visual Data Mining Tasks Thomas Nocke and Heidrun Schumann University
More informationIntelligent Analysis of User Interactions in a Collaborative Software Engineering Context
Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,
More informationInformation Brokering over the Information Highway: An Internet-Based Database Navigation System
In Proc. of The Joint Pacific Asian Conference on Expert Systems, Singapore, 1997 Information Brokering over the Information Highway: An Internet-Based Database Navigation System Syed Sibte Raza ABIDI
More informationFind the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
More informationA Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
More informationezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics
ezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics Raxit Goswami*, Neil Shah* and Amit Sheth*, ** ezdi Inc, Louisville, KY and Ahmedabad, India. ** Kno.e.sis-Wright State
More informationDeriving Business Intelligence from Unstructured Data
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 971-976 International Research Publications House http://www. irphouse.com /ijict.htm Deriving
More informationA Framework for the Delivery of Personalized Adaptive Content
A Framework for the Delivery of Personalized Adaptive Content Colm Howlin CCKF Limited Dublin, Ireland colm.howlin@cckf-it.com Danny Lynch CCKF Limited Dublin, Ireland colm.howlin@cckf-it.com Abstract
More informationSoftware Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
More informationCaravela: Semantic Content Management with Automatic Information Integration and Categorization (System Description)
Caravela: Semantic Content Management with Automatic Information Integration and Categorization (System Description) David Aumueller, Erhard Rahm University of Leipzig {david, rahm}@informatik.uni-leipzig.de
More informationVisionet IT Modernization Empowering Change
Visionet IT Modernization A Visionet Systems White Paper September 2009 Visionet Systems Inc. 3 Cedar Brook Dr. Cranbury, NJ 08512 Tel: 609 360-0501 Table of Contents 1 Executive Summary... 4 2 Introduction...
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationUsing NLP and Ontologies for Notary Document Management Systems
Outline Using NLP and Ontologies for Notary Document Management Systems Flora Amato, Antonino Mazzeo, Antonio Penta and Antonio Picariello Dipartimento di Informatica e Sistemistica Universitá di Napoli
More informationWHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT
WHITE PAPER DATA GOVERNANCE ENTERPRISE MODEL MANAGEMENT CONTENTS 1. THE NEED FOR DATA GOVERNANCE... 2 2. DATA GOVERNANCE... 2 2.1. Definition... 2 2.2. Responsibilities... 3 3. ACTIVITIES... 6 4. THE
More informationRevealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study
Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study Amar-Djalil Mezaour 1, Julien Law-To 1, Robert Isele 3, Thomas Schandl 2, and Gerd Zechmeister
More informationDesign and Implementation of Domain based Semantic Hidden Web Crawler
Design and Implementation of Domain based Semantic Hidden Web Crawler Manvi Department of Computer Engineering YMCA University of Science & Technology Faridabad, India Ashutosh Dixit Department of Computer
More informationCitationBase: A social tagging management portal for references
CitationBase: A social tagging management portal for references Martin Hofmann Department of Computer Science, University of Innsbruck, Austria m_ho@aon.at Ying Ding School of Library and Information Science,
More informationLDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,
More informationINTEROPERABILITY IN DATA WAREHOUSES
INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content
More informationService Road Map for ANDS Core Infrastructure and Applications Programs
Service Road Map for ANDS Core and Applications Programs Version 1.0 public exposure draft 31-March 2010 Document Target Audience This is a high level reference guide designed to communicate to ANDS external
More informationCorHousing. CorHousing provides performance indicator, risk and project management templates for the UK Social Housing sector including:
CorHousing CorHousing provides performance indicator, risk and project management templates for the UK Social Housing sector including: Corporate, operational and service based scorecards Housemark indicators
More informationOntology-based Domain Modelling for Consistent Content Change Management
Ontology-based Domain Modelling for Consistent Content Change Management Muhammad Javed 1, Yalemisew Abgaz 2, Claus Pahl 3 Centre for Next Generation Localization (CNGL), School of Computing, Dublin City
More informationThe Ontology and Architecture for an Academic Social Network
www.ijcsi.org 22 The Ontology and Architecture for an Academic Social Network Moharram Challenger Computer Engineering Department, Islamic Azad University Shabestar Branch, Shabestar, East Azerbaijan,
More informationTest Data Management Concepts
Test Data Management Concepts BIZDATAX IS AN EKOBIT BRAND Executive Summary Test Data Management (TDM), as a part of the quality assurance (QA) process is more than ever in the focus among IT organizations
More informationUsing Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources
Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Investigating Automated Sentiment Analysis of Feedback Tags in a Programming Course Stephen Cummins, Liz Burd, Andrew
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationSemantification of Query Interfaces to Improve Access to Deep Web Content
Semantification of Query Interfaces to Improve Access to Deep Web Content Arne Martin Klemenz, Klaus Tochtermann ZBW German National Library of Economics Leibniz Information Centre for Economics, Düsternbrooker
More informationA Data Browsing from Various Sources Driven by the User s Data Models
A Data Browsing from Various Sources Driven by the User s Data Models Guntis Arnicans, Girts Karnitis University of Latvia, Raina blvd. 9, Riga, Latvia {Guntis.Arnicans, Girts.Karnitis}@lu.lv Abstract.
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationInformation Services for Smart Grids
Smart Grid and Renewable Energy, 2009, 8 12 Published Online September 2009 (http://www.scirp.org/journal/sgre/). ABSTRACT Interconnected and integrated electrical power systems, by their very dynamic
More informationOntology and automatic code generation on modeling and simulation
Ontology and automatic code generation on modeling and simulation Youcef Gheraibia Computing Department University Md Messadia Souk Ahras, 41000, Algeria youcef.gheraibia@gmail.com Abdelhabib Bourouis
More informationAutomatic Timeline Construction For Computer Forensics Purposes
Automatic Timeline Construction For Computer Forensics Purposes Yoan Chabot, Aurélie Bertaux, Christophe Nicolle and Tahar Kechadi CheckSem Team, Laboratoire Le2i, UMR CNRS 6306 Faculté des sciences Mirande,
More informationA Contribution to Expert Decision-based Virtual Product Development
A Contribution to Expert Decision-based Virtual Product Development László Horváth, Imre J. Rudas Institute of Intelligent Engineering Systems, John von Neumann Faculty of Informatics, Óbuda University,
More informationResource Management on Computational Grids
Univeristà Ca Foscari, Venezia http://www.dsi.unive.it Resource Management on Computational Grids Paolo Palmerini Dottorato di ricerca di Informatica (anno I, ciclo II) email: palmeri@dsi.unive.it 1/29
More informationA Mind Map Based Framework for Automated Software Log File Analysis
2011 International Conference on Software and Computer Applications IPCSIT vol.9 (2011) (2011) IACSIT Press, Singapore A Mind Map Based Framework for Automated Software Log File Analysis Dileepa Jayathilake
More informationSCADE System 17.0. Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System 17.0 1
SCADE System 17.0 SCADE System is the product line of the ANSYS Embedded software family of products and solutions that empowers users with a systems design environment for use on systems with high dependability
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationSurvey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
More informationInformation Discovery on Electronic Medical Records
Information Discovery on Electronic Medical Records Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, MD Anthony F. Rossi, MD Jeffrey A. White, FIU FIU Miami Children s Hospital Miami Children s Hospital
More informationPatterns of Information Management
PATTERNS OF MANAGEMENT Patterns of Information Management Making the right choices for your organization s information Summary of Patterns Mandy Chessell and Harald Smith Copyright 2011, 2012 by Mandy
More informationONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute
More informationAnnotea and Semantic Web Supported Collaboration
Annotea and Semantic Web Supported Collaboration Marja-Riitta Koivunen, Ph.D. Annotea project Abstract Like any other technology, the Semantic Web cannot succeed if the applications using it do not serve
More informationA METHOD FOR REWRITING LEGACY SYSTEMS USING BUSINESS PROCESS MANAGEMENT TECHNOLOGY
A METHOD FOR REWRITING LEGACY SYSTEMS USING BUSINESS PROCESS MANAGEMENT TECHNOLOGY Gleison Samuel do Nascimento, Cirano Iochpe Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre,
More informationScalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many
More informationA semantic extension of a hierarchical storage management system for small and medium-sized enterprises.
Faculty of Computer Science Institute of Software- and Multimedia Technology, Chair of Multimedia Technology A semantic extension of a hierarchical storage management system for small and medium-sized
More informationFlattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationA generic approach for data integration using RDF, OWL and XML
A generic approach for data integration using RDF, OWL and XML Miguel A. Macias-Garcia, Victor J. Sosa-Sosa, and Ivan Lopez-Arevalo Laboratory of Information Technology (LTI) CINVESTAV-TAMAULIPAS Km 6
More informationA generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment
www.ijcsi.org 434 A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment V.THAVAVEL and S.SIVAKUMAR* Department of Computer Applications, Karunya University,
More informationSemantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1
Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1 Ivo Marinchev Abstract: The paper introduces approach to semantic lifting of unstructured data with the help of natural language
More informationSWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY
SWAP: ONTOLOGY-BASED KNOWLEDGE MANAGEMENT WITH PEER-TO-PEER TECHNOLOGY M. EHRIG, C. TEMPICH AND S. STAAB Institute AIFB University of Karlsruhe 76128 Karlsruhe, Germany E-mail: {meh,cte,sst}@aifb.uni-karlsruhe.de
More informationRecognition and Privacy Preservation of Paper-based Health Records
Quality of Life through Quality of Information J. Mantas et al. (Eds.) IOS Press, 2012 2012 European Federation for Medical Informatics and IOS Press. All rights reserved. doi:10.3233/978-1-61499-101-4-751
More informationOntology-based Archetype Interoperability and Management
Ontology-based Archetype Interoperability and Management Catalina Martínez-Costa, Marcos Menárguez-Tortosa, J. T. Fernández-Breis Departamento de Informática y Sistemas, Facultad de Informática Universidad
More informationFinancial Events Recognition in Web News for Algorithmic Trading
Financial Events Recognition in Web News for Algorithmic Trading Frederik Hogenboom Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands fhogenboom@ese.eur.nl Abstract. Due to
More informationSEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS
SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS Irwan Bastian, Lily Wulandari, I Wayan Simri Wicaksana {bastian, lily, wayan}@staff.gunadarma.ac.id Program Doktor Teknologi
More informationReverse Engineering in Data Integration Software
Database Systems Journal vol. IV, no. 1/2013 11 Reverse Engineering in Data Integration Software Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro Integrated applications
More informationFogbeam Vision Series - The Modern Intranet
Fogbeam Labs Cut Through The Information Fog http://www.fogbeam.com Fogbeam Vision Series - The Modern Intranet Where It All Started Intranets began to appear as a venue for collaboration and knowledge
More informationData Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.
Data Integration using Agent based Mediator-Wrapper Architecture Tutorial Report For Agent Based Software Engineering (SENG 609.22) Presented by: George Shi Course Instructor: Dr. Behrouz H. Far December
More information