Instance-Based Learning and Information Extraction for the Generation of Metadata
|
|
- Erik Gallagher
- 7 years ago
- Views:
Transcription
1 Instance-Based Learning and Information Extraction for the Generation of Metadata Andreas D. Lattner (Center for Computing Technologies TZI University of Bremen, Germany Otthein Herzog (Center for Computing Technologies TZI University of Bremen, Germany Abstract: Knowledge Management recently has become popular in enterprises hoping to achieve a competitive advantage. Describing information by metadata allows for performing detailed queries uniformly over information items from different information sources and enables goal-directed search and automatic provision of relevant information. As the manual acquisition of metadata is very costly, support for this task is desired. This work presents two metadata extractors for the creation of metadata. The first applies Instance-Based Learning for the adoption of metadata from similar objects. The second extracts information by applying regular expressions. Both extractors have been integrated into the metadata generation framework of the KnowWork system and have been evaluated in experiments on two realworld data sets from the engineering domain. Key Words: Metadata Generation, Instance-Based Learning, Information Extraction, Knowledge Management Category: H.3. Introduction Knowledge Management recently has become popular in enterprises hoping to achieve a competitive advantage. Having uniform access to all information within a company allows the users to find information faster. Structuring information can be done by setting up an ontology for existing information items. An ontology is the explicit specification of a conceptualization [Gruber 993]. All information classes and their properties can be defined in an ontology. Instances of these classes are representations of business information objects (e.g., a bill of material of a certain product). Their properties attributes and associations to other objects are described by metadata. Metadata allows for performing detailed queries uniformly over information items from different information sources. It enables goal-directed search and automatic provision of relevant information. But how does the metadata get into the system? The manual acquisition of metadata is very costly. As no direct benefit is seen by the users, the motivation for entering metadata usually is quite low. Therefore, support by semi-automated metadata generation is needed to overcome this
2 situation. Semi-automated means that the created metadata has to be understand as a suggestion, which should be verified by the user. As no metadata generation approach will provide perfect metadata, there is a trade-off between checking the created metadata and letting erroneous data into the system. Related Work There are many different fields which are related to metadata generation. If unstructured documents have to be processed, text classification or information extraction can be used for the creation of metadata (e.g. [Yang and Liu 999, Hobbs et al. 996]). In both cases it can be distinguished between automated and manual approaches. [Sebastiani 00] gives a good survey about machine learning in the area of automated text categorization. Examples for learning information extraction rules can be found in [Soderland 999] and [Junker et al. 999]. Inter- and intranet web pages may also be enhanced by metadata, e.g., in the Semantic Web context. Various approaches treat the classification of web pages, e.g., [Pierre 00], metadata generation for web pages [Jenkins et al. 999, Stuckenschmidt and van Harmelen 00], or the creation of knowledge bases from the World Wide Web [Craven et al. 000]. If metadata has to be created for databases, other approaches can be applied. Database contents can be adopted directly via database wrappers or can be mapped to the defined vocabulary of an appropriate ontology (e.g. [Tork Roth and Schwarz 997, Bergamaschi et al. 999]). The technologies to be applied strongly depend on the information sources managed by the system. Person Name Phone * authors Change Report Software variant Area of validity Document Creation date URI is a is a Travel Application Document-ID Checked by Ontology Metadata Extractor Metadata Extractor Metadata Extractor 3 Metadata Extractor n Metadata Generation Extractors Figure : Mapping between attributes and metadata generation extractors
3 Annotated documents Contact person: Mrs. Green Annotation of value to new document Contact person: Mrs. Green Contact person: Mr. Blue Most similar documents Figure : Adoption of metadata from the k-nearest neighbors 3 Metadata Generation with KnowWork The metadata generation framework MetaGen is a module of the KnowWork system [Tönshoff et al. 00] and has been introduced in [Lattner and Apitz 00]. The KnowWork system allows for managing information classes and information items within its domain model, an ontology representation. Information items can be described by attributes and linked to other items via metadata. With the metadata generation framework it is possible to create metadata for arbitrary information items. Its flexible structure allows for integrating metadata extraction modules as needed by implementing an extractor interface. The different extractors can be connected to all defined attributes for creating metadata. Fig. illustrates the mapping between metadata generation extractors and attributes from different information types. Two extractors have been implemented and evaluated so far: the TextSimilarityExtractor and the RegExExtractor. Both are briefly described in the next two subsections. 3. TextSimilarityExtractor The TextSimilarityExtractor uses an instance-based learning approach for the creation of metadata. It adopts metadata from the k-nearest neighbors (k-nn) by applying a similarity measure based on text content. An example is given in Fig.. If values for attribute a of object o should be created these steps are performed: Collect the k-nearest neighbors (default setting is k = 5) to the object o which also have the attribute a, i.e., which are instances of the class (or one of its subclasses) where the attribute a is defined. Collect and count all values for attribute a of neighbors n, n,, n k. Take over the values: o If a is a single-valued attribute, take the value with most appearances as created metadata. o If a is set-valued, take the l most frequently values as created metadata, where l is the average number of values for attribute a for the k neighbors.
4 In the case of the TextSimilarityExtractor the similarity measure is computed from text documents in vector representations. Our implementation uses the mindaccess SDK, a software development kit for integrating the mindaccess system. mindaccess features, among others, search and classification techniques for text documents [insiders 00]. All documents d j are represented by their term vectors: d j = ( w, j, w, j, K, wt, j ). The utilized term-weighting strategy follows the TF-IDF scheme. The similarity between two documents d and d is computed by the cosine of the angle between their vectors (cf. [Baeza-Yates and Ribeiro-Neto 999]): sim( d, d ) d = d d = d t i t i= = i, i, w w i, w t i= w i, 3. RegExExtractor With the RegExExtractor regular expressions for information extraction from texts can be defined. It uses the Jakarta ORO package ( which provides Perl5 compatible regular expressions. Extraction rules Document-ID: Doc. No.: ($value) [\n] Author: Applicant: ($value) [\n] Checked by: checked by: ($value) ( Text content Doc. No.: Travel application Applicant: M. Meyer checked by: K. Müller (..0) Rule matching Doc. No.: Travel application Applicant: M. Meyer checked by: K. Müller (..0) Document-ID: Author: M. Meyer Checked by: K. Müller Annotation of values to the document Figure 3: Information extraction with regular expressions If certain patterns, that indicate where attribute values can be found, appear frequently in texts this information can be used for information extraction. Many documents consist of such patterns, e.g., for pointing out the creation date or author data. In these cases, extraction rules can be defined. In the following example the author name is expected after the Created by: string. All characters after the text Created by: and before the end of line are extracted as the value. After the colon at least one tab or white space is expected. The according extraction rule is Created by:[\t ]+([\w0-9-_]+)$. Fig. 3 illustrates the use of regular expressions for the extraction of information. The figure shows simplified extraction rules for better understanding.
5 4 Evaluation Both extractors have been applied to real world data sets from the engineering domain, which have been provided by two of our application partners of the KnowWork project. Due to non-disclosure agreements we are not granted permission to show any original data used in our experiments. The data set from the first company consists of 0 documents of three different document types. All documents have six attributes: five single-valued and one setvalued attribute (Tab. ). The other data set for evaluation has been provided by a second company. It has 95 documents from a single document type (Change Report). attributes have been assigned to each document. Eight attributes are single-valued and 3 are set-valued (Tab. ). For both data sets three independent experiments have been performed at each case. For each experiment, the data sets were randomly divided into a training set (ca. 60% of the documents) and a testing set (ca. 40% of the documents). The sizes of the testing sets were 4 documents in the first case and 38 in the second one. For each document from the test sets values for all attributes have been created. The quality of the metadata generation was evaluated by the computation of precision and recall of the created metadata. The precision is the ratio of the correctly created to all created values. The ratio of the correctly created values to all actual values for an attribute determines the recall. These values have been calculated for each attribute on its own and for all attributes together. For the first data set only the TextSimilarityExtractor has been used. As these documents were very homogeneous, taking over attribute values from similar documents worked out very well. The precision was on average 9.7% at a recall of 9.7%. These results must not be overestimated because for many attributes only a few possible values existed (e.g., file format). The most challenging attribute here was the set-valued keywords attribute. But even in this case, the TextSimilarityExtractor returned good results with a precision of 79.4% and a recall of 76.% (Tab. ). The data set from the second company was more complex. Some attribute values could not be determined by the k-nn approach (e.g., the creation date). For the first seven attributes (see Table ) the RegExExtractor was used with manually created extraction rules. The TextSimilarityExtractor was applied for the remaining fourteen attributes. The overall average precision and recall on this data set turned out to be 67.7% and 67.9%, respectively. For some attributes the precision and recall values of the TextSimilarityExtractor were quite low. This happens if some attribute values just appear sporadically, or if text similarity can only give little prediction of attribute values. The TextSimilarityExtractor performed poorly at creating values for the attributes product type, change reason, hardware version, and software version. In the worst case the precision and recall are 5.7% and 7.7%. Nevertheless in many cases quite good results were achieved. For twelve of the attributes the precision and recall values were higher than 75% on average (Tab. ).
6 Attribute Set valued Extractor Exp. Exp. Exp. Exp. Exp. 3 Exp. 3 average average Contact person TextSim File type TextSim Project TextSim Customer TextSim Keywords X TextSim Author TextSim Overall Table : Evaluation results of data set from company 5 Conclusion The experiments on the two data sets show quite promising results. It can be seen that even with pretty simple approaches good results can be achieved on real world data. Even though the first data set was quite homogeneous, the experiments showed the practicability of the two integrated metadata generation extractors. For some attributes it is not recommended two apply these extractors, because the quality of the created metadata is not good enough. But if the extractors are applied only to suitable attributes, they can be a great help for the user during the metadata acquisition phase. Depending on the user s requirements, the recall of documents based on the created metadata can be increased by taking over more values. This has the advantage that more values (probably including the right ones) are presented to the user. As taking over more values might also include erroneous ones, such a modification could lead to worse precision values. Acknowledgements The content of this paper is a partial result of the KnowWork project, which is funded by the German Ministry for Education and Research (BMBF) under grant 0 IN 00 D. We wish to express our gratitude to the KnowWork colleagues and students at TZI for their contribution during the development and implementation of some of the ideas and concepts presented in this paper. We also want to acknowledge the efforts of the KnowWork project partners, especially the enterprises which provided the data sets for the evaluation of the extractors, and insiders Wissensbasierte Systeme GmbH for the provision and integration of their technologies into the KnowWork system.
7 Attribute Set valued Extractor Exp. Exp. Exp. Exp. Exp. 3 Exp. 3 average average Report ID RegEx SAP number RegEx Product series X RegEx Author RegEx Checked by RegEx Approved by RegEx Creation date RegEx Power/kW X TextSim Product type X TextSim OEM variant X TextSim Mech. Constr. X TextSim Mounting form Hardware variant Software variant Change reason Hardware version Software version Area of validity X TextSim X TextSim X TextSim X TextSim X TextSim X TextSim X TextSim Categories X TextSim File type TextSim Paper format TextSim Overall Table : Evaluation results of data set from company
8 References [Baeza-Yates and Ribeiro-Neto 999] Baeza-Yates, R.; Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press New York, Addison-Wesley, 999. [Bergamaschi et al. 999] Bergamaschi, S.; Castano, S.; Vincini, M.: Semantic Integration of Semistructured and Structured Data Sources. SIGMOD Record, 8():54-59, 999. [Craven et al. 000] Craven, M.; Dipasquo, D.; Freitag, D.; McCallum, A.; Mitchell, T.; Nugam, K.; Slattery, S.: Learning to Construct Knowledge Bases from the World Wide Web, Artificial Intelligence, 8(-), 000, p [Gruber 993] Gruber, T. R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(), 993, p [Hobbs et al. 996] Hobbs, J.; Appelt, D.; Bear, J.; Israel, D.; Kameyama, M.; Stickel, M.; Tyson, M.: FASTUS: Extracting Information from Natural Language Texts. In: E. Roche and Y. Schabes (Eds.): Finite State Devices for Natural Language Processing, MIT Press, 996. [insiders 00] mindaccess Overview and Concepts, Release.7, Technical Report, insiders Wissensbasierte Systeme GmbH, [Jenkins et al. 999] Jenkins, C; Jackson, M.; Burden, P.; Wallis, J.: Automatic RDF Metadata Generation for Resource Discovery, Computer Networks, 3, 999, p [Junker et al. 999] Junker, M.; Sintek, M.; Rinck, M.: Learning for Text Categorization and Information Extraction with ILP, Proceedings of the Workshop on Learning Language in Logic, 999. [Lattner and Apitz 00] Lattner, A. D., Apitz, R.: A Metadata Generation Framework for Heterogeneous Information Sources, Proceedings of the nd International Conference on Knowledge Management (I-KNOW 0), Graz, Austria, July -, 00, p [Pierre 00] Pierre, J. M.: On the Automated Classification of Web Sites, Linkoping Electronic Articles in Computer and Information Science, Vol. 6, 00. [Sebastiani 00] Sebastiani, F.: Machine Learning in Automated Text Categorization, ACM Computing Surveys, 34(), 00, p [Soderland 999] Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning, 34(-3):33-7, 999. [Stuckenschmidt and van Harmelen 00] Stuckenschmidt, H.; van Harmelen, F.: Ontologybased Metadata Generation from Semi-Structured Information, Proceedings of the st International Conference on Knowledge Capture (K-CAP 00), Morgan Kaufmann, 00, p [Tönshoff et al. 00] Tönshoff, H. K.; Apitz, R.; Lattner, A. D.; Schlieder C.: KnowWork An Approach to Co-ordinate Knowledge within Technical Sales, Design and Process Planning Departments, Proceedings of the 7th International Conference on Concurrent Enterprising, Bremen, Germany, 7 9th June 00, p [Tork Roth and Schwarz 997] Tork Roth, M.; Schwarz, P.: Don't scrap it, wrap it! A Wrapper Architecture for Legacy Sources. In: Proceeding of the 3rd VLDB Conference, Athens, Greece, 997, p [Yang and Liu 999] Yang, Y and Liu, X.: A Re-examination of Text Categorization Methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), 999, p
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationSemantically Enhanced Web Personalization Approaches and Techniques
Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,
More informationData Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.
Data Integration using Agent based Mediator-Wrapper Architecture Tutorial Report For Agent Based Software Engineering (SENG 609.22) Presented by: George Shi Course Instructor: Dr. Behrouz H. Far December
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute
More informationCombining RDF and Agent-Based Architectures for Semantic Interoperability in Digital Libraries
Combining RDF and Agent-Based Architectures for Semantic Interoperability in Digital Libraries Norbert Fuhr, Claus-Peter Klas University of Dortmund, Germany {fuhr,klas}@ls6.cs.uni-dortmund.de 1 Introduction
More informationCitationBase: A social tagging management portal for references
CitationBase: A social tagging management portal for references Martin Hofmann Department of Computer Science, University of Innsbruck, Austria m_ho@aon.at Ying Ding School of Library and Information Science,
More informationApplication of ontologies for the integration of network monitoring platforms
Application of ontologies for the integration of network monitoring platforms Jorge E. López de Vergara, Javier Aracil, Jesús Martínez, Alfredo Salvador, José Alberto Hernández Networking Research Group,
More informationA generic approach for data integration using RDF, OWL and XML
A generic approach for data integration using RDF, OWL and XML Miguel A. Macias-Garcia, Victor J. Sosa-Sosa, and Ivan Lopez-Arevalo Laboratory of Information Technology (LTI) CINVESTAV-TAMAULIPAS Km 6
More informationThe Ontology and Architecture for an Academic Social Network
www.ijcsi.org 22 The Ontology and Architecture for an Academic Social Network Moharram Challenger Computer Engineering Department, Islamic Azad University Shabestar Branch, Shabestar, East Azerbaijan,
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationEXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION
EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION Anna Goy and Diego Magro Dipartimento di Informatica, Università di Torino C. Svizzera, 185, I-10149 Italy ABSTRACT This paper proposes
More informationLightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
More informationExam in course TDT4215 Web Intelligence - Solutions and guidelines -
English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed
More informationInformation Services for Smart Grids
Smart Grid and Renewable Energy, 2009, 8 12 Published Online September 2009 (http://www.scirp.org/journal/sgre/). ABSTRACT Interconnected and integrated electrical power systems, by their very dynamic
More informationAmit Sheth & Ajith Ranabahu, 2010. Presented by Mohammad Hossein Danesh
Amit Sheth & Ajith Ranabahu, 2010 Presented by Mohammad Hossein Danesh 1 Agenda Introduction to Cloud Computing Research Motivation Semantic Modeling Can Help Use of DSLs Solution Conclusion 2 3 Motivation
More informationI. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION
Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationA semantic extension of a hierarchical storage management system for small and medium-sized enterprises.
Faculty of Computer Science Institute of Software- and Multimedia Technology, Chair of Multimedia Technology A semantic extension of a hierarchical storage management system for small and medium-sized
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationArtificial Intelligence & Knowledge Management
Artificial Intelligence & Knowledge Management Nick Bassiliades, Ioannis Vlahavas, Fotis Kokkoras Aristotle University of Thessaloniki Department of Informatics Programming Languages and Software Engineering
More informationOWL based XML Data Integration
OWL based XML Data Integration Manjula Shenoy K Manipal University CSE MIT Manipal, India K.C.Shet, PhD. N.I.T.K. CSE, Suratkal Karnataka, India U. Dinesh Acharya, PhD. ManipalUniversity CSE MIT, Manipal,
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationAn Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
More informationReverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,
More informationNo More Keyword Search or FAQ: Innovative Ontology and Agent Based Dynamic User Interface
IAENG International Journal of Computer Science, 33:1, IJCS_33_1_22 No More Keyword Search or FAQ: Innovative Ontology and Agent Based Dynamic User Interface Nelson K. Y. Leung and Sim Kim Lau Abstract
More informationUsing LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
More informationRecommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
More informationExploiting User and Process Context for Knowledge Management Systems
Workshop on User Modeling for Context-Aware Applications at the 8th Int. Conf. on User Modeling, July 13-16, 2001, Sonthofen, Germany Exploiting User and Process Context for Knowledge Management Systems
More informationTraining Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France
More informationA COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS
A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS Caldas, Carlos H. 1 and Soibelman, L. 2 ABSTRACT Information is an important element of project delivery processes.
More informationPerformance evaluation of Web Information Retrieval Systems and its application to e-business
Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,
More informationOntoPIM: How to Rely on a Personal Ontology for Personal Information Management
OntoPIM: How to Rely on a Personal Ontology for Personal Information Management Vivi Katifori 2, Antonella Poggi 1, Monica Scannapieco 1, Tiziana Catarci 1, and Yannis Ioannidis 2 1 Dipartimento di Informatica
More informationSEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS
SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS Irwan Bastian, Lily Wulandari, I Wayan Simri Wicaksana {bastian, lily, wayan}@staff.gunadarma.ac.id Program Doktor Teknologi
More informationLecture Notes in Artificial Intelligence 3159
Lecture Notes in Artificial Intelligence 3159 Edited by J. G. Carbonell and J. Siekmann Subseries of Lecture Notes in Computer Science This page intentionally left blank Ubbo Visser Intelligent Information
More informationINTEROPERABILITY IN DATA WAREHOUSES
INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content
More informationCombining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery
Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University
More informationHELP DESK SYSTEMS. Using CaseBased Reasoning
HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind
More informationLinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together
LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together Owen Sacco 1 and Matthew Montebello 1, 1 University of Malta, Msida MSD 2080, Malta. {osac001, matthew.montebello}@um.edu.mt
More informationSemantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology
Semantic Knowledge Management System Paripati Lohith Kumar School of Information Technology Vellore Institute of Technology University, Vellore, India. plohithkumar@hotmail.com Abstract The scholarly activities
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationImplementation of hybrid software architecture for Artificial Intelligence System
IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.1, January 2007 35 Implementation of hybrid software architecture for Artificial Intelligence System B.Vinayagasundaram and
More informationCINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
More informationInformation Technology for KM
On the Relations between Structural Case-Based Reasoning and Ontology-based Knowledge Management Ralph Bergmann & Martin Schaaf University of Hildesheim Data- and Knowledge Management Group www.dwm.uni-hildesheim.de
More informationA Data Browsing from Various Sources Driven by the User s Data Models
A Data Browsing from Various Sources Driven by the User s Data Models Guntis Arnicans, Girts Karnitis University of Latvia, Raina blvd. 9, Riga, Latvia {Guntis.Arnicans, Girts.Karnitis}@lu.lv Abstract.
More informationMERGING ONTOLOGIES AND OBJECT-ORIENTED TECHNOLOGIES FOR SOFTWARE DEVELOPMENT
23-24 September, 2006, BULGARIA 1 MERGING ONTOLOGIES AND OBJECT-ORIENTED TECHNOLOGIES FOR SOFTWARE DEVELOPMENT Dencho N. Batanov Frederick Institute of Technology Computer Science Department Nicosia, Cyprus
More informationDistributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
More informationSome Research Challenges for Big Data Analytics of Intelligent Security
Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,
More informationSEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.
More informationREUSING DISCUSSION FORUMS AS LEARNING RESOURCES IN WBT SYSTEMS
REUSING DISCUSSION FORUMS AS LEARNING RESOURCES IN WBT SYSTEMS Denis Helic, Hermann Maurer, Nick Scerbakov IICM, University of Technology Graz Austria ABSTRACT Discussion forums are highly popular and
More informationAttack Taxonomies and Ontologies
Lehrstuhl Netzarchitekturen und Netzdienste Institut für Informatik Technische Universität München Attack Taxonomies and Ontologies Seminar Future Internet Supervisor: Nadine Herold Natascha Abrek 02.10.2014
More informationISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT
ISSN 1392 124X INFORMATION TECHNOLOGY AND CONTROL, 2005, Vol.34, No.4 ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT Marijus Bernotas, Remigijus Laurutis, Asta Slotkienė Information
More informationFacilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
More informationA Framework for Ontology-Based Knowledge Management System
A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: jnwu@dlut.edu.cn Abstract Knowledge
More informationUsing Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms Mohammed M. Elsheh and Mick J. Ridley Abstract Automatic and dynamic generation of Web applications is the future
More informationONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS
ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS Hasni Neji and Ridha Bouallegue Innov COM Lab, Higher School of Communications of Tunis, Sup Com University of Carthage, Tunis, Tunisia. Email: hasni.neji63@laposte.net;
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationReport on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
More informationSecure Semantic Web Service Using SAML
Secure Semantic Web Service Using SAML JOO-YOUNG LEE and KI-YOUNG MOON Information Security Department Electronics and Telecommunications Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon KOREA
More informationQuery Recommendation employing Query Logs in Search Optimization
1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish
More informationComparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
More informationSemantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
More informationElectronic Document Management Using Inverted Files System
EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,
More informationIntelligent interoperable application for employment exchange system using ontology
1 Webology, Volume 10, Number 2, December, 2013 Home Table of Contents Titles & Subject Index Authors Index Intelligent interoperable application for employment exchange system using ontology Kavidha Ayechetty
More informationThe Semantic Desktop - a Basis for Personal Knowledge Management
Proceedings of I-KNOW 05 Graz, Austria, June 29 - July 1, 2005 The Semantic Desktop - a Basis for Personal Knowledge Management Leo Sauermann Knowledge Management Department, DFKI GmbH, Erwin-Schrödinger-Straße
More informationSupporting Change-Aware Semantic Web Services
Supporting Change-Aware Semantic Web Services Annika Hinze Department of Computer Science, University of Waikato, New Zealand a.hinze@cs.waikato.ac.nz Abstract. The Semantic Web is not only evolving into
More informationAutomatic Indexing of Scanned Documents - a Layout-based Approach
Automatic Indexing of Scanned Documents - a Layout-based Approach Daniel Esser a,danielschuster a, Klemens Muthmann a, Michael Berger b, Alexander Schill a a TU Dresden, Computer Networks Group, 01062
More informationUIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications
UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications Gaël de Chalendar CEA LIST F-92265 Fontenay aux Roses Gael.de-Chalendar@cea.fr 1 Introduction The main data sources
More informationData Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
More informationAcquisition of User Profile for Domain Specific Personalized Access 1
Acquisition of User Profile for Domain Specific Personalized Access 1 Plaban Kumar Bhowmick, Samiran Sarkar, Sudeshna Sarkar, Anupam Basu Department of Computer Science & Engineering, Indian Institute
More informationOntology and automatic code generation on modeling and simulation
Ontology and automatic code generation on modeling and simulation Youcef Gheraibia Computing Department University Md Messadia Souk Ahras, 41000, Algeria youcef.gheraibia@gmail.com Abdelhabib Bourouis
More informationA Semantic Web of Know-How: Linked Data for Community-Centric Tasks
A Semantic Web of Know-How: Linked Data for Community-Centric Tasks Paolo Pareti Edinburgh University p.pareti@sms.ed.ac.uk Ewan Klein Edinburgh University ewan@inf.ed.ac.uk Adam Barker University of St
More informationChapter 10 Practical Database Design Methodology and Use of UML Diagrams
Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Outline The Role of Information Systems in
More informationOntological Model of Educational Programs in Computer Science (Bachelor and Master Degrees)
Ontological Model of Educational Programs in Computer Science (Bachelor and Master Degrees) Sharipbay A., Razakhova B., Bekmanova G., Omarbekova A., Khassenov Ye., and Turebayeva R. Abstract In this work
More informationIntelligent Human Machine Interface Design for Advanced Product Life Cycle Management Systems
Intelligent Human Machine Interface Design for Advanced Product Life Cycle Management Systems Zeeshan Ahmed Vienna University of Technology Getreidemarkt 9/307, 1060 Vienna Austria Email: zeeshan.ahmed@tuwien.ac.at
More informationA GENERALIZED APPROACH TO CONTENT CREATION USING KNOWLEDGE BASE SYSTEMS
A GENERALIZED APPROACH TO CONTENT CREATION USING KNOWLEDGE BASE SYSTEMS By K S Chudamani and H C Nagarathna JRD Tata Memorial Library IISc, Bangalore-12 ABSTRACT: Library and information Institutions and
More informationSEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
More informationBuilding a Spanish MMTx by using Automatic Translation and Biomedical Ontologies
Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1, José Carlos Cortizo 1,2, José María Gómez 3 1 Universidad Europea de Madrid, C/Tajo s/n, Villaviciosa
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationSmartLink: a Web-based editor and search environment for Linked Services
SmartLink: a Web-based editor and search environment for Linked Services Stefan Dietze, Hong Qing Yu, Carlos Pedrinaci, Dong Liu, John Domingue Knowledge Media Institute, The Open University, MK7 6AA,
More informationSemantic Information Retrieval from Distributed Heterogeneous Data Sources
Semantic Information Retrieval from Distributed Heterogeneous Sources K. Munir, M. Odeh, R. McClatchey, S. Khan, I. Habib CCS Research Centre, University of West of England, Frenchay, Bristol, UK Email
More informationAn Ontology-Based Knowledge Management Platform
An Ontology-Based Knowledge Management Platform A.Aldea 2, R.Bañares-Alcántara 1, J.Bocio 1, J.Gramajo 2, D.Isern 2, A.Kokossis 3, L.Jiménez 1, A.Moreno 2, D.Riaño 2 1 Universitat Rovira i Virgili, Dept.
More informationAutomatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
More informationMULTI AGENT-BASED DISTRIBUTED DATA MINING
MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationFIPA agent based network distributed control system
FIPA agent based network distributed control system V.Gyurjyan, D. Abbott, G. Heyes, E. Jastrzembski, C. Timmer, E. Wolin TJNAF, Newport News, VA 23606, USA A control system with the capabilities to combine
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationSemantics and Ontology of Logistic Cloud Services*
Semantics and Ontology of Logistic Cloud s* Dr. Sudhir Agarwal Karlsruhe Institute of Technology (KIT), Germany * Joint work with Julia Hoxha, Andreas Scheuermann, Jörg Leukel Usage Tasks Query Execution
More informationHow To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
More informationInverted files and dynamic signature files for optimisation of Web directories
s and dynamic signature files for optimisation of Web directories Fidel Cacheda, Angel Viña Department of Information and Communication Technologies Facultad de Informática, University of A Coruña Campus
More informationLDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,
More informationA Tool for Searching the Semantic Web for Supplies Matching Demands
A Tool for Searching the Semantic Web for Supplies Matching Demands Zuzana Halanová, Pavol Návrat, Viera Rozinajová Abstract: We propose a model of searching semantic web that allows incorporating data
More informationA HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS
A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS Ionela MANIU Lucian Blaga University Sibiu, Romania Faculty of Sciences mocanionela@yahoo.com George MANIU Spiru Haret University Bucharest, Romania Faculty
More informationKnowledge-Based Validation, Aggregation and Visualization of Meta-data: Analyzing a Web-Based Information System
Knowledge-Based Validation, Aggregation and Visualization of Meta-data: Analyzing a Web-Based Information System Heiner Stuckenschmidt 1 and Frank van Harmelen 2,3 1 Center for Computing Technologies,
More informationAnnotea and Semantic Web Supported Collaboration
Annotea and Semantic Web Supported Collaboration Marja-Riitta Koivunen, Ph.D. Annotea project Abstract Like any other technology, the Semantic Web cannot succeed if the applications using it do not serve
More informationUsing Ontology Search in the Design of Class Diagram from Business Process Model
Using Ontology Search in the Design of Class Diagram from Business Process Model Wararat Rungworawut, and Twittie Senivongse Abstract Business process model describes process flow of a business and can
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More information