Hybrid Similarity Measure for XML Data Integration and Transformation

Size: px
Start display at page:

Download "Hybrid Similarity Measure for XML Data Integration and Transformation"

Transcription

1 Thesis for the Degree of Doctor of Philosophy Hybrid Similarity Measure for XML Data Integration and Transformation Pham Thi Thu Thuy Department of Computer Engineering Graduate School Kyung Hee University Seoul, Korea August, 2012

2 Hybrid Similarity Measure for XML Data Integration and Transformation Pham Thi Thu Thuy Department of Computer Engineering Graduate School Kyung Hee University Seoul, Korea August, 2012

3 Hybrid Similarity Measure for XML Data Integration and Transformation by Pham Thi Thu Thuy Advised by Professor Young-Koo Lee Professor Sungyoung Lee Submitted to the Department of Computer Engineering and the Faculty of the Graduate School of Kyung Hee University in partial fulfillment of the requirements for the degree of Doctor of Philosophy Dissertation Committee: Professor Byeong-soo Jeong, Ph.D Professor Brian J. d Auriol, Ph.D Professor Jin-Ho Kim, Ph.D Professor Donghai Guan, Ph.D Professor Young-Koo Lee, Ph.D

4

5 Hybrid Similarity Measure for XML Data Integration and Transformation by Pham Thi Thu Thuy Submitted to the Department of Computer Engineering on July 8, 2012, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract XML (extensible Markup Language) has been widely used as a standard for sharing data between web-based applications. In order to share XML data with another XML application system, it is needed to integrate various XML data sources into a coherent XML data set. Moreover, to share XML data with semantic supporting system, such as Web Ontology Language (OWL), XML data also need to be transfomed into OWL ontology. However, since the heterogeneous of XML data, the same information can be published in many different ways in terms of tag names and structures or the same tag names can represent different contents, the sharing of XML data is not yet fully automatic. This heterogeneity of XML data has led to research in measuring the similarity of elements between XML schemas or element similarity within a schema. Therefore, to perform the integrating and transforming tasks, the similarity measure of XML schema play a crucial role due to the heterogeneous of XML data sources. In this thesis, we deal with the problem of data transformation and integration for XML data sources. This data format presents a lot of challenges that need XML-specific solutions: an XML schema is not required for an XML document, and if XML schema exists, it may be expressed in a number of different XML schema types such as XML Schema (XSD) or Document Type Definition (DTD) ; also resolving the heterogeneity in the schema is not straightforward method due to the hierarchical nature of XML data. We propose a hybrid similarity measure based approach, that handles the distinct problems of syntactic, semantic, and schematic heterogeneity of XML data. Our similarity measure addresses both structural and semantic components and can be applied for both schema types of XML. Due to the different targets between integration and transformation of XML data, we propose two types of similarity measures, which are similarity of elements between two schemas for data integration and similarity of elements within a schema for data transformation. In particular, we can divide the thesis into two main parts, both related to enhance the sharing of XML data. The first part focuses on the similarity measure between schemas for data integration. We propose the novel similarity measure that concurrently considers both structural and semantic ini

6 formation of two specific XML Schemas. Specifically, we introduce new metrics to compute the data type and cardinality constraint similarities which improve the quality of the current semantic assessments. Based on the similarity between element pairs, we put forward an algorithm to calculate the similarity between two XML Schema trees. Based on the similarity measure, we propose an integration method to merge two or more disparate XML data sources into a single coherent data set to support the information needs of the target business or enterprise. Experimental results lead to the conclusion that our methodology provides better similarity values than the others with regard to the accuracy of semantics and structure similarities. The second part of the work is related to the similarity measure of duplicate within a schema and the transformation of XML Schema into OWL (Web Ontology Language). This part is also divided into two different sub-parts. The first one is focused on the problem of duplicate elements in XML Schema. Recent studies on transforming XML Schema into OWL have shown that the associated duplicates problem can be solved by creating a unique identifier for each element. However, this solution considers duplicate elements to be different nodes, whereas most duplicates represent the same information. We present a novel method to measure the semantic similarity between duplicate elements within an XML Schema. Semantic similarity is the combination of the declaration and context features, which capture all the descriptions and relationships of the duplicate elements. Based on the similarity values, we classify the duplicates into two groups: similar and non-similar, and then propose the suitable strategy to transform these duplicates into appropriate OWL concepts. In the second sub-part, we present a mechanism to ease the interpretation and automate the semantic transformation of specific XML data into the OWL ontology (S-Trans), which allows an easier and better semantic communication among information systems. On the basis of the XML schemas (XSD or DTD), we extract the document structure and add more descriptions for XML elements. Experimental results show that the proposed method reliably predicts semantic similarity of duplicates and produces a better-quality of OWL ontology. Thesis Supervisor: Young-Koo Lee Title: Professor Thesis Co-Supervisor: Sungyoung Lee Title: Professor ii

7 Acknowledgments There are countless people who have supported, directed, assisted and encouraged me in completing this PhD, and that I would like to thank. First of all, I would like to express my deepest gratitude to my supervisor, Professor Young- Koo Lee, for supervising my work and for always having the right suggestions during any discussion we had. He has not only led me to the research area of the semantic similarity measure for data integration but also offered me lots of insightful suggestions based on which I have developed and completed my dissertation. I would like to thank Kyung Hee University and IITA scholarship as a whole to have given me the opportunity - and have provided the funds - to carry out this PhD. I also grateful to my co-supervisor, Professor Sungyoung Lee, who guided me toward the proper direction with his inquisitive questions and helpful comments. I also would like to thanks Professor Brian J.d Aurial s for his advices on my presentation and visualization skill have been going with me not only in the past but also at the present and absolutely in my future research career. I would like to thank number of Professors in Computer Engineering Department for their excellent lectures, Professor Tae-Choong Chung, Professor Ok-Sam Chae, Professor Byeong-Soo Jeong, Professor Choong-Seon Hong, and Professor Eui-Nam Huh. Their wisdom greatly contributed to consolidating and widening my knowledge on computer engineering, which is also the very important background for my dissertation. My thanks also go to many of my colleagues who help and encourage me during my stay in Korea. Especially, Prof. Dr. Donghai Guan, Dr. Phan Tran Ho Truc, Dr. Dang Viet Hung, senior Le Tuan Anh, senior Nguyen Hoang Viet, senior Vo Thi Luu Phuong, the couple Nguyen Van Mui + Tran Thi Kim Loc, Korean friends Yongkoo Han, Jinseung Kim, Kisung Park, and lab-mates La The Vinh, Dinh Dong Luong, Pham The Anh, Iram Fatima, and many others, who have shared iii

8 their knowledge and technical expertise with me. Without their suggestions, my life and my research would have been much harder. I also would like to express my deeply thanks to the dissertation committee members whose helpful comments have helped me to improve and complete this dissertation. Last but not least my thanks go to my family of course, to whom I dedicate this achievement: my parents and my parents in law, who, with love and comprehension, has always pushed me to pursue a PhD. I would like to send my love to my sweet husband and my two lovely daughters, Bin and Su, who always stay beside me during a somewhat stressful time. Thanks to you all.. iv

9 Contents Table of Contents List of Figures List of Tables v viii x 1 Introduction Introduction XML data sharing scenario Motivation and contributions Thesis Organization Background and Related Work Background on XML Data and OWL Ontology XML data Ontology OWL fundamental constructs Term definitions Related Work Similarity between documents and XML integration Duplicate similarity and XML schema transformation v

10 3 ESim: Element Similarity measure for XML integration Similarity Measure Framework Semantic Similarity Measurement (SeSim) Name similarity (NSim) Data type similarity Constraint similarity Structural Similarity Measurement (StSim) Ancestor similarity Sibling similarity Children similarity Similarity between Two Schema Trees XML Schema Integration S-Trans: Duplicate Similarity Measure for XML2OWL General modules of XML2OWL Transformation Semantic Similarity of Duplicate Elements Motivating example Ancestor similarity (ASim) Sibling similarity (SbSim) Children similarity (ChSim) Transforming DTD/XSD into the OWL Ontology Experimental Results Experiments on XML Schemas Similarity Measure Determining of parameter values Results based on real-world XSDs Experiments on Duplicate Similarity in XML Transformation Experimental setup Results vi

11 5.3 Experimental Summarization Conclusion and Future Researches Conclusion Thesis summary Contributions Future Researches Appendix A: ESim - Evaluation Results 100 Appendix B: Sample of XML Schema for Transformation 103 Appendix C: OWL Ontology Result 108 References 118 vii

12 List of Figures 1.1 Semantic Web stack architecture Different solutions to integration and transformation XML data Thesis organization Example of a XML document and its respective DTD Example of the respective XSD of document in Figure OWL root classes OWL subclass definition OWL class individual OWL Datatype property definition OWL Class instance with datatype property General framework of similarity measure method Tree representation for Schema Patient A Tree representation for Schema Patient B Expressions for Schema Patient A A fragment of WordNet The structure similarity algorithm XML Schema integration framework architecture General syntactic to semantic architecture Example of a DTD document, prescription.dtd viii

13 4.3 Example of a DTD and a part of is corresponding XSD document The corresponding tree of XML schema (XSD/DTD) in Figure The ancestor similarities at different ancestor levels with five candidate values The ancestor similarity algorithm Transforming framework from XML into OWL The transforming correspondences between DTD/XSD and OWL OWL results of duplicates which are highly similar OWL results of duplicates which are less similar Tree representation for Schema Patient C Tree representation for Schema Patient D Determining weights of ESim function Determining weights of SeSim function Matching comparisons of ESim to COMA, XMLSim, and XClust Quality of name measure Quality of data type measure Quality of cardinality constraint measure Quality of structure measure F measure comparison The error rate of classification at different thresholds Evaluation results, drug medicament schema Evaluation results, patient admission schema Evaluation results, healthcaremetadata schema Evaluation results, pathology.report schema Quality of S-Trans, PrSim, ChSim, and CaSim A.1 Evaluation results of matching system for schemas in Table ix

14 List of Tables 3.1 Data type compatibility table Cardinality constraint similarity table The similarity of synthetic XSDs The characteristics of the tested schemas Element similarity result of the two schemas (Patient A and Patient B) The characteristics of the tested schemas x

15 Chapter 1 Introduction 1.1 Introduction Recently, many web-page applications and services publish their data using XML, the standard for sharing data, since the use of XML as a common data representation format helps sharing XML data with other applications and services. Usually, to improve the sharing of XML data with the same XML application system, all XML data sources are integrated into a coherent data set to support the information needs of the target applications. Moreover, to enhance the sharing of XML data with the semantic supporting system using OWL, XML data are transformed into the target OWL ontology. However, since the heterogeneous of XML data in which the same information can be published using XML in many different ways in terms of tag names and structures or the same tag names can represent different contents, the exchange of XML data is not fully automatic. To solve the heterogeneity problem of XML data, many researches have been proposed similarity measure methods to compute the similarity of heterogeneous XML data before integrating or transforming them. The algorithms that automate these similarity computations help to reduce time and effort spent on creating and maintaining data sharing in many applications [90] such as in e-business [12], [100], [93], e-goverment [70], [73], [48], [25], [33], e-learning [18], [11], [98], and e-health [95], [88], [68]. 1

16 CHAPTER 1. INTRODUCTION 2 Intelligent Domain Services, Applications Use, Intent Trust Reasoning/Proof Higher Semantics Semantics Structure Syntax: Data Pragmatic Web Security/Identity Inference Engine OWL RDF/RDF Schema XML Schema XML URI Unicode Figure 1.1: Semantic Web stack architecture To illustrate the important of XML data integration, let us take one integration example in e-health system. In the e-health system, there are various of XML healthcare data. These data are the collection of healthcare data from the large number of environmental and patient sensors, and actuators to monitor and improve patient s physical and mental conditions [86]. Nowadays, the XML healthcare data are increasing, so the healthcare providers need to integrate these data in order to keep them as the electronic health record (EHR) [32]. Therefore, the integration of XML healthcare data plays an important role in enhancing the quality of the patient care and the information exchange among the medical systems. In general, although heterogeneous XML sources may have similar content, they may be de-

17 CHAPTER 1. INTRODUCTION 3 XML data sharing Schema matching/ mapping Schema integration Schema transformation Integrate DTDs Integrate XSDs XML2RDF XML2OWL Similarity between docs. Similarity within doc. Similarity measure Figure 1.2: Different solutions to integration and transformation XML data scribed using different tag names and structures. Integration of similar XML documents from different data sources benefits applications which use the same XML language, giving them access to more complete and useful information and query systems to retrieve information from a single integrated source instead of various sources. On the other hand, recently, the Semantic Web has been developed and widely used by many semantic applications. This development leads to the need for sharing the existing XML data with semantic applications. However, XML is disadvantage when it comes to the semantic interoperability because it focuses primarily on the syntactics, with no way to describe the semantics of the data [34]. This lack of semantic description leads to the problems when semantic agents seek to understand and reason about these XML data. Therefore, to enable the sharing of XML data with semantic supporting systems, it is needed to map or transform XML data into a semantic

18 CHAPTER 1. INTRODUCTION 4 supporting language. In this thesis, we choose OWL as the target source for the transformation, since OWL is described as higher semantic language in the Semantic Web stack architecture [36]. Moreover, since the heterogeneous of XML data where duplicates may represent different or same information, to improve the semantics of the transformation, we propose a pre-step to compute the semantic similarity of XML elements, specifically the duplicate elements, before the XML transforming process. In general, this thesis tackles the problem of sharing XML data between the same XML applications and between XML application and the semantic supporting application. In particular, we have developed an approach to the integration and transformation of heterogeneous XML data sources. Our approach is based on the similarity measure method, meaning that the output is a set of similarity scores of elements between XML schema documents, in an XML data integration scenario, or a set of similarity scores of duplicate elements within XML schema, in an XML data transformation scenario. The overview of different solutions to enhance the data sharing and our focused research is illustrated in Figure 1.2. The rest of this chapter is constructed as follows. Section 1.2 introduces the different scenarios in the broad area of data sharing. Section 1.3 presents the motivation and contributions of the work described in this thesis. Section 1.4 gives an overview of the thesis organization. 1.2 XML data sharing scenario The sharing of XML data across applications and services may involve several scenarios, including: XML schema integration and XML schema transformation. However, all scenarios share the same process of similarity measure, particularly, similarity between documents for the integration scenario and similarity within a document for the transformation scenario. We introduce below

19 CHAPTER 1. INTRODUCTION 5 the major scenarios and processes in schema integration and transformation. XML schema integration is an XML data sharing scenario in which XML data from multiple data sources are combined in order to give users a single integrated source. This task may retain all of the original logical structures and tag names of the XML schema sources (XSD or DTD), since it generates a union or global XML schema which combines the data sources in more complex ways. XML schema transformation is an XML data sharing scenario in which one needs to defines rules for transforming a source XML schema S 1 and its associated XML instances DS 1 to a structure of the target schema S 2 which is defined in a different modeling language as S 1, for the purposes of query processing or materialization of S 2, using the data DS 1. XML data exchange is a stricter form of XML data transformation, which also respects the constraints defined within the target XML schema, and not just its structure. Element similarity between XML schemas is the automatic or semi-automatic process of determining the similarity scores between elements of an XML schema S 1 and those of another XML schema S 2. The next step of this process is the classification process in which highly similarity element pairs are combined into an integrated source. The process of choosing a classification value is discussed in the experiment section. Similarity of elements within an XML schema is the automatic or semi-automatic process of determining the similarity scores between elements within a schema S 1. The similarity results can then be used to transform data from the data source of S 1 into S 2. In this thesis, we compute the similarity value of duplicate elements in an XML Schema and then classify them into the similar or non-similar group for the transformation.

20 CHAPTER 1. INTRODUCTION Motivation and contributions From the above overview, a number of research questions arise regarding XML data sharing, which form the motivation for our research: How to improve the data sharing between applications using the same XML system or sharing XML system with higher semantic supporting language, OWL? Different XML data sources may be associated with different XML schema types, or may not have a same schema type at all. Can we encompass all types of XML data sources with a data transformation or an integration approach? How to solve the heterogeneous problem of XML data during the integration or transformation XML data? Which aspects of XML data transformation and integration can be automated? Are they clearly distinguishable from the manual aspects? Can we minimize the manual aspects? XML data sources may be structurally incompatible, which may lead to loss of information when transforming or integrating them. How to sole this problem automatically? Have existing approaches performed the integration or transformation of XML data? If so, do they have any problem needed to resolve? With these research questions as a starting point, this thesis proposes a similarity measure based approach for the integration and transformation of heterogeneous XML data sources and makes the following contributions: 1. We propose the integration method-based similarity measure to improve the data sharing between the same XML applications. For sharing data with higher semantic application, we

21 CHAPTER 1. INTRODUCTION 7 propose the transformation of XML into OWL ontology method with consider the duplicate similarity in XML schema. 2. Our approach can be applied on any type of XML data sources, regardless of the schema type used, XSD or DTD. 3. We propose a hybrid similarity measure to compute both semantic and structural similarities of XML elements. 4. We automate the similarity measure process for data integration and transformation by providing the metrics to compute all similarity factors. There is no similarity value given by users. Our propose metrics generates more precise similarity values than those by manual. Moreover, we minimize the manual aspect by proposing the method to determine the weighted values to balance the role of the similarity factors. 5. To solve the loss of information problem, in the integration process, our integrator take a union of all elements in XML schemas instead of retaining only common elements. In the transformation process, we follow the structural descriptions of XML schemas to transform all elements and their relationships with other elements into appropriate OWL concepts. 6. There are several approaches proposed to integrate and transform XML data. However, our methods are overcome the existing work because of some reasons. For the integration approaches, in most of related approaches, the data type, cardinality constraint, and weight parameters values are manually given whereas we provide novel metrics to determine those values. In the transformation approaches, most existing methods solve the duplicate problem of XML data by simply giving each XML element a unique identifier, which may cause the redundancy data when duplicates represent the same information. We resolve this duplicate problem by proposing the duplicate similarity measuring and giving an appropriate strategy to transform them.

22 CHAPTER 1. INTRODUCTION 8 With respect to existing approaches to XML schema transformation and integration, our approach makes a number of contributions: 1. We propose a new metric to measure the data type similarity between two attribute types whereas data type similarity value is given manually in related work. 2. We present the novel metric to measure the similarity of the cardinality constraints which are also manually given by user. 3. In order to avoid the case that two nodes have the same structure but difference in their names, we compute the structural similarity of two concepts by relying on the semantic similarity and each pair of their neighborhood elements. 4. We present an algorithm to calculate the similarity between two schema trees based on the similarity values of the element pairs. 5. We propose a method to determine the weight parameters which are used to balance the role of the similarity measuring factors. 6. We discovers the semantic problem during transformation of duplicate elements in an XML schema into ontology. 7. We proposes method to measure the semantic similarity between repeated elements, which considers not only the relationship similarity, but also the inside descriptions of each duplicate node. 8. We propose a method to formally determine the duplicate classifying value. 9. It proposes the strategy to transform XML schema and their duplicates into ontology. 10. Finally, our approach addresses the problem of human intervention during the integration and data redundancy in transformation of XML data. Experimental results reveal that our method overcomes the related work in terms of semantics and accuracy.

23 CHAPTER 1. INTRODUCTION Thesis Organization This section describes the the road map for the entire thesis. We provide the thesis organization in Figure 1.3. A brief summary of each chapter is shown below. Chapter 1 Introduction. This chapter briefly introduces the population of XML data and an example of XML in e-health system. The challenges and disadvantages of XML s flexibility in creating new document and lack of semantics support of XML are clearly addressed. After that the dissertation focuses and contributions are also made clear. Chapter 2 Background and Related Work. This chapter presents to sections. First, we review background knowledge on XML data and OWL ontology. Second, we give a comprehensive survey of the existing work especially work that relates to two problems: measuring the similarity between XML Schema documents and transforming XML into OWL ontology. The state of the art and limitations of existing work are clearly addressed. Chapter 3 Semantic and Structural Similarity between XML Schemas. The proposed solution for the semantic and structural measuring problem is described in detail in this chapter. Chapter 4 Duplicate and Transforming XML schemas into OWL ontology. This chapter describes all the details of the semantic similarity measuring for duplicate elements in XML schemas and proposes solution for each similarity level and transforms all XML schemas elements into OWL ontology. Chapter 5 Experimental results and discussions. Comprehensive experiments are conducted, the results are analyzed to enlighten the advantages of the proposed algorithms. Chapter 6 Conclusion and future work. In this chapter, a conclusion is given. Besides, some limitations of the work are also pointed out with potential solutions, which may need further research effort to be completed.

24 CHAPTER 1. INTRODUCTION 10 Chap. 1: Introduction Motivations of proposed integration and transformation XML data based similarity measure Chap. 2: Related Work Section Overview of XML and ontology Related work - XML integration and similarity between documents. + Structure based approaches + Semantics based approaches + Hybrid approaches - XML transformation and similarity within document. + XML2OWL + Element similarity within single document. Chap. 3: XML similarity measure for data integration Propose a complete hybrid similarity framework. Propose novel metrics to compute data type and constraint similarities Provide novel method to balance similarity factors. Section Chap. 4: Duplicate similarity measure for transformation XML into OWL Propose a novel method to solve the duplicate problem in XML2OWL. Propose novel metrics to measure duplicate similarities. Present effective method to determine the classification value. Propose strategy to transform duplicates. Chap. 5: Experiments and Discussions Propose a complete hybrid similarity framework. Propose novel metrics to compute data type and constraint similarities Provide novel method to balance similarity factors. Chap. 6: Conclusion and Future Research Summary of proposed approaches. Future researches: - Measure the similarity between different data models. - Match different data models. - Measure the similarity between Web pages. Figure 1.3: Thesis organization.

25 Chapter 2 Background and Related Work Since XML data and ontology are two main objects in this dissertation, in this chapter we give a brief introduction to their characteristics and technologies. After that, we discuss the related researches to our work. 2.1 Background on XML Data and OWL Ontology XML data XML (extensible Markup Language) is a flexible representation language. There are two varieties of XML data: XML documents and XML schemas. An XML schema provides the data definitions and structure of the XML document [65]. While XML documents are the instances of an XML schema which gives a snapshot of what the document may contain. A schema includes what elements are allowed or are not allowed; what attributes for any elements may be and the number of occurrences of XML elements; etc. A schema for a document may be included as both internally (located within the schema document) and externally (independently located outside XML schema file). 11

26 CHAPTER 2. BACKGROUND AND RELATED WORK 12 There are several XML schema languages, but only two are commonly used. They are DTD (Document Type Definition) and XML Schema or XML Schema Definition (XSD), both of which allow the construction of XML documents to be described and their contents to be constrained [79]. A DTD specifies the structure of an XML element by specifying the names of its subelements and attributes. Subelement structure is specified using some operators, such as * (zero or more elements), + (one or more elements),? (optional), and (or), as well as with properties type (PCDATA, ID, IDREF, ENUMERATION). The DTD language is disadvantaged in compare with an XSD language since it only supports a limited set of data types, has loose structure constraints, uses different language with XML, etc. To overcome the above limitations of DTD, the XSD language provides the novel features, such as simple and complex types, rich data type sets, occurrence constraints and especially using the same language with XML. An XML Schema is usually comprised of a set of schema components, such as the data type definitions and cardinality constraint declarations, etc. They can be used to evaluate the validity of the well-formed element information items. It is believed that XSD will soon replace DTD due to its flexibility [41]. Throughout this thesis, we use the term XML schema to express both the DTD and XSD, while XML Schema represents the XSD. Figure 2.1 illustrates a simple example of a XML document and its corresponding DTD. Figure 2.2 shows a respective XML Schema Ontology In computer science, an ontology is an explicit specification of a conceptualization [31], i.e. an ontology is a model that describes the concepts of a problem domain, as well as the association between those concepts. An ontology can be used as an interface to one or more data sources which means that it can be used as a schema, or it can be used to reason about the problem domain.

27 CHAPTER 2. BACKGROUND AND RELATED WORK 13 <?xml version= 1.0 encoding= UTF-8?> <Companies> <!DOCTYPE Companies [ <Company> <!ELEMENT Companies (Company+)> <Symbol> Eagle.img </Title> <!ELEMENT Company (Symbol, Name, <Name> EagleFarm </Name> Sector?, Industry, (Profile))> <Industry> Dairy </Industry> <!ELEMENT Profile (MarketCap, <Profile> EmployeeNo, (Address), <MarketCap> 1000 </ MarketCap > Description)> <EmployeeNo> 20 </ EmployeeNo > <!ELEMENT Address (State,City?)> <Address> <!ELEMENT Symbol(#PCDATA)> <State> QLD </State> <!ELEMENT Name (#PCDATA)> </Address> <!ELEMENT Sector (#PCDATA)> <Description> gdsfkls </Description> <!ELEMENT Industry (#PCDATA)> </Profile> <!ELEMENT MarketCap (#PCDATA)> </Company> <!ELEMENT EmployeeNo (#PCDATA)> <!-- Some more instances --> <!ELEMENT State (#PCDATA)>. <!ELEMENT City (#PCDATA)> </Companies> ]> Figure 2.1: Example of a XML document and its respective DTD RDF (Resource Description Framework) [64] is a family of W3C specifications which is used primarily for specifying the information about a problem domain. RDF has the triple form of subject-predicate-object. Therefore, a set of RDF statements generates a labeled, directed graph. RDF Schemais one of the W3C RDF specifications. RDF Schema allows the definition of RDF vocabularies. Note that RDF can also be used as the data format for the exchange and integration of data from different information systems. OWL (Web Ontology Language) [37], like RDF Schema, is used to define ontologies. OWL is also a Semantic Web language designed to represent more rich and complex knowledge about things, groups of things, and relations between things than RDF. OWL is a logic-based language so knowledge expressed in OWL can be reasoned with by computer programs either to verify the consistency of that knowledge or to understand about the expressed knowledge. The OWL doc-

28 CHAPTER 2. BACKGROUND AND RELATED WORK 14 1 <xsd:schema xmlns:xsd= > 2 <xsd:element name= Companies > 3 <xsd:complextype> 4 <xsd:sequence> 5 <xsd:element name= Company maxoccurs= unbounded > 6 <xsd:complextype> 7 <xsd:sequence> 8 <xsd:element name= Symbol type= xsd:string /> 9 <xsd:element name= Name type= xsd:string /> 10 <xsd:element name= Sector type= xsd:string /> 11 <xsd:element name= Industry type= xsd:string /> 12 <xsd:element name= Profile > 13 <xsd:complextype> 14 <xsd:sequence> 15 <xsd:element name= MarketCap type= xsd:string /> 16 <xsd:element name= EmployeeNumber type= xsd:unsignedint /> 17 <xsd:element name= Address > 18 <xsd:complextype> 19 <xsd:sequence> 20 <xsd:element name= State type= xsd:string /> 21 <xsd:element name= City type= xsd:string /> 22 </xsd:sequence> 23 </xsd:complextype> 24 </xsd:element> 25 <xsd:element name= Description type= xsd:string /> 26 </xsd:sequence> 27 </xsd:complextype> 28 </xsd:element> 29 </xsd:sequence> 30 </xsd:complextype> 31 </xsd:element> 32 </xsd:sequence> 33 </xsd:complextype> 34 </element> 35 </xsd:schema> Figure 2.2: Example of the respective XSD of document in Figure 2.1

29 CHAPTER 2. BACKGROUND AND RELATED WORK 15 uments, known as ontologies, can be distributed in the World Wide Web and may refer to or be referred from other OWL ontologies. The OWL language has three increasingly expressive sublanguages as following: OWL Lite [59], [1] supports those users primarily needing a classification hierarchy and simple constraint features. For example, the cardinality constraints in OWL Lite only allows cardinality values of 0 or 1. Thus, OWL Lite provides a quick migration path for thesauri and other taxonomies. OWL DL [71], [60] provides those users who want the maximum expressiveness without losing computational completeness and all computations, which will finish in finite time, of the reasoning systems. OWL DL includes all the OWL language constructs with restrictions such as type separation (for instances, a class cannot also be an individual or property, a property cannot also be an individual or class). OWL DL is so named due to its correspondence with description logics, a field of research that has studied a particular decidable fragment of first order logic. OWL DL was designed to support the existing Description Logic business segment and has desirable computational properties for the reasoning systems. OWL Full [44], [15] is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. For example, in OWL Full a class can be treated simultaneously as a collection of individuals. Another significant difference from OWL DL is that an OWL full data type property may be inverse functional. OWL Full allows an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support every feature of OWL Full. Each of these sublanguages is an extension of its simpler predecessor, both in what can be legally expressed and in what can be validly concluded. The following set of relations hold. Every legal OWL Lite ontology is a legal OWL DL ontology.

30 CHAPTER 2. BACKGROUND AND RELATED WORK 16 1 <owl:class rdf:id= RedWine /> 2 <owl:class rdf:id= Winery /> Figure 2.3: OWL root classes Every legal OWL DL ontology is a legal OWL Full ontology. Every valid OWL Lite conclusion is a valid OWL DL conclusion. Every valid OWL DL conclusion is a valid OWL Full conclusion OWL fundamental constructs In this section, we will present the fundamental elements of OWL, which include the classes, properties and individuals. Every OWL construct is uniquely defined by an rdf:id. The OWL classes describe sets of individuals that have common properties and belong to the same group. OWL classes are the most basic concept that are the roots of various taxonomic trees. Every individual in the OWL document is a member of the owl:thing class. Thus, each created class is implicitly a subclass of owl:thing. Domain specific root classes are defined by simply declaring a named class. OWL also defines the empty class, owl:nothing. Figure 2.3 shows two declarations of root classes inside an OWL ontology. OWL classes are defined inside an element < owl : Class >. The declarations shown above describes only the unique ID of the classes, without going deeper. A class can be defined as the union, intersection and complement of other classes by using the constructs owl:unionof, owl:intersectionof and owl:complementof respectively, or as an enumeration of its members by using the construct owl:oneof. Moreover, the most specific component of the classes is rdfs:subclassof. It connects a more

31 CHAPTER 2. BACKGROUND AND RELATED WORK 17 1 <owl:class rdf:about= RedWine > 2 <rdfs:subclassof rdf:resource= #Wine /> </owl:class> Figure 2.4: OWL subclass definition particular class with a more general one. The rdfs:subclassof relation is derivative, if X is a subclass of the class Y, then every instance of X is also an instance of Y. The rdfs:subclassof relation is also transitive, so that if X is a subclass of class Y and Y a subclass of class Z then X is a subclass of Z. Moreover, OWL class has some more descriptions to extend the definition of a resource. For example, see the declaration of rdf:about in Figure 2.4. Figure 2.4 shows how the class RedWine is derived from the general class Wine. The construct rdf:about is used because the class RedWine is already declared and at this moment we want to extend this class by relating it to a general class, through the subclass mechanism, in order to inherit the properties and the characteristics of Wine. Furthermore, two OWL classes may be regarded as equivalent or disjoint by using the mapping constructs owl:equivalentclass and owl:disjointwith, respectively. OWL individuals are the instances of classes, see example in Figure 2.5. Instances are declared by using the rdf:type construct or the name of the class as the name of the element in which the individual is defined. The individuals may have the properties and have to satisfy all the constraints that are predefined for the corresponding OWL class. OWL properties provide general facts about the classes and specific facts about the class individuals. There are two categories of properties: object properties and data type properties.

32 CHAPTER 2. BACKGROUND AND RELATED WORK 18 1 <RedWine rdf:id= Syrah > 2 3 OR 4 <owl:thing rdf:id= Syrah /> 5 <owl Thing rdf:about= #Syrah > 6 <rdf:type rdf:resource= RedWine /> 7 </owl:thing> Figure 2.5: OWL class individual 1 <owl:class rdf:id= VintageYear > 2 <owl:datatypeproperty rdf:id= yearvalue > 3 <rdfs:domain rdf:resource= #VintageYear /> 4 <rdfs:range rdf:resource= &xsd;positiveinteger /> 5 </owl:datatypeproperty> Figure 2.6: OWL Datatype property definition Object properties are relations between the instances of two classes. An object property is described using the owl:objectproperty construct, which connects individuals of the domain class with individuals of the range class. Data type properties are relations between class instances and RDF literals or XML Schema data types. A data type property is defined by using the owl:datatypeproperty construct, which relates individuals of the domain class to values of the range data type. The example of data type property is illustrated in Figure 2.6. Figure 2.6 describes the definition of a data type property which relates the vintage years of a wine production to positive integers. An instance of the VintageYear class is shown in Figure 2.7.

33 CHAPTER 2. BACKGROUND AND RELATED WORK 19 1 <VintageYear rdf:id= Year1998 > 2 <yearvalue rdf:datatype= &xsd;positiveinteger 3 >1998</yearValue> 4 </VintageYear> Figure 2.7: OWL Class instance with datatype property Term definitions Since our thesis usually use the term structure and semantics, in this section we restate their definitions again. According to the business dictionary [30], structure is the construction of identifiable elements in which each element is functionally connected to others, and the interrelationships between elements are fixed or changing occasionally or slowly. Based on this definition, we can infer that XML element s structure is the relation of that element to its ancestor, sibling, and descendant elements. Therefore, the structure similarity of XML element is the combination of the similarity scores of those relation elements. According to the Kamil [99], semantics is the scientific study of the meaning of words. This meaning is analyzed in terms of their semantic features which are the way that a word is used in a document. From this definition, we figure out that semantic similarity between XML elements is the combination of the meaning similarity of element name and the similarities of their other characteristic, such as data type, cardinality constraint.

34 CHAPTER 2. BACKGROUND AND RELATED WORK Related Work As mention ed in the previous chapter, our goal are to enhance the data sharing between XML applications by integration of XML Schemas (XSDs) and transformation of XML schema (XS- D/DTD) into OWL ontology based on the similarity measures. To perform these two tasks, it is require to measure the similarity of elements in XML schemas. The main difference of similarity measure in two methods is: The first method is based on the similarity measure of elements in two different documents, whereas the second method relies on the similarity measure of elements within a single document. Therefore, in this section, we introduce two subsections: XML integration with element similarity between different documents and XML transformation with element similarity within a document Similarity between documents and XML integration Much work has addressed the similarity between XML documents. Similarity can be computed at different layers of abstraction: at the instance layer (i.e., similarity between instance documents), at the schema type layer (i.e., similarity between data types, also referred to as schema, models, or structures, depending on the application domain), or between the two layers: instance and schema. XML similarity can be categorized as either of three approaches: (1) structural similarity or (2) semantic (content) similarity or (3) Hybrid approach: semantic (content) and structural similarity Structural similarity Structural similarity focuses mainly on the relationship similarity of elements between schema graphs. David Buttler [14] summarized three approaches to structural similarity: (1) tag similarity, (2) tree edit distance (TED), and (3) Fourier transform similarity. Tag similarity This is the most simplest way to measure the structural similarity between XML documents. It

35 CHAPTER 2. BACKGROUND AND RELATED WORK 21 measures how close element names from the two XML documents are. Documents which use resemblance element names are likely to have similar schema. This measure evaluates the number of intersected elements from the compared documents and it is divided by the union of elements between two documents. However, this approach is not suitable for several reasons. One critical problem is that some XML documents deriving from the same schema may have only a limited number of element names, whereas some XML documents may contain a large number of a particular element name. In addition, tag similarity completely ignores the similar of the relationships between elements, thus yielding low similarity quality. Tree edit distance (TED) According to Bille [9], tree edit distance between two labeled trees, T 1 and T 2, is the optimal sequence edit operations that turn T 1 into T 2. The edit operations include of insertion, deletion, and substitution. Previously, those edit operations are only applied on single nodes. One of the typical approach is Chawathe s method [17]. They performs the insertion and deletion operations at the leaf-node level and process the substitution of node labels anywhere in the tree but, without considering the move operation. The overall complexity of Chawathe s algorithm is expressed as O(N 2 ) where N is the maximum number of nodes of the compared trees. This complexity is quite expensive then leads to the longer run time. Therefore, Chawathe s approach is not practical for measuring the similarity of large XML data. On the other hand, one of the typical approach, which uses the complex edit operations is proposed by Shasha et al. [103]. They introduce a TED metric that permits the addition and deletion of single leaf node anywhere in the tree, not just at the leaf level. However, the entire subtrees cannot be inserted or deleted in one step. The complexity of this approach is expressed as O( T 1 T 2 depth(t 1 )depth(t 2 )). Here, T 1 and T 2 represent the number of nodes in label trees T 1 and T 2, respectively.

36 CHAPTER 2. BACKGROUND AND RELATED WORK 22 Nierman and Jagadish [69] focuss on the structural similarity of the subtrees. Their edit operations are similar to Chawathes, but they add two more new operations: insert tree and delete tree. To determine subtree similarities, they introduce containment in the relationship between trees or subtrees. A labeled tree T 1 is said to be contained in a labeled tree T 2 if all nodes of T 1 occur in T 2 with the same parent/child edge relationship and node order. The overall complexity of this algorithm is expressed as O(N 2 ). This approach proved more accurate in detecting XML structural similarities than those of either Chawathe or Shasha. Also based on Chawathe s method, Dalamagas et al. [23] introduce a framework for clustering XML documents on the basis of the structure similarities. They present the XML documents as rooted ordered labeled trees, then study the usage of structural distance metrics in hierarchical clustering algorithms to detect groups of structurally similar XML documents. Wei Li et al. [51] extend Dalamagas method to cluster dynamic XML documents based on the frequently changing in their structures. There are other three approaches which are based on structural similarity but result in higher accuracy than TED method. First, Lian et al. [53] represent XML document structures as directed graphs called s-graphs, and define a distance metric that captures the number of edges common to the graph representations of two XML documents: Dist(G 1, G 2 ) = 1 Edges(G 1) Edges(G 2 ) MaxEdges(G 1 ), Edges(G 2 ) (2.1) This equation 2.1 is more effective than others based on TED, in separating documents that are structurally different. It can be applied not only to tree-structured documents but also to document collections of arbitrary (graph) structure. Second, Bertino et al. [8] proposed a matching algorithm for measuring the structural sim-

Chapter 2 AN INTRODUCTION TO THE OWL WEB ONTOLOGY LANGUAGE 1. INTRODUCTION. Jeff Heflin Lehigh University

Chapter 2 AN INTRODUCTION TO THE OWL WEB ONTOLOGY LANGUAGE 1. INTRODUCTION. Jeff Heflin Lehigh University Chapter 2 AN INTRODUCTION TO THE OWL WEB ONTOLOGY LANGUAGE Jeff Heflin Lehigh University Abstract: Key words: 1. INTRODUCTION The OWL Web Ontology Language is an international standard for encoding and

More information

Defining a benchmark suite for evaluating the import of OWL Lite ontologies

Defining a benchmark suite for evaluating the import of OWL Lite ontologies UNIVERSIDAD POLITÉCNICA DE MADRID FACULTAD DE INFORMÁTICA FREE UNIVERSITY OF BOLZANO FACULTY OF COMPUTER SCIENCE EUROPEAN MASTER IN COMPUTATIONAL LOGIC MASTER THESIS Defining a benchmark suite for evaluating

More information

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007

Introduction to XML. Data Integration. Structure in Data Representation. Yanlei Diao UMass Amherst Nov 15, 2007 Introduction to XML Yanlei Diao UMass Amherst Nov 15, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly

More information

04 XML Schemas. Software Technology 2. MSc in Communication Sciences 2009-10 Program in Technologies for Human Communication Davide Eynard

04 XML Schemas. Software Technology 2. MSc in Communication Sciences 2009-10 Program in Technologies for Human Communication Davide Eynard MSc in Communication Sciences 2009-10 Program in Technologies for Human Communication Davide Eynard Software Technology 2 04 XML Schemas 2 XML: recap and evaluation During last lesson we saw the basics

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Transformation of OWL Ontology Sources into Data Warehouse

Transformation of OWL Ontology Sources into Data Warehouse Transformation of OWL Ontology Sources into Data Warehouse M. Gulić Faculty of Maritime Studies, Rijeka, Croatia marko.gulic@pfri.hr Abstract - The Semantic Web, as the extension of the traditional Web,

More information

Products and Services Ontologies: A

Products and Services Ontologies: A Products and Services Ontologies: 1 Citation: Martin Hepp: Products and Services Ontologies: A Methodology for Deriving OWL Ontologies from Industrial Categorization Standards, Int'l Journal on Semantic

More information

A COLLABORATIVE PERSPECTIVE OF CRM

A COLLABORATIVE PERSPECTIVE OF CRM A COLLABORATIVE PERSPECTIVE OF CRM Mărginean Nicolae Bogdan-Vodă University, Faculty of Economics, Cluj-Napoca, Spinoasa 14 street, e-mail: nicolae1976@yahoo.com, telef: 0745/318321 Today, companies are

More information

XML: extensible Markup Language. Anabel Fraga

XML: extensible Markup Language. Anabel Fraga XML: extensible Markup Language Anabel Fraga Table of Contents Historic Introduction XML vs. HTML XML Characteristics HTML Document XML Document XML General Rules Well Formed and Valid Documents Elements

More information

A generic approach for data integration using RDF, OWL and XML

A generic approach for data integration using RDF, OWL and XML A generic approach for data integration using RDF, OWL and XML Miguel A. Macias-Garcia, Victor J. Sosa-Sosa, and Ivan Lopez-Arevalo Laboratory of Information Technology (LTI) CINVESTAV-TAMAULIPAS Km 6

More information

A Workbench for Prototyping XML Data Exchange (extended abstract)

A Workbench for Prototyping XML Data Exchange (extended abstract) A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy

More information

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)? Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090

More information

SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK

SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK Antonella Carbonaro, Rodolfo Ferrini Department of Computer Science University of Bologna Mura Anteo Zamboni 7, I-40127 Bologna, Italy Tel.: +39 0547 338830

More information

REPRESENTATION AND CERTIFICATION

REPRESENTATION AND CERTIFICATION REPRESENTATION AND CERTIFICATION OF DATA QUALITY ON THE WEB (Research Paper) Cinzia Cappiello Chiara Francalanci Barbara Pernici Politecnico di Milano, Milano, Italy {cappiell, francala, pernici}@elet.polimi.it

More information

A Secure Mediator for Integrating Multiple Level Access Control Policies

A Secure Mediator for Integrating Multiple Level Access Control Policies A Secure Mediator for Integrating Multiple Level Access Control Policies Isabel F. Cruz Rigel Gjomemo Mirko Orsini ADVIS Lab Department of Computer Science University of Illinois at Chicago {ifc rgjomemo

More information

XML Schema Definition Language (XSDL)

XML Schema Definition Language (XSDL) Chapter 4 XML Schema Definition Language (XSDL) Peter Wood (BBK) XML Data Management 80 / 227 XML Schema XML Schema is a W3C Recommendation XML Schema Part 0: Primer XML Schema Part 1: Structures XML Schema

More information

XML and Data Management

XML and Data Management XML and Data Management XML standards XML DTD, XML Schema DOM, SAX, XPath XSL XQuery,... Databases and Information Systems 1 - WS 2005 / 06 - Prof. Dr. Stefan Böttcher XML / 1 Overview of internet technologies

More information

OWL based XML Data Integration

OWL based XML Data Integration OWL based XML Data Integration Manjula Shenoy K Manipal University CSE MIT Manipal, India K.C.Shet, PhD. N.I.T.K. CSE, Suratkal Karnataka, India U. Dinesh Acharya, PhD. ManipalUniversity CSE MIT, Manipal,

More information

An Ontology-based e-learning System for Network Security

An Ontology-based e-learning System for Network Security An Ontology-based e-learning System for Network Security Yoshihito Takahashi, Tomomi Abiko, Eriko Negishi Sendai National College of Technology a0432@ccedu.sendai-ct.ac.jp Goichi Itabashi Graduate School

More information

Security Issues for the Semantic Web

Security Issues for the Semantic Web Security Issues for the Semantic Web Dr. Bhavani Thuraisingham Program Director Data and Applications Security The National Science Foundation Arlington, VA On leave from The MITRE Corporation Bedford,

More information

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration

An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 3 rd, 2013 An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration Srinivasan Shanmugam and

More information

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Chapter 8 The Enhanced Entity- Relationship (EER) Model Chapter 8 The Enhanced Entity- Relationship (EER) Model Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Outline Subclasses, Superclasses, and Inheritance Specialization

More information

How To Write A Drupal 5.5.2.2 Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post

How To Write A Drupal 5.5.2.2 Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post RDFa in Drupal: Bringing Cheese to the Web of Data Stéphane Corlosquet, Richard Cyganiak, Axel Polleres and Stefan Decker Digital Enterprise Research Institute National University of Ireland, Galway Galway,

More information

Semantically Enhanced Web Personalization Approaches and Techniques

Semantically Enhanced Web Personalization Approaches and Techniques Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,

More information

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo Expected Outcomes You will learn: Basic concepts related to ontologies Semantic model Semantic web Basic features of RDF and RDF

More information

Representing the Hierarchy of Industrial Taxonomies in OWL: The gen/tax Approach

Representing the Hierarchy of Industrial Taxonomies in OWL: The gen/tax Approach Representing the Hierarchy of Industrial Taxonomies in OWL: The gen/tax Approach Martin Hepp Digital Enterprise Research Institute (DERI), University of Innsbruck Florida Gulf Coast University, Fort Myers,

More information

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University

More information

RDF Resource Description Framework

RDF Resource Description Framework RDF Resource Description Framework Fulvio Corno, Laura Farinetti Politecnico di Torino Dipartimento di Automatica e Informatica e-lite Research Group http://elite.polito.it Outline RDF Design objectives

More information

Semantics and Ontology of Logistic Cloud Services*

Semantics and Ontology of Logistic Cloud Services* Semantics and Ontology of Logistic Cloud s* Dr. Sudhir Agarwal Karlsruhe Institute of Technology (KIT), Germany * Joint work with Julia Hoxha, Andreas Scheuermann, Jörg Leukel Usage Tasks Query Execution

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

DTD Tutorial. About the tutorial. Tutorial

DTD Tutorial. About the tutorial. Tutorial About the tutorial Tutorial Simply Easy Learning 2 About the tutorial DTD Tutorial XML Document Type Declaration commonly known as DTD is a way to describe precisely the XML language. DTDs check the validity

More information

Integrating and Exchanging XML Data using Ontologies

Integrating and Exchanging XML Data using Ontologies Integrating and Exchanging XML Data using Ontologies Huiyong Xiao and Isabel F. Cruz Department of Computer Science University of Illinois at Chicago {hxiao ifc}@cs.uic.edu Abstract. While providing a

More information

A Collaborative System Software Solution for Modeling Business Flows Based on Automated Semantic Web Service Composition

A Collaborative System Software Solution for Modeling Business Flows Based on Automated Semantic Web Service Composition 32 A Collaborative System Software Solution for Modeling Business Flows Based on Automated Semantic Web Service Composition Ion SMEUREANU, Andreea DIOŞTEANU Economic Informatics Department, Academy of

More information

Getting Started Guide

Getting Started Guide TopBraid Composer Getting Started Guide Version 2.0 July 21, 2007 TopBraid Composer, Copyright 2006 TopQuadrant, Inc. 1 of 58 Revision History Date Version Revision August 1, 2006 1.0 Initial version September

More information

A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS

A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS Ionela MANIU Lucian Blaga University Sibiu, Romania Faculty of Sciences mocanionela@yahoo.com George MANIU Spiru Haret University Bucharest, Romania Faculty

More information

Semantic Interoperability

Semantic Interoperability Ivan Herman Semantic Interoperability Olle Olsson Swedish W3C Office Swedish Institute of Computer Science (SICS) Stockholm Apr 27 2011 (2) Background Stockholm Apr 27, 2011 (2) Trends: from

More information

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed

More information

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology Hong-Linh Truong Institute for Software Science, University of Vienna, Austria truong@par.univie.ac.at Thomas Fahringer

More information

Configuration Workshop 2014 Novi Sad/Нови Сад

Configuration Workshop 2014 Novi Sad/Нови Сад Configuration Workshop 2014 Novi Sad/Нови Сад Integrating Distributed Configurations with RDFS and SPARQL Gottfried Schenner, Stefan Bischof, Axel Polleres, Simon Steyskal Use Case Large technical systems

More information

XML DATA INTEGRATION SYSTEM

XML DATA INTEGRATION SYSTEM XML DATA INTEGRATION SYSTEM Abdelsalam Almarimi The Higher Institute of Electronics Engineering Baniwalid, Libya Belgasem_2000@Yahoo.com ABSRACT This paper describes a proposal for a system for XML data

More information

Introduction to the Semantic Web. Semantic tecnologies a quick overview Fulvio Corno Politecnico di Torino

Introduction to the Semantic Web. Semantic tecnologies a quick overview Fulvio Corno Politecnico di Torino Introduction to the Semantic Web Semantic tecnologies a quick overview Fulvio Corno Politecnico di Torino Semantic Web Web second generation Web 3.0 http://www.w3.org/2001/sw/ Conceptual structuring of

More information

dcml Data Center Markup Language Data Center Markup Language Framework Specification

dcml Data Center Markup Language Data Center Markup Language Framework Specification dcml Data Center Markup Language Data Center Markup Language Framework Specification Draft Version 0.11 May 5, 2004, 2004 Change History Version Date Notes version 0.1 November 9, 2003 Initial draft version

More information

No More Keyword Search or FAQ: Innovative Ontology and Agent Based Dynamic User Interface

No More Keyword Search or FAQ: Innovative Ontology and Agent Based Dynamic User Interface IAENG International Journal of Computer Science, 33:1, IJCS_33_1_22 No More Keyword Search or FAQ: Innovative Ontology and Agent Based Dynamic User Interface Nelson K. Y. Leung and Sim Kim Lau Abstract

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Introduction Nowadays, with the rapid development of the Internet, distance education and e- learning programs are becoming more vital in educational world. E-learning alternatives

More information

Reputation Network Analysis for Email Filtering

Reputation Network Analysis for Email Filtering Reputation Network Analysis for Email Filtering Jennifer Golbeck, James Hendler University of Maryland, College Park MINDSWAP 8400 Baltimore Avenue College Park, MD 20742 {golbeck, hendler}@cs.umd.edu

More information

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES

More information

Criticality of Schedule Constraints Classification and Identification Qui T. Nguyen 1 and David K. H. Chua 2

Criticality of Schedule Constraints Classification and Identification Qui T. Nguyen 1 and David K. H. Chua 2 Criticality of Schedule Constraints Classification and Identification Qui T. Nguyen 1 and David K. H. Chua 2 Abstract In construction scheduling, constraints among activities are vital as they govern the

More information

Ontology for Home Energy Management Domain

Ontology for Home Energy Management Domain Ontology for Home Energy Management Domain Nazaraf Shah 1,, Kuo-Ming Chao 1, 1 Faculty of Engineering and Computing Coventry University, Coventry, UK {nazaraf.shah, k.chao}@coventry.ac.uk Abstract. This

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

Ampersand and the Semantic Web

Ampersand and the Semantic Web Ampersand and the Semantic Web The Ampersand Conference 2015 Lloyd Rutledge The Semantic Web Billions and billions of data units Triples (subject-predicate-object) of URI s Your data readily integrated

More information

Managing enterprise applications as dynamic resources in corporate semantic webs an application scenario for semantic web services.

Managing enterprise applications as dynamic resources in corporate semantic webs an application scenario for semantic web services. Managing enterprise applications as dynamic resources in corporate semantic webs an application scenario for semantic web services. Fabien Gandon, Moussa Lo, Olivier Corby, Rose Dieng-Kuntz ACACIA in short

More information

Semi-Automatically Generated Hybrid Ontologies for Information Integration

Semi-Automatically Generated Hybrid Ontologies for Information Integration Semi-Automatically Generated Hybrid Ontologies for Information Integration Lisa Ehrlinger and Wolfram Wöß Institute for Application Oriented Knowledge Processing Johannes Kepler University Linz, Austria

More information

The Ontological Approach for SIEM Data Repository

The Ontological Approach for SIEM Data Repository The Ontological Approach for SIEM Data Repository Igor Kotenko, Olga Polubelova, and Igor Saenko Laboratory of Computer Science Problems, Saint-Petersburg Institute for Information and Automation of Russian

More information

Mapping between heterogeneous XML and OWL transaction representations in B2B integration

Mapping between heterogeneous XML and OWL transaction representations in B2B integration Mapping between heterogeneous XML and OWL transaction representations in B2B integration Jorge Cardoso 1 and Christoph Bussler 2 1 Corresponding author CISUC/Departamento de Engenharia Informática, Universidade

More information

Introduction to Service Oriented Architectures (SOA)

Introduction to Service Oriented Architectures (SOA) Introduction to Service Oriented Architectures (SOA) Responsible Institutions: ETHZ (Concept) ETHZ (Overall) ETHZ (Revision) http://www.eu-orchestra.org - Version from: 26.10.2007 1 Content 1. Introduction

More information

Music domain ontology applications for intelligent web searching

Music domain ontology applications for intelligent web searching Music domain ontology applications for intelligent web searching María Clara Vallés y Pablo R. Fillottrani Departamento de Ciencias e Ingeniería de la Computación Universidad Nacional del Sur Baha Blanca,

More information

Applying OWL to Build Ontology for Customer Knowledge Management

Applying OWL to Build Ontology for Customer Knowledge Management JOURNAL OF COMPUTERS, VOL. 5, NO. 11, NOVEMBER 2010 1693 Applying OWL to Build Ontology for Customer Knowledge Management Yalan Yan School of Management, Wuhan University of Science and Technology, Wuhan,

More information

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS

ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS ONTOLOGY FOR MOBILE PHONE OPERATING SYSTEMS Hasni Neji and Ridha Bouallegue Innov COM Lab, Higher School of Communications of Tunis, Sup Com University of Carthage, Tunis, Tunisia. Email: hasni.neji63@laposte.net;

More information

Efficient Data Structures for Decision Diagrams

Efficient Data Structures for Decision Diagrams Artificial Intelligence Laboratory Efficient Data Structures for Decision Diagrams Master Thesis Nacereddine Ouaret Professor: Supervisors: Boi Faltings Thomas Léauté Radoslaw Szymanek Contents Introduction...

More information

A Framework for Personalized Healthcare Service Recommendation

A Framework for Personalized Healthcare Service Recommendation A Framework for Personalized Healthcare Service Recommendation Choon-oh Lee, Minkyu Lee, Dongsoo Han School of Engineering Information and Communications University (ICU) Daejeon, Korea {lcol, niklaus,

More information

Semantic Web based e-learning System for Sports Domain

Semantic Web based e-learning System for Sports Domain Semantic Web based e-learning System for Sports Domain S.Muthu lakshmi Research Scholar Dept.of Information Science & Technology Anna University, Chennai G.V.Uma Professor & Research Supervisor Dept.of

More information

Annotea and Semantic Web Supported Collaboration

Annotea and Semantic Web Supported Collaboration Annotea and Semantic Web Supported Collaboration Marja-Riitta Koivunen, Ph.D. Annotea project Abstract Like any other technology, the Semantic Web cannot succeed if the applications using it do not serve

More information

Chapter 5. Regression Testing of Web-Components

Chapter 5. Regression Testing of Web-Components Chapter 5 Regression Testing of Web-Components With emergence of services and information over the internet and intranet, Web sites have become complex. Web components and their underlying parts are evolving

More information

Ontology and automatic code generation on modeling and simulation

Ontology and automatic code generation on modeling and simulation Ontology and automatic code generation on modeling and simulation Youcef Gheraibia Computing Department University Md Messadia Souk Ahras, 41000, Algeria youcef.gheraibia@gmail.com Abdelhabib Bourouis

More information

Object Database on Top of the Semantic Web

Object Database on Top of the Semantic Web WSS03 Applications, Products and Services of Web-based Support Systems 97 Object Database on Top of the Semantic Web Jakub Güttner Graduate Student, Brno Univ. of Technology, Faculty of Information Technology,

More information

Intelligent interoperable application for employment exchange system using ontology

Intelligent interoperable application for employment exchange system using ontology 1 Webology, Volume 10, Number 2, December, 2013 Home Table of Contents Titles & Subject Index Authors Index Intelligent interoperable application for employment exchange system using ontology Kavidha Ayechetty

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Model Driven Interoperability through Semantic Annotations using SoaML and ODM

Model Driven Interoperability through Semantic Annotations using SoaML and ODM Model Driven Interoperability through Semantic Annotations using SoaML and ODM JiuCheng Xu*, ZhaoYang Bai*, Arne J.Berre*, Odd Christer Brovig** *SINTEF, Pb. 124 Blindern, NO-0314 Oslo, Norway (e-mail:

More information

OWL Ontology Translation for the Semantic Web

OWL Ontology Translation for the Semantic Web OWL Ontology Translation for the Semantic Web Luís Mota and Luís Botelho We, the Body and the Mind Research Lab ADETTI/ISCTE Av. das Forças Armadas, 1649-026 Lisboa, Portugal luis.mota@iscte.pt,luis.botelho@we-b-mind.org

More information

Secure Semantic Web Service Using SAML

Secure Semantic Web Service Using SAML Secure Semantic Web Service Using SAML JOO-YOUNG LEE and KI-YOUNG MOON Information Security Department Electronics and Telecommunications Research Institute 161 Gajeong-dong, Yuseong-gu, Daejeon KOREA

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Co-Creation of Models and Metamodels for Enterprise. Architecture Projects.

Co-Creation of Models and Metamodels for Enterprise. Architecture Projects. Co-Creation of Models and Metamodels for Enterprise Architecture Projects Paola Gómez pa.gomez398@uniandes.edu.co Hector Florez ha.florez39@uniandes.edu.co ABSTRACT The linguistic conformance and the ontological

More information

A Pattern-based Framework of Change Operators for Ontology Evolution

A Pattern-based Framework of Change Operators for Ontology Evolution A Pattern-based Framework of Change Operators for Ontology Evolution Muhammad Javed 1, Yalemisew M. Abgaz 2, Claus Pahl 3 Centre for Next Generation Localization (CNGL), School of Computing, Dublin City

More information

LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,

More information

A Tool for Searching the Semantic Web for Supplies Matching Demands

A Tool for Searching the Semantic Web for Supplies Matching Demands A Tool for Searching the Semantic Web for Supplies Matching Demands Zuzana Halanová, Pavol Návrat, Viera Rozinajová Abstract: We propose a model of searching semantic web that allows incorporating data

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

Ontological Identification of Patterns for Choreographing Business Workflow

Ontological Identification of Patterns for Choreographing Business Workflow University of Aizu, Graduation Thesis. March, 2010 s1140042 1 Ontological Identification of Patterns for Choreographing Business Workflow Seiji Ota s1140042 Supervised by Incheon Paik Abstract Business

More information

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks

Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks Application of XML Tools for Enterprise-Wide RBAC Implementation Tasks Ramaswamy Chandramouli National Institute of Standards and Technology Gaithersburg, MD 20899,USA 001-301-975-5013 chandramouli@nist.gov

More information

Logic and Reasoning in the Semantic Web (part I RDF/RDFS)

Logic and Reasoning in the Semantic Web (part I RDF/RDFS) Logic and Reasoning in the Semantic Web (part I RDF/RDFS) Fulvio Corno, Laura Farinetti Politecnico di Torino Dipartimento di Automatica e Informatica e-lite Research Group http://elite.polito.it Outline

More information

Modeling an Ontology for Managing Contexts in Smart Meeting Space

Modeling an Ontology for Managing Contexts in Smart Meeting Space Modeling an Ontology for Managing Contexts in Smart Meeting Space Mohammad Rezwanul Huq, Nguyen Thi Thanh Tuyen, Young-Koo Lee, Byeong-Soo Jeong and Sungyoung Lee Department of Computer Engineering Kyung

More information

GUMO The General User Model Ontology

GUMO The General User Model Ontology GUMO The General User Model Ontology Dominik Heckmann, Tim Schwartz, Boris Brandherm, Michael Schmitz, and Margeritta von Wilamowitz-Moellendorff Saarland University, Saarbrücken, Germany {dominik, schwartz,

More information

Acknowledgements References 5. Conclusion and Future Works Sung Wan Kim

Acknowledgements References 5. Conclusion and Future Works Sung Wan Kim Hybrid Storage Scheme for RDF Data Management in Semantic Web Sung Wan Kim Department of Computer Information, Sahmyook College Chungryang P.O. Box118, Seoul 139-742, Korea swkim@syu.ac.kr ABSTRACT: With

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity

More information

Application of ontologies for the integration of network monitoring platforms

Application of ontologies for the integration of network monitoring platforms Application of ontologies for the integration of network monitoring platforms Jorge E. López de Vergara, Javier Aracil, Jesús Martínez, Alfredo Salvador, José Alberto Hernández Networking Research Group,

More information

Managing large sound databases using Mpeg7

Managing large sound databases using Mpeg7 Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT

More information

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model 22 October 2014 Tony Hammond Michele Pasin Background About Macmillan

More information

George McGeachie Metadata Matters Limited. ER SIG June 9th, 2010 1

George McGeachie Metadata Matters Limited. ER SIG June 9th, 2010 1 George McGeachie Metadata Matters Limited ER SIG June 9th, 2010 1 an industry-leading data modeling tool that enables companies to discover, document, and re-use data assets. With round-trip database support,

More information

Combining RDF and Agent-Based Architectures for Semantic Interoperability in Digital Libraries

Combining RDF and Agent-Based Architectures for Semantic Interoperability in Digital Libraries Combining RDF and Agent-Based Architectures for Semantic Interoperability in Digital Libraries Norbert Fuhr, Claus-Peter Klas University of Dortmund, Germany {fuhr,klas}@ls6.cs.uni-dortmund.de 1 Introduction

More information

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany Information Systems University of Koblenz Landau, Germany Exploiting Spatial Context in Images Using Fuzzy Constraint Reasoning Carsten Saathoff & Agenda Semantic Web: Our Context Knowledge Annotation

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2013 Vol I, IMECS 2013, March 13-15, 2013, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2013 Vol I, IMECS 2013, March 13-15, 2013, Hong Kong , March 13-15, 2013, Hong Kong Risk Assessment for Relational Database Schema-based Constraint Using Machine Diagram Kanjana Eiamsaard 1, Nakornthip Prompoon 2 Abstract Information is a critical asset

More information

Peculiarities of semantic web-services cloud runtime

Peculiarities of semantic web-services cloud runtime Procedia Computer Science Volume 71, 2015, Pages 208 214 2015 Annual International Conference on Biologically Inspired Cognitive Architectures Peculiarities of semantic web-services cloud runtime National

More information

Characterizing Knowledge on the Semantic Web with Watson

Characterizing Knowledge on the Semantic Web with Watson Characterizing Knowledge on the Semantic Web with Watson Mathieu d Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, and Enrico Motta Knowledge Media Institute (KMi), The Open

More information

A Design of Onto-ACM(Ontology based Access Control Model) in Cloud Computing Environments

A Design of Onto-ACM(Ontology based Access Control Model) in Cloud Computing Environments A Design of Onto-ACM(Ontology based Access Control Model) in Cloud Computing Environments Chang Choi Chosun University Gwangju, Republic of Korea enduranceaura@gmail.com Junho Choi Chosun University Gwangju,

More information

A Framework for Ontology-Based Knowledge Management System

A Framework for Ontology-Based Knowledge Management System A Framework for Ontology-Based Knowledge Management System Jiangning WU Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: jnwu@dlut.edu.cn Abstract Knowledge

More information

Semistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu

Semistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu Semistructured data and XML Institutt for Informatikk 1 Unstructured, Structured and Semistructured data Unstructured data e.g., text documents Structured data: data with a rigid and fixed data format

More information

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,

More information

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

BUSINESS VALUE OF SEMANTIC TECHNOLOGY BUSINESS VALUE OF SEMANTIC TECHNOLOGY Preliminary Findings Industry Advisory Council Emerging Technology (ET) SIG Information Sharing & Collaboration Committee July 15, 2005 Mills Davis Managing Director

More information

Fuzzy Duplicate Detection on XML Data

Fuzzy Duplicate Detection on XML Data Fuzzy Duplicate Detection on XML Data Melanie Weis Humboldt-Universität zu Berlin Unter den Linden 6, Berlin, Germany mweis@informatik.hu-berlin.de Abstract XML is popular for data exchange and data publishing

More information

Service Oriented Architecture

Service Oriented Architecture Service Oriented Architecture Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline

More information