A Tutorial on Data Integration
|
|
|
- Emily Wilkins
- 10 years ago
- Views:
Transcription
1 A Tutorial on Data Integration Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Antonio Ruberti, Sapienza Università di Roma DEIS 10 - Data Exchange, Integration, and Streaming November 7-12, 2010, Schloss Dagstuhl, GI-Dagstuhl Seminar M. Lenzerini A tutorial on Data Integration 1 / 132
2 Structure of the course 1 Introduction to data integration Motivations Logical formalization Mappings 2 Query answering for relational data Approaches to query answering Canonical database Query rewriting Counterexamples Query containment 3 Beyond relational data Semi-structured data integration Ontology-based data integration M. Lenzerini A tutorial on Data Integration 2 / 132
3 Structure of the course 1 Introduction to data integration Motivations Logical formalization Mappings 2 Query answering for relational data Approaches to query answering Canonical database Query rewriting Counterexamples Query containment 3 Beyond relational data Semi-structured data integration Ontology-based data integration M. Lenzerini A tutorial on Data Integration 2 / 132
4 Structure of the course 1 Introduction to data integration Motivations Logical formalization Mappings 2 Query answering for relational data Approaches to query answering Canonical database Query rewriting Counterexamples Query containment 3 Beyond relational data Semi-structured data integration Ontology-based data integration M. Lenzerini A tutorial on Data Integration 2 / 132
5 Motivations Data integration: Logical formalization Mappings Part 1: Introduction to data integration Part I Introduction to data integration M. Lenzerini A tutorial on Data Integration 3 / 132
6 Motivations Data integration: Logical formalization Mappings Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 4 / 132
7 Motivations Data integration: Logical formalization Mappings Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 5 / 132
8 Motivations Data integration: Logical formalization Mappings What is data integration? Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 6 / 132
9 Motivations Data integration: Logical formalization Mappings What is data integration? Integration in data management: evolution Part 1: Introduction to data integration client application Data manager data layer Centralized system with three-tier architecture Implicit integration: integration supported by the Data Base Management System (DBMS), i.e., the data manager M. Lenzerini A tutorial on Data Integration 7 / 132
10 Motivations Data integration: Logical formalization Mappings What is data integration? Integration in data management: evolution Part 1: Introduction to data integration client application Data manager Data manager Centralized system with three-tier architecture and multiple stores Application-hidden integration: integration embedded within application M. Lenzerini A tutorial on Data Integration 8 / 132
11 Motivations Data integration: Logical formalization Mappings What is data integration? Integration in data management: evolution Part 1: Introduction to data integration client applica*on Data Integrator Global schema Data manager Data manager Data manager data layer Centralized system with four-tier architecture and multiple, distributed stores (Centralized) data integration: the global schema is mapped to the different data sources, which are heterogeneous, distributed and autonomous M. Lenzerini A tutorial on Data Integration 9 / 132
12 Motivations Data integration: Logical formalization Mappings What is data integration? Integration in data management: evolution Part 1: Introduction to data integration client application Global schema Data manager Data manager client client application Global schema Data manager Data manager application Global schema Data manager Data manager Decentralized system Peer-to-peer data integration: distributed data integration realized with no unique, central global schema M. Lenzerini A tutorial on Data Integration 10 / 132
13 Motivations Data integration: Logical formalization Mappings Variants of data integration Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 11 / 132
14 Motivations Data integration: Logical formalization Mappings Variants of data integration Approaches to data integration Part 1: Introduction to data integration Centralized, virtual data integration... is the main topic of this tutorial Data warehousing... not dealt with in this tutorial P2P data integration... not dealt with in this tutorial M. Lenzerini A tutorial on Data Integration 12 / 132
15 Motivations Data integration: Logical formalization Mappings Variants of data integration Centralized data integration Part 1: Introduction to data integration Centralized data integration is the problem of providing unified and transparent view to a collection of data stored in multiple, autonomous, and heterogeneous data sources. The unified view is achieved through a global (or target) schema, linked to the data sources by means of mappings. Answer(Q) Query Global Schema Sources M. Lenzerini A tutorial on Data Integration 13 / 132
16 Motivations Data integration: Logical formalization Mappings Variants of data integration Data warehousing Part 1: Introduction to data integration materialization of the global database allows for OLAP without accessing the sources similar to data exchange Materialize Global Schema Sources M. Lenzerini A tutorial on Data Integration 14 / 132
17 Motivations Data integration: Logical formalization Mappings Variants of data integration Peer-to-peer data integration Part 1: Introduction to data integration P 1 P 2 P 3 P 4 P 5 Peer Local source External source Peer schema Local mapping P2P mapping Talk 10 Armin Roth, Peer data management systems Talk 11 Sebastian Skritek, Theory of Peer Data Management M. Lenzerini A tutorial on Data Integration 15 / 132
18 Motivations Data integration: Logical formalization Mappings Issues in data integration Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 16 / 132
19 Motivations Data integration: Logical formalization Mappings Issues in data integration Main issues in data integration Part 1: Introduction to data integration 1 Data extraction, cleaning, and reconciliation Talk 9 Ekaterini Ioannou, Data cleaning for data integration 2 How to discover and specify the mappings between sources and global schema Talk 22 Marie Jacob, Learning and discovering queries and mappings 3 How to model and specify the global schema 4 How to answer queries expressed on the global schema Talk 2 Piotr Wieczorek, Query answering in data integration 5 How to deal with limitations in mechanisms for accessing sources 6 How to optimize query answering 7... M. Lenzerini A tutorial on Data Integration 17 / 132
20 Motivations Data integration: Logical formalization Mappings Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 18 / 132
21 Motivations Data integration: Logical formalization Mappings Syntax and semantics of a data integration system Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 19 / 132
22 Motivations Data integration: Logical formalization Mappings Syntax and semantics of a data integration system Formal framework for data integration Part 1: Introduction to data integration Definition A data integration system I is a triple G, S, M, where G is the global schema a logical theory over an alphabet A G S is the source schema an alphabet A S disjoint from A G M is the mapping between S and G We consider different approaches to the specification of mappings M. Lenzerini A tutorial on Data Integration 20 / 132
23 Motivations Data integration: Logical formalization Mappings Syntax and semantics of a data integration system Semantics of a data integration system Part 1: Introduction to data integration Which are the dbs that satisfy I, i.e., the logical models of I? We refer only to dbs over a fixed infinite domain of elements We start from the data present in the sources: these are modeled through a (finite) source database C over (also called source model), fixing the extension of the predicates of A S The dbs for I are logical interpretations for A G, called global dbs Definition The semantics of I relative to C is: sem C (I) = { B B is a global database that satisfies G and that satisfies M wrt C } To satisfy G means to satisfy all axioms of G, i.e., being a model of G What it means to satisfy M wrt C depends on the nature of M M. Lenzerini A tutorial on Data Integration 21 / 132
24 Motivations Data integration: Logical formalization Mappings Syntax and semantics of a data integration system Semantics of a data integration system Part 1: Introduction to data integration Which are the dbs that satisfy I, i.e., the logical models of I? We refer only to dbs over a fixed infinite domain of elements We start from the data present in the sources: these are modeled through a (finite) source database C over (also called source model), fixing the extension of the predicates of A S The dbs for I are logical interpretations for A G, called global dbs Definition The semantics of I relative to C is: sem C (I) = { B B is a global database that satisfies G and that satisfies M wrt C } To satisfy G means to satisfy all axioms of G, i.e., being a model of G What it means to satisfy M wrt C depends on the nature of M M. Lenzerini A tutorial on Data Integration 21 / 132
25 Motivations Data integration: Logical formalization Mappings Syntax and semantics of a data integration system Part 1: Introduction to data integration Comparison between data integration and data exchange Data integration system I = G, S, M Data exchange setting M = S, T, Σ I = G, S, M M = S, T, Σ S G M finite source database C global database with no variable global database satisfying G and M wrt C S T Σ finite source instance I target instance is finite and may contain variables solution J M. Lenzerini A tutorial on Data Integration 22 / 132
26 Motivations Data integration: Logical formalization Mappings Queries to a data integration system Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 23 / 132
27 Motivations Data integration: Logical formalization Mappings Queries to a data integration system Queries to a data integration system I Part 1: Introduction to data integration The domain is fixed, and we do not distinguish an element of from the constant denoting it standard names Queries to I are expressions (of a certain arity) over the alphabet A G ; the evaluation of a query of arity n to I relative to a source database C returns a set of tuples of elements, each of arity n When evaluating q over I = G, S, M, we have to consider that for a given source database C, there may be many global databases B satisfying G and M wrt C, i.e., many global databases B in sem C (I) We consider those answers to q that hold for all global databases in sem C (I) certain answers M. Lenzerini A tutorial on Data Integration 24 / 132
28 Motivations Data integration: Logical formalization Mappings Queries to a data integration system Semantics of queries to I Part 1: Introduction to data integration Definition Given q, I, and C, the set of certain answers to q wrt I and C is cert(q, I, C) = { q B B sem C (I) } Query answering in information integration means to compute the certain answers, i.e., it corresponds to logical implication Complexity is measured mainly wrt the size of the source db C, i.e., we consider data complexity When we want to look at query answering as a decision problem, we consider the problem of deciding whether a given tuple c is a certain answer to q wrt I and C, i.e., whether c cert(q, I, C) M. Lenzerini A tutorial on Data Integration 25 / 132
29 Motivations Data integration: Logical formalization Mappings Queries to a data integration system Part 1: Introduction to data integration Databases with incomplete information, or knowledge bases Traditional database: one model of a first-order theory. Query answering means evaluating a formula in the model Database with incomplete information, or knowledge base: set of models (specified, for example, as a restricted first-order theory). Query answering means computing the tuples that satisfy the query in all the models in the set There is a strong connection between query answering in information integration and query answering in databases with incomplete information under constraints (or, query answering in knowledge bases) M. Lenzerini A tutorial on Data Integration 26 / 132
30 Motivations Data integration: Logical formalization Mappings Queries to a data integration system Part 1: Introduction to data integration Databases with incomplete information, or knowledge bases Traditional database: one model of a first-order theory. Query answering means evaluating a formula in the model Database with incomplete information, or knowledge base: set of models (specified, for example, as a restricted first-order theory). Query answering means computing the tuples that satisfy the query in all the models in the set There is a strong connection between query answering in information integration and query answering in databases with incomplete information under constraints (or, query answering in knowledge bases) M. Lenzerini A tutorial on Data Integration 26 / 132
31 Motivations Data integration: Logical formalization Mappings Queries to a data integration system Query answering: problem space Part 1: Introduction to data integration Global schema Relational data without constraints (i.e., empty theory) with constraints Non-relational data Graph-databases Talk 18 Paolo Guagliardo View-based query processing XML-data Talk 14 Lucja Kot, XML data integration Ontologies Talk 8 Yazmin A. Ibanez, Description logics for data integration Mapping GAV, LAV, or GLAV Semantics arbitrary vs. finite databases Standard logic vs. Inconsistency-tolerant semantics Talk 7 Slawomir Staworko, Consistent query answering M. Lenzerini A tutorial on Data Integration 27 / 132
32 Motivations Data integration: Logical formalization Mappings Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 28 / 132
33 Motivations Data integration: Logical formalization Mappings Types of mappings Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 29 / 132
34 Motivations Data integration: Logical formalization Mappings Types of mappings The mapping Part 1: Introduction to data integration In this tutorial, we mainly consider sound mappings, i.e., mapping assertions stating that the presence of certain data in the sources implies the presence of certain data in the virtual global database. How is the mapping M between S and G specified? Are the sources defined in terms of the global schema? Approach called source-centric, or local-as-view, or LAV Is the global schema defined in terms of the sources? Approach called global-schema-centric, or global-as-view, or GAV A mixed approach? Approach called GLAV M. Lenzerini A tutorial on Data Integration 30 / 132
35 Motivations Data integration: Logical formalization Mappings Types of mappings GAV vs. LAV Example Part 1: Introduction to data integration Global schema: movie(title, Year, Director) european(director) review(title, Critique) Source 1: r 1 (Title, Year, Director) Source 2: r 2 (Title, Critique) since 1990 since 1960, european directors Query: Title and critique of movies in 1998 { (t, r) d. movie(t, 1998, d) review(t, r) }, abbreviated { (t, r) movie(t, 1998, d), review(t, r) } M. Lenzerini A tutorial on Data Integration 31 / 132
36 Motivations Data integration: Logical formalization Mappings GAV mappings Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 32 / 132
37 Motivations Data integration: Logical formalization Mappings GAV mappings Formalization of GAV Part 1: Introduction to data integration In GAV (with sound sources), the mapping M is a set of assertions: x. φ S ( x) g( x) one for each element g in A G, with φ S a query over S of the arity of g Given a source db C, a db B for G satisfies M wrt C if for each g G: φ C S gb Given a source database C, M provides direct information about which data in C satisfy the elements of the global schema Elements in the global schema G can be considered as views over the sources. This is why this approach is called global as view M. Lenzerini A tutorial on Data Integration 33 / 132
38 Motivations Data integration: Logical formalization Mappings GAV mappings GAV Example Part 1: Introduction to data integration Global schema: movie(title, Year, Director) european(director) review(title, Critique) GAV: to each relation in the global schema, M associates a view over the sources: t, y, d r 1 (t, y, d) movie(t, y, d) d, t, y r 1 (t, y, d) european(d) t, r r 2 (t, r) review(t, r) M. Lenzerini A tutorial on Data Integration 34 / 132
39 Motivations Data integration: Logical formalization Mappings GAV mappings GAV Example of query processing Part 1: Introduction to data integration The query { (t, r) movie(t, 1998, d), review(t, r) } is processed by expanding each atom according to its associated definition in M, so as to come up with a query over the source relations In particular: { (t, r) movie(t, 1998, d), review(t, r) } { (t, r) r 1 (t, 1998, d), r 2 (t, r) } M. Lenzerini A tutorial on Data Integration 35 / 132
40 Motivations Data integration: Logical formalization Mappings GAV mappings GAV Example of constraints Part 1: Introduction to data integration Global schema containing constraints: movie(title, Year, Director) european(director) review(title, Critique) x, c review(x, c) y, d movie(x, y, d) GAV mappings: t, y, d r 1 (t, y, d) movie(t, y, d) d, t, y r 1 (t, y, d) european(d) t, r r 2 (t, r) review(t, r) M. Lenzerini A tutorial on Data Integration 36 / 132
41 Motivations Data integration: Logical formalization Mappings LAV mappings Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 37 / 132
42 Motivations Data integration: Logical formalization Mappings LAV mappings Formalization of LAV Part 1: Introduction to data integration In LAV (with sound sources), the mapping M is a set of assertions: x. s( x) φ G ( x) one for each source element s in A S, with φ G a query over G of the arity of s. Given source db C, a db B for G satisfies M wrt C if for each s S: s C φ B G The mapping M and the source database C do not provide direct information about which data satisfy the global schema Sources, i.e., elements in S, can be considered as views over the global schema. This is why this approach is called local-as-views. M. Lenzerini A tutorial on Data Integration 38 / 132
43 Motivations Data integration: Logical formalization Mappings LAV mappings LAV Example Part 1: Introduction to data integration Global schema: movie(title, Year, Director) european(director) review(title, Critique) LAV: to each source relation, M associates a view over the global schema: r 1 (t, y, d) { (t, y, d) movie(t, y, d), european(d), y 1960 } r 2 (t, r) { (t, r) movie(t, y, d), review(t, r), y 1990 } The query { (t, r) movie(t, 1998, d), review(t, r) } is processed by means of an inference mechanism that aims at re-expressing the atoms of the global schema in terms of atoms at the sources. In this case: { (t, r) r 2 (t, r), r 1 (t, 1998, d) } M. Lenzerini A tutorial on Data Integration 39 / 132
44 Motivations Data integration: Logical formalization Mappings LAV mappings GAV and LAV Comparison Part 1: Introduction to data integration GAV: (e.g., Carnot, SIMS, Tsimmis, IBIS, Momis, DisAtDis,... ) Quality depends on how well we have compiled the sources into the global schema through the mapping Whenever a source changes or a new one is added, the global schema needs to be reconsidered Query processing can be based on some sort of unfolding (query answering looks easier without constraints) LAV: (e.g., Information Manifold, DWQ, Picsel) Quality depends on how well we have characterized the sources High modularity and extensibility (if the global schema is well designed, when a source changes, only its definition is affected) Query processing needs reasoning (query answering complex) M. Lenzerini A tutorial on Data Integration 40 / 132
45 Motivations Data integration: Logical formalization Mappings GLAV mappings Outline Part 1: Introduction to data integration 1 Motivations What is data integration? Variants of data integration Issues in data integration 2 Data integration: Logical formalization Syntax and semantics of a data integration system Queries to a data integration system 3 Mappings Types of mappings GAV mappings LAV mappings GLAV mappings M. Lenzerini A tutorial on Data Integration 41 / 132
46 Motivations Data integration: Logical formalization Mappings GLAV mappings Beyond GAV and LAV: GLAV Part 1: Introduction to data integration In GLAV (with sound sources), the mapping M is a set of assertions: x. φ S ( x) φ G ( x) with φ S a query over S, and φ G a query over G of the same arity as φ S Given source db C, a db B for G satisfies M wrt C if for each x. φ S ( x) φ G ( x) in M: φ C S φb G As for LAV, the mapping M does not provide direct information about which data satisfy the global schema, and, therefore, to answer a query q over G, we have to infer how to use M in order to access the source database C M. Lenzerini A tutorial on Data Integration 42 / 132
47 Motivations Data integration: Logical formalization Mappings GLAV mappings GLAV Example Part 1: Introduction to data integration Global schema: work(person, Project), area(project, Field) Source 1: hasjob(person, Field) Source 2: teaches(professor, Course), in(course, Field) Source 3: get(researcher, Grant), for(grant, Project) GLAV mapping: {(r, f) hasjob(r, f)} {(r, f) work(r, p), area(p, f)} {(r, f) teaches(r, c), in(c, f)} {(r, f) work(r, p), area(p, f)} {(r, p) get(r, g), for(g, p)} {(r, f) work(r, p)} M. Lenzerini A tutorial on Data Integration 43 / 132
48 Motivations Data integration: Logical formalization Mappings GLAV mappings Exact mappings Part 1: Introduction to data integration Although we consider only sound mappings in this tutorial, exact mappings have also been studied in data integration. An exact GLAV mapping assertion have the form: x. φ S ( x) φ G ( x) with φ S a query over S, and φ G a query over G of the same arity as φ S Given source db C, a db B for G satisfies the exact mapping assertion x. φ S ( x) φ G ( x) if φ C S = φb G GAV and LAV exact mapping assertions are defined in the obvious way M. Lenzerini A tutorial on Data Integration 44 / 132
49 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Part 2: Query answering for relational data Part II Query answering for relational data M. Lenzerini A tutorial on Data Integration 45 / 132
50 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 46 / 132
51 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 47 / 132
52 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Query answering in different settings Part 2: Query answering for relational data The problem of query answering comes in different forms, depending on Global schema relational without constraints (i.e., empty theory) with constraints non-relational data Mapping GAV LAV (or GLAV) Queries user queries queries in the mapping If not otherwise stated, we will assume that both the user queries and the queries in the mappings are conjunctive queries M. Lenzerini A tutorial on Data Integration 48 / 132
53 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Incompleteness and inconsistency Part 2: Query answering for relational data Query answering heavily depends upon whether incompleteness/inconsistency shows up Incompleteness: the cardinality of sem C (I) is greater than 1 Inconsistency: the cardinality of sem C (I) is 0 Constraints in G Type of mapping Incompleteness Inconsistency no GAV very limited no no (G)LAV yes no yes GAV yes yes yes (G)LAV yes yes M. Lenzerini A tutorial on Data Integration 49 / 132
54 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Main approaches to query answering Part 2: Query answering for relational data Based on canonical database Based on query rewriting Based on counterexample Based on query containment M. Lenzerini A tutorial on Data Integration 50 / 132
55 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 51 / 132
56 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment The notion of canonical database Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 52 / 132
57 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment The notion of canonical database The canonical database Part 2: Query answering for relational data Given data integration system I, and source database C, a canonical database (or, canonical model) for I and C is global database B sem C (I), possibly with variables, such that for each query q on A G, and each tuple t, t cert(q, I, C) if and only if t q B (or, t q1 B for a suitable query q 1 ) Note the similarity with the notion of universal solution in data exchange In what follows, we discuss the approach based on canonical database by referring to GAV without constraints, and by limiting the attention to positive user queries M. Lenzerini A tutorial on Data Integration 53 / 132
58 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV without constraints Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 54 / 132
59 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV without constraints Part 2: Query answering for relational data GAV without constraints Retrieved global database Definition Given a GAV data integration system I = G, S, M, and a source database C for S, we call retrieved global database (for I wrt C), denoted M(C), the global database obtained by applying the queries in the mapping, and transferring to the elements of G the corresponding tuples retrieved from C Note that, since mappings are of type GAV, the tuples to be tranferred to the global schema are definite (they do not contain existentially quantified elements) M. Lenzerini A tutorial on Data Integration 55 / 132
60 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV without constraints Part 2: Query answering for relational data GAV without constraints Example Consider I = G, S, M, with Global schema G: student(code, Name, City) university(code, Name) enrolled(scode, Ucode) Source schema S: relations s 1 (Scode, Sname, City, Age), s 2 (Ucode, Uname), s 3 (Scode, Ucode) Mapping M: c, n, ci s 1 (c, n, ci, a) student(c, n, ci) c, n s 2 (c, n) university(c, n) s, u s 3 (s, u) enrolled(s, u) M. Lenzerini A tutorial on Data Integration 56 / 132
61 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV without constraints Example of retrieved global database Part 2: Query answering for relational data university student Code Name Code Name City AF bocconi 12 anne florence BN ucla 15 bill oslo s C 1 s C 2 12 anne florence 21 AF bocconi 15 bill oslo 24 BN ucla enrolled Scode Ucode 12 AF 16 BN s C 3 12 AF 16 BN Example of source database C and corresponding retrieved global database M(C) M. Lenzerini A tutorial on Data Integration 57 / 132
62 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV without constraints GAV without constraints Canonical database Part 2: Query answering for relational data GAV mapping assertions have the form x. φ S ( x) g( x) where φ S is a query over the source relations, and g is an element of G In general, given a source database C, there are several databases in sem C (I) However, it is easy to see that, when G has no axiom, M(C) is the intersection of all such databases, and therefore, is finite, and is the only minimal model of I For positive queries, M(C) is a canonical database of I wrt C: If q is a positive query, then t cert(q, I, C) iff t q M(C) M. Lenzerini A tutorial on Data Integration 58 / 132
63 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV without constraints Exercise 1 Part 2: Query answering for relational data Is the following problem decidable? Given a GAV data integration system I without constraints, a source database C, a first order logic query q over A G, compute the certain answers cert(q, I, C) M. Lenzerini A tutorial on Data Integration 59 / 132
64 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV without constraints Extensions to other cases Part 2: Query answering for relational data (G)LAV without constraints the chase constructs a universal solution (with variables) GAV and (G)LAV with constraints a finite universal solution may not exist M. Lenzerini A tutorial on Data Integration 60 / 132
65 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 61 / 132
66 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment What is a rewriting Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 62 / 132
67 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment What is a rewriting Query answering based on query rewriting Part 2: Query answering for relational data Given data integration system I, and a user query q, compute a query q 1 over A S, and then compute q C 1 Thus, query answering is divided in two steps: 1 Reformulate the user query in terms of a new query over the alphabet of A S, called source rewriting, or simply rewriting expressed in a given query language 2 Evaluate the rewriting over the source database C M. Lenzerini A tutorial on Data Integration 63 / 132
68 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment What is a rewriting Query rewriting Part 2: Query answering for relational data q Reformulation rew(q, I) I C (under OWA) Query evaluation (under CWA) ans(q, I, C) The language of rew(q, I) is chosen a priori! M. Lenzerini A tutorial on Data Integration 64 / 132
69 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment What is a rewriting What is a rewriting? Part 2: Query answering for relational data Definition A query q 1 over the alphabet A S is a sound rewriting of q with respect to I if for all source database C and for all global database B sem C (I), we have that q C 1 qb From the above definition, it follows that a sound rewriting computes only certain answers: indeed, if q 1 is a sound rewriting, then for all source database C, q C 1 {q B B sem C (I)} M. Lenzerini A tutorial on Data Integration 65 / 132
70 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 66 / 132
71 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting Perfect rewriting Part 2: Query answering for relational data What is the relationship between answering by rewriting and certain answers? [Calvanese & al. ICDT 05]: Let us consider the best possible rewriting Define cert [q,i] ( ) to be the function that, with q and I fixed, given source database C, computes the certain answers cert(q, I, C) cert [q,i] can be seen as a query on the alphabet A S cert [q,i] is a (sound) rewriting of q wrt I, i.e., it computes only certain answers No sound rewriting exists that is better than cert [q,i], i.e., if r is a sound rewriting of q wrt I, then r cert [q,i] Hence, cert [q,i] is called the perfect rewriting of q wrt I M. Lenzerini A tutorial on Data Integration 67 / 132
72 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting Perfect rewriting Part 2: Query answering for relational data What is the relationship between answering by rewriting and certain answers? [Calvanese & al. ICDT 05]: Let us consider the best possible rewriting Define cert [q,i] ( ) to be the function that, with q and I fixed, given source database C, computes the certain answers cert(q, I, C) cert [q,i] can be seen as a query on the alphabet A S cert [q,i] is a (sound) rewriting of q wrt I, i.e., it computes only certain answers No sound rewriting exists that is better than cert [q,i], i.e., if r is a sound rewriting of q wrt I, then r cert [q,i] Hence, cert [q,i] is called the perfect rewriting of q wrt I M. Lenzerini A tutorial on Data Integration 67 / 132
73 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting Perfect rewriting Part 2: Query answering for relational data What is the relationship between answering by rewriting and certain answers? [Calvanese & al. ICDT 05]: Let us consider the best possible rewriting Define cert [q,i] ( ) to be the function that, with q and I fixed, given source database C, computes the certain answers cert(q, I, C) cert [q,i] can be seen as a query on the alphabet A S cert [q,i] is a (sound) rewriting of q wrt I, i.e., it computes only certain answers No sound rewriting exists that is better than cert [q,i], i.e., if r is a sound rewriting of q wrt I, then r cert [q,i] Hence, cert [q,i] is called the perfect rewriting of q wrt I M. Lenzerini A tutorial on Data Integration 67 / 132
74 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting Query answering: reformulation + evaluation Part 2: Query answering for relational data q I C Perfect reformulation (under OWA) cert [q,i] Query evaluation (under CWA) cert(q, I, C) In principle, we need an arbitrary query language to express cert [q,i] M. Lenzerini A tutorial on Data Integration 68 / 132
75 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting More about rewriting Part 2: Query answering for relational data We are interested in rewritings r of q wrt I that are: sound, i.e., compute only tuples in cert(q, I, C) for every C (i.e., r cert [q,i] ) expressed in a given query language L sound, and maximal for a class of queries L perfect A sound rewriting r of q wrt I is maximal for L if for all r L, r cert [q,i] implies r r M. Lenzerini A tutorial on Data Integration 69 / 132
76 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting More about rewriting Part 2: Query answering for relational data We are interested in rewritings r of q wrt I that are: sound, i.e., compute only tuples in cert(q, I, C) for every C (i.e., r cert [q,i] ) expressed in a given query language L sound, and maximal for a class of queries L perfect A sound rewriting r of q wrt I is maximal for L if for all r L, r cert [q,i] implies r r M. Lenzerini A tutorial on Data Integration 69 / 132
77 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting More about rewriting Part 2: Query answering for relational data We are interested in rewritings r of q wrt I that are: sound, i.e., compute only tuples in cert(q, I, C) for every C (i.e., r cert [q,i] ) expressed in a given query language L sound, and maximal for a class of queries L perfect A sound rewriting r of q wrt I is maximal for L if for all r L, r cert [q,i] implies r r M. Lenzerini A tutorial on Data Integration 69 / 132
78 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting More about rewriting Part 2: Query answering for relational data We are interested in rewritings r of q wrt I that are: sound, i.e., compute only tuples in cert(q, I, C) for every C (i.e., r cert [q,i] ) expressed in a given query language L sound, and maximal for a class of queries L perfect A sound rewriting r of q wrt I is maximal for L if for all r L, r cert [q,i] implies r r M. Lenzerini A tutorial on Data Integration 69 / 132
79 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting More about rewriting Part 2: Query answering for relational data We are interested in rewritings r of q wrt I that are: sound, i.e., compute only tuples in cert(q, I, C) for every C (i.e., r cert [q,i] ) expressed in a given query language L sound, and maximal for a class of queries L perfect A sound rewriting r of q wrt I is maximal for L if for all r L, r cert [q,i] implies r r M. Lenzerini A tutorial on Data Integration 69 / 132
80 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Perfect rewriting Properties of the perfect rewriting Part 2: Query answering for relational data Can the perfect rewriting be expressed in a certain query language? For a given class of queries, what is the relationship between a maximal rewriting and the perfect rewriting? From a semantical point of view From a computational point of view Which is the computational complexity of finding the perfect rewriting, and how big is it? Which is the computational complexity of evaluating the perfect rewriting? M. Lenzerini A tutorial on Data Integration 70 / 132
81 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 71 / 132
82 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Part 2: Query answering for relational data LAV without constraints Query answering via rewriting Given a LAV data integration system I = G, S, M, and a query q over S, exp(q ) is the query over G that is obtained by substituting every atom with the view that M associates to it. Let q be a conjunctive query over G, and q a conjunctive query over S. q is a sound rewriting of q if and only if exp(q ) q. We may be interested in exact rewritings, i.e., rewritings q that are logically equivalent to the query, modulo M (i.e., exp(q ) q). However, exact rewritings may not exist. M. Lenzerini A tutorial on Data Integration 72 / 132
83 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Part 2: Query answering for relational data LAV without constraints Query answering via rewriting Given a LAV data integration system I = G, S, M, and a query q over S, exp(q ) is the query over G that is obtained by substituting every atom with the view that M associates to it. Let q be a conjunctive query over G, and q a conjunctive query over S. q is a sound rewriting of q if and only if exp(q ) q. We may be interested in exact rewritings, i.e., rewritings q that are logically equivalent to the query, modulo M (i.e., exp(q ) q). However, exact rewritings may not exist. M. Lenzerini A tutorial on Data Integration 72 / 132
84 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Exercise 2 Part 2: Query answering for relational data Prove the following: Let I be a LAV data integration system without constraints in the global schema, let q be a conjunctive query over G, and let q be a conjunctive query over S. q is a sound rewriting of q if and only if exp(q ) q. Exhibit a LAV data integration system and a query q such that no exact rewriting of q exists with respect to I. M. Lenzerini A tutorial on Data Integration 73 / 132
85 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Part 2: Query answering for relational data LAV without constraints Rewriting for conjunctive queries Consider a LAV data integration system I = G, S, M, and a query q over G. Let q and the queries in M be conjunctive queries. Theorem If the body of q has n atoms, and q is a maximal rewriting in the class of conjunctive queries, then q has at most n atoms. Sketch of the proof: Since q is a rewriting of q, we have that exp(q ) q. Consider the homomorphism h from q to exp(q ). Each atom in q is mapped by h to at most one atom in exp(q ). If there are more than n atoms in q, then the expansion of some atom in q is disjoint from the image of h, and then this atom can be removed from q while preserving containment (i.e., q is not maximal). This provides us with an algorithm for computing the set of maximal conjunctive rewritings. M. Lenzerini A tutorial on Data Integration 74 / 132
86 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Part 2: Query answering for relational data LAV without constraints Rewriting for conjunctive queries Let q be the union of all maximal rewritings of q for the class of CQs Theorem (Levy & al. PODS 95, Abiteboul & Duschka PODS 98) q is the maximal rewriting for the class of unions of conjunctive queries (UCQs) q is the perfect rewriting of q wrt I q is a PTIME query (actually, LogSpace) q is an exact rewriting (equivalent to q for each database B of I), if an exact rewriting exists Does this ideal situation carry on to cases where q and M allow for union? M. Lenzerini A tutorial on Data Integration 75 / 132
87 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Part 2: Query answering for relational data LAV without constraints Rewriting for conjunctive queries Let q be the union of all maximal rewritings of q for the class of CQs Theorem (Levy & al. PODS 95, Abiteboul & Duschka PODS 98) q is the maximal rewriting for the class of unions of conjunctive queries (UCQs) q is the perfect rewriting of q wrt I q is a PTIME query (actually, LogSpace) q is an exact rewriting (equivalent to q for each database B of I), if an exact rewriting exists Does this ideal situation carry on to cases where q and M allow for union? M. Lenzerini A tutorial on Data Integration 75 / 132
88 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Part 2: Query answering for relational data LAV without constraints Rewriting for positive views When queries over the global schema in the mapping contain union: Computing certain answering is conp-complete in data complexity [van der Meyden TCS 93] Hence, the perfect rewriting cert [q,i] is a conp-complete query, and therefore cannot be expressed as a union of conjunctive query We do not have the ideal situation we had for conjunctive queries M. Lenzerini A tutorial on Data Integration 76 / 132
89 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Exercise 3 Part 2: Query answering for relational data Prove the following: When queries over the global schema of a LAV data integration system without constraints contain union, computing certain answering is conp-complete in data complexity M. Lenzerini A tutorial on Data Integration 77 / 132
90 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment LAV without constraints Exercise 4 Part 2: Query answering for relational data Define an algorithm based on rewriting for computing the certain answers to conjunctive queries in GLAV data integration systems without constraints. M. Lenzerini A tutorial on Data Integration 78 / 132
91 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 79 / 132
92 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Inclusion dependencies (IDs) Part 2: Query answering for relational data An inclusion dependency (ID) states that the presence of a tuple t 1 in a relation implies the presence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of inclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ID r[1] s[2] corresponds to the FOL sentence x, y, w. r(x, y, w) z. s(z, x) Note: IDs are a special form of tuple-generating dependencies M. Lenzerini A tutorial on Data Integration 80 / 132
93 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Inclusion dependencies (IDs) Part 2: Query answering for relational data An inclusion dependency (ID) states that the presence of a tuple t 1 in a relation implies the presence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of inclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ID r[1] s[2] corresponds to the FOL sentence x, y, w. r(x, y, w) z. s(z, x) Note: IDs are a special form of tuple-generating dependencies M. Lenzerini A tutorial on Data Integration 80 / 132
94 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Inclusion dependencies (IDs) Part 2: Query answering for relational data An inclusion dependency (ID) states that the presence of a tuple t 1 in a relation implies the presence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of inclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ID r[1] s[2] corresponds to the FOL sentence x, y, w. r(x, y, w) z. s(z, x) Note: IDs are a special form of tuple-generating dependencies M. Lenzerini A tutorial on Data Integration 80 / 132
95 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Inclusion dependencies Example Part 2: Query answering for relational data Global schema G: player(pname, YOB, Pteam) team(tname, Tcity, Tleader) Constraints: Sources S: team[tleader, Tname] player[pname, Pteam] s 1 and s 3 store players s 2 stores teams Mapping M: x, y, z s 1 (x, y, z) s 3 (x, y, z) player(x, y, z) x, y, z s 2 (x, y, z) team(x, y, z) M. Lenzerini A tutorial on Data Integration 81 / 132
96 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Part 2: Query answering for relational data Inclusion dependencies Example retrieved global db Source database C: s 1 : Totti 1971 Roma s 2 : Juve Torino Del Piero s 3 : Buffon 1978 Juve Retrieved global database M(C): player: Totti 1971 Roma Buffon 1978 Juve team: Juve Torino Del Piero M. Lenzerini A tutorial on Data Integration 82 / 132
97 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Part 2: Query answering for relational data Inclusion dependencies Example retrieved global db Source database C: s 1 : Totti 1971 Roma s 2 : Juve Torino Del Piero s 3 : Buffon 1978 Juve Retrieved global database M(C): player: Totti 1971 Roma Buffon 1978 Juve team: Juve Torino Del Piero M. Lenzerini A tutorial on Data Integration 82 / 132
98 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Part 2: Query answering for relational data Inclusion dependencies Example retrieved global db player: Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team: Juve Torino Del Piero The ID on the global schema tells us that Del Piero is a player of Juve All global databases satisfying I have at least the tuples shown above, where α is some value of the domain Warnings 1 There may be an infinite number of databases satisfying I 2 In case of cyclic IDs, databases satisfying I may be of infinite size M. Lenzerini A tutorial on Data Integration 83 / 132
99 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Part 2: Query answering for relational data Inclusion dependencies Example retrieved global db player: Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team: Juve Torino Del Piero The ID on the global schema tells us that Del Piero is a player of Juve All global databases satisfying I have at least the tuples shown above, where α is some value of the domain Warnings 1 There may be an infinite number of databases satisfying I 2 In case of cyclic IDs, databases satisfying I may be of infinite size M. Lenzerini A tutorial on Data Integration 83 / 132
100 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Part 2: Query answering for relational data Inclusion dependencies Example retrieved global db player: Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team: Juve Torino Del Piero The ID on the global schema tells us that Del Piero is a player of Juve All global databases satisfying I have at least the tuples shown above, where α is some value of the domain Warnings 1 There may be an infinite number of databases satisfying I 2 In case of cyclic IDs, databases satisfying I may be of infinite size M. Lenzerini A tutorial on Data Integration 83 / 132
101 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Part 2: Query answering for relational data Chasing inclusion dependencies Infinite construction Intuitive strategy: Add new facts until IDs are satisfied Problem: Infinite construction in the presence of cyclic IDs Example Let r be binary with r[2] r[1] Suppose M(C) = { r(a, b) } 1 add r(b, c 1 ) 2 add r(c 1, c 2 ) 3 add r(c 2, c 3 ) 4... (ad infinitum) Example Let r, s be binary with r[1] s[1], s[2] r[1] Suppose M(C) = { r(a, b) } 1 add s(a, c 1 ) 2 add r(c 1, c 2 ) 3 add s(c 1, c 3 ) 4 add r(c 3, c 4 ) 5... (ad infinitum) M. Lenzerini A tutorial on Data Integration 84 / 132
102 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Part 2: Query answering for relational data Chasing inclusion dependencies Infinite construction Intuitive strategy: Add new facts until IDs are satisfied Problem: Infinite construction in the presence of cyclic IDs Example Let r be binary with r[2] r[1] Suppose M(C) = { r(a, b) } 1 add r(b, c 1 ) 2 add r(c 1, c 2 ) 3 add r(c 2, c 3 ) 4... (ad infinitum) Example Let r, s be binary with r[1] s[1], s[2] r[1] Suppose M(C) = { r(a, b) } 1 add s(a, c 1 ) 2 add r(c 1, c 2 ) 3 add s(c 1, c 3 ) 4 add r(c 3, c 4 ) 5... (ad infinitum) M. Lenzerini A tutorial on Data Integration 84 / 132
103 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Part 2: Query answering for relational data Chasing inclusion dependencies Infinite construction Intuitive strategy: Add new facts until IDs are satisfied Problem: Infinite construction in the presence of cyclic IDs Example Let r be binary with r[2] r[1] Suppose M(C) = { r(a, b) } 1 add r(b, c 1 ) 2 add r(c 1, c 2 ) 3 add r(c 2, c 3 ) 4... (ad infinitum) Example Let r, s be binary with r[1] s[1], s[2] r[1] Suppose M(C) = { r(a, b) } 1 add s(a, c 1 ) 2 add r(c 1, c 2 ) 3 add s(c 1, c 3 ) 4 add r(c 3, c 4 ) 5... (ad infinitum) M. Lenzerini A tutorial on Data Integration 84 / 132
104 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints The ID-chase rule Part 2: Query answering for relational data The chase for IDs has only one rule, the ID-chase rule Let D be a database: if the schema contains the ID r[i 1,..., i k ] s[j 1,..., j k ] and there is a fact in D of the form r(a 1,..., a n ) and there are no facts in D of the form s(b 1,..., b m ) such that a il = b jl for each l {1,..., k}, then add to D the fact s(c 1,..., c m ), where for each h {1,..., m}, if h = j l for some l then c h = a il otherwise c h is a new constant symbol (not in D yet) Notice: New existential symbols are introduced (skolem terms) M. Lenzerini A tutorial on Data Integration 85 / 132
105 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Properties of the chase Part 2: Query answering for relational data Bad news: the chase is in general infinite Good news: the chase identifies a canonical database (with variables) We can use the chase to prove soundness and completeness of a query processing method... but only for positive queries! M. Lenzerini A tutorial on Data Integration 86 / 132
106 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Limiting the chase Part 2: Query answering for relational data Why don t we use a finite number of existential constants in the chase? Example Consider r[1] s[1], and s[2] r[1] and suppose M(C) = { r(a, b) } Compute chase(m(c)) with only one new constant c 1 : 0) r(a, b); 1) add s(a, c 1 ) 2) add r(c 1, c 1 ) 3) add s(c 1, c 1 ) This database is not a canonical database for I wrt C E.g., for query q = { (x) r(x, y), s(y, y) }, we have a q chase(m(c)) while a cert(q, I, C) Arbitrarily limiting the chase is unsound, for any finite number of new constants M. Lenzerini A tutorial on Data Integration 87 / 132
107 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Rewriting: Chasing the query Part 2: Query answering for relational data Instead of chasing the data, we chase the query Is the dual notion of the database chase IDs are applied from right to left to the query atoms Advantage: much easier termination conditions, which imply: decidability properties efficiency This technique provides an algorithm for rewriting UCQs under IDs M. Lenzerini A tutorial on Data Integration 88 / 132
108 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Rewriting rule for inclusion dependencies Part 2: Query answering for relational data Intuition: Use the IDs as basic rewriting rules Example Consider a query q = { (x, z) player(x, y, z) } and the constraint team[tleader, Tname] player[pname, Pteam] as a logic rule: player(w 3, w 4, w 1 ) team(w 1, w 2, w 3 ) We add to the rewriting the query q = { (x, z) team(x, y, z) } Definition Basic rewriting step: when an atom unifies with the head of the rule substitute the atom with the body of the rule M. Lenzerini A tutorial on Data Integration 89 / 132
109 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Rewriting rule for inclusion dependencies Part 2: Query answering for relational data Intuition: Use the IDs as basic rewriting rules Example Consider a query q = { (x, z) player(x, y, z) } and the constraint team[tleader, Tname] player[pname, Pteam] as a logic rule: player(w 3, w 4, w 1 ) team(w 1, w 2, w 3 ) We add to the rewriting the query q = { (x, z) team(x, y, z) } Definition Basic rewriting step: when an atom unifies with the head of the rule substitute the atom with the body of the rule M. Lenzerini A tutorial on Data Integration 89 / 132
110 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Rewriting rule for inclusion dependencies Part 2: Query answering for relational data Intuition: Use the IDs as basic rewriting rules Example Consider a query q = { (x, z) player(x, y, z) } and the constraint team[tleader, Tname] player[pname, Pteam] as a logic rule: player(w 3, w 4, w 1 ) team(w 1, w 2, w 3 ) We add to the rewriting the query q = { (x, z) team(x, y, z) } Definition Basic rewriting step: when an atom unifies with the head of the rule substitute the atom with the body of the rule M. Lenzerini A tutorial on Data Integration 89 / 132
111 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Query Rewriting for IDs Algorithm ID-rewrite Part 2: Query answering for relational data Iterative execution of: 1 Reduction: Atoms that unify with other atoms are eliminated and the unification is applied Variables that appear only once are marked 2 Basic rewriting step A rewriting step is applicable to an atom if it does not eliminate variables that appear somewhere else May introduce fresh variables Note: The algorithm works directly for unions of conjunctive queries (UCQs), and produces an UCQ as result M. Lenzerini A tutorial on Data Integration 90 / 132
112 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Query Rewriting for IDs Algorithm ID-rewrite Part 2: Query answering for relational data Iterative execution of: 1 Reduction: Atoms that unify with other atoms are eliminated and the unification is applied Variables that appear only once are marked 2 Basic rewriting step A rewriting step is applicable to an atom if it does not eliminate variables that appear somewhere else May introduce fresh variables Note: The algorithm works directly for unions of conjunctive queries (UCQs), and produces an UCQ as result M. Lenzerini A tutorial on Data Integration 90 / 132
113 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints The algorithm ID-rewrite Part 2: Query answering for relational data Input: relational schema G, set Ψ ID of IDs, UCQ Q Output: perfect rewriting of Q Q := Q; repeat Q aux := Q ; for each q Q aux do (a) for each g 1, g 2 body(q) do if g 1 and g 2 unify then Q := Q {τ(reduce(q, g 1, g 2 ))}; (b) for each g body(q) do for each ID Ψ ID do if ID is applicable to g then Q := Q { q[g/rewrite(g, ID)] } until Q aux = Q ; return Q M. Lenzerini A tutorial on Data Integration 91 / 132
114 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Query answering in GAV under IDs Part 2: Query answering for relational data Properties of ID-rewrite ID-rewrite terminates ID-rewrite produces a perfect rewriting of the input query More precisely, let unf M (q) be the unfolding of the query q wrt the GAV mapping M Theorem unf M (ID-rewrite(q)) is a perfect rewriting of the query q Theorem Query answering in GAV systems under IDs is in PTime in data complexity (actually in LogSpace) M. Lenzerini A tutorial on Data Integration 92 / 132
115 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Query answering in GAV under IDs Part 2: Query answering for relational data Properties of ID-rewrite ID-rewrite terminates ID-rewrite produces a perfect rewriting of the input query More precisely, let unf M (q) be the unfolding of the query q wrt the GAV mapping M Theorem unf M (ID-rewrite(q)) is a perfect rewriting of the query q Theorem Query answering in GAV systems under IDs is in PTime in data complexity (actually in LogSpace) M. Lenzerini A tutorial on Data Integration 92 / 132
116 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment GAV with constraints Exercise 5 Part 2: Query answering for relational data An exclusion dependency (ED) states that the presence of a tuple t 1 in a relation implies the absence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of exclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] = with i 1,..., i k components of r, and j 1,..., j k components of s Find an algorithm for computing certain answers to conjunctive queries in GAV with inclusion and exclusion dependencies. M. Lenzerini A tutorial on Data Integration 93 / 132
117 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 94 / 132
118 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Part 2: Query answering for relational data Query answering based on counterexample Given I, C, q, and t, a counterexample to t cert(q, I, C) is a database B sem C (I) such that t q B Thus, query answering based on counterexample can be described as follows: Given I, C, q, and t, check whether there exists a counterexample to t cert(q, I, C) M. Lenzerini A tutorial on Data Integration 95 / 132
119 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Exercise 6 Part 2: Query answering for relational data Consider the case of LAV with positive views. t cert(q, I, C) iff there is a database B 1 sem C (I) such that t q B1 In LAV with positive views, the mapping M has the form: x. φ S ( x) y 1. α 1 ( x, y 1 ) y h α h ( x, y h )) Find an algorithm for computing certain answers to conjuntive queries in LAV with positive views M. Lenzerini A tutorial on Data Integration 96 / 132
120 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Outline Part 2: Query answering for relational data 4 Approaches to query answering 5 Canonical database The notion of canonical database GAV without constraints 6 Query rewriting What is a rewriting Perfect rewriting LAV without constraints GAV with constraints 7 Counterexamples 8 Query containment M. Lenzerini A tutorial on Data Integration 97 / 132
121 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Query containment under constraints Part 2: Query answering for relational data Definition Query containment (under constraints) is the problem of checking whether q1 D is contained in qd 2 for every database D (satisfying the constraints), where q 1, q 2 are queries of the same arity M. Lenzerini A tutorial on Data Integration 98 / 132
122 Approaches to query answering Canonical database Query rewriting Counterexamples Query containment Exercise 7 Part 2: Query answering for relational data How can we solve the problem of computing the certain answers in terms of containment? M. Lenzerini A tutorial on Data Integration 99 / 132
123 Semi-structured data integration Ontology-based data integration Part 3: Beyond relational data Part III Beyond relational data M. Lenzerini A tutorial on Data Integration 100 / 132
124 Semi-structured data integration Ontology-based data integration Outline Part 3: Beyond relational data 9 Semi-structured data integration Semi-structured data and queries Graph databases 10 Ontology-based data integration M. Lenzerini A tutorial on Data Integration 101 / 132
125 Semi-structured data integration Ontology-based data integration Outline Part 3: Beyond relational data 9 Semi-structured data integration Semi-structured data and queries Graph databases 10 Ontology-based data integration M. Lenzerini A tutorial on Data Integration 102 / 132
126 Semi-structured data integration Ontology-based data integration Semi-structured data and queries Outline Part 3: Beyond relational data 9 Semi-structured data integration Semi-structured data and queries Graph databases 10 Ontology-based data integration M. Lenzerini A tutorial on Data Integration 103 / 132
127 Semi-structured data integration Ontology-based data integration Semi-structured data and queries Introduction to semi-structured data integration Part 3: Beyond relational data The global schema (and possibly the sources) is expressed in a formalism aimed at modeling data with more flexibility wrt the relational model There are at least two types of semi-structured data models Graph databases Talk 18 Paolo Guagliardo View-based query processing XML data Talk 14 Lucja Kot, XML data integration M. Lenzerini A tutorial on Data Integration 104 / 132
128 Semi-structured data integration Ontology-based data integration Graph databases Outline Part 3: Beyond relational data 9 Semi-structured data integration Semi-structured data and queries Graph databases 10 Ontology-based data integration M. Lenzerini A tutorial on Data Integration 105 / 132
129 Semi-structured data integration Ontology-based data integration Graph databases Graph databases Part 3: Beyond relational data A graph database is a finite directed graph whose edges are labeled with a given finite alphabet Σ. Each node represents an object, and an edge from x to y labeled r represents the fact that the relation r holds between x and y. The basic query language for graph databases is the language of regular path queries. A regular path query (RPQ) over Σ is defined in terms of a regular language over Σ. The answer Q(D) to an RPQ Q over a grapg database D is the set of pairs of objects connected in D by a path traversing a sequence of edges forming a word in the regular language L(Q) defined by Q. M. Lenzerini A tutorial on Data Integration 106 / 132
130 Semi-structured data integration Ontology-based data integration Graph databases Global semi-structured database Part 3: Beyond relational data sub sub calls sub var sub sub sub calls sub var calls sub var sub var var M. Lenzerini A tutorial on Data Integration 107 / 132
131 Semi-structured data integration Ontology-based data integration Graph databases Global semi-structured databases and queries Part 3: Beyond relational data sub sub a calls sub var sub sub sub calls b sub var calls sub var sub var var Regular Path Query (RPQ): (sub) (sub (calls sub)) var M. Lenzerini A tutorial on Data Integration 108 / 132
132 Semi-structured data integration Ontology-based data integration Graph databases Global semi-structured databases and queries Part 3: Beyond relational data sub sub calls sub var sub sub sub calls b sub var calls sub var sub var var a 2RPQ: (sub ) (var sub) M. Lenzerini A tutorial on Data Integration 109 / 132
133 Semi-structured data integration Ontology-based data integration Graph databases The case of RPQ with LAV mappings Part 3: Beyond relational data Given I = G, S, M, where G simply fixes the labels (alphabet Σ) of a semi-structured database the sources in S are binary relations the mapping M is of type LAV, and associates to each source s a 2RPQ w over Σ x, y s(x, y) x w y a source database C a 2RPQ Q over Σ a pair of objects t we want to determine whether t cert(q, I, C). M. Lenzerini A tutorial on Data Integration 110 / 132
134 Semi-structured data integration Ontology-based data integration Graph databases Query answering: Technique Part 3: Beyond relational data We search for a counterexample to t cert(q, I, C), i.e., a database B sem C (I) such that t cert(q, I, C) Crucial point: it is sufficient to restrict our attention to canonical databases, i.e., databases B that can be represented by a word w B $ d 1 w 1 d 2 $ d 3 w 2 d 4 $ $ d 2m 1 w m d 2m $ where d 1,..., d 2m are constants in C, w i Σ +, and $ acts as a separator Use word-automata theoretic techniques! [Calvanese & al. PODS 2000] M. Lenzerini A tutorial on Data Integration 111 / 132
135 Semi-structured data integration Ontology-based data integration Graph databases Query answering: Technique Part 3: Beyond relational data To check whether (c, d) cert(q, I, C), we check for nonemptiness of A, that is the intersection of the one-way automaton A 0 that accepts words that represent databases, i.e., words of the form ($ C Σ + C) $ the one-way automata corresponding to the various A (Si,a,b) (for each source S i and for each pair (a, b) S C i ) the one-way automaton corresponding to the complement of A (Q,c,d) Indeed, any word accepted by such intersection automaton represents a counterexample to (c, d) cert(q, I, C). M. Lenzerini A tutorial on Data Integration 112 / 132
136 Semi-structured data integration Ontology-based data integration Graph databases Query answering: Complexity Part 3: Beyond relational data All two-way automata constructed above are of linear size in the size of Q, the queries associated to S 1,..., S k, and S C 1,..., SC k. Hence, the corresponding one-way automata would be exponential. However, we do not need to construct A explicitly. Instead, we can construct it on the fly while checking for nonemptiness. Query answering for 2RPQs is PSPACE-complete in combined complexity, and conp-complete in data complexity. M. Lenzerini A tutorial on Data Integration 113 / 132
137 Semi-structured data integration Ontology-based data integration Outline Part 3: Beyond relational data 9 Semi-structured data integration Semi-structured data and queries Graph databases 10 Ontology-based data integration M. Lenzerini A tutorial on Data Integration 114 / 132
138 Semi-structured data integration Ontology-based data integration The use of ontologies in data integration Part 3: Beyond relational data The global schema is expressed as an ontology, aimed at modeling the domain of discourse from a conceptual point of view, in turn expressed in termis of logic. Description Logics (DLs) [Baader & al. 2003] are logics specifically designed to represent and reason on structured knowledge. The domain of interest is composed of objects and is structured into: concepts, which correspond to classes, and denote sets of objects roles, which correspond to (binary) relationships, and denote binary relations on objects The knowledge is asserted through so-called assertions, i.e., logical axioms. M. Lenzerini A tutorial on Data Integration 115 / 132
139 Semi-structured data integration Ontology-based data integration Brief history of Description Logics Part 3: Beyond relational data 1977 KL-ONE Workshop: from Semantic Networks and Frames to Description Logics 1984 Trade-off expressiveness complexity of inference [Brachman & al. 1984] 1986 Description logics for conceptual modeling 1989 Classic system polynomial inference, but no assertions 1990 Expressive DLs tableaux correspondence with modal logic and PDLs automata 1995 Conceptual models fully captured in DLs 1998 Optimized tableaux make expressive DLs practical Query answering in DLs 2000 Standardization efforts OIL, DAML+OIL, OWL, OWL Polynomial DLs with assertions EL, DL-Lite M. Lenzerini A tutorial on Data Integration 116 / 132
140 Semi-structured data integration Ontology-based data integration Ingredients of a Description Logic Part 3: Beyond relational data A DL is characterized by: 1 A description language: how to form concepts and roles Human Male haschild haschild.(doctor Lawyer) 2 A mechanism to specify knowledge about concepts and roles (i.e., a TBox) T = { Father Human Male haschild, HappyFather Father haschild.(doctor Lawyer) } 3 A mechanism to specify properties of objects (i.e., an ABox) A = { HappyFather(john), haschild(john, mary) } 4 A set of inference services: how to reason on a given KB T = HappyFather haschild.(doctor Lawyer) T A = (Doctor Lawyer)(mary) M. Lenzerini A tutorial on Data Integration 117 / 132
141 Semi-structured data integration Ontology-based data integration Description language Part 3: Beyond relational data A description language provides the means for defining: concepts, corresponding to classes: interpreted as sets of objects; roles, corresponding to relationships: interpreted as binary relations on objects. To define concepts and roles: We start from a (finite) alphabet of atomic concepts and atomic roles, i.e., simply names for concept and roles. Then, by applying specific constructors, we can build complex concepts and roles, starting from the atomic ones. A description language is characterized by the set of constructs that are available for that. M. Lenzerini A tutorial on Data Integration 118 / 132
142 Semi-structured data integration Ontology-based data integration Semantics of a description language Part 3: Beyond relational data The formal semantics of DLs is given in terms of interpretations. An interpretation I = ( I, I) consists of: a nonempty set I, the domain of I an interpretation function I, which maps each individual a to an element a I of I each atomic concept A to a subset A I of I each atomic role P to a subset P I of I I The interpretation function is extended to complex concepts and roles according to their syntactic structure. M. Lenzerini A tutorial on Data Integration 119 / 132
143 Semi-structured data integration Ontology-based data integration Concept constructors Part 3: Beyond relational data Construct Syntax Example Semantics atomic concept A Doctor A I I atomic role P haschild P I I I atomic negation A Doctor I \ A I conjunction C D Hum Male C I D I (unqual.) exist. res. R haschild { a b. (a, b) R I } value restriction R.C haschild.male {a b. (a, b) R I b C I } bottom (C, D denote arbitrary concepts and R an arbitrary role) The above constructs form the basic language AL of the family of AL languages. M. Lenzerini A tutorial on Data Integration 120 / 132
144 Semi-structured data integration Ontology-based data integration Concept constructors Part 3: Beyond relational data Construct Syntax Example Semantics atomic concept A Doctor A I I atomic role P haschild P I I I atomic negation A Doctor I \ A I conjunction C D Hum Male C I D I (unqual.) exist. res. R haschild { a b. (a, b) R I } value restriction R.C haschild.male {a b. (a, b) R I b C I } bottom (C, D denote arbitrary concepts and R an arbitrary role) The above constructs form the basic language AL of the family of AL languages. M. Lenzerini A tutorial on Data Integration 120 / 132
145 Semi-structured data integration Ontology-based data integration Further examples of DL constructs Part 3: Beyond relational data Disjunction U: Doctor Lawyer Qualified existential restriction E: haschild.doctor Full negation C: (Doctor Lawyer) Number restrictions N : ( 2 haschild) ( 1 sibling) Qualified number restrictions Q: ( 2 haschild. Doctor) Inverse role I: haschild.doctor Reflexive-transitive role closure reg : haschild.doctor M. Lenzerini A tutorial on Data Integration 121 / 132
146 Semi-structured data integration Ontology-based data integration Structural properties vs. asserted properties Part 3: Beyond relational data We have seen how to build complex concept and roles expressions, which allow one to denote classes with a complex structure. However, in order to represent real world domains, one needs the ability to assert properties of classes and relationships between them (e.g., as done in UML class diagrams). The assertion of properties is done in DLs by means of an ontology. M. Lenzerini A tutorial on Data Integration 122 / 132
147 Semi-structured data integration Ontology-based data integration Description Logics ontology Part 3: Beyond relational data Is a pair O = T, A, where T is a TBox and A is an ABox: The TBox consists of a set of assertions on concepts and roles: Inclusion assertions on concepts: C 1 C 2 Inclusion assertions on roles: R 1 R 2 Property assertions on (atomic) roles: (transitive P ) (symmetric P ) (domain P C) (functional P ) (reflexive P ) (range P C) The ABox consists of a set of membership assertions on individuals: for concepts: A(c) for roles: P (c 1, c 2 ) (we use c i to denote individuals) M. Lenzerini A tutorial on Data Integration 123 / 132
148 Semi-structured data integration Ontology-based data integration Description Logics ontology Part 3: Beyond relational data Is a pair O = T, A, where T is a TBox and A is an ABox: The TBox consists of a set of assertions on concepts and roles: Inclusion assertions on concepts: C 1 C 2 Inclusion assertions on roles: R 1 R 2 Property assertions on (atomic) roles: (transitive P ) (symmetric P ) (domain P C) (functional P ) (reflexive P ) (range P C) The ABox consists of a set of membership assertions on individuals: for concepts: A(c) for roles: P (c 1, c 2 ) (we use c i to denote individuals) M. Lenzerini A tutorial on Data Integration 123 / 132
149 Semi-structured data integration Ontology-based data integration Description Logics ontology Example Part 3: Beyond relational data Note: We use C 1 C 2 as an abbreviation for C 1 C 2, C 2 C 1. TBox assertions: Inclusion assertions on concepts: Father Human Male haschild HappyFather Father haschild.(doctor Lawyer Happy) HappyAnc descendant.happyfather Teacher Doctor Lawyer Inclusion assertions on roles: haschild descendant hasfather haschild Property assertions on roles: (transitive descendant), (reflexive descendant), (functional hasfather) ABox membership assertions: Teacher(mary), hasfather(mary, john), HappyAnc(john) M. Lenzerini A tutorial on Data Integration 124 / 132
150 Semi-structured data integration Ontology-based data integration Description Logics ontology Example Part 3: Beyond relational data Note: We use C 1 C 2 as an abbreviation for C 1 C 2, C 2 C 1. TBox assertions: Inclusion assertions on concepts: Father Human Male haschild HappyFather Father haschild.(doctor Lawyer Happy) HappyAnc descendant.happyfather Teacher Doctor Lawyer Inclusion assertions on roles: haschild descendant hasfather haschild Property assertions on roles: (transitive descendant), (reflexive descendant), (functional hasfather) ABox membership assertions: Teacher(mary), hasfather(mary, john), HappyAnc(john) M. Lenzerini A tutorial on Data Integration 124 / 132
151 Semi-structured data integration Ontology-based data integration Semantics of a Description Logics ontology Part 3: Beyond relational data The semantics is given by specifying when an interpretation I satisfies an assertion: C 1 C 2 is satisfied by I if C I 1 CI 2. R 1 R 2 is satisfied by I if R I 1 RI 2. A property assertion (prop P ) is satisfied by I if P I is a relation that has the property prop. A(c) is satisfied by I if c I A I. P (c 1, c 2 ) is satisfied by I if (c I 1, ci 2 ) P I. This leads to the notion of model of a DL ontology. An interpretation I is a model of O = T, A if it satisfies all assertions in T and all assertions in A. M. Lenzerini A tutorial on Data Integration 125 / 132
152 Semi-structured data integration Ontology-based data integration Semantics of a Description Logics ontology Part 3: Beyond relational data The semantics is given by specifying when an interpretation I satisfies an assertion: C 1 C 2 is satisfied by I if C I 1 CI 2. R 1 R 2 is satisfied by I if R I 1 RI 2. A property assertion (prop P ) is satisfied by I if P I is a relation that has the property prop. A(c) is satisfied by I if c I A I. P (c 1, c 2 ) is satisfied by I if (c I 1, ci 2 ) P I. This leads to the notion of model of a DL ontology. An interpretation I is a model of O = T, A if it satisfies all assertions in T and all assertions in A. M. Lenzerini A tutorial on Data Integration 125 / 132
153 Semi-structured data integration Ontology-based data integration Example Part 3: Beyond relational data boss 1..* 1..1 AreaManager Employee empcode: Integer salary: Integer {disjoint, complete} Manager 3..* TopManager 1..* Project projectname: String worksfor 1..1 manages 1..1 Note: Domain and range of associations are expressed by means of concept inclusions. Manager Employee AreaManager Manager TopManager Manager Manager AreaManager TopManager AreaManager TopManager Employee salary salary Integer worksfor Employee worksfor Project Employee worksfor Project ( 3 worksfor ) (funct manages) (funct manages ) manages worksfor M. Lenzerini A tutorial on Data Integration 126 / 132
154 Semi-structured data integration Ontology-based data integration Example Part 3: Beyond relational data boss 1..* 1..1 AreaManager Employee empcode: Integer salary: Integer {disjoint, complete} Manager 3..* TopManager 1..* Project projectname: String worksfor 1..1 manages 1..1 Note: Domain and range of associations are expressed by means of concept inclusions. Manager Employee AreaManager Manager TopManager Manager Manager AreaManager TopManager AreaManager TopManager Employee salary salary Integer worksfor Employee worksfor Project Employee worksfor Project ( 3 worksfor ) (funct manages) (funct manages ) manages worksfor M. Lenzerini A tutorial on Data Integration 126 / 132
155 Semi-structured data integration Ontology-based data integration Example Part 3: Beyond relational data boss 1..* 1..1 AreaManager Employee empcode: Integer salary: Integer {disjoint, complete} Manager 3..* TopManager 1..* Project projectname: String worksfor 1..1 manages 1..1 Note: Domain and range of associations are expressed by means of concept inclusions. Manager Employee AreaManager Manager TopManager Manager Manager AreaManager TopManager AreaManager TopManager Employee salary salary Integer worksfor Employee worksfor Project Employee worksfor Project ( 3 worksfor ) (funct manages) (funct manages ) manages worksfor M. Lenzerini A tutorial on Data Integration 126 / 132
156 Semi-structured data integration Ontology-based data integration Example Part 3: Beyond relational data boss 1..* 1..1 AreaManager Employee empcode: Integer salary: Integer {disjoint, complete} Manager 3..* TopManager 1..* Project projectname: String worksfor 1..1 manages 1..1 Note: Domain and range of associations are expressed by means of concept inclusions. Manager Employee AreaManager Manager TopManager Manager Manager AreaManager TopManager AreaManager TopManager Employee salary salary Integer worksfor Employee worksfor Project Employee worksfor Project ( 3 worksfor ) (funct manages) (funct manages ) manages worksfor M. Lenzerini A tutorial on Data Integration 126 / 132
157 Semi-structured data integration Ontology-based data integration Example Part 3: Beyond relational data boss 1..* 1..1 AreaManager Employee empcode: Integer salary: Integer {disjoint, complete} Manager 3..* TopManager 1..* Project projectname: String worksfor 1..1 manages 1..1 Note: Domain and range of associations are expressed by means of concept inclusions. Manager Employee AreaManager Manager TopManager Manager Manager AreaManager TopManager AreaManager TopManager Employee salary salary Integer worksfor Employee worksfor Project Employee worksfor Project ( 3 worksfor ) (funct manages) (funct manages ) manages worksfor M. Lenzerini A tutorial on Data Integration 126 / 132
158 Semi-structured data integration Ontology-based data integration Example Part 3: Beyond relational data boss 1..* 1..1 AreaManager Employee empcode: Integer salary: Integer {disjoint, complete} Manager 3..* TopManager 1..* Project projectname: String worksfor 1..1 manages 1..1 Note: Domain and range of associations are expressed by means of concept inclusions. Manager Employee AreaManager Manager TopManager Manager Manager AreaManager TopManager AreaManager TopManager Employee salary salary Integer worksfor Employee worksfor Project Employee worksfor Project ( 3 worksfor ) (funct manages) (funct manages ) manages worksfor M. Lenzerini A tutorial on Data Integration 126 / 132
159 Semi-structured data integration Ontology-based data integration TBox reasoning Part 3: Beyond relational data Concept Satisfiability: C is satisfiable wrt T, if C I is not empty for some model I of T. Subsumption: C 1 is subsumed by C 2 wrt T, if C I 1 CI 2 for every model I of T. Equivalence: C 1 and C 2 are equivalent wrt T, if C1 I = CI 2 T. for every model I of Disjointness: C 1 and C 2 are disjoint wrt T, if C1 I CI 2 = for every model I of T. Analogous definitions hold for role satisfiability, subsumption, equivalence, and disjointness. M. Lenzerini A tutorial on Data Integration 127 / 132
160 Semi-structured data integration Ontology-based data integration Reasoning over an ontology Part 3: Beyond relational data Ontology Satisfiability: Verify whether an ontology O is satisfiable, i.e., whether O admits at least one model. Concept Instance Checking: Verify whether an individual c is an instance of a concept C in every model of O. Role Instance Checking: Verify whether a pair (c 1, c 2 ) of individuals is an instance of a role R in every model of O. Query Answering: see later... M. Lenzerini A tutorial on Data Integration 128 / 132
161 Semi-structured data integration Ontology-based data integration Reasoning in Description Logics Example Part 3: Beyond relational data TBox: Inclusion assertions on concepts: Father Human Male haschild HappyFather Father haschild.(doctor Lawyer Happy) HappyAnc descendant.happyfather Teacher Doctor Lawyer Inclusion assertions on roles: haschild descendant hasfather haschild Property assertions on roles: (transitive descendant), (reflexive descendant), (functional hasfather) The above TBox logically implies: HappyAncestor Father. ABox: Teacher(mary), hasfather(mary, john), HappyAnc(john) The above TBox and ABox logically imply: Happy(mary) M. Lenzerini A tutorial on Data Integration 129 / 132
162 Semi-structured data integration Ontology-based data integration Reasoning in Description Logics Example Part 3: Beyond relational data TBox: Inclusion assertions on concepts: Father Human Male haschild HappyFather Father haschild.(doctor Lawyer Happy) HappyAnc descendant.happyfather Teacher Doctor Lawyer Inclusion assertions on roles: haschild descendant hasfather haschild Property assertions on roles: (transitive descendant), (reflexive descendant), (functional hasfather) The above TBox logically implies: HappyAncestor Father. ABox: Teacher(mary), hasfather(mary, john), HappyAnc(john) The above TBox and ABox logically imply: Happy(mary) M. Lenzerini A tutorial on Data Integration 129 / 132
163 Semi-structured data integration Ontology-based data integration Reasoning in Description Logics Example Part 3: Beyond relational data TBox: Inclusion assertions on concepts: Father Human Male haschild HappyFather Father haschild.(doctor Lawyer Happy) HappyAnc descendant.happyfather Teacher Doctor Lawyer Inclusion assertions on roles: haschild descendant hasfather haschild Property assertions on roles: (transitive descendant), (reflexive descendant), (functional hasfather) The above TBox logically implies: HappyAncestor Father. ABox: Teacher(mary), hasfather(mary, john), HappyAnc(john) The above TBox and ABox logically imply: Happy(mary) M. Lenzerini A tutorial on Data Integration 129 / 132
164 Semi-structured data integration Ontology-based data integration Reasoning in Description Logics Example Part 3: Beyond relational data TBox: Inclusion assertions on concepts: Father Human Male haschild HappyFather Father haschild.(doctor Lawyer Happy) HappyAnc descendant.happyfather Teacher Doctor Lawyer Inclusion assertions on roles: haschild descendant hasfather haschild Property assertions on roles: (transitive descendant), (reflexive descendant), (functional hasfather) The above TBox logically implies: HappyAncestor Father. ABox: Teacher(mary), hasfather(mary, john), HappyAnc(john) The above TBox and ABox logically imply: Happy(mary) M. Lenzerini A tutorial on Data Integration 129 / 132
165 Semi-structured data integration Ontology-based data integration Complexity of reasoning over DL ontologies Part 3: Beyond relational data TBox reasoning over DL ontologies is in general complex: TBox reasoning over ontologies in virtually all traditional DLs is ExpTime-hard Stays in ExpTime even in the most expressive DLs (except when using nominals, i.e., ObjectOneOf). There are TBox reasoners that perform reasonably well in practice for such DLs (e.g, Racer, Pellet, Fact++,... ) M. Lenzerini A tutorial on Data Integration 130 / 132
166 Semi-structured data integration Ontology-based data integration Queries over Description Logics ontologies Part 3: Beyond relational data If we want to use ontologies as global schemas in data integration, we have to allow for queries expressed over a DL ontology A conjunctive query q( x) over an ontology O = T, A has the form q( x) y. conj ( x, y) where conj ( x, y) is a conjunction of atoms which has as predicate symbol an atomic concept or role of T, and may use variables and constants that are individuals in A The certain answers to q( x) over O = T, A, denoted cert(q, O) are the tuples c of constants such that c q I, for every model I of O. DLs must be restricted considerably if we want tractable conjunctive query answering (even when the complexity is measured wrt the size of the ABox only) M. Lenzerini A tutorial on Data Integration 131 / 132
167 Semi-structured data integration Ontology-based data integration Related talks at DEIS 10 Part 3: Beyond relational data Talk 2 Piotr Wieczorek, Query answering in data integration Talk 7 Slawomir Staworko, Consistent query answering Talk 8 Yazmin A. Ibanez, Description logics for data integration Talk 9 Ekaterini Ioannou, Data cleaning for data integration Talk 10 Armin Roth, Peer data management systems Talk 11 Sebastian Skritek, Theory of Peer Data Management Talk 14 Lucja Kot, XML data integration Talk 18 Paolo Guagliardo View-based query processing Talk 22 Marie Jacob, Learning and discovering queries and mappings M. Lenzerini A tutorial on Data Integration 132 / 132
Data Integration. Maurizio Lenzerini. Universitá di Roma La Sapienza
Data Integration Maurizio Lenzerini Universitá di Roma La Sapienza DASI 06: Phd School on Data and Service Integration Bertinoro, December 11 15, 2006 M. Lenzerini Data Integration DASI 06 1 / 213 Structure
Query Processing in Data Integration Systems
Query Processing in Data Integration Systems Diego Calvanese Free University of Bozen-Bolzano BIT PhD Summer School Bressanone July 3 7, 2006 D. Calvanese Data Integration BIT PhD Summer School 1 / 152
Data Integration: A Theoretical Perspective
Data Integration: A Theoretical Perspective Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Via Salaria 113, I 00198 Roma, Italy [email protected] ABSTRACT
Chapter 17 Using OWL in Data Integration
Chapter 17 Using OWL in Data Integration Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, and Marco Ruzzi Abstract One of the outcomes of the research work carried
Structured and Semi-Structured Data Integration
UNIVERSITÀ DEGLI STUDI DI ROMA LA SAPIENZA DOTTORATO DI RICERCA IN INGEGNERIA INFORMATICA XIX CICLO 2006 UNIVERSITÉ DE PARIS SUD DOCTORAT DE RECHERCHE EN INFORMATIQUE Structured and Semi-Structured Data
Enterprise Modeling and Data Warehousing in Telecom Italia
Enterprise Modeling and Data Warehousing in Telecom Italia Diego Calvanese Faculty of Computer Science Free University of Bolzano/Bozen Piazza Domenicani 3 I-39100 Bolzano-Bozen BZ, Italy Luigi Dragone,
Repair Checking in Inconsistent Databases: Algorithms and Complexity
Repair Checking in Inconsistent Databases: Algorithms and Complexity Foto Afrati 1 Phokion G. Kolaitis 2 1 National Technical University of Athens 2 UC Santa Cruz and IBM Almaden Research Center Oxford,
XML Data Integration
XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell University) XML Data Integration 11 November 2010 1 / 42 Introduction Data Integration and Query Answering A data integration
Robust Module-based Data Management
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. V, NO. N, MONTH YEAR 1 Robust Module-based Data Management François Goasdoué, LRI, Univ. Paris-Sud, and Marie-Christine Rousset, LIG, Univ. Grenoble
On the Decidability and Complexity of Query Answering over Inconsistent and Incomplete Databases
On the Decidability and Complexity of Query Answering over Inconsistent and Incomplete Databases Andrea Calì Domenico Lembo Riccardo Rosati Dipartimento di Informatica e Sistemistica Università di Roma
Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange
Data Integration and Exchange L. Libkin 1 Data Integration and Exchange Traditional approach to databases A single large repository of data. Database administrator in charge of access to data. Users interact
A Framework and Architecture for Quality Assessment in Data Integration
A Framework and Architecture for Quality Assessment in Data Integration Jianing Wang March 2012 A Dissertation Submitted to Birkbeck College, University of London in Partial Fulfillment of the Requirements
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
Integrating and Exchanging XML Data using Ontologies
Integrating and Exchanging XML Data using Ontologies Huiyong Xiao and Isabel F. Cruz Department of Computer Science University of Illinois at Chicago {hxiao ifc}@cs.uic.edu Abstract. While providing a
CHAPTER 7 GENERAL PROOF SYSTEMS
CHAPTER 7 GENERAL PROOF SYSTEMS 1 Introduction Proof systems are built to prove statements. They can be thought as an inference machine with special statements, called provable statements, or sometimes
Peer Data Exchange. ACM Transactions on Database Systems, Vol. V, No. N, Month 20YY, Pages 1 0??.
Peer Data Exchange Ariel Fuxman 1 University of Toronto Phokion G. Kolaitis 2 IBM Almaden Research Center Renée J. Miller 1 University of Toronto and Wang-Chiew Tan 3 University of California, Santa Cruz
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Extended Abstract Ioanna Koffina 1, Giorgos Serfiotis 1, Vassilis Christophides 1, Val Tannen
Query Answering in Peer-to-Peer Data Exchange Systems
Query Answering in Peer-to-Peer Data Exchange Systems Leopoldo Bertossi and Loreto Bravo Carleton University, School of Computer Science, Ottawa, Canada {bertossi,lbravo}@scs.carleton.ca Abstract. The
Peer Data Management Systems Concepts and Approaches
Peer Data Management Systems Concepts and Approaches Armin Roth HPI, Potsdam, Germany Nov. 10, 2010 Armin Roth (HPI, Potsdam, Germany) Peer Data Management Systems Nov. 10, 2010 1 / 28 Agenda 1 Large-scale
Semantic Technologies for Data Integration using OWL2 QL
Semantic Technologies for Data Integration using OWL2 QL ESWC 2009 Tutorial Domenico Lembo Riccardo Rosati Dipartimento di Informatica e Sistemistica Sapienza Università di Roma, Italy 6th European Semantic
Logical and categorical methods in data transformation (TransLoCaTe)
Logical and categorical methods in data transformation (TransLoCaTe) 1 Introduction to the abbreviated project description This is an abbreviated project description of the TransLoCaTe project, with an
OWL based XML Data Integration
OWL based XML Data Integration Manjula Shenoy K Manipal University CSE MIT Manipal, India K.C.Shet, PhD. N.I.T.K. CSE, Suratkal Karnataka, India U. Dinesh Acharya, PhD. ManipalUniversity CSE MIT, Manipal,
Data Integration Using ID-Logic
Data Integration Using ID-Logic Bert Van Nuffelen 1, Alvaro Cortés-Calabuig 1, Marc Denecker 1, Ofer Arieli 2, and Maurice Bruynooghe 1 1 Department of Computer Science, Katholieke Universiteit Leuven,
Data Quality in Information Integration and Business Intelligence
Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies
Virtual Data Integration
Virtual Data Integration Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada www.scs.carleton.ca/ bertossi [email protected] Chapter 1: Introduction and Issues 3 Data
Data exchange. L. Libkin 1 Data Integration and Exchange
Data exchange Source schema, target schema; need to transfer data between them. A typical scenario: Two organizations have their legacy databases, schemas cannot be changed. Data from one organization
Completing Description Logic Knowledge Bases using Formal Concept Analysis
Completing Description Logic Knowledge Bases using Formal Concept Analysis Franz Baader, 1 Bernhard Ganter, 1 Barış Sertkaya, 1 and Ulrike Sattler 2 1 TU Dresden, Germany and 2 The University of Manchester,
Formalization of the CRM: Initial Thoughts
Formalization of the CRM: Initial Thoughts Carlo Meghini Istituto di Scienza e Tecnologie della Informazione Consiglio Nazionale delle Ricerche Pisa CRM SIG Meeting Iraklio, October 1st, 2014 Outline Overture:
On Rewriting and Answering Queries in OBDA Systems for Big Data (Short Paper)
On Rewriting and Answering ueries in OBDA Systems for Big Data (Short Paper) Diego Calvanese 2, Ian Horrocks 3, Ernesto Jimenez-Ruiz 3, Evgeny Kharlamov 3, Michael Meier 1 Mariano Rodriguez-Muro 2, Dmitriy
Scalable Automated Symbolic Analysis of Administrative Role-Based Access Control Policies by SMT solving
Scalable Automated Symbolic Analysis of Administrative Role-Based Access Control Policies by SMT solving Alessandro Armando 1,2 and Silvio Ranise 2, 1 DIST, Università degli Studi di Genova, Italia 2 Security
A first step towards modeling semistructured data in hybrid multimodal logic
A first step towards modeling semistructured data in hybrid multimodal logic Nicole Bidoit * Serenella Cerrito ** Virginie Thion * * LRI UMR CNRS 8623, Université Paris 11, Centre d Orsay. ** LaMI UMR
An introduction to data integration. Prof. Letizia Tanca Technologies for Information Systems
An introduction to data integration Prof. Letizia Tanca Technologies for Information Systems 1 Motivation In modern Information Systems there is a growing quest for achieving integration of SW applications,
Schema Mediation in Peer Data Management Systems
Schema Mediation in Peer Data Management Systems Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov University of Washington Seattle, WA, USA 98195-2350 {alon,zives,suciu,igor}@cs.washington.edu Abstract
Fixed-Point Logics and Computation
1 Fixed-Point Logics and Computation Symposium on the Unusual Effectiveness of Logic in Computer Science University of Cambridge 2 Mathematical Logic Mathematical logic seeks to formalise the process of
! " # The Logic of Descriptions. Logics for Data and Knowledge Representation. Terminology. Overview. Three Basic Features. Some History on DLs
,!0((,.+#$),%$(-&.& *,2(-$)%&2.'3&%!&, Logics for Data and Knowledge Representation Alessandro Agostini [email protected] University of Trento Fausto Giunchiglia [email protected] The Logic of Descriptions!$%&'()*$#)
Algorithmic Software Verification
Algorithmic Software Verification (LTL Model Checking) Azadeh Farzan What is Verification Anyway? Proving (in a formal way) that program satisfies a specification written in a logical language. Formal
Schema Mappings and Data Exchange
Schema Mappings and Data Exchange Phokion G. Kolaitis University of California, Santa Cruz & IBM Research-Almaden EASSLC 2012 Southwest University August 2012 1 Logic and Databases Extensive interaction
Advanced Relational Database Design
APPENDIX B Advanced Relational Database Design In this appendix we cover advanced topics in relational database design. We first present the theory of multivalued dependencies, including a set of sound
ON FUNCTIONAL SYMBOL-FREE LOGIC PROGRAMS
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Mathematical Sciences 2012 1 p. 43 48 ON FUNCTIONAL SYMBOL-FREE LOGIC PROGRAMS I nf or m at i cs L. A. HAYKAZYAN * Chair of Programming and Information
Static Program Transformations for Efficient Software Model Checking
Static Program Transformations for Efficient Software Model Checking Shobha Vasudevan Jacob Abraham The University of Texas at Austin Dependable Systems Large and complex systems Software faults are major
Designing and Refining Schema Mappings via Data Examples
Designing and Refining Schema Mappings via Data Examples Bogdan Alexe UCSC [email protected] Balder ten Cate UCSC [email protected] Phokion G. Kolaitis UCSC & IBM Research - Almaden [email protected]
Monitoring Metric First-order Temporal Properties
Monitoring Metric First-order Temporal Properties DAVID BASIN, FELIX KLAEDTKE, SAMUEL MÜLLER, and EUGEN ZĂLINESCU, ETH Zurich Runtime monitoring is a general approach to verifying system properties at
SOLUTIONS TO ASSIGNMENT 1 MATH 576
SOLUTIONS TO ASSIGNMENT 1 MATH 576 SOLUTIONS BY OLIVIER MARTIN 13 #5. Let T be the topology generated by A on X. We want to show T = J B J where B is the set of all topologies J on X with A J. This amounts
Logic Programs for Consistently Querying Data Integration Systems
IJCAI 03 1 Acapulco, August 2003 Logic Programs for Consistently Querying Data Integration Systems Loreto Bravo Computer Science Department Pontificia Universidad Católica de Chile [email protected] Joint
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Logic in general. Inference rules and theorem proving
Logical Agents Knowledge-based agents Logic in general Propositional logic Inference rules and theorem proving First order logic Knowledge-based agents Inference engine Knowledge base Domain-independent
Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.
DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It
Computational Methods for Database Repair by Signed Formulae
Computational Methods for Database Repair by Signed Formulae Ofer Arieli ([email protected]) Department of Computer Science, The Academic College of Tel-Aviv, 4 Antokolski street, Tel-Aviv 61161, Israel.
(LMCS, p. 317) V.1. First Order Logic. This is the most powerful, most expressive logic that we will examine.
(LMCS, p. 317) V.1 First Order Logic This is the most powerful, most expressive logic that we will examine. Our version of first-order logic will use the following symbols: variables connectives (,,,,
Aikaterini Marazopoulou
Imperial College London Department of Computing Tableau Compiled Labelled Deductive Systems with an application to Description Logics by Aikaterini Marazopoulou Submitted in partial fulfilment of the requirements
Constraint-Based XML Query Rewriting for Data Integration
Constraint-Based XML Query Rewriting for Data Integration Cong Yu Department of EECS, University of Michigan [email protected] Lucian Popa IBM Almaden Research Center [email protected] ABSTRACT
A Secure Mediator for Integrating Multiple Level Access Control Policies
A Secure Mediator for Integrating Multiple Level Access Control Policies Isabel F. Cruz Rigel Gjomemo Mirko Orsini ADVIS Lab Department of Computer Science University of Illinois at Chicago {ifc rgjomemo
SPARQL: Un Lenguaje de Consulta para la Web
SPARQL: Un Lenguaje de Consulta para la Web Semántica Marcelo Arenas Pontificia Universidad Católica de Chile y Centro de Investigación de la Web M. Arenas SPARQL: Un Lenguaje de Consulta para la Web Semántica
Software Modeling and Verification
Software Modeling and Verification Alessandro Aldini DiSBeF - Sezione STI University of Urbino Carlo Bo Italy 3-4 February 2015 Algorithmic verification Correctness problem Is the software/hardware system
Integrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
Principles of Distributed Database Systems
M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition
