Data Integration. Maurizio Lenzerini. Universitá di Roma La Sapienza
|
|
|
- Aileen Stanley
- 9 years ago
- Views:
Transcription
1 Data Integration Maurizio Lenzerini Universitá di Roma La Sapienza DASI 06: Phd School on Data and Service Integration Bertinoro, December 11 15, 2006 M. Lenzerini Data Integration DASI 06 1 / 213
2 Structure of the course 1 Introduction to data integration Basic issues in data integration Logical formalization 2 Query answering in the absence of constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting 3 Query answering in the presence of constraints The role of integrity constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting Inconsistency tolerance 4 Concluding remarks M. Lenzerini Data Integration DASI 06 2 / 213
3 Structure of the course 1 Introduction to data integration Basic issues in data integration Logical formalization 2 Query answering in the absence of constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting 3 Query answering in the presence of constraints The role of integrity constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting Inconsistency tolerance 4 Concluding remarks M. Lenzerini Data Integration DASI 06 2 / 213
4 Structure of the course 1 Introduction to data integration Basic issues in data integration Logical formalization 2 Query answering in the absence of constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting 3 Query answering in the presence of constraints The role of integrity constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting Inconsistency tolerance 4 Concluding remarks M. Lenzerini Data Integration DASI 06 2 / 213
5 Structure of the course 1 Introduction to data integration Basic issues in data integration Logical formalization 2 Query answering in the absence of constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting 3 Query answering in the presence of constraints The role of integrity constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting Inconsistency tolerance 4 Concluding remarks M. Lenzerini Data Integration DASI 06 2 / 213
6 Basic issues in data integration Data integration: Logical formalization Part 1: Introduction to data integration Part I Introduction to data integration M. Lenzerini Data Integration DASI 06 3 / 213
7 Basic issues in data integration Data integration: Logical formalization Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI 06 4 / 213
8 Basic issues in data integration Data integration: Logical formalization Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI 06 5 / 213
9 Basic issues in data integration Data integration: Logical formalization The problem of data integration Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI 06 6 / 213
10 Basic issues in data integration Data integration: Logical formalization The problem of data integration What is data integration? Part 1: Introduction to data integration Data integration is the problem of providing unified and transparent access to a collection of data stored in multiple, autonomous, and heterogeneous data sources Answer(Q) Query Global Schema Sources M. Lenzerini Data Integration DASI 06 7 / 213
11 Basic issues in data integration Data integration: Logical formalization The problem of data integration Part 1: Introduction to data integration Conceptual architecture of a data integration system Query Global Schema A R B C S D T E Mapping U V W u 1 v 1 w 1 u 2 v 2 w 2 Source Schema 1 Source Schema 2 Source 1 Source 2 M. Lenzerini Data Integration DASI 06 8 / 213
12 Basic issues in data integration Data integration: Logical formalization The problem of data integration Relevance of data integration Part 1: Introduction to data integration Growing market One of the major challenges for the future of IT At least two contexts Intra-organization data integration (e.g., EIS) Inter-organization data integration (e.g., integration on the Web) M. Lenzerini Data Integration DASI 06 9 / 213
13 Basic issues in data integration Data integration: Logical formalization The problem of data integration Data integration: Available industrial efforts Part 1: Introduction to data integration Distributed database systems Information on demand Tools for source wrapping Tools based on database federation, e.g., DB2 Information Integrator Distributed query optimization M. Lenzerini Data Integration DASI / 213
14 Basic issues in data integration Data integration: Logical formalization Variants of data integration Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
15 Basic issues in data integration Data integration: Logical formalization Variants of data integration Part 1: Introduction to data integration Architectures for integrated access to distributed data Distributed databases data sources are homogeneous databases under the control of the distributed database management system Multidatabase or federated databases data sources are autonomous, heterogeneous databases; procedural specification (Mediator-based) data integration access through a global schema mapped to autonomous and heterogeneous data sources; declarative specification Peer-to-peer data integration network of autonomous systems mapped one to each other, without a global schema; declarative specification M. Lenzerini Data Integration DASI / 213
16 Basic issues in data integration Data integration: Logical formalization Variants of data integration Database federation tools: Characteristics Part 1: Introduction to data integration Physical transparency, i.e., masking from the user the physical characteristics of the sources Heterogeinity, i.e., federating highly diverse types of sources Extensibility Autonomy of data sources Performance, through distributed query optimization However, current tools do not (directly) support logical (or conceptual) transparency M. Lenzerini Data Integration DASI / 213
17 Basic issues in data integration Data integration: Logical formalization Variants of data integration Database federation tools: Characteristics Part 1: Introduction to data integration Physical transparency, i.e., masking from the user the physical characteristics of the sources Heterogeinity, i.e., federating highly diverse types of sources Extensibility Autonomy of data sources Performance, through distributed query optimization However, current tools do not (directly) support logical (or conceptual) transparency M. Lenzerini Data Integration DASI / 213
18 Basic issues in data integration Data integration: Logical formalization Variants of data integration DB2 Information integrator Part 1: Introduction to data integration DB2 Database Server Local SQL Engine Wrapper... Wrapper Server Server... Server Server Nickname Nickname Nickname Server Local Tables External Source External Source External Source M. Lenzerini Data Integration DASI / 213
19 Basic issues in data integration Data integration: Logical formalization Variants of data integration Logical transparency Part 1: Introduction to data integration Basic ingredients for achieving logical transparency: The global schema (ontology) provides a conceptual view that is independent from the sources The global schema is described with a semantically rich formalism The mappings are the crucial tools for realizing the independence of the global schema from the sources Obviously, the formalism for specifying the mapping is also a crucial point All the above aspects are not appropriately dealt with by current tools. This means that data integration cannot be simply addressed on a tool basis M. Lenzerini Data Integration DASI / 213
20 Basic issues in data integration Data integration: Logical formalization Variants of data integration Approaches to data integration Part 1: Introduction to data integration (Mediator-based) data integration Data exchange [Fagin & al. TCS 05, Kolaitis PODS 05] materialization of the global view allows for query answering without accessing the sources P2P data integration [Halevy & al. ICDE 03, Calvanese & al. PODS 04, Calvanese & al. DBPL 05] several peers each peer with local and external sources queries over one peer M. Lenzerini Data Integration DASI / 213
21 Basic issues in data integration Data integration: Logical formalization Variants of data integration Approaches to data integration Part 1: Introduction to data integration (Mediator-based) data integration... is the topic of this course Data exchange [Fagin & al. TCS 05, Kolaitis PODS 05] materialization of the global view allows for query answering without accessing the sources P2P data integration [Halevy & al. ICDE 03, Calvanese & al. PODS 04, Calvanese & al. DBPL 05] several peers each peer with local and external sources queries over one peer M. Lenzerini Data Integration DASI / 213
22 Basic issues in data integration Data integration: Logical formalization Variants of data integration Mediator based data integration Part 1: Introduction to data integration Queries are expressed over a global schema (a.k.a. mediated schema, enterprise model,... ) Data are stored in a set of sources Wrappers access the sources (provide a view in a uniform data model of the data stored in the sources) Mediators combine answers coming from wrappers and/or other mediators Answer(Q) Query Global Schema Sources M. Lenzerini Data Integration DASI / 213
23 Basic issues in data integration Data integration: Logical formalization Variants of data integration Data exchange Part 1: Introduction to data integration Materialization of the global schema Materialize Global Schema Sources M. Lenzerini Data Integration DASI / 213
24 Basic issues in data integration Data integration: Logical formalization Variants of data integration Peer-to-peer data integration Part 1: Introduction to data integration P 1 P 2 P 3 P 4 P 5 Peer Local source External source Peer schema Local mapping P2P mapping Operations: Answer(Q,P i ) Materialize(P i ) M. Lenzerini Data Integration DASI / 213
25 Basic issues in data integration Data integration: Logical formalization Problems in data integration Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
26 Basic issues in data integration Data integration: Logical formalization Problems in data integration Main problems in data integration Part 1: Introduction to data integration 1 How to construct the global schema 2 (Automatic) source wrapping 3 How to discover mappings between sources and global schema 4 Limitations in mechanisms for accessing sources 5 Data extraction, cleaning, and reconciliation 6 How to process updates expressed on the global schema and/or the sources ( read/write vs. read-only data integration) 7 How to model the global schema, the sources, and the mappings between the two 8 How to answer queries expressed on the global schema 9 How to optimize query answering M. Lenzerini Data Integration DASI / 213
27 Basic issues in data integration Data integration: Logical formalization Problems in data integration Main problems in data integration Part 1: Introduction to data integration 1 How to construct the global schema 2 (Automatic) source wrapping 3 How to discover mappings between sources and global schema 4 Limitations in mechanisms for accessing sources 5 Data extraction, cleaning, and reconciliation 6 How to process updates expressed on the global schema and/or the sources ( read/write vs. read-only data integration) 7 How to model the global schema, the sources, and the mappings between the two 8 How to answer queries expressed on the global schema 9 How to optimize query answering M. Lenzerini Data Integration DASI / 213
28 Basic issues in data integration Data integration: Logical formalization Problems in data integration The modeling problem Part 1: Introduction to data integration Basic questions: How to model the global schema data model constraints How to model the sources data model (conceptual and logical level) access limitations data values (common vs. different domains) How to model the mapping between global schemas and sources How to verify the quality of the modeling process A word of caution: Data modeling (in data integration) is an art. Theoretical frameworks can help humans, not replace them M. Lenzerini Data Integration DASI / 213
29 Basic issues in data integration Data integration: Logical formalization Problems in data integration The modeling problem Part 1: Introduction to data integration Basic questions: How to model the global schema data model constraints How to model the sources data model (conceptual and logical level) access limitations data values (common vs. different domains) How to model the mapping between global schemas and sources How to verify the quality of the modeling process A word of caution: Data modeling (in data integration) is an art. Theoretical frameworks can help humans, not replace them M. Lenzerini Data Integration DASI / 213
30 Basic issues in data integration Data integration: Logical formalization Problems in data integration The querying problem Part 1: Introduction to data integration A query expressed in terms of the global schema must be reformulated in terms of (a set of) queries over the sources and/or materialized views The computed sub-queries are shipped to the sources, and the results are collected and assembled into the final answer The computed query plan should guarantee completeness of the obtained answers wrt the semantics efficiency of the whole query answering process efficiency in accessing sources This process heavily depends on the approach adopted for modeling the data integration system This is the problem that we want to address in this course M. Lenzerini Data Integration DASI / 213
31 Basic issues in data integration Data integration: Logical formalization Problems in data integration The querying problem Part 1: Introduction to data integration A query expressed in terms of the global schema must be reformulated in terms of (a set of) queries over the sources and/or materialized views The computed sub-queries are shipped to the sources, and the results are collected and assembled into the final answer The computed query plan should guarantee completeness of the obtained answers wrt the semantics efficiency of the whole query answering process efficiency in accessing sources This process heavily depends on the approach adopted for modeling the data integration system This is the problem that we want to address in this course M. Lenzerini Data Integration DASI / 213
32 Basic issues in data integration Data integration: Logical formalization Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
33 Basic issues in data integration Data integration: Logical formalization Semantics of a data integration system Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
34 Basic issues in data integration Data integration: Logical formalization Semantics of a data integration system Formal framework for data integration Part 1: Introduction to data integration Definition A data integration system I is a triple G, S, M, where G is the global schema i.e., a logical theory over a relational alphabet A G S is the source schema i.e., simply a relational alphabet A S disjoint from A G M is the mapping between S and G We consider different approaches to the specification of mappings M. Lenzerini Data Integration DASI / 213
35 Basic issues in data integration Data integration: Logical formalization Semantics of a data integration system Semantics of a data integration system Part 1: Introduction to data integration Which are the dbs that satisfy I, i.e., the logical models of I? We refer only to dbs over a fixed infinite domain of elements We start from the data present in the sources: these are modeled through a source database C over (also called source model), fixing the extension of the predicates of A S The dbs for I are logical interpretations for A G, called global dbs Definition The set of databases for A G that satisfy I relative to C is: sem C (I) = { B B is a global database that is legal wrt G and that satisfies M wrt C } What it means to satisfy M wrt C depends on the nature of M M. Lenzerini Data Integration DASI / 213
36 Basic issues in data integration Data integration: Logical formalization Semantics of a data integration system Semantics of a data integration system Part 1: Introduction to data integration Which are the dbs that satisfy I, i.e., the logical models of I? We refer only to dbs over a fixed infinite domain of elements We start from the data present in the sources: these are modeled through a source database C over (also called source model), fixing the extension of the predicates of A S The dbs for I are logical interpretations for A G, called global dbs Definition The set of databases for A G that satisfy I relative to C is: sem C (I) = { B B is a global database that is legal wrt G and that satisfies M wrt C } What it means to satisfy M wrt C depends on the nature of M M. Lenzerini Data Integration DASI / 213
37 Basic issues in data integration Data integration: Logical formalization Relational calculus Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
38 Basic issues in data integration Data integration: Logical formalization Relational calculus Relational calculus: the basics Part 1: Introduction to data integration Basic idea: we use the language of first-order logic to express which tuples should be in the result to a query We assume to have a domain and a set Σ of constants, one for each element of Let A be a relational alphabet, i.e., a set of predicates, each with an associated arity (we assume a positional notation) A database D over A and is a set of relations, one for each predicate in A, over the constants in Σ (in turn interpreted as elements of ) Let L A be the first-order language over the constants in Σ the predicates of A plus the built-in predicates of relational algebra (e.g., <, >,... ) no function symbols M. Lenzerini Data Integration DASI / 213
39 Basic issues in data integration Data integration: Logical formalization Relational calculus Relational calculus: Syntax Part 1: Introduction to data integration Definition An (domain) relational calculus query over alphabet A has the form { (x 1,..., x n ) ϕ }, where n 0 is the arity of the query x 1,..., x n are (not necessarily distinct) variables ϕ is the body of the query, i.e., a formula of L A whose free variables are exactly x 1,..., x n (x 1,..., x n ) is called the target list of the query If r is a predicate of arity k, an atom with predicate r has the form r(y 1,..., y k ), where y 1,..., y k are variables or constants M. Lenzerini Data Integration DASI / 213
40 Basic issues in data integration Data integration: Logical formalization Relational calculus Relational calculus: Semantics Part 1: Introduction to data integration Relational calculus queries are evaluated on particular interpretations Definition A correct interpretation for relational calculus queries over A is a pair I =, D, where is a domain, and D is a database over A and Definition The value of a relational calculus query q = {(x 1,..., x n ) ϕ} in an interpretation I =, D is the set of tuples (c 1,..., c n ) of constants in Σ such that I, V = ϕ, where V is the variable assignment that assigns c i to x i When the domain is clear, we can omit it, and write directly D, V = ϕ, instead of, D, V = ϕ M. Lenzerini Data Integration DASI / 213
41 Basic issues in data integration Data integration: Logical formalization Relational calculus Relational calculus: Semantics Part 1: Introduction to data integration Relational calculus queries are evaluated on particular interpretations Definition A correct interpretation for relational calculus queries over A is a pair I =, D, where is a domain, and D is a database over A and Definition The value of a relational calculus query q = {(x 1,..., x n ) ϕ} in an interpretation I =, D is the set of tuples (c 1,..., c n ) of constants in Σ such that I, V = ϕ, where V is the variable assignment that assigns c i to x i When the domain is clear, we can omit it, and write directly D, V = ϕ, instead of, D, V = ϕ M. Lenzerini Data Integration DASI / 213
42 Basic issues in data integration Data integration: Logical formalization Relational calculus Relational calculus: Semantics Part 1: Introduction to data integration Relational calculus queries are evaluated on particular interpretations Definition A correct interpretation for relational calculus queries over A is a pair I =, D, where is a domain, and D is a database over A and Definition The value of a relational calculus query q = {(x 1,..., x n ) ϕ} in an interpretation I =, D is the set of tuples (c 1,..., c n ) of constants in Σ such that I, V = ϕ, where V is the variable assignment that assigns c i to x i When the domain is clear, we can omit it, and write directly D, V = ϕ, instead of, D, V = ϕ M. Lenzerini Data Integration DASI / 213
43 Basic issues in data integration Data integration: Logical formalization Relational calculus Result of relational calculus queries Part 1: Introduction to data integration Definition The result of the evaluation of a relational calculus query q = {(x 1,..., x n ) ϕ} on a database D over A and is the relation q D such that the arity of q D is n the extension of q D is the set of constants that constitute the value of the query q in the interpretation, D M. Lenzerini Data Integration DASI / 213
44 Basic issues in data integration Data integration: Logical formalization Relational calculus Conjunctive queries Part 1: Introduction to data integration are the most common kind of relational calculus queries also known as select-project-join SQL queries allow for easy optimization in relational DBMSs Definition A conjunctive query (CQ) is a relational calculus query of the form { ( x) y. r 1 ( x 1, y 1 ) r m ( x m, y m ) } where x is the union of the x i s, and y is the union of the y i s r 1,..., r m are relation symbols (not built-in predicates) We use the following abbreviation: { ( x) r 1 ( x 1, y 1 ),..., r m ( x m, y m ) } M. Lenzerini Data Integration DASI / 213
45 Basic issues in data integration Data integration: Logical formalization Relational calculus Conjunctive queries Part 1: Introduction to data integration are the most common kind of relational calculus queries also known as select-project-join SQL queries allow for easy optimization in relational DBMSs Definition A conjunctive query (CQ) is a relational calculus query of the form { ( x) y. r 1 ( x 1, y 1 ) r m ( x m, y m ) } where x is the union of the x i s, and y is the union of the y i s r 1,..., r m are relation symbols (not built-in predicates) We use the following abbreviation: { ( x) r 1 ( x 1, y 1 ),..., r m ( x m, y m ) } M. Lenzerini Data Integration DASI / 213
46 Basic issues in data integration Data integration: Logical formalization Relational calculus Complexity of relational calculus Part 1: Introduction to data integration We consider the complexity of the recognition problem, i.e., checking whether a tuple of constants is in the answer to a query: measured wrt the size of the database data complexity measured wrt the size of the query and the database combined complexity Complexity of relational calculus data complexity: polynomial, actually in LogSpace combined complexity: PSpace-complete Complexity of conjunctive queries data complexity: in LogSpace combined complexity: NP-complete M. Lenzerini Data Integration DASI / 213
47 Basic issues in data integration Data integration: Logical formalization Relational calculus Complexity of relational calculus Part 1: Introduction to data integration We consider the complexity of the recognition problem, i.e., checking whether a tuple of constants is in the answer to a query: measured wrt the size of the database data complexity measured wrt the size of the query and the database combined complexity Complexity of relational calculus data complexity: polynomial, actually in LogSpace combined complexity: PSpace-complete Complexity of conjunctive queries data complexity: in LogSpace combined complexity: NP-complete M. Lenzerini Data Integration DASI / 213
48 Basic issues in data integration Data integration: Logical formalization Relational calculus Complexity of relational calculus Part 1: Introduction to data integration We consider the complexity of the recognition problem, i.e., checking whether a tuple of constants is in the answer to a query: measured wrt the size of the database data complexity measured wrt the size of the query and the database combined complexity Complexity of relational calculus data complexity: polynomial, actually in LogSpace combined complexity: PSpace-complete Complexity of conjunctive queries data complexity: in LogSpace combined complexity: NP-complete M. Lenzerini Data Integration DASI / 213
49 Basic issues in data integration Data integration: Logical formalization Queries to a data integration system Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
50 Basic issues in data integration Data integration: Logical formalization Queries to a data integration system Queries to a data integration system I Part 1: Introduction to data integration The domain is fixed, and we do not distinguish an element of from the constant denoting it standard names Queries to I are relational calculus queries over the alphabet A G of the global schema When evaluating q over I, we have to consider that for a given source database C, there may be many global databases B in sem C (I) We consider those answers to q that hold for all global databases in sem C (I) certain answers M. Lenzerini Data Integration DASI / 213
51 Basic issues in data integration Data integration: Logical formalization Queries to a data integration system Semantics of queries to I Part 1: Introduction to data integration Definition Given q, I, and C, the set of certain answers to q wrt I and C is cert(q, I, C) = { (c 1,..., c n ) q B for all B sem C (I) } Query answering is logical implication Complexity is measured mainly wrt the size of the source db C, i.e., we consider data complexity We consider the problem of deciding whether c cert(q, I, C), for a given c M. Lenzerini Data Integration DASI / 213
52 Basic issues in data integration Data integration: Logical formalization Queries to a data integration system Part 1: Introduction to data integration Databases with incomplete information, or knowledge bases Traditional database: one model of a first-order theory Query answering means evaluating a formula in the model Database with incomplete information, or knowledge base: set of models (specified, for example, as a restricted first-order theory) Query answering means computing the tuples that satisfy the query in all the models in the set There is a strong connection between query answering in data integration and query answering in databases with incomplete information under constraints (or, query answering in knowledge bases) M. Lenzerini Data Integration DASI / 213
53 Basic issues in data integration Data integration: Logical formalization Queries to a data integration system Part 1: Introduction to data integration Databases with incomplete information, or knowledge bases Traditional database: one model of a first-order theory Query answering means evaluating a formula in the model Database with incomplete information, or knowledge base: set of models (specified, for example, as a restricted first-order theory) Query answering means computing the tuples that satisfy the query in all the models in the set There is a strong connection between query answering in data integration and query answering in databases with incomplete information under constraints (or, query answering in knowledge bases) M. Lenzerini Data Integration DASI / 213
54 Basic issues in data integration Data integration: Logical formalization Queries to a data integration system Query answering with incomplete information Part 1: Introduction to data integration [Reiter 84]: relational setting, databases with incomplete information modeled as a first order theory [Vardi 86]: relational setting, complexity of reasoning in closed world databases with unknown values Several approaches both from the DB and the KR community [van der Meyden 98]: survey on logical approaches to incomplete information in databases M. Lenzerini Data Integration DASI / 213
55 Basic issues in data integration Data integration: Logical formalization Formalizing the mapping Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
56 Basic issues in data integration Data integration: Logical formalization Formalizing the mapping The mapping Part 1: Introduction to data integration How is the mapping M between S and G specified? Are the sources defined in terms of the global schema? Approach called source-centric, or local-as-view, or LAV Is the global schema defined in terms of the sources? Approach called global-schema-centric, or global-as-view, or GAV A mixed approach? Approach called GLAV M. Lenzerini Data Integration DASI / 213
57 Basic issues in data integration Data integration: Logical formalization Formalizing the mapping GAV vs. LAV Example Part 1: Introduction to data integration Global schema: movie(title, Year, Director) european(director) review(title, Critique) Source 1: r 1 (Title, Year, Director) Source 2: r 2 (Title, Critique) since 1990 since 1960, european directors Query: Title and critique of movies in 1998 { (t, r) d. movie(t, 1998, d) review(t, r) }, abbreviated { (t, r) movie(t, 1998, d), review(t, r) } M. Lenzerini Data Integration DASI / 213
58 Basic issues in data integration Data integration: Logical formalization Formalizing GAV data integration systems Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
59 Basic issues in data integration Data integration: Logical formalization Formalizing GAV data integration systems Formalization of GAV Part 1: Introduction to data integration In GAV (with sound sources), the mapping M is a set of assertions: φ S g one for each element g in A G, with φ S a query over S of the arity of g Given a source db C, a db B for G satisfies M wrt C if for each g G: φ C S In other words, the assertion means gb x. φ S ( x) g( x) Given a source database, M provides direct information about which data satisfy the elements of the global schema Relations in G are views, and queries are expressed over the views. Thus, it seems that we can simply evaluate the query over the data satisfying the global relations (as if we had a single database at hand) M. Lenzerini Data Integration DASI / 213
60 Basic issues in data integration Data integration: Logical formalization Formalizing GAV data integration systems GAV Example Part 1: Introduction to data integration Global schema: movie(title, Year, Director) european(director) review(title, Critique) GAV: to each relation in the global schema, M associates a view over the sources: { (t, y, d) r 1 (t, y, d) } movie(t, y, d) { (d) r 1 (t, y, d) } european(d) { (t, r) r 2 (t, r) } review(t, r) Logical formalization: t, y, d. r 1 (t, y, d) movie(t, y, d) d. ( t, y. r 1 (t, y, d)) european(d) t, r. r 2 (t, r) review(t, r) M. Lenzerini Data Integration DASI / 213
61 Basic issues in data integration Data integration: Logical formalization Formalizing GAV data integration systems GAV Example of query processing Part 1: Introduction to data integration The query { (t, r) movie(t, 1998, d), review(t, r) } is processed by means of unfolding, i.e., by expanding each atom according to its associated definition in M, so as to come up with source relations In this case: { (t, r) movie(t, 1998, d), review(t, r) } unfolding { (t, r) r 1 (t, 1998, d), r 2 (t, r) } M. Lenzerini Data Integration DASI / 213
62 Basic issues in data integration Data integration: Logical formalization Formalizing GAV data integration systems GAV Example of constraints Part 1: Introduction to data integration Global schema containing constraints: movie(title, Year, Director) european(director) review(title, Critique) european movie 60s(Title, Year, Director) t, y, d. european movie 60s(t, y, d) movie(t, y, d) d. t, y. european movie 60s(t, y, d) european(d) GAV mappings: { (t, y, d) r 1 (t, y, d) } european movie 60s(t, y, d) { (d) r 1 (t, y, d) } european(d) { (t, r) r 2 (t, r) } review(t, r) M. Lenzerini Data Integration DASI / 213
63 Basic issues in data integration Data integration: Logical formalization Formalizing LAV data integration systems Outline Part 1: Introduction to data integration 1 Basic issues in data integration The problem of data integration Variants of data integration Problems in data integration 2 Data integration: Logical formalization Semantics of a data integration system Relational calculus Queries to a data integration system Formalizing the mapping Formalizing GAV data integration systems Formalizing LAV data integration systems M. Lenzerini Data Integration DASI / 213
64 Basic issues in data integration Data integration: Logical formalization Formalizing LAV data integration systems Formalization of LAV Part 1: Introduction to data integration In LAV (with sound sources), the mapping M is a set of assertions: s φ G one for each source element s in A S, with φ G a query over G Given source db C, a db B for G satisfies M wrt C if for each s S: s C φ B G In other words, the assertion means x. s( x) φ G ( x) The mapping M and the source database C do not provide direct information about which data satisfy the global schema Sources are views, and we have to answer queries on the basis of the available data in the views M. Lenzerini Data Integration DASI / 213
65 Basic issues in data integration Data integration: Logical formalization Formalizing LAV data integration systems LAV Example Part 1: Introduction to data integration Global schema: movie(title, Year, Director) european(director) review(title, Critique) LAV: to each source relation, M associates a view over the global schema: r 1 (t, y, d) { (t, y, d) movie(t, y, d), european(d), y 1960 } r 2 (t, r) { (t, r) movie(t, y, d), review(t, r), y 1990 } The query { (t, r) movie(t, 1998, d), review(t, r) } is processed by means of an inference mechanism that aims at re-expressing the atoms of the global schema in terms of atoms at the sources. In this case: { (t, r) r 2 (t, r), r 1 (t, 1998, d) } M. Lenzerini Data Integration DASI / 213
66 Basic issues in data integration Data integration: Logical formalization Formalizing LAV data integration systems GAV and LAV Comparison Part 1: Introduction to data integration GAV: (e.g., Carnot, SIMS, Tsimmis, IBIS, Momis, DisAtDis,... ) Quality depends on how well we have compiled the sources into the global schema through the mapping Whenever a source changes or a new one is added, the global schema needs to be reconsidered Query processing can be based on some sort of unfolding (query answering looks easier without constraints) LAV: (e.g., Information Manifold, DWQ, Picsel) Quality depends on how well we have characterized the sources High modularity and extensibility (if the global schema is well designed, when a source changes, only its definition is affected) Query processing needs reasoning (query answering complex) M. Lenzerini Data Integration DASI / 213
67 Basic issues in data integration Data integration: Logical formalization Formalizing LAV data integration systems Beyond GAV and LAV: GLAV Part 1: Introduction to data integration In GLAV (with sound sources), the mapping M is a set of assertions: φ S φ G with φ S a query over S, and φ G a query over G of the same arity as φ S Given source db C, a db B for G satisfies M wrt C if for each φ S φ G in M: φ C S In other words, the assertion means φb G x. φ S ( x) φ G ( x) As for LAV, the mapping M does not provide direct information about which data satisfy the global schema To answer a query q over G, we have to infer how to use M in order to access the source database C M. Lenzerini Data Integration DASI / 213
68 Basic issues in data integration Data integration: Logical formalization Formalizing LAV data integration systems GLAV Example Part 1: Introduction to data integration Global schema: work(person, Project), area(project, Field) Source 1: hasjob(person, Field) Source 2: teaches(professor, Course), in(course, Field) Source 3: get(researcher, Grant), for(grant, Project) GLAV mapping: {(r, f) hasjob(r, f)} {(r, f) work(r, p), area(p, f)} {(r, f) teaches(r, c), in(c, f)} {(r, f) work(r, p), area(p, f)} {(r, p) get(r, g), for(g, p)} {(r, f) work(r, p)} M. Lenzerini Data Integration DASI / 213
69 Basic issues in data integration Data integration: Logical formalization Formalizing LAV data integration systems GLAV A technical observation Part 1: Introduction to data integration In GLAV (with sound sources), the mapping M is constituted by a set of assertions: φ S φ G Each such assertion can be rewritten wlog by introducing a new predicate r (not to be used in the queries) of the same arity as the two queries and replace the assertion with the following two: φ S r r φ G In other words, we replace x. φ S ( x) φ G ( x) with x. φ S ( x) r( x) and x. r( x) φ G ( x) M. Lenzerini Data Integration DASI / 213
70 Query answering QA in GAV without constraints QA in (G)LAV without constraints Part 2: Query answering without constraints Part II Query answering in the absence of constraints M. Lenzerini Data Integration DASI / 213
71 Query answering QA in GAV without constraints QA in (G)LAV without constraints Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
72 Query answering QA in GAV without constraints QA in (G)LAV without constraints Part 2: Query answering without constraints Query answering in different approaches The problem of query answering comes in different forms, depending on several parameters: Global schema without constraints (i.e., empty theory) with constraints Mapping GAV LAV (or GLAV) Queries user queries queries in the mapping M. Lenzerini Data Integration DASI / 213
73 Query answering QA in GAV without constraints QA in (G)LAV without constraints Conjunctive queries Part 2: Query answering without constraints We recall the definition of a conjunctive query: Definition A conjunctive query (CQ) is a relational calculus query of the form { ( x) y. r 1 ( x 1, y 1 ) r m ( x m, y m ) } where x is the union of the x i s, and y is the union of the y i s r 1,..., r m are relation symbols (not built-in predicates) Unless otherwise specified, we consider conjunctive queries, both as user queries and as queries in the mapping M. Lenzerini Data Integration DASI / 213
74 Query answering QA in GAV without constraints QA in (G)LAV without constraints Incompleteness and inconsistency Part 2: Query answering without constraints Query answering heavily depends upon whether incompleteness/inconsistency shows up Constraints in G Type of mapping Incompleteness Inconsistency no GAV yes / no no no (G)LAV yes no yes GAV yes yes yes (G)LAV yes yes M. Lenzerini Data Integration DASI / 213
75 Query answering QA in GAV without constraints QA in (G)LAV without constraints Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
76 Query answering QA in GAV without constraints QA in (G)LAV without constraints Part 2: Query answering without constraints GAV data integration systems without constraints Constraints in G Type of mapping Incompleteness Inconsistency no GAV yes / no no no (G)LAV yes no yes GAV yes yes yes (G)LAV yes yes M. Lenzerini Data Integration DASI / 213
77 Query answering QA in GAV without constraints QA in (G)LAV without constraints Retrieved global database Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
78 Query answering QA in GAV without constraints QA in (G)LAV without constraints Retrieved global database GAV Retrieved global database Part 2: Query answering without constraints Definition Given a source database C, we call retrieved global database, denoted M(C), the global database obtained by applying the queries in the mapping, and transferring to the elements of G the corresponding retrieved tuples M. Lenzerini Data Integration DASI / 213
79 Query answering QA in GAV without constraints QA in (G)LAV without constraints Retrieved global database Part 2: Query answering without constraints GAV Example Consider I = G, S, M, with Global schema G: student(code, Name, City) university(code, Name) enrolled(scode, Ucode) Source schema S: relations s 1 (Scode, Sname, City, Age), s 2 (Ucode, Uname), s 3 (Scode, Ucode) Mapping M: { (c, n, ci) s 1 (c, n, ci, a) } student(c, n, ci) { (c, n) s 2 (c, n) } university(c, n) { (s, u) s 3 (s, u) } enrolled(s, u) M. Lenzerini Data Integration DASI / 213
80 Query answering QA in GAV without constraints QA in (G)LAV without constraints Retrieved global database GAV Example of retrieved global database Part 2: Query answering without constraints university student Code Name Code Name City AF bocconi 12 anne florence BN ucla 15 bill oslo s C 1 s C 2 12 anne florence 21 AF bocconi 15 bill oslo 24 BN ucla enrolled Scode Ucode 12 AF 16 BN s C 3 12 AF 16 BN Example of source database C and corresponding retrieved global database M(C) M. Lenzerini Data Integration DASI / 213
81 Query answering QA in GAV without constraints QA in (G)LAV without constraints Retrieved global database GAV Minimal model Part 2: Query answering without constraints GAV mapping assertions φ S g have the logical form: x. φ S ( x) g( x) where φ S is a conjunctive query over the source relations, and g is an element of G. In general, given a source database C, there are several databases legal wrt G that satisfy M wrt C. However, it is easy to see that M(C) is the intersection of all such databases, and therefore, is the only minimal model of I. M. Lenzerini Data Integration DASI / 213
82 Query answering QA in GAV without constraints QA in (G)LAV without constraints Retrieved global database GAV without constraints Part 2: Query answering without constraints M. Lenzerini Data Integration DASI / 213
83 Query answering QA in GAV without constraints QA in (G)LAV without constraints Query answering via unfolding Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
84 Query answering QA in GAV without constraints QA in (G)LAV without constraints Query answering via unfolding GAV Query answering via unfolding Part 2: Query answering without constraints The unfolding wrt M of a query q over G: is the query obtained from q by substituting every symbol g in q with the query φ S that M associates to g. We denote the unfolding of q wrt M with unf M (q) Observations: unf M (q) is a query over S Evaluating q over M(C) is equivalent to evaluating unf M (q) over C. i.e., t q M(C) iff t unf M (q) C If q is a conjunctive query, then t cert(q, I, C) iff t q M(C) Hence, t cert(q, I, C) iff t q M(C) iff t unf M (q) C Unfolding suffices for query answering in GAV without constraints Data complexity of query answering is polynomial (actually LogSpace ): the query unf M (q) is first-order (in fact conjunctive) M. Lenzerini Data Integration DASI / 213
85 Query answering QA in GAV without constraints QA in (G)LAV without constraints Query answering via unfolding GAV Example of unfolding Part 2: Query answering without constraints university Code Name AF bocconi BN ucla student Code Name City 12 anne florence 15 bill oslo s C 1 12 anne florence bill oslo 24 s C 2 AF BN bocconi ucla M. Lenzerini Data Integration DASI / 213
86 Query answering QA in GAV without constraints QA in (G)LAV without constraints Query answering via unfolding GAV Example of unfolding Part 2: Query answering without constraints university Code Name AF bocconi BN ucla student Code Name City 12 anne florence 15 bill oslo { x student(15, x, y) } s C 1 12 anne florence bill oslo 24 s C 2 AF BN bocconi ucla M. Lenzerini Data Integration DASI / 213
87 Query answering QA in GAV without constraints QA in (G)LAV without constraints Query answering via unfolding GAV Example of unfolding Part 2: Query answering without constraints university Code Name AF bocconi BN ucla student Code Name City 12 anne florence 15 bill oslo { x student(15, x, y) } unfolding s C 1 12 anne florence bill oslo 24 s C 2 AF BN bocconi ucla { x s 1 (15, x, y, z) } M. Lenzerini Data Integration DASI / 213
88 Query answering QA in GAV without constraints QA in (G)LAV without constraints Query answering via unfolding GAV More expressive queries? Part 2: Query answering without constraints Do the results extend to the case of more expressive queries? With more expressive queries in the mapping? Same results hold if we use any computable query in the mapping With more expressive user queries? Same results hold if we use Datalog queries as user queries Same results hold if we use union of conjunctive queries with inequalities as user queries [van der Meyden TCS 93] M. Lenzerini Data Integration DASI / 213
89 Query answering QA in GAV without constraints QA in (G)LAV without constraints Query answering via unfolding GAV More expressive queries? Part 2: Query answering without constraints Do the results extend to the case of more expressive queries? With more expressive queries in the mapping? Same results hold if we use any computable query in the mapping With more expressive user queries? Same results hold if we use Datalog queries as user queries Same results hold if we use union of conjunctive queries with inequalities as user queries [van der Meyden TCS 93] M. Lenzerini Data Integration DASI / 213
90 Query answering QA in GAV without constraints QA in (G)LAV without constraints Query answering via unfolding GAV More expressive queries? Part 2: Query answering without constraints Do the results extend to the case of more expressive queries? With more expressive queries in the mapping? Same results hold if we use any computable query in the mapping With more expressive user queries? Same results hold if we use Datalog queries as user queries Same results hold if we use union of conjunctive queries with inequalities as user queries [van der Meyden TCS 93] M. Lenzerini Data Integration DASI / 213
91 Query answering QA in GAV without constraints QA in (G)LAV without constraints Universal solutions Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
92 Query answering QA in GAV without constraints QA in (G)LAV without constraints Universal solutions Homomorphisms Part 2: Query answering without constraints Let D 1 and D 2 be two databases with values in Var Definition A homomorphism h : D 1 D 2 is a mapping from ( Var(D 1 )) to ( Var(D 2 )) such that 1 h(c) = c, for every constant c 2 each variable can be mapped to a constant or a variable 3 for every fact r i ( t) of D 1, we have that r i (h( t)) is a fact in D 2 Definition D 1 is homomorphically equivalent to D 2 if there is a homomorphism h : D 1 D 2 and a homomorphism h : D 2 D 1 M. Lenzerini Data Integration DASI / 213
93 Query answering QA in GAV without constraints QA in (G)LAV without constraints Universal solutions Part 2: Query answering without constraints Conjunctive query answering and homomorphisms Consider a conjunctive query q( x) = { ( x) y. r 1 ( x 1, y 1 ) r m ( x m, y m ) } For a tuple c of constants, let D c q be the set of facts (over constants and variables in y) obtained by replacing in r 1 ( x 1, y 1 ),..., r m ( x m, y m ) each x i with c i. Note that D c q can be viewed as a database with values in Var Theorem (Chandra & Merlin 77) c q D if and only if there exists a homomorphism h : D c q D Hence, homomorphisms preserve satisfaction of conjunctive queries: if there exists a homomorphism h : D D, then t q D implies t q D, for each conjunctive query q M. Lenzerini Data Integration DASI / 213
94 Query answering QA in GAV without constraints QA in (G)LAV without constraints Universal solutions Part 2: Query answering without constraints Conjunctive query answering and homomorphisms Consider a conjunctive query q( x) = { ( x) y. r 1 ( x 1, y 1 ) r m ( x m, y m ) } For a tuple c of constants, let D c q be the set of facts (over constants and variables in y) obtained by replacing in r 1 ( x 1, y 1 ),..., r m ( x m, y m ) each x i with c i. Note that D c q can be viewed as a database with values in Var Theorem (Chandra & Merlin 77) c q D if and only if there exists a homomorphism h : D c q D Hence, homomorphisms preserve satisfaction of conjunctive queries: if there exists a homomorphism h : D D, then t q D implies t q D, for each conjunctive query q M. Lenzerini Data Integration DASI / 213
95 Query answering QA in GAV without constraints QA in (G)LAV without constraints Universal solutions Part 2: Query answering without constraints Conjunctive query answering and homomorphisms Consider a conjunctive query q( x) = { ( x) y. r 1 ( x 1, y 1 ) r m ( x m, y m ) } For a tuple c of constants, let D c q be the set of facts (over constants and variables in y) obtained by replacing in r 1 ( x 1, y 1 ),..., r m ( x m, y m ) each x i with c i. Note that D c q can be viewed as a database with values in Var Theorem (Chandra & Merlin 77) c q D if and only if there exists a homomorphism h : D c q D Hence, homomorphisms preserve satisfaction of conjunctive queries: if there exists a homomorphism h : D D, then t q D implies t q D, for each conjunctive query q M. Lenzerini Data Integration DASI / 213
96 Query answering QA in GAV without constraints QA in (G)LAV without constraints Universal solutions GAV Universal solutions Part 2: Query answering without constraints Let I = G, S, M be a data integration system, and C a source db Definition A universal solution for I relative to C is a global db B that satisfies I relative to C such that, for every global db B that satisfies I relative to C, there exists a homomorphism h : B B (see [Fagin & al. TCS 05]) Theorem If I is GAV without constraints in the global schema, then M(C) is the minimal universal solution for I relative to C We derive again the following results (see Phokion s course): Theorem If q is a conjunctive query, then t cert(q, I, C) iff t q M(C) Complexity of query answering is polynomial M. Lenzerini Data Integration DASI / 213
97 Query answering QA in GAV without constraints QA in (G)LAV without constraints Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
98 Query answering QA in GAV without constraints QA in (G)LAV without constraints Part 2: Query answering without constraints (G)LAV data integration systems without constraints Constraints in G Type of mapping Incompleteness Inconsistency no GAV yes / no no no (G)LAV yes no yes GAV yes yes yes (G)LAV yes yes M. Lenzerini Data Integration DASI / 213
99 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV and incompleteness Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
100 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV and incompleteness (G)LAV Example Part 2: Query answering without constraints Consider I = G, S, M, with Global schema G: student(code, Name, City) enrolled(scode, Ucode) Source schema S: relation s 1 (Scode, Sname, City, Age) Mapping M: { (c, n, ci) s 1 (c, n, ci, a) } { (c, n, ci) student(c, n, ci), enrolled(c, u) } M. Lenzerini Data Integration DASI / 213
101 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV and incompleteness (G)LAV Example Part 2: Query answering without constraints { (c, n, ci) s 1 (c, n, ci, a) } { (c, n, ci) student(c, n, ci), enrolled(c, u) } student Code Name City 12 anne florence 15 bill oslo s C 1 12 anne florence bill oslo 24 enrolled Scode Ucode 12 x 15 y A source db C and a corresponding possible retrieved global db M(C) M. Lenzerini Data Integration DASI / 213
102 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV and incompleteness (G)LAV Example Part 2: Query answering without constraints { (c, n, ci) s 1 (c, n, ci, a) } { (c, n, ci) student(c, n, ci), enrolled(c, u) } student Code Name City 12 anne florence 15 bill oslo s C 1 12 anne florence bill oslo 24 enrolled Scode Ucode 12 x 15 x A source db C and another possible retrieved global db M(C) M. Lenzerini Data Integration DASI / 213
103 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV and incompleteness (G)LAV Incompleteness Part 2: Query answering without constraints (G)LAV mapping assertions φ S φ G have the logical form: x. φ S ( x) y. φ G ( x, y) where φ S and φ G are conjunctions of atoms Given a source database C, in general there are several solutions for a set of (G)LAV assertions (i.e., different databases that are legal wrt G that satisfy M wrt C) Incompleteness comes from the mapping This holds even for the case of very simple queries φ G : s 1 (x) { (x) y. g(x, y) } M. Lenzerini Data Integration DASI / 213
104 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV and incompleteness Part 2: Query answering without constraints (G)LAV Query answering is based on logical inference q I Logical inference C cert(q, I, C) M. Lenzerini Data Integration DASI / 213
105 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
106 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV (G)LAV Approaches to query answering Part 2: Query answering without constraints Exploit connection with query containment Direct methods (aka view-based query answering): Try to answer directly the query by means of an algorithm that takes as input the user query q, the specification of I, and the source database C By (view-based) query rewriting: 1 Taking into account I, reformulate the user query q as a new query (called a rewriting of q) over the source relations 2 Evaluate the rewriting over the source database C Note: In (G)LAV data integration the views are the sources M. Lenzerini Data Integration DASI / 213
107 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV (G)LAV Approaches to query answering Part 2: Query answering without constraints Exploit connection with query containment Direct methods (aka view-based query answering): Try to answer directly the query by means of an algorithm that takes as input the user query q, the specification of I, and the source database C By (view-based) query rewriting: 1 Taking into account I, reformulate the user query q as a new query (called a rewriting of q) over the source relations 2 Evaluate the rewriting over the source database C Note: In (G)LAV data integration the views are the sources M. Lenzerini Data Integration DASI / 213
108 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV (G)LAV Approaches to query answering Part 2: Query answering without constraints Exploit connection with query containment Direct methods (aka view-based query answering): Try to answer directly the query by means of an algorithm that takes as input the user query q, the specification of I, and the source database C By (view-based) query rewriting: 1 Taking into account I, reformulate the user query q as a new query (called a rewriting of q) over the source relations 2 Evaluate the rewriting over the source database C Note: In (G)LAV data integration the views are the sources M. Lenzerini Data Integration DASI / 213
109 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV (G)LAV Approaches to query answering Part 2: Query answering without constraints Exploit connection with query containment Direct methods (aka view-based query answering): Try to answer directly the query by means of an algorithm that takes as input the user query q, the specification of I, and the source database C By (view-based) query rewriting: 1 Taking into account I, reformulate the user query q as a new query (called a rewriting of q) over the source relations 2 Evaluate the rewriting over the source database C Note: In (G)LAV data integration the views are the sources M. Lenzerini Data Integration DASI / 213
110 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV Part 2: Query answering without constraints Connection between query answering and containment Definition Query containment (under constraints) is the problem of checking whether q1 D is contained in qd 2 for every database D (satisfying the constraints), where q 1, q 2 are queries of the same arity Query answering can be rephrased in terms of query containment: A source database C can be represented as a conjunction q C of ground literals over A S (e.g., if c s C, there is a literal s( c)) If q is a query, and t is a tuple, then we denote by q t the query obtained by substituting the free variables of q with t The problem of checking whether t cert(q, I, C) under sound sources can be reduced to the problem of checking whether the conjunctive query q C is contained in q t under the constraints expressed by G M M. Lenzerini Data Integration DASI / 213
111 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV Part 2: Query answering without constraints Connection between query answering and containment Definition Query containment (under constraints) is the problem of checking whether q1 D is contained in qd 2 for every database D (satisfying the constraints), where q 1, q 2 are queries of the same arity Query answering can be rephrased in terms of query containment: A source database C can be represented as a conjunction q C of ground literals over A S (e.g., if c s C, there is a literal s( c)) If q is a query, and t is a tuple, then we denote by q t the query obtained by substituting the free variables of q with t The problem of checking whether t cert(q, I, C) under sound sources can be reduced to the problem of checking whether the conjunctive query q C is contained in q t under the constraints expressed by G M M. Lenzerini Data Integration DASI / 213
112 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV Query answering via query containment Part 2: Query answering without constraints Complexity of checking certain answers under sound sources: The combined complexity is identical to the complexity of query containment under constraints The data complexity is the complexity of query containment under constraints when the right-hand side query is considered fixed. Hence, it is at most the complexity of query containment under constraints Most results and techniques for query containment (under constraints) are relevant also for query answering (under constraints) Note: Also, query containment can be reduced to query answering. However, (in the presence of constraints) we need to allow for constants of the database to unify, i.e., to denote the same object. M. Lenzerini Data Integration DASI / 213
113 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV Query answering via query containment Part 2: Query answering without constraints Complexity of checking certain answers under sound sources: The combined complexity is identical to the complexity of query containment under constraints The data complexity is the complexity of query containment under constraints when the right-hand side query is considered fixed. Hence, it is at most the complexity of query containment under constraints Most results and techniques for query containment (under constraints) are relevant also for query answering (under constraints) Note: Also, query containment can be reduced to query answering. However, (in the presence of constraints) we need to allow for constants of the database to unify, i.e., to denote the same object. M. Lenzerini Data Integration DASI / 213
114 Query answering QA in GAV without constraints QA in (G)LAV without constraints Approaches to query answering in (G)LAV Query answering via query containment Part 2: Query answering without constraints Complexity of checking certain answers under sound sources: The combined complexity is identical to the complexity of query containment under constraints The data complexity is the complexity of query containment under constraints when the right-hand side query is considered fixed. Hence, it is at most the complexity of query containment under constraints Most results and techniques for query containment (under constraints) are relevant also for query answering (under constraints) Note: Also, query containment can be reduced to query answering. However, (in the presence of constraints) we need to allow for constants of the database to unify, i.e., to denote the same object. M. Lenzerini Data Integration DASI / 213
115 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
116 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Basic technique From [Duschka & Genesereth PODS 97]: r 1 (t) { (t) movie(t, y, d) european(d) } r 2 (t, v) { (t, v) movie(t, y, d) review(t, v) } t. r 1 (t) y, d. movie(t, y, d) european(d) t, v. r 2 (t, v) y, d. movie(t, y, d) review(t, v) movie(t, f 1 (t), f 2 (t)) r 1 (t) european(f 2 (t)) r 1 (t) movie(t, f 4 (t, v), f 5 (t, v)) r 2 (t, v) review(t, v) r 2 (t, v) Part 2: Query answering without constraints Answering a query means evaluating a goal wrt to this nonrecursive logic program (which can be transformed into a union of CQs) Data complexity is polynomial (actually LogSpace) M. Lenzerini Data Integration DASI / 213
117 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Canonical retrieved global database What is a retrieved global database in this case? Part 2: Query answering without constraints Definition We build what we call the canonical retrieved global database for I relative to C, denoted M(C), as follows: Let all predicates initially be empty in M(C) For each mapping assertion φ S φ G in M for each tuple t φ C S such that t φ M(C) G, add t to φ M(C) G by inventing fresh variables (Skolem terms) in order to satisfy the existentially quantified variables in φ G Properties of M(C) (also called canonical model of I relative to C) It is unique up to variable renaming It can be computed in polynomial time wrt the size of C Since there are no constraints in G, it obviously satisfies G M. Lenzerini Data Integration DASI / 213
118 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Example of canonical model Part 2: Query answering without constraints { (c, n, ci) s 1 (c, n, ci, a) } { (c, n, ci) student(c, n, ci) enrolled(c, u) } student Code Name City 12 anne florence 15 bill oslo s C 1 12 anne florence bill oslo 24 enrolled Scode Ucode 12 x 15 y Example of source db C and corresponding canonical model M(C) M. Lenzerini Data Integration DASI / 213
119 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Canonical model Part 2: Query answering without constraints M. Lenzerini Data Integration DASI / 213
120 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Universal solution Part 2: Query answering without constraints Let I = G, S, M be a (G)LAV data integration system without constraints in the global schema, and C a source database Theorem Then M(C) is a universal solution for I relative to C (follows from [Fagin&al. ICDT 03]) It follows that: If q is a conjunctive query, then t cert(q, I, C) iff t q M(C) Complexity of query answering is polynomial, actually LogSpace M. Lenzerini Data Integration DASI / 213
121 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Universal solution Part 2: Query answering without constraints Let I = G, S, M be a (G)LAV data integration system without constraints in the global schema, and C a source database Theorem Then M(C) is a universal solution for I relative to C (follows from [Fagin&al. ICDT 03]) It follows that: If q is a conjunctive query, then t cert(q, I, C) iff t q M(C) Complexity of query answering is polynomial, actually LogSpace M. Lenzerini Data Integration DASI / 213
122 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV More expressive queries? Part 2: Query answering without constraints More expressive source queries in the mapping? Same results hold if we use any computable query as source query in the mapping assertions More expressive queries over the global schema in the mapping? Already positive queries lead to intractability More expressive user queries? Same results hold if we use Datalog queries as user queries Even the simplest form of negation (inequalities) leads to intractability M. Lenzerini Data Integration DASI / 213
123 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV More expressive queries? Part 2: Query answering without constraints More expressive source queries in the mapping? Same results hold if we use any computable query as source query in the mapping assertions More expressive queries over the global schema in the mapping? Already positive queries lead to intractability More expressive user queries? Same results hold if we use Datalog queries as user queries Even the simplest form of negation (inequalities) leads to intractability M. Lenzerini Data Integration DASI / 213
124 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV More expressive queries? Part 2: Query answering without constraints More expressive source queries in the mapping? Same results hold if we use any computable query as source query in the mapping assertions More expressive queries over the global schema in the mapping? Already positive queries lead to intractability More expressive user queries? Same results hold if we use Datalog queries as user queries Even the simplest form of negation (inequalities) leads to intractability M. Lenzerini Data Integration DASI / 213
125 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Intractability for positive views From [van der Meyden TCS 93], by reduction from 3-colorability Part 2: Query answering without constraints We define the following LAV data integration system I = G, S, M : G : edge(x, y), color(x, c) S : s E (x, y), s N (x) M : s E (x, y) edge(x, y) s N (x) color(x, RED) color(x, BLUE) color(x, GREEN) Given a graph G = (N, E), we define the following source database C: s E C = { (a, b), (b, a) (a, b) E } s N C = { (a) a N } Consider the boolean query q: x, y, c. edge(x, y) color(x, c) color(y, c) describing mismatched edge pairs: Theorem If G is 3-colorable, then B s.t. q B = false, hence cert(q, I, C) = false If G is not 3-colorable, then cert(q, I, C) = true Data complexity is conp-hard for positive views and conjunctive queries M. Lenzerini Data Integration DASI / 213
126 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Intractability for positive views From [van der Meyden TCS 93], by reduction from 3-colorability Part 2: Query answering without constraints We define the following LAV data integration system I = G, S, M : G : edge(x, y), color(x, c) S : s E (x, y), s N (x) M : s E (x, y) edge(x, y) s N (x) color(x, RED) color(x, BLUE) color(x, GREEN) Given a graph G = (N, E), we define the following source database C: s E C = { (a, b), (b, a) (a, b) E } s N C = { (a) a N } Consider the boolean query q: x, y, c. edge(x, y) color(x, c) color(y, c) describing mismatched edge pairs: Theorem If G is 3-colorable, then B s.t. q B = false, hence cert(q, I, C) = false If G is not 3-colorable, then cert(q, I, C) = true Data complexity is conp-hard for positive views and conjunctive queries M. Lenzerini Data Integration DASI / 213
127 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Intractability for positive views From [van der Meyden TCS 93], by reduction from 3-colorability Part 2: Query answering without constraints We define the following LAV data integration system I = G, S, M : G : edge(x, y), color(x, c) S : s E (x, y), s N (x) M : s E (x, y) edge(x, y) s N (x) color(x, RED) color(x, BLUE) color(x, GREEN) Given a graph G = (N, E), we define the following source database C: s E C = { (a, b), (b, a) (a, b) E } s N C = { (a) a N } Consider the boolean query q: x, y, c. edge(x, y) color(x, c) color(y, c) describing mismatched edge pairs: Theorem If G is 3-colorable, then B s.t. q B = false, hence cert(q, I, C) = false If G is not 3-colorable, then cert(q, I, C) = true Data complexity is conp-hard for positive views and conjunctive queries M. Lenzerini Data Integration DASI / 213
128 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Intractability for positive views From [van der Meyden TCS 93], by reduction from 3-colorability Part 2: Query answering without constraints We define the following LAV data integration system I = G, S, M : G : edge(x, y), color(x, c) S : s E (x, y), s N (x) M : s E (x, y) edge(x, y) s N (x) color(x, RED) color(x, BLUE) color(x, GREEN) Given a graph G = (N, E), we define the following source database C: s E C = { (a, b), (b, a) (a, b) E } s N C = { (a) a N } Consider the boolean query q: x, y, c. edge(x, y) color(x, c) color(y, c) describing mismatched edge pairs: Theorem If G is 3-colorable, then B s.t. q B = false, hence cert(q, I, C) = false If G is not 3-colorable, then cert(q, I, C) = true Data complexity is conp-hard for positive views and conjunctive queries M. Lenzerini Data Integration DASI / 213
129 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV Intractability for positive views From [van der Meyden TCS 93], by reduction from 3-colorability Part 2: Query answering without constraints We define the following LAV data integration system I = G, S, M : G : edge(x, y), color(x, c) S : s E (x, y), s N (x) M : s E (x, y) edge(x, y) s N (x) color(x, RED) color(x, BLUE) color(x, GREEN) Given a graph G = (N, E), we define the following source database C: s E C = { (a, b), (b, a) (a, b) E } s N C = { (a) a N } Consider the boolean query q: x, y, c. edge(x, y) color(x, c) color(y, c) describing mismatched edge pairs: Theorem If G is 3-colorable, then B s.t. q B = false, hence cert(q, I, C) = false If G is not 3-colorable, then cert(q, I, C) = true Data complexity is conp-hard for positive views and conjunctive queries M. Lenzerini Data Integration DASI / 213
130 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV In conp for positive views and queries Part 2: Query answering without constraints t cert(q, I, C) if and only if there is a database B for I that satisfies M wrt C, and such that t q B The mapping M has the form: x. φ S ( x) y 1. α 1 ( x, y 1 ) y h α h ( x, y h )) Hence, each tuple in C forces the existence of k tuples in any database that satisfies M wrt C, where k is the maximal length of conjunctions α i ( x, y i ) in M If C has n tuples, then there is a db B B for I that satisfies M wrt C with at most n k tuples. Since q is monotone, t q B Checking whether B satisfies M wrt C, and checking whether t q B can be done in PTIME wrt the size of B Theorem For positive views and queries, query answ. is conp in data complexity M. Lenzerini Data Integration DASI / 213
131 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV In conp for positive views and queries Part 2: Query answering without constraints t cert(q, I, C) if and only if there is a database B for I that satisfies M wrt C, and such that t q B The mapping M has the form: x. φ S ( x) y 1. α 1 ( x, y 1 ) y h α h ( x, y h )) Hence, each tuple in C forces the existence of k tuples in any database that satisfies M wrt C, where k is the maximal length of conjunctions α i ( x, y i ) in M If C has n tuples, then there is a db B B for I that satisfies M wrt C with at most n k tuples. Since q is monotone, t q B Checking whether B satisfies M wrt C, and checking whether t q B can be done in PTIME wrt the size of B Theorem For positive views and queries, query answ. is conp in data complexity M. Lenzerini Data Integration DASI / 213
132 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV In conp for positive views and queries Part 2: Query answering without constraints t cert(q, I, C) if and only if there is a database B for I that satisfies M wrt C, and such that t q B The mapping M has the form: x. φ S ( x) y 1. α 1 ( x, y 1 ) y h α h ( x, y h )) Hence, each tuple in C forces the existence of k tuples in any database that satisfies M wrt C, where k is the maximal length of conjunctions α i ( x, y i ) in M If C has n tuples, then there is a db B B for I that satisfies M wrt C with at most n k tuples. Since q is monotone, t q B Checking whether B satisfies M wrt C, and checking whether t q B can be done in PTIME wrt the size of B Theorem For positive views and queries, query answ. is conp in data complexity M. Lenzerini Data Integration DASI / 213
133 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV In conp for positive views and queries Part 2: Query answering without constraints t cert(q, I, C) if and only if there is a database B for I that satisfies M wrt C, and such that t q B The mapping M has the form: x. φ S ( x) y 1. α 1 ( x, y 1 ) y h α h ( x, y h )) Hence, each tuple in C forces the existence of k tuples in any database that satisfies M wrt C, where k is the maximal length of conjunctions α i ( x, y i ) in M If C has n tuples, then there is a db B B for I that satisfies M wrt C with at most n k tuples. Since q is monotone, t q B Checking whether B satisfies M wrt C, and checking whether t q B can be done in PTIME wrt the size of B Theorem For positive views and queries, query answ. is conp in data complexity M. Lenzerini Data Integration DASI / 213
134 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) (G)LAV In conp for positive views and queries Part 2: Query answering without constraints t cert(q, I, C) if and only if there is a database B for I that satisfies M wrt C, and such that t q B The mapping M has the form: x. φ S ( x) y 1. α 1 ( x, y 1 ) y h α h ( x, y h )) Hence, each tuple in C forces the existence of k tuples in any database that satisfies M wrt C, where k is the maximal length of conjunctions α i ( x, y i ) in M If C has n tuples, then there is a db B B for I that satisfies M wrt C with at most n k tuples. Since q is monotone, t q B Checking whether B satisfies M wrt C, and checking whether t q B can be done in PTIME wrt the size of B Theorem For positive views and queries, query answ. is conp in data complexity M. Lenzerini Data Integration DASI / 213
135 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) Part 2: Query answering without constraints (G)LAV Conjunctive user queries with inequalities Consider I = G, S, M, and source db C (see [Fagin & al. ICDT 03]): G : g(x, y) S : s(x, y) M : s(x, y) { (x, y) g(x, z) g(z, y) } C : { s(a, a) } B 1 = { g(a, a) } is a solution If B is a universal solution, then both g(a, x) and g(x, a) are in B, with x a (otherwise g(a, a) would be true in every solution) Let q = { () g(x, y) x y } q B 1 = false, hence cert(q, I, C) = false But q B = true for every universal solution B for I relative to C Hence, the notion of universal solution is not the right tool M. Lenzerini Data Integration DASI / 213
136 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) Part 2: Query answering without constraints (G)LAV Conjunctive user queries with inequalities Consider I = G, S, M, and source db C (see [Fagin & al. ICDT 03]): G : g(x, y) S : s(x, y) M : s(x, y) { (x, y) g(x, z) g(z, y) } C : { s(a, a) } B 1 = { g(a, a) } is a solution If B is a universal solution, then both g(a, x) and g(x, a) are in B, with x a (otherwise g(a, a) would be true in every solution) Let q = { () g(x, y) x y } q B 1 = false, hence cert(q, I, C) = false But q B = true for every universal solution B for I relative to C Hence, the notion of universal solution is not the right tool M. Lenzerini Data Integration DASI / 213
137 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) Part 2: Query answering without constraints (G)LAV Conjunctive user queries with inequalities conp algorithm: guess equalities on variables in the canonical retrieved global database conp-hard already for a conjunctive user query with two inequalities (and conjunctive view definitions) [Fagin & al. ICDT 03], see Phokion s course Theorem For conjunctive user queries with inequalities, (G)LAV query answering is conp-complete in data complexity Note: inequalities in the view definitions do not affect expressive power and complexity (in fact, they can be removed) M. Lenzerini Data Integration DASI / 213
138 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Direct methods (aka view-based query answering) Part 2: Query answering without constraints (G)LAV Conjunctive user queries with inequalities conp algorithm: guess equalities on variables in the canonical retrieved global database conp-hard already for a conjunctive user query with two inequalities (and conjunctive view definitions) [Fagin & al. ICDT 03], see Phokion s course Theorem For conjunctive user queries with inequalities, (G)LAV query answering is conp-complete in data complexity Note: inequalities in the view definitions do not affect expressive power and complexity (in fact, they can be removed) M. Lenzerini Data Integration DASI / 213
139 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting Outline Part 2: Query answering without constraints 3 Query answering in GAV without constraints Retrieved global database Query answering via unfolding Universal solutions 4 Query answering in (G)LAV without constraints (G)LAV and incompleteness Approaches to query answering in (G)LAV (G)LAV: Direct methods (aka view-based query answering) (G)LAV: Query answering by (view-based) query rewriting M. Lenzerini Data Integration DASI / 213
140 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV View-based query rewriting Part 2: Query answering without constraints Query answering is divided in two steps: 1 Re-express the query in terms of a given query language over the alphabet of A S 2 Evaluate the rewriting over the source database C M. Lenzerini Data Integration DASI / 213
141 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting Query answering Part 2: Query answering without constraints q I Logical inference C cert(q, I, C) M. Lenzerini Data Integration DASI / 213
142 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting Query answering: reformulation + evaluation Part 2: Query answering without constraints q I C Perfect reformulation (under OWA) cert [q,i] Query evaluation (under CWA) cert(q, I, C) The query cert [q,i] could be expressed in an arbitrary query language M. Lenzerini Data Integration DASI / 213
143 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting Query rewriting Part 2: Query answering without constraints q Reformulation rew(q, I) I C (under OWA) Query evaluation (under CWA) ans(q, I, C) The language of rew(q, I) is chosen a priori! M. Lenzerini Data Integration DASI / 213
144 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV Connection to rewriting Part 2: Query answering without constraints Query answering by rewriting: 1 Given I = G, S, M and a query q over G, rewrite q into a query, called rew(q, I), over the alphabet A S of the sources 2 Evaluate the rewriting rew(q, I) over the source database C We are interested in rewritings that are: sound, i.e., compute only tuples in cert(q, I, C) for every C expressed in a given query language L maximal for the class of queries expressible in L We may be interested in exact rewritings, i.e., rewritings that are logically equivalent to the query, modulo M Exact rewritings may not exist M. Lenzerini Data Integration DASI / 213
145 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV Connection to rewriting Part 2: Query answering without constraints Query answering by rewriting: 1 Given I = G, S, M and a query q over G, rewrite q into a query, called rew(q, I), over the alphabet A S of the sources 2 Evaluate the rewriting rew(q, I) over the source database C We are interested in rewritings that are: sound, i.e., compute only tuples in cert(q, I, C) for every C expressed in a given query language L maximal for the class of queries expressible in L We may be interested in exact rewritings, i.e., rewritings that are logically equivalent to the query, modulo M Exact rewritings may not exist M. Lenzerini Data Integration DASI / 213
146 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV Connection to rewriting Part 2: Query answering without constraints Query answering by rewriting: 1 Given I = G, S, M and a query q over G, rewrite q into a query, called rew(q, I), over the alphabet A S of the sources 2 Evaluate the rewriting rew(q, I) over the source database C We are interested in rewritings that are: sound, i.e., compute only tuples in cert(q, I, C) for every C expressed in a given query language L maximal for the class of queries expressible in L We may be interested in exact rewritings, i.e., rewritings that are logically equivalent to the query, modulo M Exact rewritings may not exist M. Lenzerini Data Integration DASI / 213
147 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV Example of maximal rewriting G: nonstop(airline, Num, From, To) S: flightsbyunited(from, To) flightsfromsfo(airline, Num, To) M: flightsbyunited(from, To) nonstop(ua, Num, From, To) flightsfromsfo(airline, Num, To) nonstop(airline, Num, SFO, To) q: { (airline, num) nonstop(airline, num, LAX, PHX) } A maximal (wrt positive queries) rewriting of q is: { (UA, num) flightsbyunited(num, LAX, PHX) } Part 2: Query answering without constraints M. Lenzerini Data Integration DASI / 213
148 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV Example of maximal rewriting G: nonstop(airline, Num, From, To) S: flightsbyunited(from, To) flightsfromsfo(airline, Num, To) M: flightsbyunited(from, To) nonstop(ua, Num, From, To) flightsfromsfo(airline, Num, To) nonstop(airline, Num, SFO, To) q: { (airline, num) nonstop(airline, num, LAX, PHX) } A maximal (wrt positive queries) rewriting of q is: { (UA, num) flightsbyunited(num, LAX, PHX) } Part 2: Query answering without constraints M. Lenzerini Data Integration DASI / 213
149 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting Perfect rewriting Part 2: Query answering without constraints What is the relationship between answering by rewriting and certain answers? [Calvanese & al. ICDT 05]: When does the (maximal) rewriting compute all certain answers? What do we gain or loose by focusing on a given class of queries? Let s try to consider the best possible rewriting Define cert [q,i] ( ) to be the function that, with q and I fixed, given source database C, computes the certain answers cert(q, I, C). cert [q,i] can be seen as a query on the alphabet A S cert [q,i] is a (sound) rewriting of q wrt I No sound rewriting exists that is better than cert [q,i] Hence, cert [q,i] is called the perfect rewriting of q wrt I M. Lenzerini Data Integration DASI / 213
150 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting Perfect rewriting Part 2: Query answering without constraints What is the relationship between answering by rewriting and certain answers? [Calvanese & al. ICDT 05]: When does the (maximal) rewriting compute all certain answers? What do we gain or loose by focusing on a given class of queries? Let s try to consider the best possible rewriting Define cert [q,i] ( ) to be the function that, with q and I fixed, given source database C, computes the certain answers cert(q, I, C). cert [q,i] can be seen as a query on the alphabet A S cert [q,i] is a (sound) rewriting of q wrt I No sound rewriting exists that is better than cert [q,i] Hence, cert [q,i] is called the perfect rewriting of q wrt I M. Lenzerini Data Integration DASI / 213
151 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting Perfect rewriting Part 2: Query answering without constraints What is the relationship between answering by rewriting and certain answers? [Calvanese & al. ICDT 05]: When does the (maximal) rewriting compute all certain answers? What do we gain or loose by focusing on a given class of queries? Let s try to consider the best possible rewriting Define cert [q,i] ( ) to be the function that, with q and I fixed, given source database C, computes the certain answers cert(q, I, C). cert [q,i] can be seen as a query on the alphabet A S cert [q,i] is a (sound) rewriting of q wrt I No sound rewriting exists that is better than cert [q,i] Hence, cert [q,i] is called the perfect rewriting of q wrt I M. Lenzerini Data Integration DASI / 213
152 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting Properties of the perfect rewriting Part 2: Query answering without constraints Can the perfect rewriting be expressed in a certain query language? For a given class of queries, what is the relationship between a maximal rewriting and the perfect rewriting? From a semantical point of view From a computational point of view Which is the computational complexity of finding the perfect rewriting, and how big is it? Which is the computational complexity of evaluating the perfect rewriting? M. Lenzerini Data Integration DASI / 213
153 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV The case of conjunctive queries Part 2: Query answering without constraints Let q and the queries in M be conjunctive queries (CQs) Let q be the union of all maximal rewritings of q for the class of CQs Theorem (Levy & al. PODS 95, Abiteboul & Duschka PODS 98) q is the maximal rewriting for the class of unions of conjunctive queries (UCQs) q is the perfect rewriting of q wrt I q is a PTIME query q is an exact rewriting (equivalent to q for each database B of I), if an exact rewriting exists Does this ideal situation carry on to cases where q and M allow for union? M. Lenzerini Data Integration DASI / 213
154 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV The case of conjunctive queries Part 2: Query answering without constraints Let q and the queries in M be conjunctive queries (CQs) Let q be the union of all maximal rewritings of q for the class of CQs Theorem (Levy & al. PODS 95, Abiteboul & Duschka PODS 98) q is the maximal rewriting for the class of unions of conjunctive queries (UCQs) q is the perfect rewriting of q wrt I q is a PTIME query q is an exact rewriting (equivalent to q for each database B of I), if an exact rewriting exists Does this ideal situation carry on to cases where q and M allow for union? M. Lenzerini Data Integration DASI / 213
155 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV View-based query processing for UPQs Part 2: Query answering without constraints When queries over the global schema in the mapping contain union: We have seen that view-based query answering is conp-complete in data complexity [van der Meyden TCS 93] hence, cert(q, I, C), with q, I fixed, is a conp-complete function hence, the perfect rewriting cert [q,i] is a conp-complete query We do not have the ideal situation we had for conjunctive queries Problem: Isolate those cases of view based query rewriting for UPQs q and I for which the perfect rewriting cert [q,i] is a PTIME function (assuming P NP) [Calvanese & al. LICS 00]. M. Lenzerini Data Integration DASI / 213
156 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV View-based query processing for UPQs Part 2: Query answering without constraints When queries over the global schema in the mapping contain union: We have seen that view-based query answering is conp-complete in data complexity [van der Meyden TCS 93] hence, cert(q, I, C), with q, I fixed, is a conp-complete function hence, the perfect rewriting cert [q,i] is a conp-complete query We do not have the ideal situation we had for conjunctive queries Problem: Isolate those cases of view based query rewriting for UPQs q and I for which the perfect rewriting cert [q,i] is a PTIME function (assuming P NP) [Calvanese & al. LICS 00]. M. Lenzerini Data Integration DASI / 213
157 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV Data complexity of query answering Part 2: Query answering without constraints From [Abiteboul & Duschka PODS 98], for sound sources: Global schema User queries mapping query CQ CQ PQ Datalog FOL CQ PTIME conp PTIME PTIME undec. CQ PTIME conp PTIME PTIME undec. PQ conp conp conp conp undec. Datalog conp undec. conp undec. undec. FOL undec. undec. undec. undec. undec. M. Lenzerini Data Integration DASI / 213
158 Query answering QA in GAV without constraints QA in (G)LAV without constraints (G)LAV: Query answering by (view-based) query rewriting (G)LAV Further references Part 2: Query answering without constraints Inverse rules [Duschka & Genesereth PODS 97] Bucket algorithm for query rewriting [Levy & al. AAAI 96] MiniCon algorithm for query rewriting [Pottinger & Levy VLDB 00] Conjunctive queries using conjunctive views [Levy & al. PODS 95] Recursive queries (Datalog programs) using conjunctive views [Duschka & Genesereth PODS 97; Afrati & al. ICDT 99] CQs with arithmetic comparison [Afrati & al. PODS 01] Complexity analysis [Abiteboul & Duschka PODS 98; Grahne & Mendelzon ICDT 99] Variants of Regular Path Queries [Calvanese & al. ICDE 00, PODS 00, DBPL 01; Deutsch & Tannen DBPL 01], Relationship between view-based rewriting and answering [Calvanese & al. LICS 00, PODS 03, ICDT 05] M. Lenzerini Data Integration DASI / 213
159 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints Part III Query answering in the presence of constraints M. Lenzerini Data Integration DASI / 213
160 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
161 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
162 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Global integrity constraints Part 3: Query answering with constraints Integrity constraints (ICs) are posed over the global schema Specify intensional knowledge about the domain of interest Add semantics to the information But data in the sources can conflict with global ICs The presence of global ICs raises semantic and computational problems Many open issues M. Lenzerini Data Integration DASI / 213
163 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints Integrity constraints for relational schemas Most important types of ICs for the relational model: key dependencies (KDs) functional dependencies (FDs) foreign keys (FKs) inclusion dependencies (IDs) exclusion dependencies (EDs) M. Lenzerini Data Integration DASI / 213
164 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints Integrity constraints for relational schemas Most important types of ICs for the relational model: key dependencies (KDs) functional dependencies (FDs) foreign keys (FKs) inclusion dependencies (IDs) exclusion dependencies (EDs) M. Lenzerini Data Integration DASI / 213
165 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints Integrity constraints for relational schemas Most important types of ICs for the relational model: key dependencies (KDs) functional dependencies (FDs) foreign keys (FKs) inclusion dependencies (IDs) exclusion dependencies (EDs) M. Lenzerini Data Integration DASI / 213
166 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Inclusion dependencies (IDs) Part 3: Query answering with constraints An inclusion dependency (ID) states that the presence of a tuple t 1 in a relation implies the presence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of inclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ID r[1] s[2] corresponds to the FOL sentence x, y, w. r(x, y, w) z. s(z, x) Note: IDs are a special form of tuple-generating dependencies M. Lenzerini Data Integration DASI / 213
167 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Inclusion dependencies (IDs) Part 3: Query answering with constraints An inclusion dependency (ID) states that the presence of a tuple t 1 in a relation implies the presence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of inclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ID r[1] s[2] corresponds to the FOL sentence x, y, w. r(x, y, w) z. s(z, x) Note: IDs are a special form of tuple-generating dependencies M. Lenzerini Data Integration DASI / 213
168 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Inclusion dependencies (IDs) Part 3: Query answering with constraints An inclusion dependency (ID) states that the presence of a tuple t 1 in a relation implies the presence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of inclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ID r[1] s[2] corresponds to the FOL sentence x, y, w. r(x, y, w) z. s(z, x) Note: IDs are a special form of tuple-generating dependencies M. Lenzerini Data Integration DASI / 213
169 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Key dependencies (KDs) Part 3: Query answering with constraints A key dependency (KD) states that a set of attributes functionally determines all the attributes of a relation Syntax of key dependencies key(r) = {i 1,..., i k } with i 1,..., i k components of r Example For r of arity 3, the KD key(r) = {1} corresponds to the FOL sentence x, y, y, z, z. r(x, y, z) r(x, y, z ) y = y z = z Note: KDs are a special form of equality-generating dependencies M. Lenzerini Data Integration DASI / 213
170 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Key dependencies (KDs) Part 3: Query answering with constraints A key dependency (KD) states that a set of attributes functionally determines all the attributes of a relation Syntax of key dependencies key(r) = {i 1,..., i k } with i 1,..., i k components of r Example For r of arity 3, the KD key(r) = {1} corresponds to the FOL sentence x, y, y, z, z. r(x, y, z) r(x, y, z ) y = y z = z Note: KDs are a special form of equality-generating dependencies M. Lenzerini Data Integration DASI / 213
171 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Key dependencies (KDs) Part 3: Query answering with constraints A key dependency (KD) states that a set of attributes functionally determines all the attributes of a relation Syntax of key dependencies key(r) = {i 1,..., i k } with i 1,..., i k components of r Example For r of arity 3, the KD key(r) = {1} corresponds to the FOL sentence x, y, y, z, z. r(x, y, z) r(x, y, z ) y = y z = z Note: KDs are a special form of equality-generating dependencies M. Lenzerini Data Integration DASI / 213
172 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Exclusion dependencies (EDs) Part 3: Query answering with constraints An exclusion dependency (ED) states that the presence of a tuple t 1 in a relation implies the absence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of exclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] = with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ED r[1] s[2] = corresponds to the FOL sentence x, y, w, z. r(x, y, w) s(z, x) Note: EDs are a special form of denial constraints M. Lenzerini Data Integration DASI / 213
173 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Exclusion dependencies (EDs) Part 3: Query answering with constraints An exclusion dependency (ED) states that the presence of a tuple t 1 in a relation implies the absence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of exclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] = with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ED r[1] s[2] = corresponds to the FOL sentence x, y, w, z. r(x, y, w) s(z, x) Note: EDs are a special form of denial constraints M. Lenzerini Data Integration DASI / 213
174 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Exclusion dependencies (EDs) Part 3: Query answering with constraints An exclusion dependency (ED) states that the presence of a tuple t 1 in a relation implies the absence of a tuple t 2 in another relation, where t 2 contains a projection of the values contained in t 1 Syntax of exclusion dependencies r[i 1,..., i k ] s[j 1,..., j k ] = with i 1,..., i k components of r, and j 1,..., j k components of s Example For r of arity 3 and s of arity 2, the ED r[1] s[2] = corresponds to the FOL sentence x, y, w, z. r(x, y, w) s(z, x) Note: EDs are a special form of denial constraints M. Lenzerini Data Integration DASI / 213
175 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
176 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
177 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV system with integrity constraints Part 3: Query answering with constraints We consider a data integration system I = G, S, M where G is a global schema with constraints M is a set of GAV mappings, whose assertions have the form φ S g and are interpreted as x. φ S ( x) g( x) where φ S is a conjunctive query over S, and g is an element of G Basic observation: Since G does have constraints, the retrieved global database M(C) may not be legal for G M. Lenzerini Data Integration DASI / 213
178 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV system with integrity constraints Part 3: Query answering with constraints We consider a data integration system I = G, S, M where G is a global schema with constraints M is a set of GAV mappings, whose assertions have the form φ S g and are interpreted as x. φ S ( x) g( x) where φ S is a conjunctive query over S, and g is an element of G Basic observation: Since G does have constraints, the retrieved global database M(C) may not be legal for G M. Lenzerini Data Integration DASI / 213
179 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems Part 3: Query answering with constraints Semantics of GAV systems with integrity constraints Given a source db C, a global db B (over ) satisfies I relative to C if 1 it is legal wrt the global schema, i.e., it satisfies the ICs 2 it satisfies the mapping, i.e., B is a superset of the retrieved global database M(C) (sound mappings) Recall: M(C) is obtained by evaluating, for each relation in A G, the corresponding mapping query over the source database C We are interested in certain answers to a query, i.e., those that hold for all global databases that satisfy I relative to C M. Lenzerini Data Integration DASI / 213
180 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems Part 3: Query answering with constraints Semantics of GAV systems with integrity constraints Given a source db C, a global db B (over ) satisfies I relative to C if 1 it is legal wrt the global schema, i.e., it satisfies the ICs 2 it satisfies the mapping, i.e., B is a superset of the retrieved global database M(C) (sound mappings) Recall: M(C) is obtained by evaluating, for each relation in A G, the corresponding mapping query over the source database C We are interested in certain answers to a query, i.e., those that hold for all global databases that satisfy I relative to C M. Lenzerini Data Integration DASI / 213
181 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV with constraints Example Part 3: Query answering with constraints Consider I = G, S, M, with G: student(code, Name, City) key(student) = {Code} university(code, Name) key(university) = {Code} enrolled(scode, Ucode) enrolled[scode] student[code] enrolled[ucode] university[code] Source schema S: s 1 (Scode, Sname, City, Age), s 2 (Ucode, Uname), s 3 (Scode, Ucode) Mapping M: { (c, n, ci) s 1 (c, n, ci, a) } student(c, n, ci) { (c, n) s 2 (c, n) } university(c, n) { (s, u) s 3 (s, u) } enrolled(s, u) M. Lenzerini Data Integration DASI / 213
182 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV with constraints Example Part 3: Query answering with constraints Consider I = G, S, M, with G: student(code, Name, City) key(student) = {Code} university(code, Name) key(university) = {Code} enrolled(scode, Ucode) enrolled[scode] student[code] enrolled[ucode] university[code] Source schema S: s 1 (Scode, Sname, City, Age), s 2 (Ucode, Uname), s 3 (Scode, Ucode) Mapping M: { (c, n, ci) s 1 (c, n, ci, a) } student(c, n, ci) { (c, n) s 2 (c, n) } university(c, n) { (s, u) s 3 (s, u) } enrolled(s, u) M. Lenzerini Data Integration DASI / 213
183 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems Part 3: Query answering with constraints GAV with constraints Example of retrieved global db university student Code Name Code Name City AF bocconi 12 anne florence BN ucla 15 bill oslo s C 1 s C 2 12 anne florence 21 AF bocconi 15 bill oslo 24 BN ucla enrolled Scode Ucode 12 AF 16 BN s C 3 12 AF 16 BN Example of source database C and corresponding retrieved global database M(C) M. Lenzerini Data Integration DASI / 213
184 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems Part 3: Query answering with constraints GAV with constraints Example of incompleteness s C 3 12 AF 16 BN enrolled B Scode Ucode 12 AF 16 BN student B Code Name City 12 anne florence 15 bill oslo 16 x y s C 3 (16, BN) and the mapping imply enrolledb (16, BN) for all B sem C (I) Due to the inclusion dependency enrolled[scode] student[code] in G, 16 is the code of some student in all B sem C (I) Since C does not provide information about name and city of the student with code 16, a global database that is legal for I wrt C may contain arbitrary values for these M. Lenzerini Data Integration DASI / 213
185 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems Part 3: Query answering with constraints GAV with constraints Example of incompleteness s C 3 12 AF 16 BN enrolled B Scode Ucode 12 AF 16 BN student B Code Name City 12 anne florence 15 bill oslo 16 x y s C 3 (16, BN) and the mapping imply enrolledb (16, BN) for all B sem C (I) Due to the inclusion dependency enrolled[scode] student[code] in G, 16 is the code of some student in all B sem C (I) Since C does not provide information about name and city of the student with code 16, a global database that is legal for I wrt C may contain arbitrary values for these M. Lenzerini Data Integration DASI / 213
186 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems Part 3: Query answering with constraints GAV with constraints Example of incompleteness s C 3 12 AF 16 BN enrolled B Scode Ucode 12 AF 16 BN student B Code Name City 12 anne florence 15 bill oslo 16 x y s C 3 (16, BN) and the mapping imply enrolledb (16, BN) for all B sem C (I) Due to the inclusion dependency enrolled[scode] student[code] in G, 16 is the code of some student in all B sem C (I) Since C does not provide information about name and city of the student with code 16, a global database that is legal for I wrt C may contain arbitrary values for these M. Lenzerini Data Integration DASI / 213
187 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV with constraints Unfolding is not sufficient Part 3: Query answering with constraints Mapping M: { (c, n, ci) s 1 (c, n, ci, a) } student(c, n, ci) { (c, n) s 2 (c, n) } university(c, n) { (s, u) s 3 (s, u) } enrolled(s, u) s C 1 12 anne florence bill oslo 24 s C 2 AF BN bocconi ucla s C 3 12 AF 16 BN Consider the query: q = { (c) student(c, n, ci) } Unfolding of q wrt M: unf M (q) = { (c) s 1 (c, n, ci, a) } The query unf M (q) retrieves from C only the answer {12, 15}, while the correct answer would be {12, 15, 16} The simple unfolding strategy is not sufficient for GAV with constraints M. Lenzerini Data Integration DASI / 213
188 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV with constraints Unfolding is not sufficient Part 3: Query answering with constraints Mapping M: { (c, n, ci) s 1 (c, n, ci, a) } student(c, n, ci) { (c, n) s 2 (c, n) } university(c, n) { (s, u) s 3 (s, u) } enrolled(s, u) s C 1 12 anne florence bill oslo 24 s C 2 AF BN bocconi ucla s C 3 12 AF 16 BN Consider the query: q = { (c) student(c, n, ci) } Unfolding of q wrt M: unf M (q) = { (c) s 1 (c, n, ci, a) } The query unf M (q) retrieves from C only the answer {12, 15}, while the correct answer would be {12, 15, 16} The simple unfolding strategy is not sufficient for GAV with constraints M. Lenzerini Data Integration DASI / 213
189 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV with constraints Unfolding is not sufficient Part 3: Query answering with constraints Mapping M: { (c, n, ci) s 1 (c, n, ci, a) } student(c, n, ci) { (c, n) s 2 (c, n) } university(c, n) { (s, u) s 3 (s, u) } enrolled(s, u) s C 1 12 anne florence bill oslo 24 s C 2 AF BN bocconi ucla s C 3 12 AF 16 BN Consider the query: q = { (c) student(c, n, ci) } Unfolding of q wrt M: unf M (q) = { (c) s 1 (c, n, ci, a) } The query unf M (q) retrieves from C only the answer {12, 15}, while the correct answer would be {12, 15, 16} The simple unfolding strategy is not sufficient for GAV with constraints M. Lenzerini Data Integration DASI / 213
190 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV with constraints Example of inconsistency Part 3: Query answering with constraints s C 1 12 anne florence bill oslo 24 student B Code Name City 12 anne florence 12 bill oslo The tuples in s C 1 and the mapping imply studentb (12, anne, florence) and student B (12, bill, oslo), for all B that satisfy the mapping Due to the key dependency key(student) = {Code} in G, there is no global database that satisfies the mapping and is legal wrt the global schema, i.e., sem I (C) = M. Lenzerini Data Integration DASI / 213
191 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV with constraints Example of inconsistency Part 3: Query answering with constraints s C 1 12 anne florence bill oslo 24 student B Code Name City 12 anne florence 12 bill oslo The tuples in s C 1 and the mapping imply studentb (12, anne, florence) and student B (12, bill, oslo), for all B that satisfy the mapping Due to the key dependency key(student) = {Code} in G, there is no global database that satisfies the mapping and is legal wrt the global schema, i.e., sem I (C) = M. Lenzerini Data Integration DASI / 213
192 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems GAV data integration systems with constraints Part 3: Query answering with constraints Constraints in G Type of mapping Incompleteness Inconsistency no GAV yes / no no no (G)LAV yes no IDs GAV yes no KDs GAV yes / no yes IDs + KDs GAV yes yes yes (G)LAV yes yes M. Lenzerini Data Integration DASI / 213
193 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Incompleteness and inconsistency in GAV systems Part 3: Query answering with constraints GAV with constraints Incompleteness and inconsistency M. Lenzerini Data Integration DASI / 213
194 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
195 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Inclusion dependencies Example Part 3: Query answering with constraints Global schema G: player(pname, YOB, Pteam) team(tname, Tcity, Tleader) Constraints: Sources S: team[tleader, Tname] player[pname, Pteam] s 1 and s 3 store players s 2 stores teams Mapping M: { (x, y, z) s 1 (x, y, z) s 3 (x, y, z) } player(x, y, z) { (x, y, z) s 2 (x, y, z) } team(x, y, z) M. Lenzerini Data Integration DASI / 213
196 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Inclusion dependencies Example retrieved global db Source database C: s 1 : Totti 1971 Roma s 2 : Juve Torino Del Piero s 3 : Buffon 1978 Juve Retrieved global database M(C): player: Totti 1971 Roma Buffon 1978 Juve team: Juve Torino Del Piero M. Lenzerini Data Integration DASI / 213
197 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Inclusion dependencies Example retrieved global db Source database C: s 1 : Totti 1971 Roma s 2 : Juve Torino Del Piero s 3 : Buffon 1978 Juve Retrieved global database M(C): player: Totti 1971 Roma Buffon 1978 Juve team: Juve Torino Del Piero M. Lenzerini Data Integration DASI / 213
198 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Inclusion dependencies Example retrieved global db player: Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team: Juve Torino Del Piero The ID on the global schema tells us that Del Piero is a player of Juve All global databases satisfying I have at least the tuples shown above, where α is some value of the domain Warnings 1 There may be an infinite number of databases satisfying I 2 In case of cyclic IDs, databases satisfying I may be of infinite size M. Lenzerini Data Integration DASI / 213
199 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Inclusion dependencies Example retrieved global db player: Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team: Juve Torino Del Piero The ID on the global schema tells us that Del Piero is a player of Juve All global databases satisfying I have at least the tuples shown above, where α is some value of the domain Warnings 1 There may be an infinite number of databases satisfying I 2 In case of cyclic IDs, databases satisfying I may be of infinite size M. Lenzerini Data Integration DASI / 213
200 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Inclusion dependencies Example retrieved global db player: Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team: Juve Torino Del Piero The ID on the global schema tells us that Del Piero is a player of Juve All global databases satisfying I have at least the tuples shown above, where α is some value of the domain Warnings 1 There may be an infinite number of databases satisfying I 2 In case of cyclic IDs, databases satisfying I may be of infinite size M. Lenzerini Data Integration DASI / 213
201 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Inclusion dependencies Example retrieved global db player: Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team: Juve Torino Del Piero The ID on the global schema tells us that Del Piero is a player of Juve All global databases satisfying I have at least the tuples shown above, where α is some value of the domain Consider the query q = { (x, z) player(x, y, z) } cert(q, I, C) = { (Totti, Roma), (Buffon, Juve), (Del Piero, Juve) } M. Lenzerini Data Integration DASI / 213
202 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Inclusion dependencies Example retrieved global db player: Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team: Juve Torino Del Piero The ID on the global schema tells us that Del Piero is a player of Juve All global databases satisfying I have at least the tuples shown above, where α is some value of the domain Consider the query q = { (x, z) player(x, y, z) } cert(q, I, C) = { (Totti, Roma), (Buffon, Juve), (Del Piero, Juve) } M. Lenzerini Data Integration DASI / 213
203 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Chasing inclusion dependencies Infinite construction Intuitive strategy: Add new facts until IDs are satisfied Problem: Infinite construction in the presence of cyclic IDs Example Let r be binary with r[2] r[1] Suppose M(C) = { r(a, b) } 1 add r(b, c 1 ) 2 add r(c 1, c 2 ) 3 add r(c 2, c 3 ) 4... (ad infinitum) Example Let r, s be binary with r[1] s[1], s[2] r[1] Suppose M(C) = { r(a, b) } 1 add s(a, c 1 ) 2 add r(c 1, c 2 ) 3 add s(c 1, c 3 ) 4 add r(c 3, c 4 ) 5... (ad infinitum) M. Lenzerini Data Integration DASI / 213
204 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Chasing inclusion dependencies Infinite construction Intuitive strategy: Add new facts until IDs are satisfied Problem: Infinite construction in the presence of cyclic IDs Example Let r be binary with r[2] r[1] Suppose M(C) = { r(a, b) } 1 add r(b, c 1 ) 2 add r(c 1, c 2 ) 3 add r(c 2, c 3 ) 4... (ad infinitum) Example Let r, s be binary with r[1] s[1], s[2] r[1] Suppose M(C) = { r(a, b) } 1 add s(a, c 1 ) 2 add r(c 1, c 2 ) 3 add s(c 1, c 3 ) 4 add r(c 3, c 4 ) 5... (ad infinitum) M. Lenzerini Data Integration DASI / 213
205 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Part 3: Query answering with constraints Chasing inclusion dependencies Infinite construction Intuitive strategy: Add new facts until IDs are satisfied Problem: Infinite construction in the presence of cyclic IDs Example Let r be binary with r[2] r[1] Suppose M(C) = { r(a, b) } 1 add r(b, c 1 ) 2 add r(c 1, c 2 ) 3 add r(c 2, c 3 ) 4... (ad infinitum) Example Let r, s be binary with r[1] s[1], s[2] r[1] Suppose M(C) = { r(a, b) } 1 add s(a, c 1 ) 2 add r(c 1, c 2 ) 3 add s(c 1, c 3 ) 4 add r(c 3, c 4 ) 5... (ad infinitum) M. Lenzerini Data Integration DASI / 213
206 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies The chase of a database Part 3: Query answering with constraints Definition The chase of a database is the exhaustive application of a set of rules that transform the database, in order to make it consistent with a set of integrity constraints Typically, there will be one or more chase rules for each different type of constraint M. Lenzerini Data Integration DASI / 213
207 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies The ID-chase rule Part 3: Query answering with constraints The chase for IDs has only one rule, the ID-chase rule Let D be a database: if the schema contains the ID r[i 1,..., i k ] s[j 1,..., j k ] and there is a fact in D of the form r(a 1,..., a n ) and there are no facts in D of the form s(b 1,..., b m ) such that a il = b jl for each l {1,..., k}, then add to D the fact s(c 1,..., c m ), where for each h {1,..., m}, if h = j l for some l then c h = a il otherwise c h is a new constant symbol (not in D yet) Notice: New existential symbols are introduced (skolem terms) M. Lenzerini Data Integration DASI / 213
208 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Properties of the chase Part 3: Query answering with constraints Bad news: the chase is in general infinite Good news: the chase identifies a canonical model A canonical model is a database that represents all the models of the system We can use the chase to prove soundness and completeness of a query processing method... but only for positive queries! M. Lenzerini Data Integration DASI / 213
209 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under inclusion dependencies Limiting the chase Part 3: Query answering with constraints Why don t we use a finite number of existential constants in the chase? Example Consider r[1] s[1], and s[2] r[1] and suppose M(C) = { r(a, b) } Compute chase(m(c)) with only one new constant c 1 : 0) r(a, b); 1) add s(a, c 1 ) 2) add r(c 1, c 1 ) 3) add s(c 1, c 1 ) This database is not a canonical model for I wrt C E.g., for query q = { (x) r(x, y), s(y, y) }, we have a q chase(m(c)) while a cert(q, I, C) Arbitrarily limiting the chase is unsound, for any finite number of new constants M. Lenzerini Data Integration DASI / 213
210 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
211 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Chasing the query Part 3: Query answering with constraints When chasing the data the termination condition would need to take into account the query We consider an alternative approach, based on the idea of a query chase Instead of chasing the data, we chase the query Is the dual notion of the database chase IDs are applied from right to left to the query atoms Advantage: much easier termination conditions, which imply: decidability properties efficiency This technique provides an algorithm for rewriting UCQs under IDs M. Lenzerini Data Integration DASI / 213
212 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Chasing the query Part 3: Query answering with constraints When chasing the data the termination condition would need to take into account the query We consider an alternative approach, based on the idea of a query chase Instead of chasing the data, we chase the query Is the dual notion of the database chase IDs are applied from right to left to the query atoms Advantage: much easier termination conditions, which imply: decidability properties efficiency This technique provides an algorithm for rewriting UCQs under IDs M. Lenzerini Data Integration DASI / 213
213 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Query rewriting under inclusion dependencies Part 3: Query answering with constraints Given a query q over the global schema G, we look for a rewriting rew of q expressed over S A rewriting rew is perfect if rew C = cert(q, I, C), for every source database C With a perfect rewriting, we can do query answering by rewriting We avoid actually constructing the retrieved global database M(C) M. Lenzerini Data Integration DASI / 213
214 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Rewriting rule for inclusion dependencies Part 3: Query answering with constraints Intuition: Use the IDs as basic rewriting rules Example Consider a query q = { (x, z) player(x, y, z) } and the constraint team[tleader, Tname] player[pname, Pteam] as a logic rule: player(w 3, w 4, w 1 ) team(w 1, w 2, w 3 ) We add to the rewriting the query q = { (x, z) team(x, y, z) } Definition Basic rewriting step: when an atom unifies with the head of the rule substitute the atom with the body of the rule M. Lenzerini Data Integration DASI / 213
215 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Rewriting rule for inclusion dependencies Part 3: Query answering with constraints Intuition: Use the IDs as basic rewriting rules Example Consider a query q = { (x, z) player(x, y, z) } and the constraint team[tleader, Tname] player[pname, Pteam] as a logic rule: player(w 3, w 4, w 1 ) team(w 1, w 2, w 3 ) We add to the rewriting the query q = { (x, z) team(x, y, z) } Definition Basic rewriting step: when an atom unifies with the head of the rule substitute the atom with the body of the rule M. Lenzerini Data Integration DASI / 213
216 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Rewriting rule for inclusion dependencies Part 3: Query answering with constraints Intuition: Use the IDs as basic rewriting rules Example Consider a query q = { (x, z) player(x, y, z) } and the constraint team[tleader, Tname] player[pname, Pteam] as a logic rule: player(w 3, w 4, w 1 ) team(w 1, w 2, w 3 ) We add to the rewriting the query q = { (x, z) team(x, y, z) } Definition Basic rewriting step: when an atom unifies with the head of the rule substitute the atom with the body of the rule M. Lenzerini Data Integration DASI / 213
217 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Query Rewriting for IDs Algorithm ID-rewrite Part 3: Query answering with constraints Iterative execution of: 1 Reduction: Atoms that unify with other atoms are eliminated and the unification is applied Variables that appear only once are marked 2 Basic rewriting step A rewriting step is applicable to an atom if it does not eliminate variables that appear somewhere else May introduce fresh variables Note: The algorithm works directly for unions of conjunctive queries (UCQs), and produces an UCQ as result M. Lenzerini Data Integration DASI / 213
218 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Query Rewriting for IDs Algorithm ID-rewrite Part 3: Query answering with constraints Iterative execution of: 1 Reduction: Atoms that unify with other atoms are eliminated and the unification is applied Variables that appear only once are marked 2 Basic rewriting step A rewriting step is applicable to an atom if it does not eliminate variables that appear somewhere else May introduce fresh variables Note: The algorithm works directly for unions of conjunctive queries (UCQs), and produces an UCQ as result M. Lenzerini Data Integration DASI / 213
219 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV The algorithm ID-rewrite Part 3: Query answering with constraints Input: relational schema G, set Ψ ID of IDs, UCQ Q Output: perfect rewriting of Q Q := Q; repeat Q aux := Q ; for each q Q aux do (a) for each g 1, g 2 body(q) do if g 1 and g 2 unify then Q := Q {τ(reduce(q, g 1, g 2 ))}; (b) for each g body(q) do for each ID Ψ ID do if ID is applicable to g then Q := Q { q[g/rewrite(g, ID)] } until Q aux = Q ; return Q M. Lenzerini Data Integration DASI / 213
220 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs Part 3: Query answering with constraints Properties of ID-rewrite ID-rewrite terminates ID-rewrite produces a perfect rewriting of the input query More precisely, let unf M (q) be the unfolding of the query q wrt the GAV mapping M Theorem unf M (ID-rewrite(q)) is a perfect rewriting of the query q Theorem Query answering in GAV systems under IDs is in PTime in data complexity (actually in LogSpace) M. Lenzerini Data Integration DASI / 213
221 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs Part 3: Query answering with constraints Properties of ID-rewrite ID-rewrite terminates ID-rewrite produces a perfect rewriting of the input query More precisely, let unf M (q) be the unfolding of the query q wrt the GAV mapping M Theorem unf M (ID-rewrite(q)) is a perfect rewriting of the query q Theorem Query answering in GAV systems under IDs is in PTime in data complexity (actually in LogSpace) M. Lenzerini Data Integration DASI / 213
222 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
223 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Query answering under IDs and KDs Part 3: Query answering with constraints We have already seen that in GAV systems under sound mappings Key dependencies may give rise to inconsistencies When M(C) violates the KDs, no legal database exists and query answering becomes trivial How do KDs interact with IDs? Theorem Query answering under IDs and KDs is undecidable Proof: By reduction from implication of IDs and FDs We need to look for syntactic restrictions on the form of the dependencies that ensures decidability M. Lenzerini Data Integration DASI / 213
224 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Query answering under IDs and KDs Part 3: Query answering with constraints We have already seen that in GAV systems under sound mappings Key dependencies may give rise to inconsistencies When M(C) violates the KDs, no legal database exists and query answering becomes trivial How do KDs interact with IDs? Theorem Query answering under IDs and KDs is undecidable Proof: By reduction from implication of IDs and FDs We need to look for syntactic restrictions on the form of the dependencies that ensures decidability M. Lenzerini Data Integration DASI / 213
225 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Query answering under IDs and KDs Part 3: Query answering with constraints We have already seen that in GAV systems under sound mappings Key dependencies may give rise to inconsistencies When M(C) violates the KDs, no legal database exists and query answering becomes trivial How do KDs interact with IDs? Theorem Query answering under IDs and KDs is undecidable Proof: By reduction from implication of IDs and FDs We need to look for syntactic restrictions on the form of the dependencies that ensures decidability M. Lenzerini Data Integration DASI / 213
226 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Query answering under IDs and KDs Part 3: Query answering with constraints We have already seen that in GAV systems under sound mappings Key dependencies may give rise to inconsistencies When M(C) violates the KDs, no legal database exists and query answering becomes trivial How do KDs interact with IDs? Theorem Query answering under IDs and KDs is undecidable Proof: By reduction from implication of IDs and FDs We need to look for syntactic restrictions on the form of the dependencies that ensures decidability M. Lenzerini Data Integration DASI / 213
227 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Non-key-conflicting IDs Part 3: Query answering with constraints Definition Non-key-conflicting IDs (NKCIDs) are of the form r 1 [ x 1 ] r 2 [ x 2 ] where x 2 is not a strict superset of key(r 2 ) Example Let r be of arity 3 and s of arity 4 with key(s) = {1, 2} The following are NKCIDs r[2] s[2], since {2} is a strict subset of key(s) r[2, 3] s[1, 2], since {1, 2} coincides with key(s) r[1, 2] s[2, 3], since 1 key(s) but 1 {2, 3} The following is not a NKCID: r[1, 2, 3] s[1, 2, 4] Note: Foreign keys (FKs) are a special case of NKCIDs M. Lenzerini Data Integration DASI / 213
228 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Non-key-conflicting IDs Part 3: Query answering with constraints Definition Non-key-conflicting IDs (NKCIDs) are of the form r 1 [ x 1 ] r 2 [ x 2 ] where x 2 is not a strict superset of key(r 2 ) Example Let r be of arity 3 and s of arity 4 with key(s) = {1, 2} The following are NKCIDs r[2] s[2], since {2} is a strict subset of key(s) r[2, 3] s[1, 2], since {1, 2} coincides with key(s) r[1, 2] s[2, 3], since 1 key(s) but 1 {2, 3} The following is not a NKCID: r[1, 2, 3] s[1, 2, 4] Note: Foreign keys (FKs) are a special case of NKCIDs M. Lenzerini Data Integration DASI / 213
229 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Non-key-conflicting IDs Part 3: Query answering with constraints Definition Non-key-conflicting IDs (NKCIDs) are of the form r 1 [ x 1 ] r 2 [ x 2 ] where x 2 is not a strict superset of key(r 2 ) Example Let r be of arity 3 and s of arity 4 with key(s) = {1, 2} The following are NKCIDs r[2] s[2], since {2} is a strict subset of key(s) r[2, 3] s[1, 2], since {1, 2} coincides with key(s) r[1, 2] s[2, 3], since 1 key(s) but 1 {2, 3} The following is not a NKCID: r[1, 2, 3] s[1, 2, 4] Note: Foreign keys (FKs) are a special case of NKCIDs M. Lenzerini Data Integration DASI / 213
230 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Separation for IDs and KDs Part 3: Query answering with constraints Theorem (IDs-KDs separation) Under KDs and NKCIDs, if M(C) satisfies the KDs, then the KDs can be ignored wrt certain answers of a user query q Intuition: For NKCIDs, when applying the ID-chase rule to a tuple t 1 r1 B, we can choose the tuple t 2 to introduce in r2 B so that it does not violate key(r 2 ): When key(r 2 ) x 2, fresh constants in t 2 are chosen for key attributes, and so there is no other tuple in r2 B coinciding with t 2 on all key attributes When key(r 2 ) = x 2, if there is already a tuple t in r2 B such that t 1 [ x 1 ] = t[ x 2 ], we choose t for t 2 Query answering becomes undecidable as soon as we extend the language of the IDs M. Lenzerini Data Integration DASI / 213
231 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Separation for IDs and KDs Part 3: Query answering with constraints Theorem (IDs-KDs separation) Under KDs and NKCIDs, if M(C) satisfies the KDs, then the KDs can be ignored wrt certain answers of a user query q Intuition: For NKCIDs, when applying the ID-chase rule to a tuple t 1 r1 B, we can choose the tuple t 2 to introduce in r2 B so that it does not violate key(r 2 ): When key(r 2 ) x 2, fresh constants in t 2 are chosen for key attributes, and so there is no other tuple in r2 B coinciding with t 2 on all key attributes When key(r 2 ) = x 2, if there is already a tuple t in r2 B such that t 1 [ x 1 ] = t[ x 2 ], we choose t for t 2 Query answering becomes undecidable as soon as we extend the language of the IDs M. Lenzerini Data Integration DASI / 213
232 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Separation for IDs and KDs Part 3: Query answering with constraints Theorem (IDs-KDs separation) Under KDs and NKCIDs, if M(C) satisfies the KDs, then the KDs can be ignored wrt certain answers of a user query q Intuition: For NKCIDs, when applying the ID-chase rule to a tuple t 1 r1 B, we can choose the tuple t 2 to introduce in r2 B so that it does not violate key(r 2 ): When key(r 2 ) x 2, fresh constants in t 2 are chosen for key attributes, and so there is no other tuple in r2 B coinciding with t 2 on all key attributes When key(r 2 ) = x 2, if there is already a tuple t in r2 B such that t 1 [ x 1 ] = t[ x 2 ], we choose t for t 2 Query answering becomes undecidable as soon as we extend the language of the IDs M. Lenzerini Data Integration DASI / 213
233 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Query processing under separable KDs and IDs Part 3: Query answering with constraints Global algorithm: 1 Verify consistency of M(C) with respect to KDs 2 Compute ID-rewrite of the input query 3 Unfold wrt M the query computed at previous step 4 Evaluate the unfolded query over the sources Note: The KD consistency check can be done by suitable CQs with inequality The computation of M(C) can be avoided (by unfolding the queries for the KD consistency check) M. Lenzerini Data Integration DASI / 213
234 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Query processing under separable KDs and IDs Part 3: Query answering with constraints Global algorithm: 1 Verify consistency of M(C) with respect to KDs 2 Compute ID-rewrite of the input query 3 Unfold wrt M the query computed at previous step 4 Evaluate the unfolded query over the sources Note: The KD consistency check can be done by suitable CQs with inequality The computation of M(C) can be avoided (by unfolding the queries for the KD consistency check) M. Lenzerini Data Integration DASI / 213
235 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Checking KD consistency Example Part 3: Query answering with constraints Relation: Key dependency: player[pname, Pteam] key(player) = {Pname} Query to check (in)consistency of the KD: q = { () player(x, y), player(x, z), y z } is true iff the instance of player violates the KD Mapping M: { (x, y) s 1 (x, y) s 2 (x, y) } player(x, y) Unfolding of q wrt M: { () s 1 (x, y), s 1 (x, z), y z s 1 (x, y), s 2 (x, z), y z s 2 (x, y), s 1 (x, z), y z s 2 (x, y), s 2 (x, z), y z } M. Lenzerini Data Integration DASI / 213
236 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Checking KD consistency Example Part 3: Query answering with constraints Relation: Key dependency: player[pname, Pteam] key(player) = {Pname} Query to check (in)consistency of the KD: q = { () player(x, y), player(x, z), y z } is true iff the instance of player violates the KD Mapping M: { (x, y) s 1 (x, y) s 2 (x, y) } player(x, y) Unfolding of q wrt M: { () s 1 (x, y), s 1 (x, z), y z s 1 (x, y), s 2 (x, z), y z s 2 (x, y), s 1 (x, z), y z s 2 (x, y), s 2 (x, z), y z } M. Lenzerini Data Integration DASI / 213
237 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Part 3: Query answering with constraints Query answering in GAV under separable IDs+KDs Theorem (Calì, Lembo & Rosati, PODS 03) Answering conjunctive queries in GAV systems under KDs and NKCIDs is in PTime in data complexity (actually in LogSpace ) Can we extend these results to more expressive user queries? The rewriting technique extends immediately to unions of CQs ID-rewrite(q 1 q n ) = ID-rewrite(q 1 ) ID-rewrite(q n ) This is not the case for recursive queries Theorem (Calvanese & Rosati KRDB 03) Answering recursive queries under KDs and FKs is undecidable Answering recursive queries under IDs is undecidable M. Lenzerini Data Integration DASI / 213
238 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Part 3: Query answering with constraints Query answering in GAV under separable IDs+KDs Theorem (Calì, Lembo & Rosati, PODS 03) Answering conjunctive queries in GAV systems under KDs and NKCIDs is in PTime in data complexity (actually in LogSpace ) Can we extend these results to more expressive user queries? The rewriting technique extends immediately to unions of CQs ID-rewrite(q 1 q n ) = ID-rewrite(q 1 ) ID-rewrite(q n ) This is not the case for recursive queries Theorem (Calvanese & Rosati KRDB 03) Answering recursive queries under KDs and FKs is undecidable Answering recursive queries under IDs is undecidable M. Lenzerini Data Integration DASI / 213
239 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs and KDs Part 3: Query answering with constraints Query answering in GAV under separable IDs+KDs Theorem (Calì, Lembo & Rosati, PODS 03) Answering conjunctive queries in GAV systems under KDs and NKCIDs is in PTime in data complexity (actually in LogSpace ) Can we extend these results to more expressive user queries? The rewriting technique extends immediately to unions of CQs ID-rewrite(q 1 q n ) = ID-rewrite(q 1 ) ID-rewrite(q n ) This is not the case for recursive queries Theorem (Calvanese & Rosati KRDB 03) Answering recursive queries under KDs and FKs is undecidable Answering recursive queries under IDs is undecidable M. Lenzerini Data Integration DASI / 213
240 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
241 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Query answering under IDs and EDs Part 3: Query answering with constraints Under EDs: Possibility of inconsistencies When M(C) violates the EDs, no legal database exists and query answering becomes trivial Under IDs and EDs: How do EDs and IDs interact? Is query answering separable? Is query answering decidable? M. Lenzerini Data Integration DASI / 213
242 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Query answering under IDs and EDs Part 3: Query answering with constraints Under EDs: Possibility of inconsistencies When M(C) violates the EDs, no legal database exists and query answering becomes trivial Under IDs and EDs: How do EDs and IDs interact? Is query answering separable? Is query answering decidable? M. Lenzerini Data Integration DASI / 213
243 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Exclusion dependencies Example Part 3: Query answering with constraints Global schema G: player(pname, YOB, Pteam) team(tname, Tcity, Tleader) coach(cname, Cteam) Constraints: Sources S: team[tleader, Tname] player[pname, Pteam] coach[cname] player[pname] = s 1 and s 3 store players s 2 stores teams s 4 stores coaches Mapping M: { (x, y, z) s 1 (x, y, z) s 3 (x, y, z) } player(x, y, z) { (x, y, z) s 2 (x, y, z) } team(x, y, z) { (x, y) s 4 (x, y, z) } coach(x, y) M. Lenzerini Data Integration DASI / 213
244 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Retrieved global db under EDs Example Source database C: s 1 : Totti 1971 Roma s 2 : Juve Torino Del Piero s 3 : Buffon 1978 Juve s 4 : Del Piero Viterbese Retrieved global database M(C): player : Totti 1971 Roma Buffon 1978 Juve coach : Del Piero team : Juve Torino Del Piero Viterbese M. Lenzerini Data Integration DASI / 213
245 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Retrieved global db under EDs Example Source database C: s 1 : Totti 1971 Roma s 2 : Juve Torino Del Piero s 3 : Buffon 1978 Juve s 4 : Del Piero Viterbese Retrieved global database M(C): player : Totti 1971 Roma Buffon 1978 Juve coach : Del Piero team : Juve Torino Del Piero Viterbese M. Lenzerini Data Integration DASI / 213
246 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Repair of retrieved global db under EDs Example Retrieved global database M(C): player : Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team : Juve Torino Del Piero coach : Del Piero Viterbese team[tleader, Tname] player[pname, Pteam] Violation of coach[cname] player[pname] = Can we detect such situations without actually constructing M(C)? M. Lenzerini Data Integration DASI / 213
247 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Repair of retrieved global db under EDs Example Retrieved global database M(C): player : Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team : Juve Torino Del Piero coach : Del Piero Viterbese Violation of team[tleader, Tname] player[pname, Pteam] Violation of coach[cname] player[pname] = Can we detect such situations without actually constructing M(C)? M. Lenzerini Data Integration DASI / 213
248 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Repair of retrieved global db under EDs Example Retrieved global database M(C): player : Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team : Juve Torino Del Piero coach : Del Piero Viterbese Repair of team[tleader, Tname] player[pname, Pteam] Violation of coach[cname] player[pname] = Can we detect such situations without actually constructing M(C)? M. Lenzerini Data Integration DASI / 213
249 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Repair of retrieved global db under EDs Example Retrieved global database M(C): player : Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team : Juve Torino Del Piero coach : Del Piero Viterbese Repair of team[tleader, Tname] player[pname, Pteam] Violation of coach[cname] player[pname] = Can we detect such situations without actually constructing M(C)? M. Lenzerini Data Integration DASI / 213
250 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Repair of retrieved global db under EDs Example Retrieved global database M(C): player : Totti 1971 Roma Buffon 1978 Juve Del Piero α Juve team : Juve Torino Del Piero coach : Del Piero Viterbese Repair of team[tleader, Tname] player[pname, Pteam] Violation of coach[cname] player[pname] = Can we detect such situations without actually constructing M(C)? M. Lenzerini Data Integration DASI / 213
251 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Deductive closure of EDs under IDs Example Part 3: Query answering with constraints Can we saturate (close) the EDs by adding all the EDs that are logical consequences of the EDs and IDs? Example From it follows that team[tleader, Tname] player[pname, Pteam] coach[cname] player[pname] = coach[cname] team[tleader] = This constraint is violated by the retrieved global database M(C) M. Lenzerini Data Integration DASI / 213
252 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Deductive closure of EDs under IDs Example Part 3: Query answering with constraints Can we saturate (close) the EDs by adding all the EDs that are logical consequences of the EDs and IDs? Example From it follows that team[tleader, Tname] player[pname, Pteam] coach[cname] player[pname] = coach[cname] team[tleader] = This constraint is violated by the retrieved global database M(C) M. Lenzerini Data Integration DASI / 213
253 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Deductive closure of EDs under IDs Part 3: Query answering with constraints Definition Derivation rule of EDs under EDs and IDs: From the ED r[i 1,..., i k ] s[j 1,..., j k ] = and the ID t[l 1,..., l k ] s[j 1,..., j k ] derive the ED r[i 1,..., i k ] t[l 1,..., l k ] = Corresponds to a simple application of resolution on the FOL sentences corresponding to EDs and IDs Theorem If the set of EDs is closed with respect to the above rule, it contains all EDs that are logical consequences of the initial EDs and IDs M. Lenzerini Data Integration DASI / 213
254 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Deductive closure of EDs under IDs Part 3: Query answering with constraints Definition Derivation rule of EDs under EDs and IDs: From the ED r[i 1,..., i k ] s[j 1,..., j k ] = and the ID t[l 1,..., l k ] s[j 1,..., j k ] derive the ED r[i 1,..., i k ] t[l 1,..., l k ] = Corresponds to a simple application of resolution on the FOL sentences corresponding to EDs and IDs Theorem If the set of EDs is closed with respect to the above rule, it contains all EDs that are logical consequences of the initial EDs and IDs M. Lenzerini Data Integration DASI / 213
255 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Query answering in GAV under IDs and EDs Part 3: Query answering with constraints Theorem (ID-ED Separation) Under IDs and EDs, if M(C) satisfies all EDs derived from the IDs and the original EDs then the EDs can be ignored wrt certain answers of a query We obtain a method for query answering in GAV under EDs and IDs: 1 Close the set of EDs with respect to the IDs 2 Verify consistency of M(C) with respect to EDs 3 Compute ID-rewrite of the input query 4 Unfold the query computed at the previous step 5 Evaluate the query over the sources The ED consistency check can be done by suitable CQs M. Lenzerini Data Integration DASI / 213
256 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Query answering in GAV under IDs and EDs Part 3: Query answering with constraints Theorem (ID-ED Separation) Under IDs and EDs, if M(C) satisfies all EDs derived from the IDs and the original EDs then the EDs can be ignored wrt certain answers of a query We obtain a method for query answering in GAV under EDs and IDs: 1 Close the set of EDs with respect to the IDs 2 Verify consistency of M(C) with respect to EDs 3 Compute ID-rewrite of the input query 4 Unfold the query computed at the previous step 5 Evaluate the query over the sources The ED consistency check can be done by suitable CQs M. Lenzerini Data Integration DASI / 213
257 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Query answering in GAV under IDs, KDs and EDs Part 3: Query answering with constraints Theorem (ID-KD-ED Separation) Under KDs, NKCIDs, and EDs, if M(C) satisfies all the KDs and satisfies all EDs derived from the IDs and the original EDs then the KDs and EDs can be ignored wrt certain answers of a query We obtain a method for query answering in GAV under KDs, NKCIDs, and EDs: 1 Close the set of EDs with respect to the IDs 2 Verify consistency of M(C) with respect to KDs and EDs 3 Compute ID-rewrite of the input query 4 Unfold the query computed at the previous step 5 Evaluate the query over the sources M. Lenzerini Data Integration DASI / 213
258 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Query answering in GAV under IDs, KDs and EDs Part 3: Query answering with constraints Theorem (ID-KD-ED Separation) Under KDs, NKCIDs, and EDs, if M(C) satisfies all the KDs and satisfies all EDs derived from the IDs and the original EDs then the KDs and EDs can be ignored wrt certain answers of a query We obtain a method for query answering in GAV under KDs, NKCIDs, and EDs: 1 Close the set of EDs with respect to the IDs 2 Verify consistency of M(C) with respect to KDs and EDs 3 Compute ID-rewrite of the input query 4 Unfold the query computed at the previous step 5 Evaluate the query over the sources M. Lenzerini Data Integration DASI / 213
259 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Query answ. in GAV under IDs, KDs and EDs Complexity Note: 1 Closing the set of EDs wrt the IDs is independent of the data 2 Consistency of M(C) wrt KDs and EDs can be verified through suitable queries over the source database C Theorem (Lembo & Rosati, 2004) Answering conjunctive queries in GAV systems under KDs, NKCIDs and EDs is in PTime in data complexity (actually in LogSpace ) M. Lenzerini Data Integration DASI / 213
260 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in GAV under IDs, KDs, and EDs Part 3: Query answering with constraints Query answ. in GAV under IDs, KDs and EDs Complexity Note: 1 Closing the set of EDs wrt the IDs is independent of the data 2 Consistency of M(C) wrt KDs and EDs can be verified through suitable queries over the source database C Theorem (Lembo & Rosati, 2004) Answering conjunctive queries in GAV systems under KDs, NKCIDs and EDs is in PTime in data complexity (actually in LogSpace ) M. Lenzerini Data Integration DASI / 213
261 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
262 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and integrity constraints Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
263 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and integrity constraints (G)LAV system with integrity constraints Part 3: Query answering with constraints We consider a data integration system I = G, S, M where G is a global schema with constraints M is a set of LAV mappings, whose assertions have the form φ S φ G and are interpreted as x. φ S ( x) φ G ( x) where φ S is a conjunctive query over S, and φ G is a conjunctive query over G Basic observation: Since G does have constraints, the canonical retrieved global database M(C) may not be legal for G M. Lenzerini Data Integration DASI / 213
264 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and integrity constraints (G)LAV system with integrity constraints Part 3: Query answering with constraints We consider a data integration system I = G, S, M where G is a global schema with constraints M is a set of LAV mappings, whose assertions have the form φ S φ G and are interpreted as x. φ S ( x) φ G ( x) where φ S is a conjunctive query over S, and φ G is a conjunctive query over G Basic observation: Since G does have constraints, the canonical retrieved global database M(C) may not be legal for G M. Lenzerini Data Integration DASI / 213
265 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and integrity constraints Part 3: Query answering with constraints Semantics of (G)LAV systems with integrity constraints Given a source db C, a global db B (over ) satisfies I relative to C if 1 it is legal wrt the global schema, i.e., it satisfies the ICs 2 it satisfies the mapping, i.e., B is a superset of the canonical retrieved global database M(C) (sound mappings) Recall: M(C) is obtained by evaluating, for each mapping assertion φ S φ G, the query φ S over C, and using the obtained tuples to populate the global relations according to φ G, using fresh constants for existentially quantified elements We are interested in certain answers to a query, i.e., those that hold for all global databases that satisfy I relative to C M. Lenzerini Data Integration DASI / 213
266 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and integrity constraints Part 3: Query answering with constraints Semantics of (G)LAV systems with integrity constraints Given a source db C, a global db B (over ) satisfies I relative to C if 1 it is legal wrt the global schema, i.e., it satisfies the ICs 2 it satisfies the mapping, i.e., B is a superset of the canonical retrieved global database M(C) (sound mappings) Recall: M(C) is obtained by evaluating, for each mapping assertion φ S φ G, the query φ S over C, and using the obtained tuples to populate the global relations according to φ G, using fresh constants for existentially quantified elements We are interested in certain answers to a query, i.e., those that hold for all global databases that satisfy I relative to C M. Lenzerini Data Integration DASI / 213
267 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and integrity constraints (G)LAV data integration systems with constraints Part 3: Query answering with constraints Constraints in G Type of mapping Incompleteness Inconsistency no GAV yes / no no no (G)LAV yes no IDs GAV yes no KDs GAV yes / no yes IDs + KDs GAV yes yes IDs (G)LAV yes no KDs (G)LAV yes yes IDs + KDs GAV yes yes M. Lenzerini Data Integration DASI / 213
268 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and integrity constraints Part 3: Query answering with constraints (G)LAV with constr. Incompleteness and inconsistency M. Lenzerini Data Integration DASI / 213
269 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under inclusion dependencies Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
270 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under inclusion dependencies (G)LAV systems under IDs Part 3: Query answering with constraints Under IDs only, we can exploit the previous results for GAV also for (G)LAV, by turning the (G)LAV mappings into GAV mappings: We transform a (G)LAV integration system I = G, S, M with IDs only into a GAV system I = G, S, M With respect to I, the transformed system I contains auxiliary IDs and auxiliary global relation symbols The transformation is query-preserving: For every conjunctive query q and for every source database C, the certain answers to q wrt I and C are equal to the certain answers to q wrt I and C M. Lenzerini Data Integration DASI / 213
271 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under inclusion dependencies Transforming LAV into GAV Example Part 3: Query answering with constraints Initial LAV mappings: s(x, y) { (x, y) r 1 (x, z), r 2 (y, w) } t(x, y) { (x, y) r 1 (x, z), r 3 (y, x) } We introduce two new global relations for each mapping assertion: s i /2, s e /4, and t i /2, t e /3 Transformed GAV mappings: { (x, y) s(x, y) } s i (x, y) { (x, y) t(x, y) } t i (x, y) Additional IDs generated by the transformation: s i [1, 2] s e [1, 2] s e [1, 3] r 1 [1, 2] s e [2, 4] r 2 [1, 2] t i [1, 2] t e [1, 2] t e [1, 3] r 1 [1, 2] t e [2, 1] r 3 [1, 2] M. Lenzerini Data Integration DASI / 213
272 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under inclusion dependencies Transforming LAV into GAV Example Part 3: Query answering with constraints Initial LAV mappings: s(x, y) { (x, y) r 1 (x, z), r 2 (y, w) } t(x, y) { (x, y) r 1 (x, z), r 3 (y, x) } We introduce two new global relations for each mapping assertion: s i /2, s e /4, and t i /2, t e /3 Transformed GAV mappings: { (x, y) s(x, y) } s i (x, y) { (x, y) t(x, y) } t i (x, y) Additional IDs generated by the transformation: s i [1, 2] s e [1, 2] s e [1, 3] r 1 [1, 2] s e [2, 4] r 2 [1, 2] t i [1, 2] t e [1, 2] t e [1, 3] r 1 [1, 2] t e [2, 1] r 3 [1, 2] M. Lenzerini Data Integration DASI / 213
273 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV systems under IDs Part 3: Query answering with constraints Method for query answering in a (G)LAV system I with IDs: 1 Transform I into a GAV system I 2 Apply the query answering method for GAV systems under IDs (The unfolding step must take into account the presence of auxiliary global symbols) Theorem Answering conjunctive queries in (G)LAV systems under IDs is in PTime in data complexity (actually in LogSpace ) M. Lenzerini Data Integration DASI / 213
274 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV systems under IDs Part 3: Query answering with constraints Method for query answering in a (G)LAV system I with IDs: 1 Transform I into a GAV system I 2 Apply the query answering method for GAV systems under IDs (The unfolding step must take into account the presence of auxiliary global symbols) Theorem Answering conjunctive queries in (G)LAV systems under IDs is in PTime in data complexity (actually in LogSpace ) M. Lenzerini Data Integration DASI / 213
275 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under IDs and EDs Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
276 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under IDs and EDs (G)LAV systems under IDs and EDs Part 3: Query answering with constraints What happens if we have also EDs in the global schema? The above transformation of (G)LAV into GAV is still correct in the presence of EDs It is thus possible to first turn the (G)LAV system into a GAV one and then compute query answering in the transformed system The addition of EDs is completely modular (we just need to add auxiliary steps in the query answering technique) M. Lenzerini Data Integration DASI / 213
277 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under IDs and EDs (G)LAV systems under IDs and EDs Part 3: Query answering with constraints What happens if we have also EDs in the global schema? The above transformation of (G)LAV into GAV is still correct in the presence of EDs It is thus possible to first turn the (G)LAV system into a GAV one and then compute query answering in the transformed system The addition of EDs is completely modular (we just need to add auxiliary steps in the query answering technique) M. Lenzerini Data Integration DASI / 213
278 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under IDs and EDs Part 3: Query answering with constraints Query answering in (G)LAV systems under IDs and EDs Method for query answering in a (G)LAV system I with IDs and EDs: 1 Transform I into a GAV system I 2 Apply the query answering method for GAV systems under IDs and EDs (The unfolding step must take into account the presence of auxiliary global symbols) Theorem Answering conjunctive queries in (G)LAV systems under IDs end Eds is in PTime in data complexity (actually in LogSpace ) M. Lenzerini Data Integration DASI / 213
279 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query answering in (G)LAV under IDs and EDs Part 3: Query answering with constraints Query answering in (G)LAV systems under IDs and EDs Method for query answering in a (G)LAV system I with IDs and EDs: 1 Transform I into a GAV system I 2 Apply the query answering method for GAV systems under IDs and EDs (The unfolding step must take into account the presence of auxiliary global symbols) Theorem Answering conjunctive queries in (G)LAV systems under IDs end Eds is in PTime in data complexity (actually in LogSpace ) M. Lenzerini Data Integration DASI / 213
280 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and key dependencies Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
281 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and key dependencies (G)LAV systems under KDs Part 3: Query answering with constraints We consider a (G)LAV system with only KDs in the global schema: The transformation of (G)LAV into GAV is still correct in the presence of KDs More precisely, starting from a (G)LAV system I with KDs, we obtain a GAV system I with KDs and IDs But in general, I is such that the IDs added by the transformation are key-conflicting IDs (i.e., these IDs are not NKCIDs), and hence the KDs are in general not separable Therefore, it is not possible to apply the query answering method for (G)LAV systems under separable KDs and IDs Question: Can we find some analogous query answering method based on query rewriting? M. Lenzerini Data Integration DASI / 213
282 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and key dependencies (G)LAV systems under KDs Part 3: Query answering with constraints We consider a (G)LAV system with only KDs in the global schema: The transformation of (G)LAV into GAV is still correct in the presence of KDs More precisely, starting from a (G)LAV system I with KDs, we obtain a GAV system I with KDs and IDs But in general, I is such that the IDs added by the transformation are key-conflicting IDs (i.e., these IDs are not NKCIDs), and hence the KDs are in general not separable Therefore, it is not possible to apply the query answering method for (G)LAV systems under separable KDs and IDs Question: Can we find some analogous query answering method based on query rewriting? M. Lenzerini Data Integration DASI / 213
283 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and key dependencies (G)LAV systems under KDs A negative result Part 3: Query answering with constraints Problem: KDs and LAV mappings derive new equality-generating dependencies (not simple KDs) Theorem (Duschka & al., 1998) Given a LAV data integration system I with KDs in the global schema and a conjunctive query q, in general there does not exist a first-order query rew such that rew C = cert(q, I, C) for every source database C In other words, in LAV with KDs, conjunctive queries are not first-order rewritable, and one would need to resort to more powerful relational query languages (e.g., Datalog) M. Lenzerini Data Integration DASI / 213
284 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and key dependencies Part 3: Query answering with constraints Data integration with constraints First-order rewritability Can query answering in integration systems be performed by first-order (UCQ) rewriting? GAV with IDs + EDs: yes GAV with IDs + KDs + EDs: only if KDs and IDs are separable LAV with IDs + EDs: yes LAV with KDs: no M. Lenzerini Data Integration DASI / 213
285 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance LAV systems and key dependencies Part 3: Query answering with constraints Data integration with constraints Complexity results EDs KDs IDs Data/Combined complexity no no general PTIME/PSPACE yes-no yes no PTIME/NP yes yes-no no PTIME/NP yes-no yes NKC PTIME/PSPACE yes no general PTIME/PSPACE yes-no yes 1KC undecidable yes-no yes general undecidable M. Lenzerini Data Integration DASI / 213
286 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Outline Part 3: Query answering with constraints 5 The role of global integrity constraints 6 Query answering in GAV with constraints Incompleteness and inconsistency in GAV systems Query answering in GAV under inclusion dependencies Rewriting CQs under inclusion dependencies in GAV Query answering in GAV under IDs and KDs Query answering in GAV under IDs, KDs, and EDs 7 Query answering in LAV with constraints LAV systems and integrity constraints Query answering in (G)LAV under inclusion dependencies Query answering in (G)LAV under IDs and EDs LAV systems and key dependencies 8 Inconsistency tolerance M. Lenzerini Data Integration DASI / 213
287 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Inconsistency tolerance Part 3: Query answering with constraints Under the classical (i.e., first-order) semantics considered so far: if data at the sources violate (through the mapping) a single KD or ED, the integration system has no legal databases (i.e., no models) consequently, the certain answers to any query of arity n are all the n-tuples of constants of Γ (ex falso quodlibet) non-interesting case for query answering M. Lenzerini Data Integration DASI / 213
288 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Inconsistency tolerance: example Part 3: Query answering with constraints G = {r/2, key(r) = {1}} S = {s/2, t/2} M = {s(x, Y ) r(x, Y ), t(x, Y ) r(x, Y )} C = {s(a, b), t(a, c)} Query: q = {(X) r(x, Y )} There is no database in sem C (I) (M(C) violates the KD on r) cert(q, I, C) = {c c } However: we would like the only certain answer to q to be a M. Lenzerini Data Integration DASI / 213
289 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Consistent query answering (CQA) Part 3: Query answering with constraints Study of methods and techniques for repairing a database instance that is inconsistent with the integrity constraints declared on its schema [Arenas et al., 2000] Peculiarity of CQA: repairing is virtual and based on a logical/declarative semantics Therefore, data are not changed, not a material repairing (as in data cleaning) The CQA principles and methods can be extended to data integration scenarios In the following, we only consider GAV mapping M. Lenzerini Data Integration DASI / 213
290 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance The loosely-sound semantics Part 3: Query answering with constraints We introduce one particular semantics for inconsistency tolerance in GAV integration systems, the loosely-sound semantics [Calì et al., 2005] the loosely-sound semantics principle: add as much as you like (as with sound semantics), and throw away only a minimal set of tuples Let B 1 and B 2 be two global databases that satisfy constraints on the global schema. Then, B 1 is better than B 2, denoted B 1 (I,D) B 2, iff B 1 M(C) B 2 M(C) The answers cert l (Q, I, D) to a query are those that are true on all best legal global databases w.r.t. (I,D) M. Lenzerini Data Integration DASI / 213
291 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example Part 3: Query answering with constraints Global schema: Constraints: player(pname, YOB, Pteam) team(tname, Tcity, Tleader) team[tleader, Tname] player[pname, Pteam] key(player) = {Pname} { s1 (X, Y, Z) player(x, Y, Z) Mapping: player : s 3 (X, Y, Z) player(x, Y, Z) team : s 2 (X, Y, Z) team(x, Y, Z) M. Lenzerini Data Integration DASI / 213
292 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example (continued) Part 3: Query answering with constraints Source database C s 1 : Totti 1971 Roma Vieri 1950 Inter s 2 : Juve Torino Del Piero s 3 : Vieri 1970 Inter M. Lenzerini Data Integration DASI / 213
293 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example (IDs KDs) Part 3: Query answering with constraints Retrieved global database M(C) player: Totti 1971 Roma Vieri 1970 Inter Vieri 1950 Inter team: Juve Torino Del Piero In M(C) there is a violation of the KD and a violation of the ID There are two possible ways of repairing the violation of the KD with a minimum deletion of tuples. M. Lenzerini Data Integration DASI / 213
294 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example (continued) Part 3: Query answering with constraints First form player: Totti 1971 Roma Vieri 1970 Inter Del Piero α Juve team: Juve Torino Del Piero Second form player: Totti 1971 Roma Vieri 1950 Inter Del Piero α Juve team: Juve Torino Del Piero For query q = {(X, Z) player(x, Y, Z)} we have cert l (q, I, C) = { Totti, Roma, Vieri, Inter, Del Piero, Juve } M. Lenzerini Data Integration DASI / 213
295 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints Query rewriting under the loosely-sound semantics Query language: Datalog under stable model semantics Rewriting under KDs: set of rules Π KD that take KDs into account for each KD key(r) = {X 1,... X n } in G: r( x, y) r D ( x, y), not r( x, y) r( x, y) r D ( x, y), r( x, z), Y 1 Z 1 r( x, y) r D ( x, y), r( x, z), Y m Z m where x = X 1,..., X n, y = Y 1,..., Y m and z = Z 1,..., Z m M. Lenzerini Data Integration DASI / 213
296 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints Query rewriting under the loosely-sound semantics Theorem Π ID Π KD Π MC is a perfect rewriting of q, where: Π MC = rules obtained from the mapping rules Π M by replacing each r with r C Π ID = rewriting of the query q obtained by the algorithm ID-rewrite Remark: Π ID Π KD Π MC is a Datalog program (and is interpreted under stable model semantics) Note: a stable model of a Datalog program Π is any set σ of ground atoms that coincides with the unique minimal Herbrand model of the Datalog progam Π σ, where Π σ is obtained from Π by deleting every rule that has a negative literal B with B σ, and all negative literals in the bodies of the remaining rules. M. Lenzerini Data Integration DASI / 213
297 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints What about Exclusion Dependencies (EDs)? We extend the previous example with an ED: Global schema: player(pname, YOB, Pteam) team(tname, Tcity, Tleader) coach(cname, Cteam) Constraints: team[tleader, Tname] player[pname, Pteam] coach[cname] player[pname] = key(player) = {Pname, Pteam} key(team) = {Tname} key(coach) = {Cname} Mapping: player(x, Y, Z) s 1 (X, Y, Z) player(x, Y, Z) s 3 (X, Y, Z) team(x, Y, Z) s 2 (X, Y, Z) coach(x, Y ) s 4 (X, Y ) M. Lenzerini Data Integration DASI / 213
298 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example (continued) Part 3: Query answering with constraints Source database C s 1 : Totti 1971 Roma s 2 : Juve Torino Del Piero s 3 : Vieri 1970 Inter s 4 : Del Piero Viterbese M. Lenzerini Data Integration DASI / 213
299 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example IDs EDs Part 3: Query answering with constraints Retrieved global database M(C) player: Totti 1971 Roma Vieri 1970 Inter team: Juve Torino Del Piero coach: Del Piero Viterbese There are two possible ways of repairing the violation with a minimum deletion of tuples. M. Lenzerini Data Integration DASI / 213
300 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example (continued) Part 3: Query answering with constraints First form player: Totti 1971 Roma Vieri 1950 Inter Del Piero α Juve team: Juve Torino Del Piero Second form player: Totti 1971 Roma Vieri 1950 Inter coach: Del Piero Viterbese For the query q = {(X, Z) player(x, Y, Z)} we have cert l (q, I, D) = { Totti, Roma, Vieri, Inter } M. Lenzerini Data Integration DASI / 213
301 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Query rewriting for EDs Part 3: Query answering with constraints Set of rules Π ED that take EDs into account For each exclusion dependency r[a] s[b] = in the closure of EDs wrt logical implication by IDs and EDs: r( x, y) r D ( x, y), not r( x, y) s( x, y) s D ( x, y), not s( x, y) r( x, y) r D ( x, y), s( x, z) s( x, y) s D ( x, y), r( x, z) where in r( x, z) the variables in x correspond to the sequence of attributes A of r, and in s( x, z) the variables in x correspond to the sequence of attributes B of s. M. Lenzerini Data Integration DASI / 213
302 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints Query rewriting under the loosely-sound semantics: IDs, KDs and EDs Theorem Π ID Π KD Π ED Π MC is a perfect rewriting of Q. M. Lenzerini Data Integration DASI / 213
303 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example Part 3: Query answering with constraints Global schema: player(pname, YOB, Pteam) team(tname, Tcity, Tleader) coach(cname, Cteam) Constraints: team[tleader, Tname] player[pname, Pteam] coach[cname] player[pname] = key(player) = {Pname, Pteam} key(team) = {Tname} key(coach) = {Cname} Mapping: player(x, Y, Z) s 1 (X, Y, Z) player(x, Y, Z) s 3 (X, Y, Z) team(x, Y, Z) s 2 (X, Y, Z) coach(x, Y ) s 4 (X, Y ) Query: q = {(X, Z) player(x, Y, Z)} M. Lenzerini Data Integration DASI / 213
304 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example (continued) Part 3: Query answering with constraints Rewriting of the query q: q(x, Z) player(x, Y, Z) q(x, Z) team(z, Y, X) player(x, Y, Z) player D (X, Y, Z), not player(x, Y, Z) player(x, Y, Z) player D (X, Y, Z), player(x, W, Z), Y W team(x, Y, Z) team D (X, Y, Z), not team(x, Y, Z) team(x, Y, Z) team D (X, Y, Z), team(x, V, W ), Y V team(x, Y, Z) team D (X, Y, Z), team(x, V, W ), Z W coach(x, Y ) coach D (X, Y ), not coach(x, Y ) coach(x, Y ) coach D (X, Y ), coach(x, Z), Y Z M. Lenzerini Data Integration DASI / 213
305 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Example (continued) Part 3: Query answering with constraints Rewriting of the query q (continued): player(x, Y, Z) player D (X, Y, Z), coach(x, V ) coach(x, Y ) coach D (X, Y ), player(x, Z, V ) coach(x, Y ) coach D (X, Y ), team(z, V, X) team(x, Y, Z) team D (X, Y, Z), coach(z, V ) player D (X, Y, Z) s 1 (X, Y, Z) player D (X, Y, Z) s 3 (X, Y, Z) team D (X, Y, Z) s 2 (X, Y, Z) coach D (X, Y ) s 4 (X, Y ) M. Lenzerini Data Integration DASI / 213
306 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Part 3: Query answering with constraints Complexity of consistent query answering With respect to standard strictly-sound semantics, the loosely-sound semantics adds complexity to query answering Intuitive explanation: the number of repairs of a system with even a single KD may be exponential in the size of the source database Consequence: query answering is not tractable in data complexity (while it is tractable under strictly-sound semantics) Such increase of complexity is general (shared by all approaches to consistent query answering) M. Lenzerini Data Integration DASI / 213
307 Constraints QA in GAV with constraints QA in LAV with constraints Inconsistency tolerance Summary of complexity results Part 3: Query answering with constraints EDs KDs IDs strictly-sound loosely-sound no no GEN PTIME/PSPACE PTIME/PSPACE yes-no yes no PTIME/NP conp/π p 2 yes yes-no no PTIME/NP conp/π p 2 yes-no yes NKC PTIME/PSPACE conp/pspace yes no GEN PTIME/PSPACE conp/pspace yes-no yes 1KC undecidable undecidable yes-no yes GEN undecidable undecidable Data/Combined complexity M. Lenzerini Data Integration DASI / 213
308 Concluding remarks Part 4: Conclusions Part IV Concluding remarks M. Lenzerini Data Integration DASI / 213
309 Concluding remarks Outline Part 4: Conclusions 9 Concluding remarks M. Lenzerini Data Integration DASI / 213
310 Concluding remarks Outline Part 4: Conclusions 9 Concluding remarks M. Lenzerini Data Integration DASI / 213
311 Concluding remarks Further issues and open problems Part 4: Conclusions Further forms of constraints, e.g., KDs with restricted forms of key-conflicting IDs ontology languages, description logics, RDF [Calvanese & al. PODS 98, Calvanese & al., KR 06] Semistructured data and XML constraints (DTDs, XML Schema,... ) query languages (transitive closure) Finite models vs. unrestricted models [Rosati, PODS 06] Data exchange and materialization M. Lenzerini Data Integration DASI / 213
312 Concluding remarks Acknowledgements Part 4: Conclusions Andrea Calì Diego Calvanese Giuseppe De Giacomo Domenico Lembo Riccardo Rosati Moshe Y. Vardi M. Lenzerini Data Integration DASI / 213
313 Concluding remarks References I Part 4: Conclusions S. Abiteboul and O. Duschka. Complexity of answering queries using materialized views. In Proc. of PODS 98, pages , A. Calì, D. Calvanese, G. De Giacomo, and M. Lenzerini. On the expressive power of data integration systems. In Proc. of ER 2002, A. Calì, D. Calvanese, G. De Giacomo, and M. Lenzerini. On the role of integrity constraints in data integration. IEEE Bull. on Data Engineering, 25(3):39 45, A. Calì, D. Lembo, and R. Rosati. Query rewriting and answering under constraints in data integration systems. In Proc. of IJCAI 2003, pages 16 21, M. Lenzerini Data Integration DASI / 213
314 Concluding remarks References II Part 4: Conclusions D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Data complexity of query answering in description logics. In Proc. of KR 2006, D. Calvanese, G. De Giacomo, and M. Lenzerini. On the decidability of query containment under constraints. In Proc. of PODS 98, pages , D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. View-based query processing: On the relationship between rewriting, answering and losslessness. In Proc. of ICDT 2005, volume 3363 of LNCS, pages Springer, D. Calvanese and R. Rosati. Answering recursive queries under keys and foreign keys is undecidable. In Proc. of KRDB CEUR Electronic Workshop Proceedings, M. Lenzerini Data Integration DASI / 213
315 Concluding remarks References III Part 4: Conclusions O. M. Duschka, M. R. Genesereth, and A. Y. Levy. Recursive query plans for data integration. J. of Logic Programming, 43(1):49 73, A. Fuxman and R. J. Miller. First-order query rewriting for inconsistent databases. In Proc. of ICDT 2005, volume 3363 of LNCS, pages Springer, G. Grahne and A. O. Mendelzon. Tableau techniques for querying information sources through global schemas. In Proc. of ICDT 99, volume 1540 of LNCS, pages Springer, A. Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4): , M. Lenzerini Data Integration DASI / 213
316 Concluding remarks References IV Part 4: Conclusions M. Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS 2002, pages , N. Leone, T. Eiter, W. Faber, M. Fink, G. Gottlob, G. Greco, E. Kalka, G. Ianni, D. Lembo, V. Lio, B. Nowicki, R. Rosati, M. Ruzzi, W. Staniszkis, and G. Terracina. Boosting information integration: The INFOMIX system. In Proc. of SEBD 2005, pages 55 66, A. Y. Levy, A. Rajaraman, and J. J. Ordille. Query answering algorithms for information agents. In Proc. of AAAI 96, pages 40 47, M. Lenzerini Data Integration DASI / 213
317 Concluding remarks References V Part 4: Conclusions R. Rosati. On the decidability and finite controllability of query processing in databases with incomplete information. In Proc. of PODS 2006, M. Lenzerini Data Integration DASI / 213
Query Processing in Data Integration Systems
Query Processing in Data Integration Systems Diego Calvanese Free University of Bozen-Bolzano BIT PhD Summer School Bressanone July 3 7, 2006 D. Calvanese Data Integration BIT PhD Summer School 1 / 152
A Tutorial on Data Integration
A Tutorial on Data Integration Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Antonio Ruberti, Sapienza Università di Roma DEIS 10 - Data Exchange, Integration, and Streaming November 7-12,
Data Integration: A Theoretical Perspective
Data Integration: A Theoretical Perspective Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Via Salaria 113, I 00198 Roma, Italy [email protected] ABSTRACT
Chapter 17 Using OWL in Data Integration
Chapter 17 Using OWL in Data Integration Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, and Marco Ruzzi Abstract One of the outcomes of the research work carried
Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange
Data Integration and Exchange L. Libkin 1 Data Integration and Exchange Traditional approach to databases A single large repository of data. Database administrator in charge of access to data. Users interact
Data Quality in Information Integration and Business Intelligence
Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies
Structured and Semi-Structured Data Integration
UNIVERSITÀ DEGLI STUDI DI ROMA LA SAPIENZA DOTTORATO DI RICERCA IN INGEGNERIA INFORMATICA XIX CICLO 2006 UNIVERSITÉ DE PARIS SUD DOCTORAT DE RECHERCHE EN INFORMATIQUE Structured and Semi-Structured Data
Enterprise Modeling and Data Warehousing in Telecom Italia
Enterprise Modeling and Data Warehousing in Telecom Italia Diego Calvanese Faculty of Computer Science Free University of Bolzano/Bozen Piazza Domenicani 3 I-39100 Bolzano-Bozen BZ, Italy Luigi Dragone,
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
Virtual Data Integration
Virtual Data Integration Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada www.scs.carleton.ca/ bertossi [email protected] Chapter 1: Introduction and Issues 3 Data
OWL based XML Data Integration
OWL based XML Data Integration Manjula Shenoy K Manipal University CSE MIT Manipal, India K.C.Shet, PhD. N.I.T.K. CSE, Suratkal Karnataka, India U. Dinesh Acharya, PhD. ManipalUniversity CSE MIT, Manipal,
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Extended Abstract Ioanna Koffina 1, Giorgos Serfiotis 1, Vassilis Christophides 1, Val Tannen
Query Answering in Peer-to-Peer Data Exchange Systems
Query Answering in Peer-to-Peer Data Exchange Systems Leopoldo Bertossi and Loreto Bravo Carleton University, School of Computer Science, Ottawa, Canada {bertossi,lbravo}@scs.carleton.ca Abstract. The
Peer Data Exchange. ACM Transactions on Database Systems, Vol. V, No. N, Month 20YY, Pages 1 0??.
Peer Data Exchange Ariel Fuxman 1 University of Toronto Phokion G. Kolaitis 2 IBM Almaden Research Center Renée J. Miller 1 University of Toronto and Wang-Chiew Tan 3 University of California, Santa Cruz
Peer Data Management Systems Concepts and Approaches
Peer Data Management Systems Concepts and Approaches Armin Roth HPI, Potsdam, Germany Nov. 10, 2010 Armin Roth (HPI, Potsdam, Germany) Peer Data Management Systems Nov. 10, 2010 1 / 28 Agenda 1 Large-scale
Integrating and Exchanging XML Data using Ontologies
Integrating and Exchanging XML Data using Ontologies Huiyong Xiao and Isabel F. Cruz Department of Computer Science University of Illinois at Chicago {hxiao ifc}@cs.uic.edu Abstract. While providing a
Repair Checking in Inconsistent Databases: Algorithms and Complexity
Repair Checking in Inconsistent Databases: Algorithms and Complexity Foto Afrati 1 Phokion G. Kolaitis 2 1 National Technical University of Athens 2 UC Santa Cruz and IBM Almaden Research Center Oxford,
A Framework and Architecture for Quality Assessment in Data Integration
A Framework and Architecture for Quality Assessment in Data Integration Jianing Wang March 2012 A Dissertation Submitted to Birkbeck College, University of London in Partial Fulfillment of the Requirements
Data Integration Using ID-Logic
Data Integration Using ID-Logic Bert Van Nuffelen 1, Alvaro Cortés-Calabuig 1, Marc Denecker 1, Ofer Arieli 2, and Maurice Bruynooghe 1 1 Department of Computer Science, Katholieke Universiteit Leuven,
Fixed-Point Logics and Computation
1 Fixed-Point Logics and Computation Symposium on the Unusual Effectiveness of Logic in Computer Science University of Cambridge 2 Mathematical Logic Mathematical logic seeks to formalise the process of
An introduction to data integration. Prof. Letizia Tanca Technologies for Information Systems
An introduction to data integration Prof. Letizia Tanca Technologies for Information Systems 1 Motivation In modern Information Systems there is a growing quest for achieving integration of SW applications,
Schema Mediation in Peer Data Management Systems
Schema Mediation in Peer Data Management Systems Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov University of Washington Seattle, WA, USA 98195-2350 {alon,zives,suciu,igor}@cs.washington.edu Abstract
Semantic Technologies for Data Integration using OWL2 QL
Semantic Technologies for Data Integration using OWL2 QL ESWC 2009 Tutorial Domenico Lembo Riccardo Rosati Dipartimento di Informatica e Sistemistica Sapienza Università di Roma, Italy 6th European Semantic
Relational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute
Introduction to tuple calculus Tore Risch 2011-02-03
Introduction to tuple calculus Tore Risch 2011-02-03 The relational data model is based on considering normalized tables as mathematical relationships. Powerful query languages can be defined over such
On Rewriting and Answering Queries in OBDA Systems for Big Data (Short Paper)
On Rewriting and Answering ueries in OBDA Systems for Big Data (Short Paper) Diego Calvanese 2, Ian Horrocks 3, Ernesto Jimenez-Ruiz 3, Evgeny Kharlamov 3, Michael Meier 1 Mariano Rodriguez-Muro 2, Dmitriy
HANDLING INCONSISTENCY IN DATABASES AND DATA INTEGRATION SYSTEMS
HANDLING INCONSISTENCY IN DATABASES AND DATA INTEGRATION SYSTEMS by Loreto Bravo A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree
Integrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Robust Module-based Data Management
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. V, NO. N, MONTH YEAR 1 Robust Module-based Data Management François Goasdoué, LRI, Univ. Paris-Sud, and Marie-Christine Rousset, LIG, Univ. Grenoble
Schema Mappings and Data Exchange
Schema Mappings and Data Exchange Phokion G. Kolaitis University of California, Santa Cruz & IBM Research-Almaden EASSLC 2012 Southwest University August 2012 1 Logic and Databases Extensive interaction
Monitoring Metric First-order Temporal Properties
Monitoring Metric First-order Temporal Properties DAVID BASIN, FELIX KLAEDTKE, SAMUEL MÜLLER, and EUGEN ZĂLINESCU, ETH Zurich Runtime monitoring is a general approach to verifying system properties at
SPARQL: Un Lenguaje de Consulta para la Web
SPARQL: Un Lenguaje de Consulta para la Web Semántica Marcelo Arenas Pontificia Universidad Católica de Chile y Centro de Investigación de la Web M. Arenas SPARQL: Un Lenguaje de Consulta para la Web Semántica
Relational Calculus. Module 3, Lecture 2. Database Management Systems, R. Ramakrishnan 1
Relational Calculus Module 3, Lecture 2 Database Management Systems, R. Ramakrishnan 1 Relational Calculus Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC). Calculus
Comparing Data Integration Algorithms
Comparing Data Integration Algorithms Initial Background Report Name: Sebastian Tsierkezos [email protected] ID :5859868 Supervisor: Dr Sandra Sampaio School of Computer Science 1 Abstract The problem
CHAPTER 7 GENERAL PROOF SYSTEMS
CHAPTER 7 GENERAL PROOF SYSTEMS 1 Introduction Proof systems are built to prove statements. They can be thought as an inference machine with special statements, called provable statements, or sometimes
CSE 233. Database System Overview
CSE 233 Database System Overview 1 Data Management An evolving, expanding field: Classical stand-alone databases (Oracle, DB2, SQL Server) Computer science is becoming data-centric: web knowledge harvesting,
Logic Programs for Consistently Querying Data Integration Systems
IJCAI 03 1 Acapulco, August 2003 Logic Programs for Consistently Querying Data Integration Systems Loreto Bravo Computer Science Department Pontificia Universidad Católica de Chile [email protected] Joint
Logical and categorical methods in data transformation (TransLoCaTe)
Logical and categorical methods in data transformation (TransLoCaTe) 1 Introduction to the abbreviated project description This is an abbreviated project description of the TransLoCaTe project, with an
Constraint-Based XML Query Rewriting for Data Integration
Constraint-Based XML Query Rewriting for Data Integration Cong Yu Department of EECS, University of Michigan [email protected] Lucian Popa IBM Almaden Research Center [email protected] ABSTRACT
Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration
DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries
Aikaterini Marazopoulou
Imperial College London Department of Computing Tableau Compiled Labelled Deductive Systems with an application to Description Logics by Aikaterini Marazopoulou Submitted in partial fulfilment of the requirements
Chapter 5. SQL: Queries, Constraints, Triggers
Chapter 5 SQL: Queries, Constraints, Triggers 1 Overview: aspects of SQL DML: Data Management Language. Pose queries (Ch. 5) and insert, delete, modify rows (Ch. 3) DDL: Data Definition Language. Creation,
Designing and Refining Schema Mappings via Data Examples
Designing and Refining Schema Mappings via Data Examples Bogdan Alexe UCSC [email protected] Balder ten Cate UCSC [email protected] Phokion G. Kolaitis UCSC & IBM Research - Almaden [email protected]
Translating SQL into the Relational Algebra
Translating SQL into the Relational Algebra Jan Van den Bussche Stijn Vansummeren Required background Before reading these notes, please ensure that you are familiar with (1) the relational data model
Integrating Pattern Mining in Relational Databases
Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a
! " # The Logic of Descriptions. Logics for Data and Knowledge Representation. Terminology. Overview. Three Basic Features. Some History on DLs
,!0((,.+#$),%$(-&.& *,2(-$)%&2.'3&%!&, Logics for Data and Knowledge Representation Alessandro Agostini [email protected] University of Trento Fausto Giunchiglia [email protected] The Logic of Descriptions!$%&'()*$#)
Query Management in Data Integration Systems: the MOMIS approach
Dottorato di Ricerca in Computer Engineering and Science Scuola di Dottorato in Information and Communication Technologies XXI Ciclo Università degli Studi di Modena e Reggio Emilia Dipartimento di Ingegneria
3. Relational Model and Relational Algebra
ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra
SQL: Queries, Programming, Triggers
SQL: Queries, Programming, Triggers CSC343 Introduction to Databases - A. Vaisman 1 R1 Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,
Database Systems. Lecture Handout 1. Dr Paolo Guagliardo. University of Edinburgh. 21 September 2015
Database Systems Lecture Handout 1 Dr Paolo Guagliardo University of Edinburgh 21 September 2015 What is a database? A collection of data items, related to a specific enterprise, which is structured and
On the Decidability and Complexity of Query Answering over Inconsistent and Incomplete Databases
On the Decidability and Complexity of Query Answering over Inconsistent and Incomplete Databases Andrea Calì Domenico Lembo Riccardo Rosati Dipartimento di Informatica e Sistemistica Università di Roma
Extending Data Processing Capabilities of Relational Database Management Systems.
Extending Data Processing Capabilities of Relational Database Management Systems. Igor Wojnicki University of Missouri St. Louis Department of Mathematics and Computer Science 8001 Natural Bridge Road
Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts
Distributed Databases Basic Concepts Distributed Databases Concepts. Advantages and disadvantages of distributed databases. Functions and architecture for a DDBMS. Distributed database design. Levels of
2. The Language of First-order Logic
2. The Language of First-order Logic KR & R Brachman & Levesque 2005 17 Declarative language Before building system before there can be learning, reasoning, planning, explanation... need to be able to
Relational Algebra and SQL
Relational Algebra and SQL Johannes Gehrke [email protected] http://www.cs.cornell.edu/johannes Slides from Database Management Systems, 3 rd Edition, Ramakrishnan and Gehrke. Database Management
Semantic Description of Distributed Business Processes
Semantic Description of Distributed Business Processes Authors: S. Agarwal, S. Rudolph, A. Abecker Presenter: Veli Bicer FZI Forschungszentrum Informatik, Karlsruhe Outline Motivation Formalism for Modeling
ECS 165A: Introduction to Database Systems
ECS 165A: Introduction to Database Systems Todd J. Green based on material and slides by Michael Gertz and Bertram Ludäscher Winter 2011 Dept. of Computer Science UC Davis ECS-165A WQ 11 1 1. Introduction
