Repair Checking in Inconsistent Databases: Algorithms and Complexity
|
|
|
- Eleanor Ball
- 9 years ago
- Views:
Transcription
1 Repair Checking in Inconsistent Databases: Algorithms and Complexity Foto Afrati 1 Phokion G. Kolaitis 2 1 National Technical University of Athens 2 UC Santa Cruz and IBM Almaden Research Center Oxford, January 2009
2 Coping with Inconsistent Databases Inconsistent databases arise in a variety of contexts and for different reasons: In data warehousing of heterogeneous data obeying different integrity constraints. In ETL applications, where data has to be cleansed" before it can be processed. For lack of support of particular integrity constraints....
3 Coping with Inconsistent Databases Inconsistent databases arise in a variety of contexts and for different reasons: In data warehousing of heterogeneous data obeying different integrity constraints. In ETL applications, where data has to be cleansed" before it can be processed. For lack of support of particular integrity constraints.... Database repairs provide a framework for coping with inconsistent databases in a principled way and without cleansing" dirty data first.
4 Database Repairs Definition (Arenas, Bertossi, Chomicki 1999) Σ a set of integrity constraints and r an inconsistent database. A database r is a repair of r w.r.t. Σ if r is a consistent database (i.e., r = Σ); r differs from r in a minimal way.
5 Database Repairs Definition (Arenas, Bertossi, Chomicki 1999) Σ a set of integrity constraints and r an inconsistent database. A database r is a repair of r w.r.t. Σ if r is a consistent database (i.e., r = Σ); r differs from r in a minimal way. Fact Several different types of repairs have been considered: Subset-repairs; -repairs (symmetric-difference-repairs); Cardinality-based repairs; Attribute-based repairs.
6 Types of Repairs Definition Σ a set of integrity constraints and r an inconsistent database. r is a subset-repair of r w.r.t. Σ if r r, r = Σ, and there is no r such that r r r and r = Σ. r is a -repair of r w.r.t. Σ if r = Σ and there is no r such that r r r r and r = Σ.
7 Types of Repairs Definition Σ a set of integrity constraints and r an inconsistent database. r is a subset-repair of r w.r.t. Σ if r r, r = Σ, and there is no r such that r r r and r = Σ. r is a -repair of r w.r.t. Σ if r = Σ and there is no r such that r r r r and r = Σ. Fact If r r, then r is a subset-repair of r if and only if r is a -repair of r. If Σ is a set of functional dependencies, then every -repair is also a subset-repair.
8 Example Relation schema R, instance r = {R(a, b), R(a, c), R(b, c)} Σ = {R(x, y) R(x, z) y = z} r has two -repairs (and subset repairs) w.r.t. Σ: r 1 = {R(a, b), R(b, c)} and r 2 = {R(a, c), R(b, c)}.
9 Example Relation schema R, instance r = {R(a, b), R(a, c), R(b, c)} Σ = {R(x, y) R(x, z) y = z} r has two -repairs (and subset repairs) w.r.t. Σ: r 1 = {R(a, b), R(b, c)} and r 2 = {R(a, c), R(b, c)}. Σ = {R(x, y) R(y, x)} r has eight -repairs w.r.t. Σ : r 1 = r 2 = {R(a, b), R(b, a)} r 3 = {R(a, b), R(b, a), R(a, c), R(c, a)}...
10 Possible Worlds and Certain Answers Definition Suppose that with every instance r over some schema S, we have associated a set W(r) of instances over some other (possibly different) schema T (the set of possible worlds of r). If q is a query over T, then the certain answers of q on r w.r.t. W(r) is certain(q, r, W(r)) = {q(r ) : r W(r)}. Note The certain answers is the standard semantics of queries in the context of incomplete information.
11 Repairs and Consistent Answers Definition (Arenas, Bertossi, Chomicki) Fix a particular type of repairs (say, subset repairs or -repairs) Let Σ be a set of integrity constraints, let q be a query, and let r be an instance. The consistent answers of q on r w.r.t. Σ, denoted by cons Σ (q, r), is the set certain(q, r, W(r)), where W(r) is the set of all repairs of r, i.e., cons Σ (q, r) = {q(r ) : r is a repair of r}.
12 Example (Revisited) Relation schema R, instance r = {R(a, b), R(a, c), R(b, c)} Σ = {R(x, y) R(x, z) y = z} Recall that r has two -repairs (and subset repairs) w.r.t. Σ: Then r 1 = {R(a, b), R(b, c)} and r 2 = {R(a, c), R(b, c)}. If q(x) is the query zr(x, z), then cons Σ (q, r) = {a, b}. If q (x) is the query zr(z, x), then cons Σ (q, r) = {c}.
13 Data Complexity of Consistent Query Answering Theorem (Chomicki and Marcinkowski ) There exist a set Σ of two functional dependencies (in fact, key constraints) and a Boolean conjunctive query q such that the following problem is conp-complete: Given an instance r, is cons Σ (q, r) true?
14 Data Complexity of Consistent Query Answering Theorem (Chomicki and Marcinkowski ) There exist a set Σ of two functional dependencies (in fact, key constraints) and a Boolean conjunctive query q such that the following problem is conp-complete: Given an instance r, is cons Σ (q, r) true? Theorem (Staworko ) There exist a set Σ consisting of one functional dependency and two universal constraints, and a Boolean conjunctive query q such that the following problem is Π p 2 -complete: Given an instance r, is cons Σ (q, r) true?
15 Data Complexity of Consistent Query Answering Extensive study over the past decade for various classes of integrity constraints and for different types of repairs. Note Intractability results (conp-hardness, Π p 2 -hardness) Tractability results for restricted classes of conjunctive queries: Polynomial-time algorithms. Rewriting to first-order queries. Prototype systems for consistent query answering: Hippo (Chomicki et al.) ConQuer (Fuxman) For overviews, see the invited paper by J. Chomicki in ICDT 2007 and the Ph.D. theses of A. Fuxman (2007) and S. Staworko (2007).
16 Algorithmic Problems about Inconsistent Databases The Consistent Query Answering Problem: Consistent query answering has been investigated in depth. The Repair Checking Problem: Given r and r, tell whether or not r is a repair of r. Repair checking is a data cleaning problem that underlies consistent query answering. So far, repair checking has received less attention than consistent query answering.
17 Repair Checking vs. Consistent Query Answering Proposition (Chomicki and Marcinkowski ) Let Σ be a set of integrity constraints containing all inclusion dependencies. There is a Boolean query q such that the -repair checking problem w.r.t. Σ has a logspace-reduction to the complement of the consistent query answering problem for q w.r.t. Σ Note Thus, in many cases, lower bounds for the complexity of the -repair checking problem yield lower bounds for the complexity of the consistent query answering problem.
18 Aim of this Work Embark on a systematic investigation of the algorithmic aspects of the repair checking problem Study classes of integrity constraints that have been considered in information integration and data exchange. Study subset-repairs and -repairs. Introduce and study CC-repairs (component-cardinality repairs), a new type of cardinality-based repairs that have a Pareto-optimality character.
19 Types of Constraints Definition Equality-generating dependency (egd): x(φ(x) x i = x j ), where φ(x) is a conjunction of atoms. Denial constraint: x (α(x) β(x)), where α(x) is a non-empty conjunction of atoms and β(x) is a conjunction of comparison atoms x i = x j, x i x j, x i < x j, x i x j. Example Every functional dependency is an egd, but not vice versa: x, y, z(mother(z, x) MOTHER(w, x) z = w). Every egd is (logically equivalent) to a denial constraint, but not vice versa: x, y (MOTHER(x, y) x = y))
20 Types of Constraints Definition Tuple-generating dependency (tgd): x(φ(x) yψ(x, y)), where φ(x) is a conjunction of atoms with vars. in x, and ψ(x, y) is a conjunction of atoms with vars. in x and y. Full tgd: a tgd with no existential quantifiers in rhs. x(φ(x) ψ(x)), where φ(x) and ψ(x) are conjunctions of atoms. LAV (local-as-view) tgd: a tgd in which lhs is a single atom. x(p(x) yψ(x, y)). Note: Every inclusion dependency is a LAV tgd, but not vice versa.
21 Types of Constraints Example (dropping universal quantifiers) The following is a tgd (MOTHER(z, x) MOTHER(z, y) u(father(u, x) FATHER(u, y))). The following are full tgds: (SIBLING(x, y) SIBLING(y, x)) (MOTHER(z, x) MOTHER(z, y) SIBLING(x, y)) The following is a LAV tgd: (SIBLING(x, y) z(mother(z, x) MOTHER(z, y)))
22 Types of Repairs Definition Σ a set of integrity constraints and r an inconsistent database. r is a subset-repair of r w.r.t. Σ if r r, r = Σ, and there is no r such that r r r and r = Σ. r is a -repair of r w.r.t. Σ if r = Σ and there is no r such that r r r r and r = Σ. Fact If r r, then r is a subset-repair of r if and only if r is a -repair of r. If Σ is a set of denial constraints, then every -repair is also a subset-repair.
23 Earlier Work - Tractability Results Theorem folklore If Σ is a set of denial constraints, then the subset-repair checking problem w.r.t. Σ is in LOGSPACE.
24 Earlier Work - Tractability Results Theorem folklore If Σ is a set of denial constraints, then the subset-repair checking problem w.r.t. Σ is in LOGSPACE. Chomicki and Marcinkowski 2005 If Σ is the union of an acyclic set of inclusion dependencies and a set of functional dependencies, then the subset-repair checking problem w.r.t. Σ is in PTIME; in fact, it is in LOGSPACE.
25 Earlier Work - Tractability Results Theorem folklore If Σ is a set of denial constraints, then the subset-repair checking problem w.r.t. Σ is in LOGSPACE. Chomicki and Marcinkowski 2005 If Σ is the union of an acyclic set of inclusion dependencies and a set of functional dependencies, then the subset-repair checking problem w.r.t. Σ is in PTIME; in fact, it is in LOGSPACE. Staworko 2007 If Σ is a set of full tgds and egds, then the subset-repair checking problem w.r.t. Σ is in PTIME.
26 Weakly Acyclic Sets of Tgds Fact Acyclic sets of inclusion dependencies and set of full tgds are special cases of weakly acyclic sets of tgds. Weakly acyclic sets of inclusion dependencies are known to have good algorithmic behavior in data exchange and data integration.
27 Definition The position graph of a set Σ of tgds: The nodes are the pairs (R, A), where R is a relation symbol and A is an attribute of R. Such a pair (R, A) is called a position. Let φ(x) yψ(x, y) be a tgd in Σ and let x in x be a variable that also occurs in ψ(x, y). For every occurrence of x in φ(x) in position (R, A i ), add the following edges: (i) For every occurrence of x in ψ(x, y) in position (S, B j ), add an edge (R, A i ) (S, B j ); (ii) In addition, for every existentially quantified variable y in y and for every occurrence of y in ψ(x, y) in position (T, C k ), add a special edge (R, A i ) (T, C k ). Σ is weakly acyclic if the position graph has no cycle going through a special edge. A tgd θ is weakly acyclic if {θ} is weakly acyclic.
28 Weakly Acyclic Sets of Tgds Fact Every acyclic set of inclusion dependencies is a weakly acyclic set (the position graph is acyclic) Every set of full tgds is weakly acyclic (the position graph has no special edges). Example Σ = {D(e, m) M(m), M(m) ed(e, m)} is a weakly acyclic, but cyclic, set of inclusion dependencies. Position graph: D.1 M.1 D.2
29 Weakly Acyclic Sets of Tgds Fact Weakly acyclic sets of tgds have good algorithmic behavior in data exchange and data integration. Specifically, there are PTIME algorithms for: Computing a canonical universal solution; Computing the core of the universal solutions; Computing the certain answers of conjunctive queries.
30 Weakly Acyclic Sets of Tgds Fact Weakly acyclic sets of tgds have good algorithmic behavior in data exchange and data integration. Specifically, there are PTIME algorithms for: Computing a canonical universal solution; Computing the core of the universal solutions; Computing the certain answers of conjunctive queries. Problem Does the good algorithmic behavior of weakly acyclic sets of tgds extend to the repair checking problem?
31 Weakly Acyclic Sets of Tgds: Intractability Theorem There is a weakly acyclic set Σ of tgds such that the subset-repair checking problem w.r.t. Σ is conp-complete.
32 Weakly Acyclic Sets of Tgds: Intractability Theorem There is a weakly acyclic set Σ of tgds such that the subset-repair checking problem w.r.t. Σ is conp-complete. Proof. conp-hardness via a reduction from POSITIVE 1-IN-3-SAT Σ consists of the (non-lav) tgd A(w) P(x, y, z) u 1, u 2, u 3 (T(x, u 1 ) T(y, u 2 ) T(z, u 3 ) S(u 1, u 2, u 3 )) and the two full tgds: T(x, u) T(x, u ) D(u, u ) S(u, u, u), T(x, u) A(u). Σ is weakly acyclic: all special edges are from pos. of P to pos. of T and S; no position of P has an incoming edge.
33 Weakly Acyclic Sets of Tgds: Intractability Theorem There is a weakly acyclic set Σ of tgds such that the subset-repair checking problem w.r.t. Σ is conp-complete.
34 Weakly Acyclic Sets of Tgds: Intractability Theorem There is a weakly acyclic set Σ of tgds such that the subset-repair checking problem w.r.t. Σ is conp-complete. Theorem (Chomicki and Marcinkowski ) There is a set Σ consisting of one inclusion dependency and one functional dependency such that the subset-repair checking problem w.r.t. Σ is conp-complete. Note: The inclusion dependency is R(x 1, x 2, x 3, x 4 ) y 1, y 2 y 3 R(y 1, y 2, x 4, y 3 ), which is not weakly acyclic (special self-loop on R.4).
35 Weakly Acyclic Sets of LAV Tgds: Tractability Theorem If Σ is the union of a weakly acyclic set of LAV tgds and a set of egds, then the subset-repair checking problem w.r.t. Σ is in PTIME; in fact, it is in LOGSPACE.
36 Weakly Acyclic Sets of LAV Tgds: Tractability Theorem If Σ is the union of a weakly acyclic set of LAV tgds and a set of egds, then the subset-repair checking problem w.r.t. Σ is in PTIME; in fact, it is in LOGSPACE. Proof Idea. Key property of LAV tgds: only single facts fire a tgd (and no combinations of facts). Hence, LAV tgds are preserved under unions of models. Key property of weakly acyclic sets of tgds: The solution aware chase terminates in polynomial time. Note: The solution aware chase was used in the study of peer data exchange (Fuxman, K..., Miller, Tan ).
37 Weakly Acyclic Sets of LAV Tgds: Tractability Lemma Let Σ be the union of a weakly acyclic set of LAV tgds and a set of egds. Then there is a constant c such that the following holds. Let r, r be two instances such that r = Σ, and let t be a fact in r \ r such that there is a non-empty set A r \ r such that t A and r A = Σ. There there is a set A t of facts such that t A t A t r \ r A t c r A t = Σ.
38 Weakly Acyclic Sets of LAV Tgds: Tractability Algorithm for subset-repair checking w.r.t. a set Σ that is the union of a weakly acyclic set of LAV tgds and a set of egds. Given r and r with r r, r = Σ, r = Σ: Test whether there is a set A such that 1 A is non-empty 2 A c 3 A r \ r 4 r A = Σ. If such a set A exists, then r is not a subset repair of r w.r.t. Σ; Otherwise, r is a subset repair of r w.r.t. Σ.
39 Subset Repairs vs. -Repairs Theorem If Σ is a weakly acyclic set of LAV tgds, then the -repair problem w.r.t. Σ is in PTIME; in fact, it is in LOGSPACE. Theorem There is a weakly acyclic set Σ 1 of LAV tgds and a set Σ 2 of egds such that the -repair checking problem w.r.t. Σ 1 Σ 2 is conp-complete.
40 Theorem There is a weakly acyclic set Σ 1 of LAV tgds and a set Σ 2 of egds such that the -repair checking problem w.r.t. Σ 1 Σ 2 is conp-complete. Proof. Reduction from POSITIVE 1-IN-3-SAT P 23 (w, w, x, y, z) u, v, w(t(x, u, x, y, z) T(y, v, x, y, z) T(z, w, x, y, z) S(u, v, w)) T(x, u, x, y, z ) T(x, u, x, y, z ) u = u P 23 (s, s, x, y, z) P 23 (w, w, x, y, z ) w = w P 1 (x, y, z) wp 23 (w, w, x, y, z) T(x, u, x, y, z) P 23 (w, w, x, y, z).
41 Full Tgds: PTIME-completeness Theorem (Staworko 2007) If Σ is a set of full tgds, then the -repair checking problem w.r.t. Σ is in PTIME.
42 Full Tgds: PTIME-completeness Theorem (Staworko 2007) If Σ is a set of full tgds, then the -repair checking problem w.r.t. Σ is in PTIME. Theorem There is a set Σ of full tgds such that the subset-repair problem w.r.t. Σ is PTIME-complete.
43 Full Tgds: PTIME-completeness Theorem (Staworko 2007) If Σ is a set of full tgds, then the -repair checking problem w.r.t. Σ is in PTIME. Theorem There is a set Σ of full tgds such that the subset-repair problem w.r.t. Σ is PTIME-complete. Proof (Hint). Logspace Reduction from HORN 3-SAT. Use full tgds to encode the unit propagation algorithm for HORN 3-SAT.
44 Complexity of Subset- and -Repair Checking Constraints \ Semantics Subset-repair -repair Denial LOGSPACE LOGSPACE Acyc. set of IND & egds LOGSPACE? Weak. acyc. LAV tgds LOGSPACE LOGSPACE Weak. acyc. LAV tgds & egds LOGSPACE conp-comp. Full tgds & egds PTIME-comp. PTIME-comp. IND & egds conp-comp. conp-comp. Weak. acyc. tgds & egds conp-comp. conp-comp.
45 Complexity of Subset- and -Repair Checking Constraints \ Semantics Subset-repair -repair Denial LOGSPACE LOGSPACE Acyc. set of IND & egds LOGSPACE? Weak. acyc. LAV tgds LOGSPACE LOGSPACE Weak. acyc. LAV tgds & egds LOGSPACE conp-comp. Full tgds & egds PTIME-comp. PTIME-comp. IND & egds conp-comp. conp-comp. Weak. acyc. tgds & egds conp-comp. conp-comp. Note New phenomenon: Good algorithmic behavior of acyclic sets of inclusion dependencies and sets of full tgds for subset-repair checking does not extend to arbitrary weakly acyclic sets of tgds.
46 C-Repairs: Cardinality Repairs Definition Σ a set of integrity constraints and r an inconsistent database. r is a C-repair (cardinality-repair) of r w.r.t. Σ if r = Σ and there is no r such that r = Σ and r r < r r.
47 C-Repairs: Cardinality Repairs Definition Σ a set of integrity constraints and r an inconsistent database. r is a C-repair (cardinality-repair) of r w.r.t. Σ if r = Σ and there is no r such that r = Σ and r r < r r. Theorem (Lopatenko and Bertossi 2007) There is a denial constraint ϕ such that the C-repair checking problem w.r.t. ϕ is conp-complete.
48 CC-Repairs: Component Cardinality Repairs Definition r cc r if for every relation symbol P in the schema, we have that P r P r. r < cc r if r cc r and there is at least one relation symbol P in the schema such that P r < P r. r is a CC-repair (component-cardinality repair) of r w.r.t. Σ if r = Σ and there is no r such that r = Σ and r r < cc r r.
49 CC-Repairs: Component Cardinality Repairs Definition r cc r if for every relation symbol P in the schema, we have that P r P r. r < cc r if r cc r and there is at least one relation symbol P in the schema such that P r < P r. r is a CC-repair (component-cardinality repair) of r w.r.t. Σ if r = Σ and there is no r such that r = Σ and r r < cc r r. Fact Every C-repair is a CC-repair. Every CC-repair is a -repair.
50 Example Let Σ be the set consisting of the following four tgds: P(x) R(x), R(x) R (x), P (x) R (x), P (x) Q (x). Inconsistent r = {P(1), P (1)}; consistent r 1, r 2, r 3 : r 1 = ; 2; (1, 1, 0, 0, 0) r 2 = {P (1), R (1), Q (1)}; 3; (1, 0, 0, 1, 1) r 3 = {P(1), R(1), R (1), }; 3; (0, 1, 1, 1, 0) characteristic sequence under the order (P, P, R, R, Q ) r 1, r 2, r 3 are CC-repairs. r 1 is the only C-repair among them.
51 CC-Repairs: Intractability Theorem There is denial constraint θ such that the CC-repair checking problem w.r.t. χ is conp-complete. There is a full tgd ϕ such that the CC-repair problem w.r.t. θ is conp-complete. There is a LAV acyclic tgd ψ such that the CC-repair checking problem w.r.t. ψ is conp-complete. There is an acyclic set Ψ of inclusion dependencies such that the CC-repair problem w.r.t. Ψ is conp-complete.
52 CC-Repairs: Intractability Theorem There is an acyclic set Ψ of inclusion dependencies such that the CC-repair problem w.r.t. Ψ is conp-complete. Proof. conp-hardness via a reduction from POSITIVE 1-IN-3-SAT Ψ is the following acyclic set of inclusion dependencies: P(x, y, z) u, v, wq(x, y, z, u, v, w) Q(x, y, z, u, v, w) S(u, v, w) Q(x, y, z, u, v, w) T(x, u) Q(x, y, z, u, v, w) T(y, v) Q(x, y, z, u, v, w) T(z, w).
53 Synopsis Subset repair checking is in PTIME for weakly acyclic sets of LAV tgds and egds. -repair checking is in PTIME for weakly acyclic sets of LAV tgds. -repair checking can be conp-complete for weakly acyclic sets of LAV tgds and egds. -repair checking can be conp-complete for weakly acyclic sets of tgds. CC-repair checking can be conp-complete for denial constraints, full tgds, and acyclic sets of inclusion dependencies.
54 Directions and Problems Open Problem: Prove or disprove that a dichotomy theorem holds for the complexity of the -repair checking problem w.r.t. sets of tgds and egds. Investigate the complexity of repair checking for other types of repairs (attribute-based repairs). Work in this direction has already been carried out by J. Wisden. Are there criteria for differentiating between repairs of the same type?
Peer Data Exchange. ACM Transactions on Database Systems, Vol. V, No. N, Month 20YY, Pages 1 0??.
Peer Data Exchange Ariel Fuxman 1 University of Toronto Phokion G. Kolaitis 2 IBM Almaden Research Center Renée J. Miller 1 University of Toronto and Wang-Chiew Tan 3 University of California, Santa Cruz
Consistent Query Answering in Databases Under Cardinality-Based Repair Semantics
Consistent Query Answering in Databases Under Cardinality-Based Repair Semantics Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada Joint work with: Andrei Lopatenko (Google,
Schema Mappings and Data Exchange
Schema Mappings and Data Exchange Phokion G. Kolaitis University of California, Santa Cruz & IBM Research-Almaden EASSLC 2012 Southwest University August 2012 1 Logic and Databases Extensive interaction
Logic Programs for Consistently Querying Data Integration Systems
IJCAI 03 1 Acapulco, August 2003 Logic Programs for Consistently Querying Data Integration Systems Loreto Bravo Computer Science Department Pontificia Universidad Católica de Chile [email protected] Joint
A Tutorial on Data Integration
A Tutorial on Data Integration Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Antonio Ruberti, Sapienza Università di Roma DEIS 10 - Data Exchange, Integration, and Streaming November 7-12,
Designing and Refining Schema Mappings via Data Examples
Designing and Refining Schema Mappings via Data Examples Bogdan Alexe UCSC [email protected] Balder ten Cate UCSC [email protected] Phokion G. Kolaitis UCSC & IBM Research - Almaden [email protected]
Dependencies Revisited for Improving Data Quality
Dependencies Revisited for Improving Data Quality Wenfei Fan University of Edinburgh & Bell Laboratories Wenfei Fan Dependencies Revisited for Improving Data Quality 1 / 70 Real-world data is often dirty
Computational Methods for Database Repair by Signed Formulae
Computational Methods for Database Repair by Signed Formulae Ofer Arieli ([email protected]) Department of Computer Science, The Academic College of Tel-Aviv, 4 Antokolski street, Tel-Aviv 61161, Israel.
Data exchange. L. Libkin 1 Data Integration and Exchange
Data exchange Source schema, target schema; need to transfer data between them. A typical scenario: Two organizations have their legacy databases, schemas cannot be changed. Data from one organization
NP-complete? NP-hard? Some Foundations of Complexity. Prof. Sven Hartmann Clausthal University of Technology Department of Informatics
NP-complete? NP-hard? Some Foundations of Complexity Prof. Sven Hartmann Clausthal University of Technology Department of Informatics Tractability of Problems Some problems are undecidable: no computer
Data Quality: From Theory to Practice
Data Quality: From Theory to Practice Wenfei Fan School of Informatics, University of Edinburgh, and RCBD, Beihang University [email protected] ABSTRACT Data quantity and data quality, like two sides
XML Data Integration
XML Data Integration Lucja Kot Cornell University 11 November 2010 Lucja Kot (Cornell University) XML Data Integration 11 November 2010 1 / 42 Introduction Data Integration and Query Answering A data integration
Introduction to Logic in Computer Science: Autumn 2006
Introduction to Logic in Computer Science: Autumn 2006 Ulle Endriss Institute for Logic, Language and Computation University of Amsterdam Ulle Endriss 1 Plan for Today Now that we have a basic understanding
Data Integration. Maurizio Lenzerini. Universitá di Roma La Sapienza
Data Integration Maurizio Lenzerini Universitá di Roma La Sapienza DASI 06: Phd School on Data and Service Integration Bertinoro, December 11 15, 2006 M. Lenzerini Data Integration DASI 06 1 / 213 Structure
1. Nondeterministically guess a solution (called a certificate) 2. Check whether the solution solves the problem (called verification)
Some N P problems Computer scientists have studied many N P problems, that is, problems that can be solved nondeterministically in polynomial time. Traditionally complexity question are studied as languages:
Lecture 8: Resolution theorem-proving
Comp24412 Symbolic AI Lecture 8: Resolution theorem-proving Ian Pratt-Hartmann Room KB2.38: email: [email protected] 2014 15 In the previous Lecture, we met SATCHMO, a first-order theorem-prover implemented
Data Quality in Information Integration and Business Intelligence
Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies
Generating models of a matched formula with a polynomial delay
Generating models of a matched formula with a polynomial delay Petr Savicky Institute of Computer Science, Academy of Sciences of Czech Republic, Pod Vodárenskou Věží 2, 182 07 Praha 8, Czech Republic
Query Processing in Data Integration Systems
Query Processing in Data Integration Systems Diego Calvanese Free University of Bozen-Bolzano BIT PhD Summer School Bressanone July 3 7, 2006 D. Calvanese Data Integration BIT PhD Summer School 1 / 152
Bounded Treewidth in Knowledge Representation and Reasoning 1
Bounded Treewidth in Knowledge Representation and Reasoning 1 Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien Luminy, October 2010 1 Joint work with G.
Schema Mediation in Peer Data Management Systems
Schema Mediation in Peer Data Management Systems Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov University of Washington Seattle, WA, USA 98195-2350 {alon,zives,suciu,igor}@cs.washington.edu Abstract
MATHEMATICS: CONCEPTS, AND FOUNDATIONS Vol. III - Logic and Computer Science - Phokion G. Kolaitis
LOGIC AND COMPUTER SCIENCE Phokion G. Kolaitis Computer Science Department, University of California, Santa Cruz, CA 95064, USA Keywords: algorithm, Armstrong s axioms, complete problem, complexity class,
CoNP and Function Problems
CoNP and Function Problems conp By definition, conp is the class of problems whose complement is in NP. NP is the class of problems that have succinct certificates. conp is therefore the class of problems
On the Decidability and Complexity of Query Answering over Inconsistent and Incomplete Databases
On the Decidability and Complexity of Query Answering over Inconsistent and Incomplete Databases Andrea Calì Domenico Lembo Riccardo Rosati Dipartimento di Informatica e Sistemistica Università di Roma
Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
Lecture 7: NP-Complete Problems
IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Basic Course on Computational Complexity Lecture 7: NP-Complete Problems David Mix Barrington and Alexis Maciel July 25, 2000 1. Circuit
On Scale Independence for Querying Big Data
On Scale Independence for Querying Big Data Wenfei Fan University of Edinburgh & Beihang University [email protected] Floris Geerts University of Antwerp [email protected] Leonid Libkin University
Data Integration: A Theoretical Perspective
Data Integration: A Theoretical Perspective Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Via Salaria 113, I 00198 Roma, Italy [email protected] ABSTRACT
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
SPARQL: Un Lenguaje de Consulta para la Web
SPARQL: Un Lenguaje de Consulta para la Web Semántica Marcelo Arenas Pontificia Universidad Católica de Chile y Centro de Investigación de la Web M. Arenas SPARQL: Un Lenguaje de Consulta para la Web Semántica
Query Answering in Peer-to-Peer Data Exchange Systems
Query Answering in Peer-to-Peer Data Exchange Systems Leopoldo Bertossi and Loreto Bravo Carleton University, School of Computer Science, Ottawa, Canada {bertossi,lbravo}@scs.carleton.ca Abstract. The
Relational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute
OHJ-2306 Introduction to Theoretical Computer Science, Fall 2012 8.11.2012
276 The P vs. NP problem is a major unsolved problem in computer science It is one of the seven Millennium Prize Problems selected by the Clay Mathematics Institute to carry a $ 1,000,000 prize for the
Fixed-Point Logics and Computation
1 Fixed-Point Logics and Computation Symposium on the Unusual Effectiveness of Logic in Computer Science University of Cambridge 2 Mathematical Logic Mathematical logic seeks to formalise the process of
Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
Updating Action Domain Descriptions
Updating Action Domain Descriptions Thomas Eiter, Esra Erdem, Michael Fink, and Ján Senko Institute of Information Systems, Vienna University of Technology, Vienna, Austria Email: (eiter esra michael jan)@kr.tuwien.ac.at
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,
Bounded-width QBF is PSPACE-complete
Bounded-width QBF is PSPACE-complete Albert Atserias 1 and Sergi Oliva 2 1 Universitat Politècnica de Catalunya Barcelona, Spain [email protected] 2 Universitat Politècnica de Catalunya Barcelona, Spain
Factorization in Polynomial Rings
Factorization in Polynomial Rings These notes are a summary of some of the important points on divisibility in polynomial rings from 17 and 18 of Gallian s Contemporary Abstract Algebra. Most of the important
CSC 373: Algorithm Design and Analysis Lecture 16
CSC 373: Algorithm Design and Analysis Lecture 16 Allan Borodin February 25, 2013 Some materials are from Stephen Cook s IIT talk and Keven Wayne s slides. 1 / 17 Announcements and Outline Announcements
Relational Algebra. Basic Operations Algebra of Bags
Relational Algebra Basic Operations Algebra of Bags 1 What is an Algebra Mathematical system consisting of: Operands --- variables or values from which new values can be constructed. Operators --- symbols
Scheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
µz An Efficient Engine for Fixed points with Constraints
µz An Efficient Engine for Fixed points with Constraints Kryštof Hoder, Nikolaj Bjørner, and Leonardo de Moura Manchester University and Microsoft Research Abstract. The µz tool is a scalable, efficient
Modern Algebra Lecture Notes: Rings and fields set 4 (Revision 2)
Modern Algebra Lecture Notes: Rings and fields set 4 (Revision 2) Kevin Broughan University of Waikato, Hamilton, New Zealand May 13, 2010 Remainder and Factor Theorem 15 Definition of factor If f (x)
XML Data Exchange: Consistency and Query Answering
XML Data Exchange: Consistency and Query Answering Marcelo Arenas University of Toronto [email protected] Leonid Libkin University of Toronto [email protected] ABSTRACT Data exchange is the problem
Predicate Logic. Example: All men are mortal. Socrates is a man. Socrates is mortal.
Predicate Logic Example: All men are mortal. Socrates is a man. Socrates is mortal. Note: We need logic laws that work for statements involving quantities like some and all. In English, the predicate is
F. ABTAHI and M. ZARRIN. (Communicated by J. Goldstein)
Journal of Algerian Mathematical Society Vol. 1, pp. 1 6 1 CONCERNING THE l p -CONJECTURE FOR DISCRETE SEMIGROUPS F. ABTAHI and M. ZARRIN (Communicated by J. Goldstein) Abstract. For 2 < p
This asserts two sets are equal iff they have the same elements, that is, a set is determined by its elements.
3. Axioms of Set theory Before presenting the axioms of set theory, we first make a few basic comments about the relevant first order logic. We will give a somewhat more detailed discussion later, but
HANDLING INCONSISTENCY IN DATABASES AND DATA INTEGRATION SYSTEMS
HANDLING INCONSISTENCY IN DATABASES AND DATA INTEGRATION SYSTEMS by Loreto Bravo A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree
Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products
Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing
Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange
Data Integration and Exchange L. Libkin 1 Data Integration and Exchange Traditional approach to databases A single large repository of data. Database administrator in charge of access to data. Users interact
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are
Big Data Cleaning. Nan Tang. Qatar Computing Research Institute, Doha, Qatar [email protected]
Big Data Cleaning Nan Tang Qatar Computing Research Institute, Doha, Qatar [email protected] Abstract. Data cleaning is, in fact, a lively subject that has played an important part in the history of data
P NP for the Reals with various Analytic Functions
P NP for the Reals with various Analytic Functions Mihai Prunescu Abstract We show that non-deterministic machines in the sense of [BSS] defined over wide classes of real analytic structures are more powerful
Monitoring Metric First-order Temporal Properties
Monitoring Metric First-order Temporal Properties DAVID BASIN, FELIX KLAEDTKE, SAMUEL MÜLLER, and EUGEN ZĂLINESCU, ETH Zurich Runtime monitoring is a general approach to verifying system properties at
! " # The Logic of Descriptions. Logics for Data and Knowledge Representation. Terminology. Overview. Three Basic Features. Some History on DLs
,!0((,.+#$),%$(-&.& *,2(-$)%&2.'3&%!&, Logics for Data and Knowledge Representation Alessandro Agostini [email protected] University of Trento Fausto Giunchiglia [email protected] The Logic of Descriptions!$%&'()*$#)
Logic in general. Inference rules and theorem proving
Logical Agents Knowledge-based agents Logic in general Propositional logic Inference rules and theorem proving First order logic Knowledge-based agents Inference engine Knowledge base Domain-independent
Fundamentele Informatica II
Fundamentele Informatica II Answer to selected exercises 1 John C Martin: Introduction to Languages and the Theory of Computation M.M. Bonsangue (and J. Kleijn) Fall 2011 Let L be a language. It is clear
Optimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. [email protected] P Srinivasu
5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
Determination of the normalization level of database schemas through equivalence classes of attributes
Computer Science Journal of Moldova, vol.17, no.2(50), 2009 Determination of the normalization level of database schemas through equivalence classes of attributes Cotelea Vitalie Abstract In this paper,
2.1 Complexity Classes
15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look
Discuss the size of the instance for the minimum spanning tree problem.
3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can
NP-Completeness and Cook s Theorem
NP-Completeness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:
The Classes P and NP
The Classes P and NP We now shift gears slightly and restrict our attention to the examination of two families of problems which are very important to computer scientists. These families constitute the
ON FUNCTIONAL SYMBOL-FREE LOGIC PROGRAMS
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Mathematical Sciences 2012 1 p. 43 48 ON FUNCTIONAL SYMBOL-FREE LOGIC PROGRAMS I nf or m at i cs L. A. HAYKAZYAN * Chair of Programming and Information
Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU
Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU Birgit Vogtenhuber Institute for Software Technology email: [email protected] office hour: Tuesday 10:30 11:30 slides: http://www.ist.tugraz.at/pact.html
DEFINABLE TYPES IN PRESBURGER ARITHMETIC
DEFINABLE TYPES IN PRESBURGER ARITHMETIC GABRIEL CONANT Abstract. We consider the first order theory of (Z, +,
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES
FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied
A first step towards modeling semistructured data in hybrid multimodal logic
A first step towards modeling semistructured data in hybrid multimodal logic Nicole Bidoit * Serenella Cerrito ** Virginie Thion * * LRI UMR CNRS 8623, Université Paris 11, Centre d Orsay. ** LaMI UMR
6.852: Distributed Algorithms Fall, 2009. Class 2
.8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election
