Query Processing in Data Integration Systems
|
|
|
- Gladys Blake
- 10 years ago
- Views:
Transcription
1 Query Processing in Data Integration Systems Diego Calvanese Free University of Bozen-Bolzano BIT PhD Summer School Bressanone July 3 7, 2006 D. Calvanese Data Integration BIT PhD Summer School 1 / 152
2 Structure of the course 1 Introduction to data integration Basic issues in data integration Logical formalization 2 Query answering in the absence of constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting 3 Query answering in the presence of constraints The role of integrity constraints Global-as-view (GAV) setting Local-as-view (LAV) and GLAV setting 4 Concluding remarks D. Calvanese Data Integration BIT PhD Summer School 2 / 152
3 Part I Introduction to data integration D. Calvanese Data Integration BIT PhD Summer School 3 / 152
4 Outline 1 Concluding remarks D. Calvanese Data Integration BIT PhD Summer School 4 / 152
5 Outline 1 Concluding remarks D. Calvanese Data Integration BIT PhD Summer School 5 / 152
6 The problem of data integration What is data integration? Data integration is the problem of providing unified and transparent access to a collection of data stored in multiple, autonomous, and heterogeneous data sources Answer(Q) Query Global Schema Sources D. Calvanese Data Integration BIT PhD Summer School 7 / 152
7 The problem of data integration Conceptual architecture of a data integration system Query Global Schema A R B C S D T E Mapping U V W u 1 v 1 w 1 u 2 v 2 w 2 Source Schema 1 Source Schema 2 Source 1 Source 2 D. Calvanese Data Integration BIT PhD Summer School 8 / 152
8 The problem of data integration Relevance of data integration Growing market One of the major challenges for the future of IT At least two contexts Intra-organization data integration (e.g., EIS) Inter-organization data integration (e.g., integration on the Web) D. Calvanese Data Integration BIT PhD Summer School 9 / 152
9 The problem of data integration Data integration: Available industrial efforts Distributed database systems Information on demand Tools for source wrapping Tools based on database federation, e.g., DB2 Information Integrator Distributed query optimization D. Calvanese Data Integration BIT PhD Summer School 10 / 152
10 Variants of data integration Architectures for integrated access to distributed data Distributed databases data sources are homogeneous databases under the control of the distributed database management system Multidatabase or federated databases data sources are autonomous, heterogeneous databases; procedural specification (Mediator-based) data integration access through a global schema mapped to autonomous and heterogeneous data sources; declarative specification Peer-to-peer data integration network of autonomous systems mapped one to each other, without a global schema; declarative specification D. Calvanese Data Integration BIT PhD Summer School 12 / 152
11 Variants of data integration Database federation tools: Characteristics Physical transparency, i.e., masking from the user the physical characteristics of the sources Heterogeinity, i.e., federating highly diverse types of sources Extensibility Autonomy of data sources Performance, through distributed query optimization However, current tools do not (directly) support logical (or conceptual) transparency D. Calvanese Data Integration BIT PhD Summer School 13 / 152
12 Variants of data integration Logical transparency Basic ingredients for achieving logical transparency: The global schema (ontology) provides a conceptual view that is independent from the sources The global schema is described with a semantically rich formalism The mappings are the crucial tools for realizing the independence of the global schema from the sources Obviously, the formalism for specifying the mapping is also a crucial point All the above aspects are not appropriately dealt with by current tools. This means that data integration cannot be simply addressed on a tool basis D. Calvanese Data Integration BIT PhD Summer School 14 / 152
13 Variants of data integration Approaches to data integration (Mediator-based) data integration... is the topic of this course Data exchange [Fagin & al. TCS 05, Kolaitis PODS 05] materialization of the global view allows for query answering without accessing the sources P2P data integration [Halevy & al. ICDE 03, & al. PODS 04, & al. DBPL 05] several peers each peer with local and external sources queries over one peer D. Calvanese Data Integration BIT PhD Summer School 15 / 152
14 Variants of data integration Mediator based data integration Queries are expressed over a global schema (a.k.a. mediated schema, enterprise model,... ) Data are stored in a set of sources Wrappers access the sources (provide a view in a uniform data model of the data stored in the sources) Mediators combine answers coming from wrappers and/or other mediators Answer(Q) Query Global Schema Sources D. Calvanese Data Integration BIT PhD Summer School 16 / 152
15 Variants of data integration Data exchange Materialization of the global schema Materialize Global Schema Sources D. Calvanese Data Integration BIT PhD Summer School 17 / 152
16 Variants of data integration Peer-to-peer data integration P 1 P 2 P 3 P 4 P 5 Peer Local source External source Peer schema Local mapping P2P mapping Operations: Answer(Q,P i ) Materialize(P i ) D. Calvanese Data Integration BIT PhD Summer School 18 / 152
17 Problems in data integration Main problems in data integration 1 How to construct the global schema 2 (Automatic) source wrapping 3 How to discover mappings between sources and global schema 4 Limitations in mechanisms for accessing sources 5 Data extraction, cleaning, and reconciliation 6 How to process updates expressed on the global schema and/or the sources ( read/write vs. read-only data integration) 7 How to model the global schema, the sources, and the mappings between the two 8 How to answer queries expressed on the global schema 9 How to optimize query answering D. Calvanese Data Integration BIT PhD Summer School 20 / 152
18 Problems in data integration The modeling problem Basic questions: How to model the global schema data model constraints How to model the sources data model (conceptual and logical level) access limitations data values (common vs. different domains) How to model the mapping between global schemas and sources How to verify the quality of the modeling process A word of caution: Data modeling (in data integration) is an art. Theoretical frameworks can help humans, not replace them D. Calvanese Data Integration BIT PhD Summer School 21 / 152
19 Problems in data integration The querying problem A query expressed in terms of the global schema must be reformulated in terms of (a set of) queries over the sources and/or materialized views The computed sub-queries are shipped to the sources, and the results are collected and assembled into the final answer The computed query plan should guarantee completeness of the obtained answers wrt the semantics efficiency of the whole query answering process efficiency in accessing sources This process heavily depends on the approach adopted for modeling the data integration system This is the problem that we want to address in this course D. Calvanese Data Integration BIT PhD Summer School 22 / 152
20 Outline 1 Concluding remarks D. Calvanese Data Integration BIT PhD Summer School 23 / 152
21 Semantics of a data integration system Formal framework for data integration Definition A data integration system I is a triple G, S, M, where G is the global schema i.e., a logical theory over a relational alphabet A G S is the source schema i.e., simply a relational alphabet A S disjoint from A G M is the mapping between S and G We consider different approaches to the specification of mappings D. Calvanese Data Integration BIT PhD Summer School 25 / 152
22 Semantics of a data integration system Semantics of a data integration system Which are the dbs that satisfy I, i.e., the logical models of I? We refer only to dbs over a fixed infinite domain of elements We start from the data present in the sources: these are modeled through a source database C over (also called source model), fixing the extension of the predicates of A S The dbs for I are logical interpretations for A G, called global dbs Definition The set of databases for A G that satisfy I relative to C is: sem C (I) = { B B is a global database that is legal wrt G and that satisfies M wrt C } What it means to satisfy M wrt C depends on the nature of M D. Calvanese Data Integration BIT PhD Summer School 26 / 152
23 Relational calculus Relational calculus: the basics Basic idea: we use the language of first-order logic to express which tuples should be in the result to a query We assume to have a domain and a set Σ of constants, one for each element of Let A be a relational alphabet, i.e., a set of predicates, each with an associated arity (we assume a positional notation) A database D over A and is a set of relations, one for each predicate in A, over the constants in Σ (in turn interpreted as elements of ) Let L A be the first-order language over the constants in Σ the predicates of A plus the built-in predicates of relational algebra (e.g., <, >,... ) no function symbols D. Calvanese Data Integration BIT PhD Summer School 28 / 152
24 Relational calculus Relational calculus: Syntax Definition An (domain) relational calculus query over alphabet A has the form { (x 1,..., x n ) ϕ }, where n 0 is the arity of the query x 1,..., x n are (not necessarily distinct) variables ϕ is the body of the query, i.e., a formula of L A whose free variables are exactly x 1,..., x n (x 1,..., x n ) is called the target list of the query If r is a predicate of arity k, an atom with predicate r has the form r(y 1,..., y k ), where y 1,..., y k are variables or constants D. Calvanese Data Integration BIT PhD Summer School 29 / 152
25 Relational calculus Relational calculus: Semantics Relational calculus queries are evaluated on particular interpretations Definition A correct interpretation for relational calculus queries over A is a pair I =, D, where is a domain, and D is a database over A and Definition The value of a relational calculus query q = {(x 1,..., x n ) ϕ} in an interpretation I =, D is the set of tuples (c 1,..., c n ) of constants in Σ such that I, V = ϕ, where V is the variable assignment that assigns c i to x i When the domain is clear, we can omit it, and write directly D, V = ϕ, instead of, D, V = ϕ D. Calvanese Data Integration BIT PhD Summer School 30 / 152
26 Relational calculus Result of relational calculus queries Definition The result of the evaluation of a relational calculus query q = {(x 1,..., x n ) ϕ} on a database D over A and is the relation q D such that the arity of q D is n the extension of q D is the set of constants that constitute the value of the query q in the interpretation, D D. Calvanese Data Integration BIT PhD Summer School 31 / 152
27 Relational calculus Conjunctive queries are the most common kind of relational calculus queries also known as select-project-join SQL queries allow for easy optimization in relational DBMSs Definition A conjunctive query (CQ) is a relational calculus query of the form { ( x) y. r 1 ( x 1, y 1 ) r m ( x m, y m ) } where x is the union of the x i s, and y is the union of the y i s r 1,..., r m are relation symbols (not built-in predicates) We use the following abbreviation: { ( x) r 1 ( x 1, y 1 ),..., r m ( x m, y m ) } D. Calvanese Data Integration BIT PhD Summer School 32 / 152
28 Relational calculus Complexity of relational calculus We consider the complexity of the recognition problem, i.e., checking whether a tuple of constants is in the answer to a query: measured wrt the size of the database data complexity measured wrt the size of the query and the database combined complexity Complexity of relational calculus data complexity: polynomial, actually in LogSpace combined complexity: PSpace-complete Complexity of conjunctive queries data complexity: in LogSpace combined complexity: NP-complete D. Calvanese Data Integration BIT PhD Summer School 33 / 152
29 Queries to a data integration system Queries to a data integration system I The domain is fixed, and we do not distinguish an element of from the constant denoting it standard names Queries to I are relational calculus queries over the alphabet A G of the global schema When evaluating q over I, we have to consider that for a given source database C, there may be many global databases B in sem C (I) We consider those answers to q that hold for all global databases in sem C (I) certain answers D. Calvanese Data Integration BIT PhD Summer School 35 / 152
30 Queries to a data integration system Semantics of queries to I Definition Given q, I, and C, the set of certain answers to q wrt I and C is cert(q, I, C) = { (c 1,..., c n ) q B for all B sem C (I) } Query answering is logical implication Complexity is measured mainly wrt the size of the source db C, i.e., we consider data complexity We consider the problem of deciding whether c cert(q, I, C), for a given c D. Calvanese Data Integration BIT PhD Summer School 36 / 152
31 Queries to a data integration system Databases with incomplete information, or knowledge bases Traditional database: one model of a first-order theory Query answering means evaluating a formula in the model Database with incomplete information, or knowledge base: set of models (specified, for example, as a restricted first-order theory) Query answering means computing the tuples that satisfy the query in all the models in the set There is a strong connection between query answering in data integration and query answering in databases with incomplete information under constraints (or, query answering in knowledge bases) D. Calvanese Data Integration BIT PhD Summer School 37 / 152
32 Queries to a data integration system Query answering with incomplete information [Reiter 84]: relational setting, databases with incomplete information modeled as a first order theory [Vardi 86]: relational setting, complexity of reasoning in closed world databases with unknown values Several approaches both from the DB and the KR community [van der Meyden 98]: survey on logical approaches to incomplete information in databases D. Calvanese Data Integration BIT PhD Summer School 38 / 152
33 Formalizing the mapping The mapping How is the mapping M between S and G specified? Are the sources defined in terms of the global schema? Approach called source-centric, or local-as-view, or LAV Is the global schema defined in terms of the sources? Approach called global-schema-centric, or global-as-view, or GAV A mixed approach? Approach called GLAV D. Calvanese Data Integration BIT PhD Summer School 40 / 152
34 Formalizing the mapping GAV vs. LAV Example Global schema: movie(title, Year, Director) european(director) review(title, Critique) Source 1: r 1 (Title, Year, Director) Source 2: r 2 (Title, Critique) since 1990 since 1960, european directors Query: Title and critique of movies in 1998 { (t, r) d. movie(t, 1998, d) review(t, r) }, abbreviated { (t, r) movie(t, 1998, d), review(t, r) } D. Calvanese Data Integration BIT PhD Summer School 41 / 152
35 Formalizing GAV data integration systems Formalization of GAV In GAV (with sound sources), the mapping M is a set of assertions: φ S g one for each element g in A G, with φ S a query over S of the arity of g Given a source db C, a db B for G satisfies M wrt C if for each g G: φ C S In other words, the assertion means gb x. φ S ( x) g( x) Given a source database, M provides direct information about which data satisfy the elements of the global schema Relations in G are views, and queries are expressed over the views. Thus, it seems that we can simply evaluate the query over the data satisfying the global relations (as if we had a single database at hand) D. Calvanese Data Integration BIT PhD Summer School 43 / 152
36 Formalizing GAV data integration systems GAV Example Global schema: movie(title, Year, Director) european(director) review(title, Critique) GAV: to each relation in the global schema, M associates a view over the sources: { (t, y, d) r 1 (t, y, d) } movie(t, y, d) { (d) r 1 (t, y, d) } european(d) { (t, r) r 2 (t, r) } review(t, r) Logical formalization: t, y, d. r 1 (t, y, d) movie(t, y, d) d. ( t, y. r 1 (t, y, d)) european(d) t, r. r 2 (t, r) review(t, r) D. Calvanese Data Integration BIT PhD Summer School 44 / 152
37 Formalizing GAV data integration systems GAV Example of query processing The query { (t, r) movie(t, 1998, d), review(t, r) } is processed by means of unfolding, i.e., by expanding each atom according to its associated definition in M, so as to come up with source relations In this case: { (t, r) movie(t, 1998, d), review(t, r) } unfolding { (t, r) r 1 (t, 1998, d), r 2 (t, r) } D. Calvanese Data Integration BIT PhD Summer School 45 / 152
38 Formalizing GAV data integration systems GAV Example of constraints Global schema containing constraints: movie(title, Year, Director) european(director) review(title, Critique) european movie 60s(Title, Year, Director) t, y, d. european movie 60s(t, y, d) movie(t, y, d) d. t, y. european movie 60s(t, y, d) european(d) GAV mappings: { (t, y, d) r 1 (t, y, d) } european movie 60s(t, y, d) { (d) r 1 (t, y, d) } european(d) { (t, r) r 2 (t, r) } review(t, r) D. Calvanese Data Integration BIT PhD Summer School 46 / 152
39 Formalizing LAV data integration systems Formalization of LAV In LAV (with sound sources), the mapping M is a set of assertions: s φ G one for each source element s in A S, with φ G a query over G Given source db C, a db B for G satisfies M wrt C if for each s S: s C φ B G In other words, the assertion means x. s( x) φ G ( x) The mapping M and the source database C do not provide direct information about which data satisfy the global schema Sources are views, and we have to answer queries on the basis of the available data in the views D. Calvanese Data Integration BIT PhD Summer School 48 / 152
40 Formalizing LAV data integration systems LAV Example Global schema: movie(title, Year, Director) european(director) review(title, Critique) LAV: to each source relation, M associates a view over the global schema: r 1 (t, y, d) { (t, y, d) movie(t, y, d), european(d), y 1960 } r 2 (t, r) { (t, r) movie(t, y, d), review(t, r), y 1990 } The query { (t, r) movie(t, 1998, d), review(t, r) } is processed by means of an inference mechanism that aims at re-expressing the atoms of the global schema in terms of atoms at the sources. In this case: { (t, r) r 2 (t, r), r 1 (t, 1998, d) } D. Calvanese Data Integration BIT PhD Summer School 49 / 152
41 Formalizing LAV data integration systems GAV and LAV Comparison GAV: (e.g., Carnot, SIMS, Tsimmis, IBIS, Momis, DisAtDis,... ) Quality depends on how well we have compiled the sources into the global schema through the mapping Whenever a source changes or a new one is added, the global schema needs to be reconsidered Query processing can be based on some sort of unfolding (query answering looks easier without constraints) LAV: (e.g., Information Manifold, DWQ, Picsel) Quality depends on how well we have characterized the sources High modularity and extensibility (if the global schema is well designed, when a source changes, only its definition is affected) Query processing needs reasoning (query answering complex) D. Calvanese Data Integration BIT PhD Summer School 50 / 152
42 Formalizing LAV data integration systems Beyond GAV and LAV: GLAV In GLAV (with sound sources), the mapping M is a set of assertions: φ S φ G with φ S a query over S, and φ G a query over G of the same arity as φ S Given source db C, a db B for G satisfies M wrt C if for each φ S φ G in M: φ C S In other words, the assertion means φb G x. φ S ( x) φ G ( x) As for LAV, the mapping M does not provide direct information about which data satisfy the global schema To answer a query q over G, we have to infer how to use M in order to access the source database C D. Calvanese Data Integration BIT PhD Summer School 51 / 152
43 Formalizing LAV data integration systems GLAV Example Global schema: work(person, Project), area(project, Field) Source 1: hasjob(person, Field) Source 2: teaches(professor, Course), in(course, Field) Source 3: get(researcher, Grant), for(grant, Project) GLAV mapping: {(r, f) hasjob(r, f)} {(r, f) work(r, p), area(p, f)} {(r, f) teaches(r, c), in(c, f)} {(r, f) work(r, p), area(p, f)} {(r, p) get(r, g), for(g, p)} {(r, f) work(r, p)} D. Calvanese Data Integration BIT PhD Summer School 52 / 152
44 Formalizing LAV data integration systems GLAV A technical observation In GLAV (with sound sources), the mapping M is constituted by a set of assertions: φ S φ G Each such assertion can be rewritten wlog by introducing a new predicate r (not to be used in the queries) of the same arity as the two queries and replace the assertion with the following two: φ S r r φ G In other words, we replace x. φ S ( x) φ G ( x) with x. φ S ( x) r( x) and x. r( x) φ G ( x) D. Calvanese Data Integration BIT PhD Summer School 53 / 152
Data Integration. Maurizio Lenzerini. Universitá di Roma La Sapienza
Data Integration Maurizio Lenzerini Universitá di Roma La Sapienza DASI 06: Phd School on Data and Service Integration Bertinoro, December 11 15, 2006 M. Lenzerini Data Integration DASI 06 1 / 213 Structure
A Tutorial on Data Integration
A Tutorial on Data Integration Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Antonio Ruberti, Sapienza Università di Roma DEIS 10 - Data Exchange, Integration, and Streaming November 7-12,
Data Integration: A Theoretical Perspective
Data Integration: A Theoretical Perspective Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Via Salaria 113, I 00198 Roma, Italy [email protected] ABSTRACT
Chapter 17 Using OWL in Data Integration
Chapter 17 Using OWL in Data Integration Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, and Marco Ruzzi Abstract One of the outcomes of the research work carried
Data Quality in Information Integration and Business Intelligence
Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies
OWL based XML Data Integration
OWL based XML Data Integration Manjula Shenoy K Manipal University CSE MIT Manipal, India K.C.Shet, PhD. N.I.T.K. CSE, Suratkal Karnataka, India U. Dinesh Acharya, PhD. ManipalUniversity CSE MIT, Manipal,
Virtual Data Integration
Virtual Data Integration Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada www.scs.carleton.ca/ bertossi [email protected] Chapter 1: Introduction and Issues 3 Data
Enterprise Modeling and Data Warehousing in Telecom Italia
Enterprise Modeling and Data Warehousing in Telecom Italia Diego Calvanese Faculty of Computer Science Free University of Bolzano/Bozen Piazza Domenicani 3 I-39100 Bolzano-Bozen BZ, Italy Luigi Dragone,
Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange
Data Integration and Exchange L. Libkin 1 Data Integration and Exchange Traditional approach to databases A single large repository of data. Database administrator in charge of access to data. Users interact
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS
INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan
Peer Data Management Systems Concepts and Approaches
Peer Data Management Systems Concepts and Approaches Armin Roth HPI, Potsdam, Germany Nov. 10, 2010 Armin Roth (HPI, Potsdam, Germany) Peer Data Management Systems Nov. 10, 2010 1 / 28 Agenda 1 Large-scale
Structured and Semi-Structured Data Integration
UNIVERSITÀ DEGLI STUDI DI ROMA LA SAPIENZA DOTTORATO DI RICERCA IN INGEGNERIA INFORMATICA XIX CICLO 2006 UNIVERSITÉ DE PARIS SUD DOCTORAT DE RECHERCHE EN INFORMATIQUE Structured and Semi-Structured Data
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
Integrating XML Data Sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM) Extended Abstract Ioanna Koffina 1, Giorgos Serfiotis 1, Vassilis Christophides 1, Val Tannen
Peer Data Exchange. ACM Transactions on Database Systems, Vol. V, No. N, Month 20YY, Pages 1 0??.
Peer Data Exchange Ariel Fuxman 1 University of Toronto Phokion G. Kolaitis 2 IBM Almaden Research Center Renée J. Miller 1 University of Toronto and Wang-Chiew Tan 3 University of California, Santa Cruz
Query Answering in Peer-to-Peer Data Exchange Systems
Query Answering in Peer-to-Peer Data Exchange Systems Leopoldo Bertossi and Loreto Bravo Carleton University, School of Computer Science, Ottawa, Canada {bertossi,lbravo}@scs.carleton.ca Abstract. The
On Rewriting and Answering Queries in OBDA Systems for Big Data (Short Paper)
On Rewriting and Answering ueries in OBDA Systems for Big Data (Short Paper) Diego Calvanese 2, Ian Horrocks 3, Ernesto Jimenez-Ruiz 3, Evgeny Kharlamov 3, Michael Meier 1 Mariano Rodriguez-Muro 2, Dmitriy
Fixed-Point Logics and Computation
1 Fixed-Point Logics and Computation Symposium on the Unusual Effectiveness of Logic in Computer Science University of Cambridge 2 Mathematical Logic Mathematical logic seeks to formalise the process of
Integrating and Exchanging XML Data using Ontologies
Integrating and Exchanging XML Data using Ontologies Huiyong Xiao and Isabel F. Cruz Department of Computer Science University of Illinois at Chicago {hxiao ifc}@cs.uic.edu Abstract. While providing a
An introduction to data integration. Prof. Letizia Tanca Technologies for Information Systems
An introduction to data integration Prof. Letizia Tanca Technologies for Information Systems 1 Motivation In modern Information Systems there is a growing quest for achieving integration of SW applications,
A Framework and Architecture for Quality Assessment in Data Integration
A Framework and Architecture for Quality Assessment in Data Integration Jianing Wang March 2012 A Dissertation Submitted to Birkbeck College, University of London in Partial Fulfillment of the Requirements
Data Integration Using ID-Logic
Data Integration Using ID-Logic Bert Van Nuffelen 1, Alvaro Cortés-Calabuig 1, Marc Denecker 1, Ofer Arieli 2, and Maurice Bruynooghe 1 1 Department of Computer Science, Katholieke Universiteit Leuven,
Schema Mediation in Peer Data Management Systems
Schema Mediation in Peer Data Management Systems Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov University of Washington Seattle, WA, USA 98195-2350 {alon,zives,suciu,igor}@cs.washington.edu Abstract
Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts
Distributed Databases Basic Concepts Distributed Databases Concepts. Advantages and disadvantages of distributed databases. Functions and architecture for a DDBMS. Distributed database design. Levels of
Integrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
Repair Checking in Inconsistent Databases: Algorithms and Complexity
Repair Checking in Inconsistent Databases: Algorithms and Complexity Foto Afrati 1 Phokion G. Kolaitis 2 1 National Technical University of Athens 2 UC Santa Cruz and IBM Almaden Research Center Oxford,
Semantic Technologies for Data Integration using OWL2 QL
Semantic Technologies for Data Integration using OWL2 QL ESWC 2009 Tutorial Domenico Lembo Riccardo Rosati Dipartimento di Informatica e Sistemistica Sapienza Università di Roma, Italy 6th European Semantic
Formalization of the CRM: Initial Thoughts
Formalization of the CRM: Initial Thoughts Carlo Meghini Istituto di Scienza e Tecnologie della Informazione Consiglio Nazionale delle Ricerche Pisa CRM SIG Meeting Iraklio, October 1st, 2014 Outline Overture:
Web-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, [email protected] Abstract. Despite the dramatic growth of online genomic
Comparing Data Integration Algorithms
Comparing Data Integration Algorithms Initial Background Report Name: Sebastian Tsierkezos [email protected] ID :5859868 Supervisor: Dr Sandra Sampaio School of Computer Science 1 Abstract The problem
Overview. DW Source Integration, Tools, and Architecture. End User Applications (EUA) EUA Concepts. DW Front End Tools. Source Integration
DW Source Integration, Tools, and Architecture Overview DW Front End Tools Source Integration DW architecture Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course
HANDLING INCONSISTENCY IN DATABASES AND DATA INTEGRATION SYSTEMS
HANDLING INCONSISTENCY IN DATABASES AND DATA INTEGRATION SYSTEMS by Loreto Bravo A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree
! " # The Logic of Descriptions. Logics for Data and Knowledge Representation. Terminology. Overview. Three Basic Features. Some History on DLs
,!0((,.+#$),%$(-&.& *,2(-$)%&2.'3&%!&, Logics for Data and Knowledge Representation Alessandro Agostini [email protected] University of Trento Fausto Giunchiglia [email protected] The Logic of Descriptions!$%&'()*$#)
Database Systems. Lecture Handout 1. Dr Paolo Guagliardo. University of Edinburgh. 21 September 2015
Database Systems Lecture Handout 1 Dr Paolo Guagliardo University of Edinburgh 21 September 2015 What is a database? A collection of data items, related to a specific enterprise, which is structured and
3. Relational Model and Relational Algebra
ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra
Course: CSC 222 Database Design and Management I (3 credits Compulsory)
Course: CSC 222 Database Design and Management I (3 credits Compulsory) Course Duration: Three hours per week for 15weeks with practical class (45 hours) As taught in 2010/2011 session Lecturer: Oladele,
Distributed Databases
Distributed Databases Chapter 1: Introduction Johann Gamper Syllabus Data Independence and Distributed Data Processing Definition of Distributed databases Promises of Distributed Databases Technical Problems
SPARQL: Un Lenguaje de Consulta para la Web
SPARQL: Un Lenguaje de Consulta para la Web Semántica Marcelo Arenas Pontificia Universidad Católica de Chile y Centro de Investigación de la Web M. Arenas SPARQL: Un Lenguaje de Consulta para la Web Semántica
Distributed Databases in a Nutshell
Distributed Databases in a Nutshell Marc Pouly [email protected] Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice
Query Management in Data Integration Systems: the MOMIS approach
Dottorato di Ricerca in Computer Engineering and Science Scuola di Dottorato in Information and Communication Technologies XXI Ciclo Università degli Studi di Modena e Reggio Emilia Dipartimento di Ingegneria
Logic Programs for Consistently Querying Data Integration Systems
IJCAI 03 1 Acapulco, August 2003 Logic Programs for Consistently Querying Data Integration Systems Loreto Bravo Computer Science Department Pontificia Universidad Católica de Chile [email protected] Joint
Schema Mappings and Data Exchange
Schema Mappings and Data Exchange Phokion G. Kolaitis University of California, Santa Cruz & IBM Research-Almaden EASSLC 2012 Southwest University August 2012 1 Logic and Databases Extensive interaction
Logical and categorical methods in data transformation (TransLoCaTe)
Logical and categorical methods in data transformation (TransLoCaTe) 1 Introduction to the abbreviated project description This is an abbreviated project description of the TransLoCaTe project, with an
ECS 165A: Introduction to Database Systems
ECS 165A: Introduction to Database Systems Todd J. Green based on material and slides by Michael Gertz and Bertram Ludäscher Winter 2011 Dept. of Computer Science UC Davis ECS-165A WQ 11 1 1. Introduction
GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
Robust Module-based Data Management
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. V, NO. N, MONTH YEAR 1 Robust Module-based Data Management François Goasdoué, LRI, Univ. Paris-Sud, and Marie-Christine Rousset, LIG, Univ. Grenoble
Introducing Formal Methods. Software Engineering and Formal Methods
Introducing Formal Methods Formal Methods for Software Specification and Analysis: An Overview 1 Software Engineering and Formal Methods Every Software engineering methodology is based on a recommended
CHAPTER 7 GENERAL PROOF SYSTEMS
CHAPTER 7 GENERAL PROOF SYSTEMS 1 Introduction Proof systems are built to prove statements. They can be thought as an inference machine with special statements, called provable statements, or sometimes
Principles of Distributed Database Systems
M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition
Introduction to Databases
Page 1 of 5 Introduction to Databases An introductory example What is a database? Why do we need Database Management Systems? The three levels of data abstraction What is a Database Management System?
Chapter 2: DDBMS Architecture
Chapter 2: DDBMS Architecture Definition of the DDBMS Architecture ANSI/SPARC Standard Global, Local, External, and Internal Schemas, Example DDBMS Architectures Components of the DDBMS Acknowledgements:
Designing and Refining Schema Mappings via Data Examples
Designing and Refining Schema Mappings via Data Examples Bogdan Alexe UCSC [email protected] Balder ten Cate UCSC [email protected] Phokion G. Kolaitis UCSC & IBM Research - Almaden [email protected]
Integrating Pattern Mining in Relational Databases
Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a
2. The Language of First-order Logic
2. The Language of First-order Logic KR & R Brachman & Levesque 2005 17 Declarative language Before building system before there can be learning, reasoning, planning, explanation... need to be able to
Data Grids. Lidan Wang April 5, 2007
Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural
Integrating Data from Possibly Inconsistent Databases
Integrating Data from Possibly Inconsistent Databases Phan Minh Dung Department of Computer Science Asian Institute of Technology PO Box 2754, Bangkok 10501, Thailand [email protected] Abstract We address
Constraint-Based XML Query Rewriting for Data Integration
Constraint-Based XML Query Rewriting for Data Integration Cong Yu Department of EECS, University of Michigan [email protected] Lucian Popa IBM Almaden Research Center [email protected] ABSTRACT
DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 DATA INTEGRATION Motivation Many databases and sources of data that need to be integrated to work together Almost all applications have many sources
2. Basic Relational Data Model
2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that
Extending Data Processing Capabilities of Relational Database Management Systems.
Extending Data Processing Capabilities of Relational Database Management Systems. Igor Wojnicki University of Missouri St. Louis Department of Mathematics and Computer Science 8001 Natural Bridge Road
