Schema Integration. Conceptual Database Design. Batini, Ceri, Navathe Ch. 5. A Comparative Analysis of Methodologies for Database Schema Integration

Similar documents
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

not necessarily strictly sequential feedback loops exist, i.e. may need to revisit earlier stages during a later stage

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams

Fundamentals of Database System

Data Integration and Data Provenance. Profa. Dra. Cristina Dutra de Aguiar Ciferri

A Simplified Framework for Data Cleaning and Information Retrieval in Multiple Data Source Problems

IT2305 Database Systems I (Compulsory)

CHAPTER-6 DATA WAREHOUSE

Database Design Methodology

SOLVING SEMANTIC CONFLICTS IN AUDIENCE DRIVEN WEB DESIGN

No More Keyword Search or FAQ: Innovative Ontology and Agent Based Dynamic User Interface

Databases What the Specification Says

IT2304: Database Systems 1 (DBS 1)

Database Design Methodology

2.5.3 Use basic database skills to enter information in a database

Introduction to Database Development

Data Modeling in the Age of Big Data

Contribute to resource plan development in contact centre operations

INTEGRAL COLLABORATIVE DECISION MODEL IN ORDER TO SUPPORT PROJECT DEFINITION PHASE MANAGEMENT

DNV GL Assessment Checklist ISO 9001:2015

A brief overview of developing a conceptual data model as the first step in creating a relational database.

Answers to Review Questions

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology

A Tool for Generating Relational Database Schema from EER Diagram

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

Chapter 2: Entity-Relationship Model. Entity Sets. " Example: specific person, company, event, plant

Integrating Heterogeneous Data Sources Using XML

Toad Data Modeler - Features Matrix

Data Integration using Semantic Technology: A use case

City of Sydney Council. Petition Guidelines

CDC UNIFIED PROCESS PRACTICES GUIDE

Rose/Architect: a tool to visualize architecture

Basics of Dimensional Modeling

What the Accountant Can & Cannot Do (.QBA)... How do I know what I can do? Transaction restrictions in Accountant's Copy

Fragmentation and Data Allocation in the Distributed Environments

DATABASE DESIGN. - Developing database and information systems is performed using a development lifecycle, which consists of a series of steps.

isurf edocreator: e-business Document Design and Customization Environment

Structured System Analysis and Design Method - SSADM. An Introduction to. Relational Data Analysis. Third Normal Form

MIDTERM #2: INFORMATION SYSTEMS (INDE499B) Dr. Jennifer Turns Autumn a. Total Time: You will have a total of 50 minutes for this midterm.

Data Integration and ETL Process

SQLMutation: A tool to generate mutants of SQL database queries

Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1

Requirements Engineering Processes. Feasibility studies. Elicitation and analysis. Problems of requirements analysis

Introduction to normalization. Introduction to normalization

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

SMBC ICT Functional Review Recommendations (Amended 1 st June 2016)


BOARD OF DIRECTORS ROLE, ORGANISATION AND METHODS OF OPERATION

Key Steps to a Management Skills Audit

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

XV. The Entity-Relationship Model

SEMI AUTOMATIC DATA CLEANING FROM MULTISOURCES BASED ON SEMANTIC HETEROGENOUS

Handling Database Schema Variability in Software Product Lines

IV. The (Extended) Entity-Relationship Model

The policy also aims to make clear the actions required when faced with evidence of work related stress.

Information Model Architecture. Version 2.0

CRM INTEGRATION. Swift Digital Suite CRM Integration with: Salesforce and MS Dynamics. 25 September 2015

Customer Support Services

The Relational Data Model: Structure

Advantages of DBMS.

Database Design Process. Databases - Entity-Relationship Modelling. Requirements Analysis. Database Design

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

Further information can be obtained by calling Glenn Shaw, Acting Manager Infrastructure and Waste Services on

2. Conceptual Modeling using the Entity-Relationship Model

Artificial Intelligence

Accessing Private Network via Firewall Based On Preset Threshold Value

Benefits of Normalisation in a Data Base - Part 1

Data Modeling. Database Systems: The Complete Book Ch ,

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

An Automatic Tool for Checking Consistency between Data Flow Diagrams (DFDs)

Procurement Routes. Curtin University of Technology

Doing database design with MySQL

An introduction to data integration. Prof. Letizia Tanca Technologies for Information Systems

Data Modelling and E-R Diagrams

Umbrello UML Modeller Handbook

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams

Executive Assistant to CEO

Introduction to Fractions, Equivalent and Simplifying (1-2 days)

A terminology model approach for defining and managing statistical metadata

Transcription:

Schema Integration Conceptual Database Design Batini, Ceri, Navathe Ch. 5 A Comparative Analysis of Methodologies for Database Schema Integration Batini, Lenzerini, Navathe ACM Computing Surveys, Vol 18, No 4, Dec 1986 Fundamentals of Database Systems Elmasri/Navathe sec. 16.2.2 Stephen Mc Kearney, 2002. 1

Overview Definition What is schema integration? Merging two or more database schemas (models). Problems Different methods of representing the same concepts, for example, naming concepts. Strategies Schemas may be merged (i) in one go or (ii) in stages. Process Three stages: (i) identify problems, (ii) resolve problems, (iii) merge schemas. Resolving Conflicts Changing the names and the structure of entities. Example See paper. 2 Stephen Mc Kearney, 2002. 2

Definition Database 1 Database 2 Integrated Database 3 A database schema is the description of a database, for example, the entity-relationship model. Batini et al define schema integration as the process of merging several conceptual schemas into a global conceptual schema that represents all the requirements of the application. Schema integration is used to merge two or more database schemas into a single schema that can store data from both the original databases. Schema integration is used when two or more existing databases must be combined, for example, when a new management information system is being developed. Schema integration may be used when the process of database design is too large to be carried out by one individual. Two or more designers will build models of different parts of the database and use schema integration to merge the resulting models. There are two major types of schema integration: View Integration View integration takes place during the design of a new database when user requirements may be different for each user group. View integration is used to merge different viewpoints into a single data model. Database Integration Database integration is used when two or more databases must be combined to produce a single schema, called a global schema. Ref: Batini, p119. Stephen Mc Kearney, 2002. 3

Overview Definition What is schema integration? Merging two or more database schemas (models). Problems Different methods of representing the same concepts, for example, naming concepts. Strategies Schemas may be merged (i) in one go or (ii) in stages. Process Three stages: (i) identify problems, (ii) resolve problems, (iii) merge schemas. Resolving Conflicts Changing the names and the structure of entities. Example See paper. 4 Stephen Mc Kearney, 2002. 4

Problems - Different Perspectives Employee Employee Relationship Department Project Project Perspective 1 Perspective 2 5 When two database schemas are designed by different designers using different user requirements, the resulting schemas will often present contrasting views of the same data. In the example above, the relationship between employee and project in one database is represented as a relationships between employee, department and project in another database. This situation might occur in an organisation that allows different departments to have different rules as to how employees are allocated to projects. For example, in one department employees may be assigned to projects while in another department employees may not be considered to be directly related to a project. Ref: Batini, sec. 5.1. Stephen Mc Kearney, 2002. 5

Problems - Equivalent Concepts Attribute Book Book Publisher Publisher Database 1 Database 2 6 Different databases may treat the same concepts in different ways. In the above example, the publisher concept is an entity in database 1 but an attribute in database 2. There are two situations that must be dealt with during schema integration: 1. When different concepts are modelled in the same way. For example, in a university database staff and students may be represented by the entity person even though they are different concepts. 2. When the same concepts are modelled in different way. For instance, the above example models the concept of a publisher as an entity and as an attribute. Ref: Batini, sec. 5.1. Stephen Mc Kearney, 2002. 6

Problems - Incompatible Designs Employee Employee one-to-many 1 N many-to-many N M Project Project Database 1 Database 2 7 Two database designs may be incompatible because mistakes were made in the initial design or there are different constraints placed on the data. For instance, in the above example, the relationship between employee and project is represented as a one-to-many relationship in database 1 and as a many-to-many relationship in database 2. This problem may be caused by mistakes made during the initial database analysis task or because users of the system have different working practices. For example, one department in an organisation, which works on small projects, may allocate one employee to a project but a different department, which works on large projects, may allocate many employees to a project. During schema integration these different viewpoints must be reconciled. Ref: Batini, sec. 5.1. Stephen Mc Kearney, 2002. 7

Overview Definition What is schema integration? Merging two or more database schemas (models). Problems Different methods of representing the same concepts, for example, naming concepts. Strategies Schemas may be merged (i) in one go or (ii) in stages. Process Three stages: (i) identify problems, (ii) resolve problems, (iii) merge schemas. Resolving Conflicts Changing the names and the structure of entities. Example See paper. 8 Stephen Mc Kearney, 2002. 8

Strategies - All-In-One Database Schema 1 Database Schema 2 Database Schema 3 Integrated Database Schema 9 The first strategy for integrating a set of schemas is to merge them all into a single large schema (called the global schema). This approach would be difficult when the schemas are large or when there are a large number of schemas. Ref: Batini, sec. 5.2. Stephen Mc Kearney, 2002. 9

Strategies - Stages Database Schema 1 Database Schema 2 Database Schema 3 Partially Integrated Database Schema Integrated Database Schema 10 The second strategy for schema integration is to integrate some of the schemas (e.g. two) and then to integrate the resulting schemas. This approach would be more appropriate when the schemas are complex or when there are a large number of schemas. Ref: Batini, sec. 5.2. Stephen Mc Kearney, 2002. 10

Overview Definition What is schema integration? Merging two or more database schemas (models). Problems Different methods of representing the same concepts, for example, naming concepts. Strategies Schemas may be merged (i) in one go or (ii) in stages. Process Three stages: (i) identify problems, (ii) resolve problems, (iii) merge schemas. Resolving Conflicts Changing the names and the structure of entities. Example See paper. 11 Stephen Mc Kearney, 2002. 11

Process Schema A Conflict Analysis Schema A Schema B List of Conflicts Conflict Resolution Schema A Schema B Interschema Properties Schema Merging Integrated Schema Schema B 12 The schema integration process starts with two or more schemas and involves three main stages: 1. Conflict Analysis During conflict analysis differences in the schemas are identified, for example, similar concepts that are represented in different ways. 2. Conflict Resolution During conflict resolution the conflicts identified during conflict analysis are resolved. For example, a common method of representing equivalent concepts will be decided upon. This process may involve discussing the problems with the users or correcting errors in the schemas. 3. Schema Merging During schema merging the schemas are merged into a single schema using the decisions made during the conflict resolution. Ref: Batini, sec. 5.2. Stephen Mc Kearney, 2002. 12

Overview Definition What is schema integration? Merging two or more database schemas (models). Problems Different methods of representing the same concepts, for example, naming concepts. Strategies Schemas may be merged (i) in one go or (ii) in stages. Process Three stages: (i) identify problems, (ii) resolve problems, (iii) merge schemas. Resolving Conflicts Changing the names and the structure of entities. Example See paper. 13 Stephen Mc Kearney, 2002. 13

Conflicts - s Synonyms Objects that are the same but have different names. For example, passenger and customer. Homonyms Objects that are different but have the same names. For example, publication(=book) and publication(=journal). 14 There are two types of name conflict that occur in a database schema: Synonyms When two similar concepts occur with different names. For example, two public transport databases may have entities called passenger and customer. These entities may be the same entity. Homonyms When two different concepts occur with the same name. For example, two publishing databases may have entities called publication but in one database a publication may be a book while in the other database a publication may be a journal. conflicts cause a problem because information may be duplicated in the integrated database. It is important to identify those data items in each schema that actually represents the same concept or that should be represented using different structures in the integrated schema. Synonyms may be removed from the database by renaming the concepts so that they have the same name. Homonyms may be removed from the database by renaming the concepts so that they have different names. It may be possible to use a superclass/subclass relationship to avoid synonyms or homonyms. Ref: Batini, sec. 5.3.1. Stephen Mc Kearney, 2002. 14

Conflicts - Structural Identical concepts Merged Compatible concepts Representations are adapted and merged Incompatible concepts Different cardinalities Different identifiers Reverse subset relationships 15 Structural conflicts occur when the actual method of representing the same concept in different databases is different or incompatible. There are three cases: Identical Concepts When the same concept in different databases is represented in the same way they may be merged. For example, when an entity publication has the same structure and means the same in two databases the entities may be merged. Compatible Concepts When the same concept in different databases is represented in compatible ways they may be merged. For example, when an entity publication is represented by an attribute in one database and an entity in another database they may be merged by converting the attribute into an entity. Incompatible Concepts When the same concept in different databases is represented using different structures then it may be difficult to merge them directly. For example: - Relationships may have different cardinalities (i.e. one-tomany and many-to-many). - Primary keys may be different. - Set relationships may be reversed (e.g. projects contain programmes and programmes contain projects). Incompatible designs must be resolved by re-analysing the data and adapting one or more of the schemas or by constructing a new, common representation. Ref: Batini, sec. 5.3.2. Stephen Mc Kearney, 2002. 15

Overview Definition What is schema integration? Merging two or more database schemas (models). Problems Different methods of representing the same concepts, for example, naming concepts. Strategies Schemas may be merged (i) in one go or (ii) in stages. Process Three stages: (i) identify problems, (ii) resolve problems, (iii) merge schemas. Resolving Conflicts Changing the names and the structure of entities. Example See paper. 16 Stephen Mc Kearney, 2002. 16

Schema Integration Example adapted from A Comparative Analysis of Methodologies for Database Schema Integration Batini, Lenzerini, Navathe ACM Computing Surveys, Vol 18, No 4, Dec 1986 Stephen Mc Kearney, 2002. 17

Original Schemas Publisher Book University State Publication Publisher Address Surname Topics Keyword Research Area 18 Stephen Mc Kearney, 2002. 18

Step 1 Publisher Book University State Publication Publisher Address Surname Topics Topics Research Area Rename Keywords to Topics 19 Stephen Mc Kearney, 2002. 19

Step 2 Publisher Book University State Publisher Publication Address Surname Topics Topics Research Area Make the publisher attribute an entity 20 Stephen Mc Kearney, 2002. 20

Step 3 Publication Surname Address Publisher Topics Research Two Publisher entities merged together. Book Two Topic entities merged together. University State 21 Stephen Mc Kearney, 2002. 21

Step 4 Publication Surname Address Publisher Topics Research Book University State Book is a Publication 22 Stephen Mc Kearney, 2002. 22

Step 5 Publication Surname Address Publisher Topics Research Book University State Remove relationships inherited from Publication 23 Stephen Mc Kearney, 2002. 23