chapater 7 : Distributed Database Management Systems



Similar documents
Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Evolution of Distributed Database Management System

Distributed Data Management

Distributed Databases

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

An Overview of Distributed Databases

When an organization is geographically dispersed, it. Distributed Databases. Chapter 13-1 LEARNING OBJECTIVES INTRODUCTION

Distributed Databases

Distributed Database Management Systems

DISTRIBUTED AND PARALLELL DATABASE

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

B.Com(Computers) II Year DATABASE MANAGEMENT SYSTEM UNIT- V

TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA

Distributed Database Management Systems for Information Management and Access

Distributed Databases

Topics. Distributed Databases. Desirable Properties. Introduction. Distributed DBMS Architectures. Types of Distributed Databases

Fragmentation and Data Allocation in the Distributed Environments

Software Life-Cycle Management

Distributed Databases in a Nutshell

BBM467 Data Intensive ApplicaAons

Distributed System Principles

SQL Server Training Course Content

Introduction to Parallel and Distributed Databases

VII. Database System Architecture

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Principles of Distributed Database Systems

Chapter 3: Distributed Database Design

Tier Architectures. Kathleen Durant CS 3200

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Advanced Database Group Project - Distributed Database with SQL Server

Data Management in the Cloud

AHAIWE Josiah Information Management Technology Department, Federal University of Technology, Owerri - Nigeria jahaiwe@yahoo.

Introduction to Databases

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

Introduction to Database Systems

SODDA A SERVICE-ORIENTED DISTRIBUTED DATABASE ARCHITECTURE

Technologies & Applications

Outline. Mariposa: A wide-area distributed database. Outline. Motivation. Outline. (wrong) Assumptions in Distributed DBMS

Oracle Database Links Part 2 - Distributed Transactions Written and presented by Joel Goodman October 15th 2009

Chapter 18: Database System Architectures. Centralized Systems

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

The Sierra Clustered Database Engine, the technology at the heart of

Scalability and BMC Remedy Action Request System TECHNICAL WHITE PAPER

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Chapter 5. Learning Objectives. DW Development and ETL

Database Replication

Data Grids. Lidan Wang April 5, 2007

Virtuoso Replication and Synchronization Services

Distributed Database Design (Chapter 5)

1. INTRODUCTION TO RDBMS

IV Distributed Databases - Motivation & Introduction -

DATABASE SYSTEM CONCEPTS AND ARCHITECTURE CHAPTER 2

Optimizing Performance. Training Division New Delhi

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

Mobile and Heterogeneous databases Database System Architecture. A.R. Hurson Computer Science Missouri Science & Technology

Gradient An EII Solution From Infosys

Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit

Distributed Systems. Outline. What is a Distributed System?

Distributed Databases

C/S Basic Concepts. The Gartner Model. Gartner Group Model. GM: distributed presentation. GM: distributed logic. GM: remote presentation

Mind Q Systems Private Limited

Load Balancing in Distributed Data Base and Distributed Computing System

Assistant Information Technology Specialist. X X X software related to database development and administration Computer platforms and

Module 14: Scalability and High Availability

SQL Server 2012 Database Administration With AlwaysOn & Clustering Techniques

H4 DATABASE DEVELOPMENT SOLUTIONS & MARKING SCHEME JUNE 2013

Network Attached Storage. Jinfeng Yang Oct/19/2015

Distributed Database Systems

Distributed Database Design

Hadoop and Map-Reduce. Swati Gore

How To Virtualize A Storage Area Network (San) With Virtualization

Cluster, Grid, Cloud Concepts

Conventional Files versus the Database. Files versus Database. Pros and Cons of Conventional Files. Pros and Cons of Databases. Fields (continued)

Client/Server and Distributed Computing

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Security Implications of Distributed Database Management System Models

Postgres Plus xdb Replication Server with Multi-Master User s Guide

A Shared-nothing cluster system: Postgres-XC

ENTERPRISE VIRTUALIZATION ONE PLATFORM FOR ALL DATA

ICS 434 Advanced Database Systems

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Database Management. Chapter Objectives

Client/Server Computing Distributed Processing, Client/Server, and Clusters

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

How to Implement Multi-way Active/Active Replication SIMPLY

Data Distribution with SQL Server Replication

In Memory Accelerator for MongoDB

Client Server Architecture

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Relational Databases in the Cloud

New method for data replication in distributed heterogeneous database systems

Transcription:

chapater 7 : Distributed Database Management Systems Distributed Database Management System When an organization is geographically dispersed, it may choose to store its databases on a central database server or to distribute them to local servers (or a combination of both). A distributed database is a single logical database that is spread physically across computers in multiple locations that are connected by a data communications network. We emphasize that a distributed database is truly a database, not a loose collection of files. The distributed database is still centrally administered as a corporate resource while providing local flexibility and customization DDBMS is a centralized application that manages a distributed database. This database system synchronizes data periodically and ensures that any change in data made by users is universally updated in the database. Distributed DBMS To have a distributed database, there must be a database management system that coordinates the access to data at the various nodes. We will call such a system a distributed DBMS. Although each site may have a DBMS managing the local database at that site, a distributed DBMS will perform the following functions 1. Keep track of where data are located in a distributed data dictionary. This means, in part, presenting one logical database and schema to developers and users. 2. Determine the location from which to retrieve requested data and the location at which to process each part of a distributed query without any special actions by the developer or user. 3. If necessary, translate the request at one node using a local DBMS into the proper request to another node using a different DBMS and data model and return data to the requesting node in the format accepted by that node. 4. Provide data management functions, such as security, concurrency and deadlock control, global query optimization, and automatic failure recording and recovery. 5. Provide consistency among copies of data across the remote sites (e.g., by using multiphase commit protocols). 1 P a g e

6. Be scalable. Scalability is the ability to grow, reduce in size, and become more heterogeneous as the needs of the business change. Thus, a distributed database must be dynamic and be able to change within reasonable limits and without having to be redesigned. Scalability also means that there are easy ways for new sites to be added (or to subscribe) and to be initialized (e.g., with replicated data). Homogeneous distributed database In a homogeneous distributed database system, all sites have identical databasemanagement system software, are aware of one another, and agree to cooperate in processing users requests. Heterogeneous distributed database, heterogeneous distributed database, different sitesmay use different schemas, and different database-management system software. The sites may not be aware of one another, and they may provide only limited facilities for cooperation in transaction processing. Distributed Data Storage Consider a relation r that is to be stored in the database. There are two approaches to storing this relation in the distributed database: Replication. The system maintains several identical replicas (copies) of the relation, and stores each replica at a different site. The alternative to replication is to store only one copy of relation r. Fragmentation. The system partitions the relation into several fragments, and stores each fragment at a different site. Fragmentation and replication can be combined: A relation can be partitioned into several fragments and there may be several replicas of each fragment. In the following subsections, we elaborate on each of these techniques. 2 P a g e

Data Replication If relation r is replicated, a copy of relation r is stored in two or more sites. In the most extreme case, we have full replication, in which a copy is stored in every site in the system. There are a number of advantages and disadvantages to replication. Availability. If one of the sites containing relation r fails, then the relation r can be found in another site. Thus, the system can continue to process queries involving r, despite the failure of one site.most extreme case, we have full replication, in which a copy is stored in every site in the system. Increased parallelism. In the case where the majority of accesses to the relation r result in only the reading of the relation, then several sites can process queries involving r in parallel. The more replicas of r there are, the greater the chance that the needed data will be found in the site where the transaction is executing. Hence, data replication minimizes movement of data between sites. Increased overhead on update. The system must ensure that all replicas of a relation r are consistent; otherwise, erroneous computations may result. Thus, whenever r is updated, the update must be propagated to all sites containing replicas. The result is increased overhead. For example, in a banking system, where account information is replicated in various sites, it is necessary to ensure that the balance in a particular account agrees in all sites. Data Fragmentation If relation r is fragmented, r is divided into a number of fragments r1, r2,..., rn. These fragments contain sufficient information to allow reconstruction of the original relation r. There are two different schemes for fragmenting a relation: horizontal fragmentation and vertical fragmentation. Horizontal fragmentation splits the relation by assigning each tuple of r to one or more fragments. Vertical fragmentation splits the relation by decomposing the scheme R of relation r. horizontal fragmentation In horizontal fragmentation, a relation r is partitioned into a number of subsets, r1, r2,..., rn. Each tuple of relation r must belong to at least one of the fragments, so that the original relation can be reconstructed, if needed. 3 P a g e

vertical fragmentation Vertical fragmentation refers to the division of a relation into attribute (column) subsets. Each subset (fragment) is stored at a different node, and each fragment has unique columns with the exception of the key column, which is common to all fragments. Example, Consider the following table Customer_Id Name Area Payment Type Sex 1 BOB London Credit Card Male 2 Mike Manchester Cash Male 3 Ruby London Cash Female Horizontal Fragmentation are subsets of tuples (rows) Fragment 1 Customer_Id Name Area Payment Type Sex 1 BOB London Credit Card Male 2 Mike Manchester Cash Male Fragment 2 Customer_Id Name Area Payment Type Sex 3 Ruby London Cash Female 4 P a g e

Vertical fragmentation are subset of attributes Fragment 1 Customer_Id Name Area Sex 1 BOB London Male 2 Mike Manchester Male 3 Ruby London Female Fragment 2 Customer_Id Payment Type 1 Credit Card 2 Cash 3 Cash Components of Distributed Database System Hardware Communications Media Software Each processing site (or node) that forms the database system can consist of various types of hardware. Nodes can be mainframes, minicomputers or microcomputers. Homogeneousnodes combine the same type of hardware whereasheterogeneous nodes combine a mixture of hardware. Network hardware and software allows each node to communicate and exchange data with other nodes that comprise the network. Local area networks typically use cables to transmit data from node to node, whereas telephone lines or satellites are used for more widely dispersed sites. A distributed database management system is a collection of data processors and transaction processors. Data Processors Are programs that store and retrieve data at local sites. A DP could be an 5 P a g e

(DPs) Transaction processors (TPs) independent database management system such as Access or Oracle, or it could be a subset of the distributed database management system. Are programs that control and co-ordinate query and transaction data requests from local or remote sites. Data requests are analysed by the TP to determine update or retrieval locations required by the data request. TP's do this by accessing the Distributed Data Catalog (DDC) which contains a description of the entire database. Once specific data locations are determined, the TP then transfers the data requests to the appropriate data processors. A TP could already exist as part of the distributed database management system or it could be specifically written. TP features can also be manually incorporated into queries and transactions, and you will see examples of this when we explore distributed database transparency features in the next section. Distributed Database Design Designing a distributed computing system involves taking decisions on the placement of data and programs in a computer network nodes, and network design itself. In the case of distributed databases, assuming that the network has been designed already and there is a copy of the DBMS software on each node in the network where data are stored, it rep a g e 6mains to focus our attention on the distribution of data. There are in general several design alternatives. Top-down approach: first the general concepts, the global framework are defined, after then the details. Down-top approach: first the detail modules are defined, after then the global framework. If the system is built up from a scratch, the top-down method is more accepted. If the system should match to existing systems or some modules are yet ready, the down-top method is usually used. 6 P a g e