IV Distributed Databases - Motivation & Introduction -



Similar documents
Distributed Databases

DISTRIBUTED AND PARALLELL DATABASE

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Distributed Database Management Systems

Principles and characteristics of distributed systems and environments

Introduction to Parallel and Distributed Databases

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

ORACLE DATABASE 10G ENTERPRISE EDITION

DATABASE REPLICATION A TALE OF RESEARCH ACROSS COMMUNITIES

Distributed Systems LEEC (2005/06 2º Sem.)

Distribution transparency. Degree of transparency. Openness of distributed systems

Introduction to Databases

chapater 7 : Distributed Database Management Systems

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

Real-time Data Replication

High Availability Databases based on Oracle 10g RAC on Linux

Tier Architectures. Kathleen Durant CS 3200

Database Replication with Oracle 11g and MS SQL Server 2008

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Challenges for Data Driven Systems

Distributed Data Management

An Overview of Distributed Databases

TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA

Module 14: Scalability and High Availability

In Memory Accelerator for MongoDB

Distributed Database Design

Distributed Databases in a Nutshell

Report Data Management in the Cloud: Limitations and Opportunities

Chapter 2: DDBMS Architecture

How To Understand The Concept Of A Distributed System

Distributed System: Definition

Distributed System Principles

Distributed Database Management Systems for Information Management and Access

Relational Database Systems 2 1. System Architecture

SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK

AHAIWE Josiah Information Management Technology Department, Federal University of Technology, Owerri - Nigeria jahaiwe@yahoo.

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Cloud Computing at Google. Architecture

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Enterprise Applications

Introduction: Database management system

Chapter 18: Database System Architectures. Centralized Systems

Evolution of Distributed Database Management System

CHAPTER 1: OPERATING SYSTEM FUNDAMENTALS

Virtual machine interface. Operating system. Physical machine interface

<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region

Data Management in the Cloud

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Distributed Systems and Recent Innovations: Challenges and Benefits

Performance And Scalability In Oracle9i And SQL Server 2000

Storage Virtualization from clusters to grid

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

A distributed system is defined as

Contents. SnapComms Data Protection Recommendations

Distributed Systems. Examples. Advantages and disadvantages. CIS 505: Software Systems. Introduction to Distributed Systems

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

MapReduce and Hadoop Distributed File System V I J A Y R A O

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Winter 2009 Lecture 1 - Class Introduction

VII. Database System Architecture

High Availability Solutions for the MariaDB and MySQL Database

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

MapReduce and Hadoop Distributed File System

Daniel J. Adabi. Workshop presentation by Lukas Probst

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

Double-Take Replication in the VMware Environment: Building DR solutions using Double-Take and VMware Infrastructure and VMware Server

The Classical Architecture. Storage 1 / 36

Benchmarking Data Replication Performance for The Defense Integrated Military Human Resources System

Adding Indirection Enhances Functionality

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Applying Attribute Level Locking to Decrease the Deadlock on Distributed Database

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Applying traditional DBA skills to Oracle Exadata. Marc Fielding March 2013

Hardware Performance Optimization and Tuning. Presenter: Tom Arakelian Assistant: Guy Ingalls

Database Middleware and Web Services for Data Distribution and Integration in Distributed Heterogeneous Database Systems

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database

Survey on Comparative Analysis of Database Replication Techniques

Chapter 1: Introduction

DBMS / Business Intelligence, SQL Server

Relational Database Basics Review

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

System Models for Distributed and Cloud Computing

Chapter 3 - Data Replication and Materialized Integration

Data Grids. Lidan Wang April 5, 2007

Tushar Joshi Turtle Networks Ltd

SQL Server Administrator Introduction - 3 Days Objectives

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction

Microsoft SQL Database Administrator Certification

Transcription:

IV Distributed Databases - Motivation & Introduction - I OODBS II XML DB III Inf Retr DModel Motivation Expected Benefits Technical issues Types of distributed DBS 12 Rules of C. Date Parallel vs Distributed DBS References M.T. Özsu and P. Valduriez. Principles of Distributed Database Systems, 2nd edition. Prentice-Hall,1999. Rahm, E.: Mehrrechner-Datenbanksysteme, Addison-Wesley, 1994 G. Vossen, G. Weikum: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery, Morgan Kaufmann, 2001, ISBN ISBN: 1558605088 Gray, J.; Reuter, A.: Transaction Processing - Concepts and Techniques, Morgan Kaufmann Publishers, San Matteo, 1993 Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems, Addison-Wesley, 1987 (pdf) Bernstein, P.A., Newcomer, E.: Principles of Transaction Processing, Morgan Kaufmann, San Matteo, 1997 Material used from B. Kemme (McGill), H. Garcia-Molina (Stanford), A. Zaslavsky et al.(monash), G. Alonso (ETH) hs / FUB dbsii-03-10ddbintro-2

Motivation Application: Data "naturally" distributed Companies with different branches Airlines Financial Business University / faculties Any organization with a decentralized organizational structure Technology: Network infrastructure, processors, RAM Economy: Hardware cost Software supporting Distributed Processing, e.g RPC Huge number of interconnected systems Recent challenge: Web-based Computing E-Commerce hs / FUB dbsii-03-10ddbintro-3 Goals: Improvement of non functional characteristics Performance: the more computing power, the better Primary goal for parallel DBS, not necessary distributed DB Reliability: Substitute faulty components (HW, software and network) seamlessly Fault tolerance: the ability to hide failures from users Related to higher availability 95,8 % too low? Definitely: 1 hour / day! Scalability upscale / downscale your system incrementally Central components and algorithms counter productive Distributed algorithms hs / FUB dbsii-03-10ddbintro-4

The dark side of distribution Systems often less reliable "You will never make a system of unreliable components more reliable by adding more unreliable components" However: hot standby But: data copies must be kept consistent, complex software, unreliable network. Scalability DS inherently complex High development cost -> middleware efforts High administration cost lack of flexibility hs / FUB dbsii-03-10ddbintro-5 The dark side Performance Double resources do not guarantee double performance Network performance? Transfer time not only depends on bandwidth Transfer of 4 KB page latency Bandwidth transfer - 100 m 0.5 µs 10 Mbps 5 ms - 100 m 0.5 µs 100 Mbps 0.5 ms - 1 km 5 µs 100 Mbps 0.5 ms - 100 km 0.5 ms 100 Mbps 1 ms - 1000 km 5 ms 100 Mbps 5.5 ms - 10000 km 50 ms 1 Gbps 50 ms Distance > 100 km signal propagation time dominates Compare mean disk access time: ~ 5 ms hs / FUB dbsii-03-10ddbintro-6

What is a Distributed Database? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (D DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. Distributed database system (DDBS) = DDB + D DBMS Def. by P. Valduriez, T. Öszu hs / FUB dbsii-03-10ddbintro-7 Example (1) Transparency of distribution: one logical DB UPDATE empl SET sal = sal*1.1 WHERE proj.dur>12 AND emp.id = ass.eid AND proj.id=ass.pid Berlin All projects Berlin employees All assigments net New York NY employees Munic Muc projects Muc employees Muc assigments Expl. by B. Kemme hs / FUB dbsii-03-10ddbintro-8

Example (2) Cooperation: autonomous DB cooperating on particular tasks SELECT flights WHERE departure = Montreal AND arrival = Munich AND date = 12/9/2002 AND price < 800$ lufthansa.com net Travel-overland.com air-canada.com hs / FUB dbsii-03-10ddbintro-9 Example(3) Autonomous, heterogenous systems, logically identical data types Select empl SET sal = sal*0.9 WHERE jobtitle = "product manager" Daimler / Stuttg. OnlyStuttgart data IBM DB2 net Daimler / Bremen Chrysler / Detroit Only Detroit data Oracle 9i Only Bremen data MySQL hs / FUB dbsii-03-10ddbintro-10

Example (4) Sophisticated Client / Server computing client client client client Application Server A Application Server B Possible R/W conflict hs / FUB dbsii-03-10ddbintro-11 Classification criteria Distribution Physically independent systems Peer-to-peer: data distribution and sharing Client / Server: function distribution e.g. parsing in client Heterogeneity DBMS software Database schema (Types) and languages (SQL variants) Autonomy No global control Local DBS operations may not influenced by global operations (e.g. of a global transaction) Note: subsumes completely independent or semiautonomous systems, see scenarios hs / FUB dbsii-03-10ddbintro-12

Classification cube by P. Valduriez, T. Öszu Distributed DB: looks like one DB Federated: more autonomy but not independent (Expl. 3) Multi DB: independent, cooperative (Expl. 2) hs / FUB dbsii-03-10ddbintro-13 Scenarios and common problems Not just one distributed database systems.. but indefinitely many Understand common problems e.g. how to guarantee one state for replicated data from the user point of view Solve by developing distributed algorithms e.g. transaction commit Main issue: Any unsolvable problems? Partial failure Example: Internet marriage bride priest groom All participants and communication unreliable Distributed transaction: YES of NO, this is the question hs / FUB dbsii-03-10ddbintro-14

12 +1 rules for DDBS (C. Date) Rule 0: A DDB looks like a central DB to users Rule 1: sites should be as independent as possible local autonomy Rule 2: There should not be a central master all sites are dependent on - No reliance on central site Rule 3: Never a need for complete shutdown continuous operation Rule 4: Users should not need to know where data are stored - location transparency (independence) Rule 5: If data are split (e.g. columns of one relation) and distributed over several sites, user's should not be aware of it - fragmentation transparency hs / FUB dbsii-03-10ddbintro-15 12 rules Rule 6: Users should not be aware of replicated data - replication independence Rule 7: Efficient distributed query processing Rule 8: Global concurrency control and recovery distributed transaction management Rule 9: Hardware independence Rule 10: OS independence Rule 11: Network independence Rule 12: DBMS independence hs / FUB dbsii-03-10ddbintro-16

Parallel versus Distributed Databases More similarities than differences Similar to Parallel / Distributed Processing distinction Parallel DBS Not geographically distributed Goal: High Performance Homogenous Software Fast interconnect Distributed DBS Data geographically distributed Goal: Data sharing Disconnected operation possible -> autonomy Transparency hs / FUB dbsii-03-10ddbintro-17 Parallel / distributed DBS Query processing in parallel DBS Distribute operators (sort, filter, ) an data over processor to make complex processing fast e.g. join on a shared disk MP system P P P P M 1 M n Join (R, S) { // R >> S 1. Split R into n-1 partitions R i and assign to M i /P i ; Assign S to processor / memory P n / M n ; 2. Sort R i and S; ( //n parallel 3. Join (n-1) + 1 streams } hs / FUB dbsii-03-10ddbintro-18

Parallel / distributed DBS Distributed QP Given a data distribution Find strategy to evaluate query with minimal cost, in particular communication cost 10000 km S = 100000 records R = 10000 records 100 km Compute with minimal cost (time): R S T T = 1000 records hs / FUB dbsii-03-10ddbintro-19 Important terms Motivation: technology, application, economy Expected benefits: Scalability reliability performance Data / function distribution Fault tolerance in case of partial failures Autonomy, multi database, federated DB Distribution transparency Parallel versus Distributed DBS hs / FUB dbsii-03-10ddbintro-20