DISTRIBUTED AND PARALLELL DATABASE



Similar documents
Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Distributed Databases

Distributed Data Management

Chapter 18: Database System Architectures. Centralized Systems

chapater 7 : Distributed Database Management Systems

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Topics. Distributed Databases. Desirable Properties. Introduction. Distributed DBMS Architectures. Types of Distributed Databases

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Introduction to Parallel and Distributed Databases

Distributed Databases in a Nutshell

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Data Management in the Cloud

Appendix A Core Concepts in SQL Server High Availability and Replication

Optimizing Performance. Training Division New Delhi

Tier Architectures. Kathleen Durant CS 3200

Distributed Database Systems

Principles of Distributed Database Systems

IV Distributed Databases - Motivation & Introduction -

Middleware Lou Somers

Distributed and Parallel Database Systems

BBM467 Data Intensive ApplicaAons

1 File Processing Systems

Fragmentation and Data Allocation in the Distributed Environments

System types. Distributed systems

Distributed Database Management Systems for Information Management and Access

TOP-DOWN APPROACH PROCESS BUILT ON CONCEPTUAL DESIGN TO PHYSICAL DESIGN USING LIS, GCS SCHEMA

Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit

SCALABILITY AND AVAILABILITY

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Distributed Database Management Systems

In Memory Accelerator for MongoDB

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Middleware for Heterogeneous and Distributed Information Systems

How To Understand The Concept Of A Distributed System

An Overview of Distributed Databases

Module 14: Scalability and High Availability

Principles and characteristics of distributed systems and environments

Distributed Systems. REK s adaptation of Prof. Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 1

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Distribution transparency. Degree of transparency. Openness of distributed systems

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

Database Scalability {Patterns} / Robert Treat

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Distributed Systems Architectures

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

Transactions and the Internet

Availability Digest. Raima s High-Availability Embedded Database December 2011

Hadoop and Map-Reduce. Swati Gore

Distributed Database Design

The Oracle Universal Server Buffer Manager

A distributed system is defined as

Practical Cassandra. Vitalii

When an organization is geographically dispersed, it. Distributed Databases. Chapter 13-1 LEARNING OBJECTIVES INTRODUCTION

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

VII. Database System Architecture

Using Oracle Real Application Clusters (RAC)

Distributed Databases

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

The Sierra Clustered Database Engine, the technology at the heart of

White Paper. Optimizing the Performance Of MySQL Cluster

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Distributed Databases

Microsoft SQL Database Administrator Certification

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

Dr Markus Hagenbuchner CSCI319. Distributed Systems

Server Consolidation with SQL Server 2008

Administering a Microsoft SQL Server 2000 Database

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Processing of Hadoop using Highly Available NameNode

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Real-time Data Replication

Contents RELATIONAL DATABASES

Software Life-Cycle Management

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft SQL Server

Contents. SnapComms Data Protection Recommendations

ORACLE DATABASE 10G ENTERPRISE EDITION

Service Oriented Architectures

Planning the Installation and Installing SQL Server

Software Concepts. Uniprocessor Operating Systems. System software structures. CIS 505: Software Systems Architectures of Distributed Systems

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Chapter 10: Distributed DBMS Reliability

Microsoft Azure Data Technologies: An Overview

Apache HBase. Crazy dances on the elephant back

Chapter 1: Introduction

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Availability Digest. MySQL Clusters Go Active/Active. December 2006

DATABASE DESIGN - 1DL400

Using DataDirect Connect for JDBC with Oracle Real Application Clusters (RAC)

Introduction to Databases

Transcription:

DISTRIBUTED AND PARALLELL DATABASE SYSTEMS Tore Risch Uppsala Database Laboratory Department of Information Technology Uppsala University Sweden http://user.it.uu.se/~torer PAGE 1

What is a Distributed System? Background A Distributed System is a number of autonomous computers communicating over a network with software for integrated tasks. Examples of Distributed Systems: SUN s Network File System (NFS), distributed file system WS WS WS WS PC Local Area Network PC PC FS FS PS PS PAGE 2

Client-Server Architecture CLIENT CLIENT CLIENT CLIENT Each client sends request (RPCs) to server Server waits for requests from clients FAT Server: Databases, File servers, Heavy computations Thin Client: Graphics, User interactions SERVER FAT Client: CAD System, Numerical computations RPC, CORBA, SOAP PAGE 3

Database Communication VERY FAT SERVER Many clients Much data Stream based client-server interfaces DBMS specific interfaces Compiler integrated interfaces (embedded SQL) ODBC: SQL-based standardized subroutine call library (MicroSoft) JDBC: ODBC for Java (not MicroSoft) PAGE 4

Distributed Databases DB DB DB DB CL CL CL CL Database transparently seen from application ONE database, usual SQL interface. Manual partitioning or fragmentation and repliction of data tables: Distributed database design DDBMS automatically optimizes queries and updates to distributed database. PAGE 5

What is a Distributed Database? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A DDBMS is the software system that permits the management of DDBs and makes the distribution transparent to the user. A DDB is NOT: - A collection of files (need structure and DB manager) - A client-server interface to a database Data on one node, clients on other nodes in network (Almost) every centralized DBMS has client-server interface PAGE 6

Example of DDB Multi-national company with distributed departments Distributed data management: - Each location keep local records of local employees - R & D keeps track of what is going on at its facility. - Manufacturing plants keep data related to their engineering operations and access to R & D - Manufacturing keeps track of local inventory. Access to warehouse also possible. - Warehouses keeps local inventory. Manufacturing can access inventory levels. - Headquarters keep marketing and sales records per region. Share with other headquarters and can access inventory data at plants and warehouses. PAGE 7

Advantages with DDBs Local autonomy for DDB nodes - Local control - Local policies Improved performance - Avoid data shipping Improved Reliability - Crashes less severe (if application not dependent on non-local data) Expandability - Easy to add new nodes (not always linear scale-up because of central directory) Sharability - Uniform interface and sharing through DDBMS PAGE 8

Replication and fragmentation Data replication Same data on several nodes - For reliability and read performance - Not necessary to replicate all tables Fully replicated vs partially replicated Full replication often not realistic! - Updates must be propagated to each replica! - Special procedures after failures to restore consistency - More problematic transaction synchronization! - Asynchronous propagation often OK Data Fragmentation (= data partitioning) Tables transparently split over several nodes - For access performance - Good when nodes far apart PAGE 9

Problems with DDBs Complexity Database administration may be complex (e.g. design, recovery) Distribution of administrative Control Security Networking a known problem Distributed schema management Schema is accessed whenever SQL query issued! - Global directory => Central Database becomes hot spot - Local directories => Data replication => Since schema is not updated often but need to be accessed very often it is normally fully replicated by the DDBMS. Synchronous distributed concurrency control decreases update performance Reliability of DDBMS - Maintain consistency of replicas - Bring up (fragmented) database at failed sites PAGE 10

Transparency Replication Transparency - User unaware of data replicas - Automatic replica propagation at update - Asyncrounous replica propagation may suffice - Special problems when nodes are down Fragmentation Transparency - E.g. a logical relation is horizontally fragmented into local physical tables - Translation from global queries to fragmented queries Network Transparency - Protect user from operational details of network - Hide existence of network - No machine names in database table references Location transparency Naming transparency PAGE 11

Distributed database design - Where to put data and applications? - Partitioned data: Split data into distributed partitions - Replicated data: Several sites have data copies - Goal: Minimize combined cost of storing data, communication, transactions. - NP complete optimization problem. Distributed Query Processing - Automatically done by distributed query processor of DDBMS - Analyze query --> distributed execution plan - Factors: Data replication Data fragmentation Communication costs PAGE 12

Distributed Database Design (Fragmentation) Correctness of Fragmentation 1. Completeness R has fragments R 1, R 2,...,R n Should be possible to find every tuple of relation in some fragment(s) Lossless decomposition of fragmented relation. 2. Reconstructability Should be possible to reconstruct original relation with some relational operator, : R = R i, R i F R 3. Distinctness Data items (tuples or columns) d of R should only occur in exactly one fragment d R i => d R j, i j Horizontal: d are rows Vertical: d are columns PAGE 13

Parallel databases Query Manager Query Manager Query Manager Query Manager Query Manager Coordinator Data Server Data Server Data Server Data Server Data Server PAGE 14

Parallel data servers Parallel Databases Use parallel processing in cluster of computer nodes for data servers Automatic fragmentation and replication of data over the different nodes by the Parallel DBMS (PDBMS) To achieve maximal throughput and availablility Utilization of modern hardware - Multiple independent hardware components interconnected through fast communication medium (e.g. a bus) - Modern multi-processor architectures natural to support interquery, intraquery, and intraoperation parallelism - Connect disk per processor unit PAGE 15

Distributed Algorithms, SDDS Distributed data storage algorithms should be designed so that no hot spot nodes are created and so that they scale well. Problem: How to store large storage structures on distributed nodes without any central hot spot. How to design storage structure that gracefully grows and shrinks over nodes in network as database evolves. Solution: SDDS, Scalable Distributed Data Structures Most well known SDDS: LH* by Witold Litwin (Chord very similar) LH* is a scalable and distributed hash table. Can be extended with high-availability: LH*g, LH*m PAGE 16

LH* Forward Forward Hash buckets IAM IAM Clients No directory (no hot spot) Each client has approximate image of hash buckets Addressing error => max 2 forwardings + Image Adjustment Messages Table growth => bucket splits => dynamic extension of # of hash buckets PAGE 17

DDBMS Reliability Two-Phase Commit Protocol (2PC) The most well known protocol to ensure atomic commitment of distributed transactions. Extend local site commit protocols by contract (synchronization) that they all agree on committing (or aborting) in a distributed transaction. PAGE 18

Coordinator Start prepare Each participant Start Write begin_commit in log Wait Any No? n Write commit in log commit abort! commit! y write abort in log abort ack Write abort in log ack write abort in log n global-abort! global-commit! Ready to commit? y Write ready in log Ready abort Type of msg? commit write commit in log write end_of_transaction in log abort commit PAGE 19