Transaction Models of DDBMS

Similar documents
Course Content. Transactions and Concurrency Control. Objectives of Lecture 4 Transactions and Concurrency Control

Recovery Theory. Storage Types. Failure Types. Theory of Recovery. Volatile storage main memory, which does not survive crashes.

Transactions and Concurrency Control. Goals. Database Administration. (Manga Guide to DB, Chapter 5, pg , ) Database Administration

Goals. Managing Multi-User Databases. Database Administration. DBA Tasks. (Kroenke, Chapter 9) Database Administration. Concurrency Control

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

Transactions. SET08104 Database Systems. Napier University

Textbook and References

Transactional properties of DBS

Lecture 7: Concurrency control. Rasmus Pagh

Transactions: Definition. Transactional properties of DBS. Transactions: Management in DBS. Transactions: Read/Write Model

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina

Concurrency Control. Module 6, Lectures 1 and 2

Database Tuning and Physical Design: Execution of Transactions

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

Transaction Management Overview

Transactions and the Internet

Distributed Transactions

Chapter 15: Recovery System

Concurrency Control: Locking, Optimistic, Degrees of Consistency

Chapter 14: Recovery System

Chapter 16: Recovery System

Introduction to Database Management Systems

Roadmap DB Sys. Design & Impl. Detailed Roadmap. Paper. Transactions - dfn. Reminders: Locking and Consistency

Principles of Distributed Database Systems

! Volatile storage: ! Nonvolatile storage:

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

INTRODUCTION TO DATABASE SYSTEMS

Chapter 6 The database Language SQL as a tutorial

How To Recover From Failure In A Relational Database System

Distributed Databases: what is next?

Review: The ACID properties

Distributed Databases

Database Concurrency Control and Recovery. Simple database model

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

Transaction Processing Monitors

Distributed Database Management Systems

Concurrency control. Concurrency problems. Database Management System

COS 318: Operating Systems

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

Introduction to Database Systems. Module 1, Lecture 1. Instructor: Raghu Ramakrishnan UW-Madison

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Transaction Management in Distributed Database Systems: the Case of Oracle s Two-Phase Commit

Data Management in the Cloud

Entangled Transactions. Nitin Gupta, Milos Nikolic, Sudip Roy, Gabriel Bender, Lucja Kot, Johannes Gehrke, Christoph Koch

Chapter 10. Backup and Recovery

Definition of SOA. Capgemini University Technology Services School Capgemini - All rights reserved November 2006 SOA for Software Architects/ 2

Distributed Databases

Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types

DISTRIBUTED AND PARALLELL DATABASE

Unit 12 Database Recovery

Failure Recovery Himanshu Gupta CSE 532-Recovery-1

CAP Theorem and Distributed Database Consistency. Syed Akbar Mehdi Lara Schmidt

CPS221 Lecture - ACID Transactions

Transactions and ACID in MongoDB

Lesson 12: Recovery System DBMS Architectures

Week 1 Part 1: An Introduction to Database Systems. Databases and DBMSs. Why Use a DBMS? Why Study Databases??

UVA. Failure and Recovery. Failure and inconsistency. - transaction failures - system failures - media failures. Principle of recovery

How To Write A Transaction System

Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability

Ph.D. Thesis Proposal Database Replication in Wide Area Networks

TRANSACÇÕES. PARTE I (Extraído de SQL Server Books Online )

CS 245 Final Exam Winter 2013

Serializable Isolation for Snapshot Databases

Cloud DBMS: An Overview. Shan-Hung Wu, NetDB CS, NTHU Spring, 2015

Database Replication Techniques: a Three Parameter Classification

Distributed Database Systems

Postgres-R(SI): Combining Replica Control with Concurrency Control based on Snapshot Isolation

Introduction to Real-Time Databases

AN OVERVIEW OF DISTRIBUTED DATABASE MANAGEMENT

Geo-Replication in Large-Scale Cloud Computing Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

CMSC724: Concurrency

Oracle Database Links Part 2 - Distributed Transactions Written and presented by Joel Goodman October 15th 2009

The first time through running an Ad Hoc query or Stored Procedure, SQL Server will go through each of the following steps.

Special Relativity and the Problem of Database Scalability

Information Systems. Computer Science Department ETH Zurich Spring 2012

Database Replication with Oracle 11g and MS SQL Server 2008

Contents RELATIONAL DATABASES

Datenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo

Redo Recovery after System Crashes

Lecture 18: Reliable Storage

Simulations of the implementation of primary copy two-phase locking in distributed database systems

Udai Shankar 2 Deptt. of Computer Sc. & Engineering Madan Mohan Malaviya Engineering College, Gorakhpur, India

Database Integrity Mechanism between OLTP and Offline Data

5. CHANGING STRUCTURE AND DATA

THE UNIVERSITY OF TRINIDAD & TOBAGO

Real Time Database Systems

Introduction to Database Systems CS4320. Instructor: Christoph Koch CS

Concepts of Database Management Seventh Edition. Chapter 7 DBMS Functions

1.264 Lecture 15. SQL transactions, security, indexes

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Logistics. Database Management Systems. Chapter 1. Project. Goals for This Course. Any Questions So Far? What This Course Cannot Do.

Crash Recovery. Chapter 18. Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke

Transcription:

Transaction Models of DDBMS Topics covered: Transactions Characterization of transactions Formalization of transactions Serializability theory Concurrency control models Locks

Transactions The concept of transaction is a unit of consistent and reliable computation Database in a consistent state Database may be temporarily in an inconsistent state Database in a consistent state Begin transaction Execution of transaction End transaction Transaction management: keeping the DB in consistent state even when concurrent accesses and failures occur

Definition of a transaction A transaction makes transformations of system states preserving consistency A transaction is a sequence of read and write operations together with computation steps, assuming that the transaction may be executed concurrently with others: concurrency transparency must be provided failures may occur during execution: failure transparency must be provided

Example of a transaction Example DB: FLIGHT(FNO, DATE, SRC, DEST, STSOLD, CAP) CUST(CNAME, ADDR, BAL) FC(FNO, DATE, CNAME, SPECIAL) Transaction BEGIN_TRANSACTION RESERVATION BEGIN INPUT(flight_no, date, customer_name); EXEC SQL UPDATE FLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL) VALUES(flight_no, date, customer_name, null); END

Atomicity all or nothing Consistency Properties of transactions maps one consistent DB state to another the correctness of a transaction Isolation each transaction sees a consistent DB Durability the results of a transaction must survive system failures Remember ACIDity

Atomicity Treated as a unit of operation Either all the actions of a transaction are completed or none of them upon failure the DBMS can decide whether to terminate by completing the pending actions or terminate by undoing the actions that have been executed Maintainig atomicity requires recovery from failures transaction failures: data errors, deadlocks, etc. Transaction recovery system failures: media, processor failures, communication breakages, etc. Crash recovery

Classification of consistency (by Gray et al.) Dirty data: data values that have been written by a transaction prior to its commitment Degree 0 (Transaction T sees degree 0 consistency if) T does not overwrite dirty data of other transactions Degree 1: Degree 0 plus T does not commit any writes before end of transaction Degree 2: Degree 1 plus T does not read dirty data from other transactions Degree 3: Degree 2 plus Other transactions do not dirty any data read by T before T completes

Isolation (example) Possible execution schemes of T1 and T2 T1: Read(x) T1: Read(x) T1: x = x + 1 T1: x = x + 1 T1: Write(x) T2: Read(x) T1: Commit T1: Write(x) T2: Read(x) T1: Commit T2: x = x + 1 T2: x = x + 1 T2: Write(x) T2: Write(x) T2: Commit T2: Commit Lost update: incomplete results can be seen by other transactions Cascading aborts: if T1 decides to abort, all Reads 50 when x is 51 transactions that have seen T1 s incomplete results must be aborted

Isolation An executing transaction cannot reveal it results to other concurrent transactions before its commitment Isolation is related to serializability: if several transactions are executed concurrently, the results must be the same as if they were executed serially in some order There is a strong relationship between isolation and degrees of consistency: degree 0: low level of isolation, yet solves the problem of lost updates degree 2: solves both lost updates and cascading aborts degree 3: full isolation

Durability Once a transaction commits, its results are permanent and cannot be erased even if system failure occurs Database recovery

Termination of transactions A transaction always terminates if the task is successful: commits if the task is incomplete (for some reasons): aborts Commit either due to system failure or unsatisfied conditions rollback: undone the actions and return the DB to its state before execution the point of no return if a transaction is committed its results are permanently stored in the DB durability its results can be made visible to other transactions consistency, isolation

Example of termination BEGIN_TRANSACTION RESERVATION BEGIN INPUT(flight_no, date, customer_name) EXEC SQL SELECT STSOLD, CAP INTO temp1, temp2 FROM FLIGHT WHERE FNO = flight_no AND DATE = date; IF temp1 = temp2 THEN BEGIN OUTPUT( no free seats ); ABORT END ELSE BEGIN EXEC SQL UPDATE FLIGHT SET STSOLD = STSOLD + 1 WHERE FNO = flight_no AND DATE = date; EXEC SQL INSERT INTO FC(FNO, DATE, CNAME, SPECIAL) VALUES(flight_no, date, customer_name, null); COMMIT; OUTPUT( reservation completed ); END END

Formalization of the transaction concept Characterization Data items that a given transaction reads: Read Set (RS) writes: Write Set (WS) they are not necessarily mutually exclusive Base Set (BS): BS = RS WS Insertion and deletion are omitted, the discussion is restricted to static databases

Formalization of the transaction concept O ij (x): some atomic operation O j of transaction T i that operates on DB entity x O j {read, write} OS i = j O ij, i.e. all operations in T i N i {abort, commit}, the termination condition for T i Transaction T i is a partial ordering over its operations and the termination condition

Formalization of the transaction concept Partial order P = {Σ, p} where Σ is the domain p is an irreflexive and transitive relation Transition T i is a partial order {Σ i, p i } where Σ i = OS i N i For any two operations O ij, O ik OS i, if O ij =R(x) and O ik =W(x) for any data item x then either O ij p i O ik or O ik p i O ij, i.e. there must be an order between conflicting operations O ij OS i, O ij p i N i, i.e. áll operations must precede the termination The ordering relation p i is application dependent

Formalization of the transaction concept Example Read(x) Read(y) x = x + y Write(x) Commit R(x) R(y) W(x) Σ = {R(x), R(y), W(x), C} p = {(R(x), W(x)), (R(y), W(x)), (W(x), C), (R(x), C), (R(y), C)} where (O i, O j ) means O i p O j Partial order: the ordering is not specified for every pair of operations C

Characterization of transactions According to application type regular or distributed compensating heterogeneous According to duration on-line (short life) or batch (long life) According to structure flat, nested or workflow According to the order of read and write operations general two-step: all read ops before any write ops restricted: a data item must be read before written restricted two-step action: restricted where each read-write pair is atomic

Structural types of transactions Flat a sequence of primitive operations between begin and end markers Nested a transaction may include other transactions with their own commit points more concurrency introduced recovery is possible independently for each subtransaction a subtransaction can be a nested one too nesting open subtransactions begin after their parents and finish before them commitment is conditional upon the commitment of the parent closed subtransactions can execute and commit independently compensation may be necessary

Architecture revisited User Begin_transaction, Read, Write, Commit, Abort Results User Interface Handler Semantic Data Controller Global Query Optimizer Global Execution Monitor External Schema Global Conceptual Schema GD/D With other TMs Transaction Manager (TM) Scheduler (SC) With other SCs With other data processors User processor To data processors Global Execution Monitor (Distributed Execution Monitor)

Serializability theory Schedule (history) S: specifies an interleaved execution order over a set of transactions T={T 1, T 2,... T n } Complete schedule S Tc : is a partial order S T c ={Σ T, p T } over a set of transactions T={T 1, T 2,... T n } that defines the execution order of all operations in its domain Σ = U n Σ T i= 1 i n p U T p i= 1 i for any two conflicting operations O ij, O kl Σ T, either O ij p T O kl or O kl p T O ij

Serializability theory Schedule (example): a possible complete schedule T1: T2: Read(x) Read(x) x = x + 1 x = x + 1 Write(x) Write(x) Commit Commit Σ 1 = {R 1 (x), W 1 (x), C 1 }, Σ 2 = {R 2 (x), W 2 (x), C 2 } Σ T = Σ 1 Σ 2 = {R 1 (x), W 1 (x), C 1, R 2 (x), W 2 (x), C 2 } p T = {(R 1, R 2 ), (R 1, W 1 ), (R 1, C 1 ), (R 1, W 2 ), (R 1, C 2 ), (R 2, W 1 ), (R 2, C 1 ), (R 2, W 2 ), (R 2, C 2 ), (W 1, C 1 ), (W 1, W 2 ), (W 1, C 2 ), (C 1, W 2 ), (C 1, C 2 ), (W 2, C 2 )} R 1 R 2 W 1 W 2 C 1 C 2

Serializability theory Prefix: P = {Σ, p } is a prefix of partial order P = {Σ, p} if Σ Σ e i Σ, e 1 p e 2 iff e 1 p e 2 e i Σ, if e j Σ and e j p e i, then e j Σ Only the conflicting operations are relevant at scheduling - redefine schedule: Schedule (incomplete) S: is a prefix of complete schedule S T c

Serializability theory Incomplete schedule (example) T1: T2: T3: Read(x) Write(x) Read(x) Write(x) Write(y) Read(y) Commit Read(z) Read(z) Commit Commit Complete schedule Partial schedule the partial schedule is a prefix of complete schedule and equivalent to it R1(x) W2(x) R3(x) R1(x) W2(x) R3(x) W1(x) W2(y) R3(y) W2(y) R3(y) C1 R2(z) R3(z) R2(z) R3(z) C2 C3

Serializability theory Serial schedule (serial history): if in a schedule S the operations of various transactions are not interleaved, the schedule is serial S = {W 2 (x), W 2 (y), R 2 (z), C 2, W 1 (x), R 1 (x), C 1, R 3 (x), R 3 (y), R 3 (z), C 3 } T 2 p S T 1 p S T 3 Two schedules S 1 and S 2 are equivalent if for each pair of conflicting operations O ij, O kl (i k) whenever O ij p 1 O kl then O ij p 2 O kl. (conflict equivalence) Schedule S is serializable if it is conflict equivalent to a serial schedule (conflict-based serializability)

Serializability theory Transactions execute concurrently but the overall effect of the resulted history upon the database is equivalent to some serial scheduling Primary goal of concurrency control: generate a serializable schedule for the pending transactions Two histories must be taken into account: local schedule (at each site) global schedule

Serializability theory When the DB is partitioned, if each local schedule is serializable then the global schedule is serializable When the DB is replicated, the global schedule is serializable (one-copy serializable) if local schedules are serializable two conflicting operations are in the same relative order in each local schedule where they appear

Replica control protocol Consistency in presence of replication: one-copy serializability must be provided concurrency control plus replica control Assume data item x (logical data) is replicated as x 1, x 2,... x n (physical data items) each read(x) is mapped to one of the physical items each write(x) is mapped to a subset of the physical data copies If read(x) is mapped to one and write(x) is mapped to all physical copies, it is a read-once/write-all (ROWA) protocol

Pessimistic Concurrency control models 2-Phase Locking based (2PL) Centralized Primary copy Distributed Timestamp Ordering (TO) Basic Multiversion Conservative Hybrid Optimistic Locking Timestamp ordering

Locks Locks ensure that data shared by conflicting operations are accessed by one operation at a time - a simple way of serialization The lock is set by a transaction before the lock unit is accessed reset at the end of the operation if the lock is set already, the lock unit cannot be accessed Lock modes read lock (shared lock) write lock (exclusive lock) Read lock (x) Write lock (x) Read lock (x) compatible not compatible Write lock (x) not compatible not compatible Locks are controlled by the Lock Manager (LM) which is a part of the Scheduler (see architecture revisited)

Locks Two-phase locking (2PL): no transaction should request a lock after it releases one of its locks Transactions have growing phase lock point shrinking phase Obtain lock Lock point Release lock Begin End Theorem: any schedule that obeys 2PL rule is serializable (Eswaran et al.) Difficult to implement Transaction Manager (among others due to cascading aborts)

Locks Strict two-phase locking (S2PL): locks are released if the operation is a commit or an abort Obtain lock Data in use Release locks Begin End

Locks in distributed DBSs: Centralized 2PL There is only one 2PL scheduler (lock manager) in the distributed system All lock requests are addressed to it Data processors Coordinating TM Centralized LM 1 Lock request 3 Operation 2 Lock Granted 4 End of Operation 5 Release Lock Important: TM must implement the replica control protocol

Locks in distributed DBSs: Primary copy 2PL The centralized 2PL scheduler may form a bottleneck In PC2PL lock managers are implemented at a number of sites they are responsible for a given set of lock units TMs send lock and unlock requests to the scheduler that is responsible for the given lock unit one copy of the data item is treated as a primary copy the location of the primary copy must be determined prior to sending lock and unlock requests - a directory design issue

Locks in distributed DBSs: Distributed 2PL LMs are available at each site in D2PL if the DB is not replicated, it is the same as PC2PL if replicated, it implements the ROWA protocol operations are passed via LMs - there is no lock granted message Coordinating TM Participating LMs Participating DPs 1 Lock request 2 Operation 3 End of Operation 4 Release Lock