Concurrency control. Concurrency problems. Database Management System

Similar documents
Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

Concurrency Control. Module 6, Lectures 1 and 2

Lecture 7: Concurrency control. Rasmus Pagh

Database Tuning and Physical Design: Execution of Transactions

Course Content. Transactions and Concurrency Control. Objectives of Lecture 4 Transactions and Concurrency Control

Transaction Management Overview

CHAPTER 6: DISTRIBUTED FILE SYSTEMS

Recovery Theory. Storage Types. Failure Types. Theory of Recovery. Volatile storage main memory, which does not survive crashes.

Concurrency Control: Locking, Optimistic, Degrees of Consistency

Roadmap DB Sys. Design & Impl. Detailed Roadmap. Paper. Transactions - dfn. Reminders: Locking and Consistency

Introduction to Database Management Systems

Transactions. SET08104 Database Systems. Napier University

Transactional properties of DBS

Transactions: Definition. Transactional properties of DBS. Transactions: Management in DBS. Transactions: Read/Write Model

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

In This Lecture. More Concurrency. Deadlocks. Precedence/Wait-For Graphs. Example. Example

PostgreSQL Concurrency Issues

CMSC724: Concurrency

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Chapter 6 The database Language SQL as a tutorial

Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability

Homework 8. Revision : 2015/04/14 08:13

Recovery: Write-Ahead Logging

Goals. Managing Multi-User Databases. Database Administration. DBA Tasks. (Kroenke, Chapter 9) Database Administration. Concurrency Control

Transaction Processing Monitors

The Classical Architecture. Storage 1 / 36

Distributed Database Management Systems

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

Distributed Transactions

! Volatile storage: ! Nonvolatile storage:

Generalized Isolation Level Definitions

2 nd Semester 2008/2009

Ph.D. Thesis Proposal Database Replication in Wide Area Networks

High-Performance Concurrency Control Mechanisms for Main-Memory Databases

Distributed Databases

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina

Scalable Transaction Management on Cloud Data Management Systems

Module 3 (14 hrs) Transactions : Transaction Processing Systems(TPS): Properties (or ACID properties) of Transactions Atomicity Consistency

DATABASE ENGINES ON MULTICORES SCALE: A PRACTICAL APPROACH. João Soares and Nuno Preguiça

Information Systems. Computer Science Department ETH Zurich Spring 2012

Transactions and Concurrency Control. Goals. Database Administration. (Manga Guide to DB, Chapter 5, pg , ) Database Administration

Database Concurrency Control and Recovery. Simple database model

CAP Theorem and Distributed Database Consistency. Syed Akbar Mehdi Lara Schmidt

How To Recover From Failure In A Relational Database System

DDB Functionalities by Major DMBS Products. Haibin Liu Shcherbak Maryna Nassrat Hatem

Introduction to Database Systems. Module 1, Lecture 1. Instructor: Raghu Ramakrishnan UW-Madison

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Active database systems. Triggers. Triggers. Active database systems.

Review: The ACID properties

Database Replication with Oracle 11g and MS SQL Server 2008

Outline. Failure Types

Optimizing Concurrent Processing of Write-then-Read Transactions

Chapter 15: Recovery System

Recovery: An Intro to ARIES Based on SKS 17. Instructor: Randal Burns Lecture for April 1, 2002 Computer Science Johns Hopkins University

CS 245 Final Exam Winter 2013

The Oracle Universal Server Buffer Manager

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Textbook and References

A Proposal for a Multi-Master Synchronous Replication System

Real Time Database Systems

Database replication for commodity database services

Chapter 14: Recovery System

Compuprint 4247 Serial Matrix Printers

Network management and QoS provisioning - QoS in the Internet

Recovery System C H A P T E R16. Practice Exercises

Serializable Isolation for Snapshot Databases

Database Replication with MySQL and PostgreSQL

Data Management in the Cloud

Redo Recovery after System Crashes

Chapter 16: Recovery System

Issues in Designing Concurrency Control Techniques for Mobile Ad-hoc Network Databases

Chapter 10. Backup and Recovery

MS SQL Performance (Tuning) Best Practices:

Chapter 18: Database System Architectures. Centralized Systems

Implementing New Approach for Enhancing Performance and Throughput in a Distributed Database

Physical DB design and tuning: outline

Replicated open source databases for web applications: architecture and performance analysis

Transactions, Views, Indexes. Controlling Concurrent Behavior Virtual and Materialized Views Speeding Accesses to Data

Administração e Optimização de BDs 2º semestre

Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication

Applying Attribute Level Locking to Decrease the Deadlock on Distributed Database

Postgres-R(SI): Combining Replica Control with Concurrency Control based on Snapshot Isolation

High Availability Essentials

Lecture 18: Reliable Storage

Scalable Transaction Management with Snapshot Isolation on Cloud Data Management Systems

Low Overhead Concurrency Control for Partitioned Main Memory Databases

Distributed Data Management

COS 318: Operating Systems

New method for data replication in distributed heterogeneous database systems

Week 1 Part 1: An Introduction to Database Systems. Databases and DBMSs. Why Use a DBMS? Why Study Databases??

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies

How To Write A Transaction System

Chapter 10: Distributed DBMS Reliability

Special Relativity and the Problem of Database Scalability

ZooKeeper. Table of contents

MaaT: Effective and scalable coordination of distributed transactions in the cloud

Transcription:

Concurrency control Transactions per second (tps) is the measure of the workload of a operational DBMS; if two transactions access concurrently to the same data there is a problem: the module who resolve conflict avoiding interferences is concurrency control. He operates in order to increas DBMS efficiency in two ways:. maximizing the number of transaction per second;. minimizing the response time. Operations can cause concurrency problems are:. reading a data object x = r(x);. writing a data object x = w(x). It means that this operations may cause the reading on disk if the object is not present into the buffer or writing to disk an entire page of memory if, again, the object is going to be written does not fill in main memory. The scheduler is a block of the concurrency control module who decide if an operation requested (read or write) can be satisfied; if it is not present is possible to have problems due to correctness, also called anomalies. Concurrency problems Examples of problems due to concurrency are:. lost update;. dirty read;. inconsistent read: Lost update. ghost update (a);. ghost update (b). This effect can be shown when two transactions read the same object x and update concurrently its value: it happends that operation done by one transaction is lost. Dirty read This effect happends when a transaction reads the value of an object x who is not stable because the other transaction is still running so, for atomicity property, there will be a rollback cascade. 1

Inconsistent read When a transaction reads the object x twice and x has different values the problem is called inconsistent read. It happends because between the two reads another transaction has modified the value of x. There are two kinds of inconsistent read:. ghost update (a) if two transaction access concurrently to the same object and they view their modification each other; notice that all objects are already present into the database;. ghost update (b) if one of two transaction insert a new object into the database and another transaction access use that data. Theory of concurrency problem Definitions of transactions and schedule operations are different:. a transaction is a sequence of read/write operations characterized by the same transaction identifier (TID): r 1 (x)r 1 (y)w 1 (x)w 1 (y). the schedule is a sequence of read/write operations presented by concurrent transactions (they appear in the arrival order of requests): r 1 (z)r 2 (z)w 1 (y)w 2 (z) The scheduler compiles schedules which can be accepted or refused by concurrency control in order to avoid anomalies. The scheduler scan and processed the order of execution without knowing what will be the results and, if it refuse a schedule, the transaction will be aborted. Commit projection This approach runs assuming that the schedule only contains transactions performing commit and cover all anomalies a part from the dirty read. The most important concept introduced is serial schedule. A serial schedule is the schedule in which all actions of each transaction appear in sequence. For example: r 1 (x)w 1 (x)w 2 (x)r 2 (x) A serializable schedule is the schedule which produce the same results of an arbitrary serial schedule. There are some classes for equivalence between two schedules:. view equivalence; 2

. conflict equivalence;. 2 phase locking;. timestamp equivalence. A single class defines how characteristic of schedules must be and how much is complex define the equivalence principle. View equivalence In order to be equivalence, two schedule have to present the same set of read-from and final-write. A read-from means that the same object x was already written by another transaction, different from the one which is now reading x. A final-write means that the object x is written and there are no others transactions in the schedule that afterwards will write the same object. Ifascheduleisview serializable (VSR)it meansthat isview equivalent to an arbitrary serial schedule which have the same transactions. For example, in the sequence: w 0 (x)r 2 (x)r 1 (x)w 2 (x)w 2 (z) it is possible to find the following reletions: w 0 (x)r 2 (x) r 1 (x)w 2 (x)w 2 (z) where in blue are shown the read-from and in red the final-write. The sequence is serializable because it is equivalent to the following: Infact: w 0 (x)r 1 (x)r 2 (x)w 2 (x)w 2 (z) w 0 (x)r 1 (x) r 2 (x)w 2 (x)w 2 (z) the two sequence present the same read-from and final-write. View equivalence cover:. lost update anomaly;. inconsistent read anomaly;. ghost update (a) anomaly. To detect view equivalence has linear complexity if the schedule is given but detecting the equivalence to an arbitrary schedule is a NP problem (exponential complexity) so is not applied in real systems. 3

Conflict equivalence There is conflict if two actions operate on the same object ad at least one of them is a write operation. Conflicts can be:. read-write (RW or WR: RW is similar to final-write operation, instead WR is similar to read-from);. write-write (WW). It is not possible have read-read conflicts otherwise the definition will not be respected. Two schedules are conflict equivalent if they have the same conflict set and each conflict pair is in the same order in both schedules. Conflict equivalence is stronger than view equivalence because the schedule is strongly characterized in the first case. If it is equivalent to an arbitrary serial schedule with the same transactions, a schedule is conflict serializable (CSR). For example the sequence: Conflict are: w 0 (x)r 1 (x)w 0 (z)r 1 (z)r 2 (x)r 3 (z)w 3 (z)w 1 (x). WR kind: w 0 (x) r 1 (x), w 0 (x) r 2 (x), w 0 (z) r 1 (z), w 0 (z) r 3 (z);. RW kind: r 1 (z) w 3 (z), r 2 (x) w 1 (x);. WW kind: w 0 (x) w 1 (x), w 0 (z) w 3 (z). It is important notice that in the same transaction there are no conflicts. The conflicts discovered are the same of the serial sequence: w 0 (x)w 0 (z)r 2 (x)r 1 (x)r 1 (z)w 1 (x)r 3 (z)w 3 (z) In order to detect conflict it is possible to exploit the conflict graph avoiding the comparison with all possible serial sequecens; if the graph is acyclic the schedule is conflict serializable. This approach reduce complexity because checking graph cyclicity is linear in the size of the graph. For the sequence used in the example the graph will be the following: T 0 T 1 T 2 T 3 4

The graph generates also constraints in the order of the optimal serial sequence because T 0 always precedes T 1, T 2, T 3 and so on; the full list of constraints is: T 0 T 1, T 2, T 3 T 2 T 1 T 1 T 3 where the symbol means: precede. Looking the list of the terms on the left from the top to the bottom it is possible to see the proper sequence for the serial one: T 0, T 2, T 1, T 3 which is the one already reported before. Although this method is less complex than view equivalence but also this approach is not used in real DBMS because have several disadvantages:. possibility that the graph is large, so the evaluation is a long operation;. the graph should be updated;. the cycle absence operation have to be performed. 2 Phase Locking This approach is the one really used by DBMS and it is based on lock operations. A lock the operation who block a specify resource in order to prevent any access by other transactions; it can be:. a read-lock (R-lock);. a write-lock (W-lock). When the resource have to become free because it is not used more the system performs an unlock operation. Each read operation have to be preceded by a R-lock request and followed by an unlock request; the same procedure is done for write operations. The scheduler behaves like a lock manager: it receives transactions request and grants locks consequently. When the lock request is granted the corresponding resource is assigned to the requesting transaction and becomes unavaiable; only when the transaction require an unlock operation the resource can be accessed by others transactions. If the lock is not granted the requesting transaction is put in a waiting state which terminates only when the resource is unlocked: it allows locally reschedule policies. 5

The lock manager exploits the lock table, where is stored information about locks can be granted to a transaction and the conflict table which manage lock conflicts. The following table shows a conflict table: Requests Resource State Free R-Locked W-Locked R-Lock Ok/R-Locked Ok/R-Locked No/W-Locked W-Lock Ok/W-Locked No/R-Locked No/W-Locked Unlock Error Depends Ok/Free Read locks are shared: it means that other transaction may lock the same resources; each transaction holding the R-lock on the same resource is count with a counter: a resource is free when the counter is equal to 0. 2 Phase Locking is characterized by two phases:. growing phase (lock are acquired);. shrinking phase (lock are released). The serializability is guaranteed because a transaction can not acquire a new lock after having released any lock. This method is strogest than conflict equivalence or view equivalence: the following picture shows levels of selectivity. CSR 2PL VSR CSR serializable schedule, no 2PL Strict 2 Phase Locking This method allows to drop the hypotesis done as far that only committed transaction were considered. In this case a transaction locks it is released only at hte end of the transaction (commit or rollback is not important): in this way data is always stable (dirty read avoided). 6

Techniques to manage locking When a transaction requests a resource x the system check if that resource is avaiable; in positive case the lock manager modifiesthestat of xin its internal tables andreturnsthecontrol tothe transaction. In this approach the processing delay is very small. Otherwise if the resource is not avaiable the transaction is suspended and insered in the waiting queue; as soon as the resource becomes avaiable the first transaction present in the queue is resumed and processed. The probability of having conflicts is: P (Conflicts) κ M N where:. κ is the number of active transactions;. M is the average number of objects accessed by a transactions;. N is the number of objects present in the database. If a timeout expire whileatransaction is still waiting in the queue is possible that thelock manager extract orresumeit or returnanerrorcode; thetransaction, instead, can performs rollback and maybe restarts automatically or requests again the same lock after some time is passed. Hierarchical Locking In this approach locks can be acquired ad different levels:. table;. group of tuples:. physical criteria (physical data page of memory);. logical criteria (tuples satisfating a given property);. single tuple;. single field of a tuple. Hierarchical locking allows lock operations only if the transaction belong to the proper level and is characterized by a large set of locking primitives:. shared lock (SL);. exclusive lock (XL): allows both read/write operations;. intention of shared lock (ISL): shows the intention of shared locking on a low hierarchical level;. intention of exclusive lock (IXL): shows the intention of exclusive locking on a low hierarchical level; 7

. shared lock and intention of exclusive lock (SIXL): shared lock of the current object and intention of exclusive locking on a low hierarchical level of one or more objects. In the following picture is shown the hierarchy: XL SIXL SL IXL ISL Locks are always requested starting from the root of the tree and going down: this is also the way followed in order to perform localized reads (detailed granularity). Another option is reading from the leaves node to the root: massive reads are performed (rough granularity). Both approaches allow to reduce concurrency, but can create overhead if granularity is too fine. Until now, all anomalies are covered with methods studied, a part from ghost update type b (update performed by a transaction with insertion of data by another transaction). Infact, 2PL allows locks only for objects already present into the database. Predicate locking admits lock for all data satisfating a given predicate: the real implementation is made through locking indices who prevent the insertion of new tuples. Transactions are divided into two classes:. read-write (default);. read only: it not allows to perform updates. The way in which transactions interacts each others is specify by the isolation level. There are four levels:. serializable: is the highest level, implemented by strict 2PL and includes predicate locking, so cover all anomalies; 8

. repeateble read: implemented by strict 2PL without predicate locking cover all anomalies a part from ghost update type b;. read committed: does not implement 2PL and release the lock when read is ultimate; it means that dirty read is avoided because write operation does not release the lock (remember that usually locks are released only when transaction ends);. read uncommitted: data is read without locking; it is possible because read operations does not affected database, infact this level is allowed only for read-only transactions and, for this reason, dirty read is not avoided. SERIALIZABLE REPEATABLE READ READ COMMITTED READ UNCOMMITTED Since write operations are critical because can propagate errors through database, they only can be executed under strict 2PL with exclusive locking. Deadlock This is a typical situation for concurrent systems managed by locking and waiting times. It happends when one transaction (T 1 ) is waiting for a resource locked by another transaction (T 2 ) and (T 2 ) is waiting for a resource locked by (T 1 ). The easier possible solution is adopting timeout: after sometime, if the period specify by the timeout is expired, the transaction rollback (maybe can restart completly or re-asking another lock operation). If the timeout is too long it happends that the system waits for a long period; othewise is possible that too much transactions are killed. In distributed database deadlock are detected by the wait graph. In the previous example, (T 1 ) was waiting for a resource reserved by (T 2 ); in the following picture is shown this condition: 9

T 1 T 2 But also (T 2 ) is waiting for a resource locked by (T 1 ): T 1 T 2 So: T 1 T 2 In this case there is a cycle which represents a deadlock. This approach is too expensive to build and manage. 10