Introduction to Transaction Management UVic C SC 370 Dr. Daniel M. German Department of Computer Science July 8, 2005 Version: 1.1.0 10 1 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Overview What is a transaction? What properties transactions have? Why do we want to interleave transactions? How does the DMBS deal with transactions? How do we use transactions from SQL? 10 2 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Transactions Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently. A user s program may carry out many operations on the data retrieved from the database, but the DBMS is only concerned about what data is read/written from/to the database. A transaction is the DBMS s abstract view of a user program: a sequence of reads and writes. 10 3 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
ACID The DBMS must ensure 4 important properties of transactions: 1. Transactions should be atomic. Either they happen or they don t happen at all. 2. Each transaction, run by itself, alone, should preserve the consistency of the database. The DBMS assumes that consistency holds for each transaction. 3. Isolation: Transactions are isolated from the effect of other transactions that might be executed concurrently 4. Durability: Once the user is notified that the transaction was successful, its effects should persist even if the system crashes. 10 4 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Consistency Users are responsible for the consistency of their transactions Each transaction must leave the database in a consistent state if the database is consistent when the transaction begins. The DBMS will enforce ICs and other constraints Beyond this, the database does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). Database consistency is the property that every transaction sees a consistent database instance 10 5 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Isolation Users submit transactions, and can think of each transaction as executing by itself. Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. The net effect of several transactions should be the same as if they are executed one after another 10 6 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Atomicity If a transaction ends, we say its commits, otherwise it aborts Transactions can be incomplete for three reasons: 1. It can be aborted by the DBMS 2. A system crash 3. The transaction aborts itself When a transaction does not commit, its partial effects should be undone Users can then forget about dealing with incomplete transactions But if it is committed it should be durable The DBMS uses a log to ensure that incomplete transactions can be undone, if necessary 10 7 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Schedules A transaction is seen by the DBMS as a series (or list) of actions These actions are reads or writes of an object: R T (O),W T (O) In addition to reading and writing, a transaction should specify commit or abort at the end: Commit T,Abort T Assumptions: Transactions only interact with each other through reads/writes A database is a fixed collection of independent objects A schedule is a list of actions (read, write, abort, commit) for a set of transactions, and the order in which they happen in the schedule is the same as in the transaction 10 8 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Schedules... A schedule is a potential execution sequence of a set of transactions It describes actions as seen by the DBMS: T 1 T 2 R(A) W(A) R(C) W(C) Commit R(B) W(B) Abort If the actions are not interleaved, it is called a serial schedule. 10 9 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Serializable Schedules A serializable schedule of a set of S transactions is a schedule identical to a serial schedule of the same set of transactions. T 1 T 2 R(A) W(A) R(A) W(A) R(B) W(B) R(B) W(B) Commit Commit T 1 T 2 R(A) W(A) R(A) R(B) W(B) W(A) R(B) W(B) Commit Commit T 1 T 2 R(A) W(A) R(B) W(B) Commit R(A) W(A) R(B) W(B) Commit Note: SQL programmers can instruct the database to use non-serializable schedules. 10 10 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Anomalies Concurrency can leave to an inconsistent state Two actions in the same object conflict it at least one is a write 3 types of anomalies (assume transactions T 1,T 2 ) Write-Read WR conflict: T 2 reads data previously written by T 1 Read-Write RW conflict: T 2 writes data to something previously read by T 1 Write-Write WW conflict: T 2 writes data to something previously written by T 1 10 11 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
WR Conflict T 2 reads data that has not been committed yet T 1 T 2 R(A) W(A) R(A) W(A) R(B) W(B) Commit R(B) W(B) Commit This situation is called a dirty read. 10 12 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
RW Conflicts: Unrepeatable Reads T 2 changes the value of an object already read by T 1 If T 1 tries to read it again, then it will be different Called unrepeatable read 10 13 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
WW Conflicts: Overwriting Uncommitted Data T 2 overwrites the value of an object A, already modified by T 1, while T 1 is still in progress T 1 T 2 W(A) W(A) W(B) W(B) Commit Commit Writes that don t read the object are called blind writes 10 14 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
What about aborted transactions? A serializable schedule over a set S of transactions is a schedule whose effect on any consistent database instance is guaranteed to be identical to that of some complete serial schedule over the set of committed transactions. This means we might have to undo aborted transactions But this is not always possible: unrecoverable schedule T 1 T 2 R(A) W(A) R(A) W(A) R(B) W(B) Commit Abort 10 15 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Recoverable Schedules In a recoverable schedule transactions can only read data that has been already committed There is still the situation of a blind write T 1 T 2 R(A) W(A) W(A) Commit Abort What should the value of A be after the abort? 10 16 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Lock Based Concurrency Control We use locks to guarantee recoverable schedules A locking protocol is a set of rules to be followed by each transaction (enforced by the DBMS) to ensure that, even though actions of several transactions might be interleaved, the net effect is executing those transactions in some serial order. We will use shared and exclusive locks 10 17 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Strict 2PL: Strict Two-Phase Locking A simple locking protocol with 2 rules: 1. If a transaction T wants to read (modify) an object, it first requests a shared (exclusive) lock on that object. 2. All locks held by a transaction are released when the transaction is completed. Requests to acquire and release the locks can be automatically inserted into transactions by the DBMS, the user does not have to worry This allows safe interleaving of operations When two transactions want to use the same object, they are serialized by the database. 10 18 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Example Using locks to avoid WR conflicts: T 1 T 2 X(A) R(A) W(A) X(A) X(B) R(B) W(B) Commit R(A) W(A) X(B) R(B) W(B) Commit 10 19 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Deadlocks And, of course, when we have locking, we run the risk of deadlocks The DBMS must either prevent or detect deadlocks The common solution: detect, and resolve A simple way to detect them is by using timeouts If a transaction timeouts then the DBMS aborts it 10 20 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Performance of Locking The more locking the lower performance in concurrent systems And furthermore, there is trashing How can we increase throughput? 1. Lock the smallest sized objects possible 2. Reduce the time you lock objects 3. Reduce hot spots (objects that are frequently access and modified) 10 21 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Transaction Support in SQL A transaction is automatically created with the first statement that accesses the database or the catalogs Subsequent statements are considered part of the transaction until it is terminated with COMMIT or ROLLBACK. 10 22 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Transactions Characteristics Transactions have three special characteristics: 1. Access mode: What type of read/write access the transaction has 2. Isolation level: How isolated should it run? 3. Diagnostics Size (we will not discuss this) 10 23 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Access Modes If the transaction is READ ONLY, it cannot modify the database Otherwise it can 10 24 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Isolation Levels The programmer can obtain greater concurrency at the cost of increasing the exposure to other transactions uncommitted changes Level Dirty Read Unrepeatable Read Phantom READ UNCOMMITED Maybe Maybe Maybe READ COMMITTED No Maybe Maybe REPEATABLE READ No No Maybe SERIALIZABLE No No No 10 25 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Crash Recovery What happens when the database crashes? Remember ACID We need to guarantee that committed transactions survive a system crash or a media failure The recovery manager (RM) is responsible for ensuring AD It is one of the most difficult parts to implement After the DBMS is restarted after a crash, control is given to the RM The RM is also responsible to undo uncommitted transactions For our discussion we assume that writing a page to disk is an atomic operation 10 26 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Steeling Frames and Forcing Pages 2 questions: 1. Can the changes made to an object in the buffer pool by a transaction be written to disk before T commits? If this is allowed, then we say that a second transaction steels a page from T. We say that a steal approach has been used. 2. When a transaction commits, must we ensure that all the changes it has made to objects in the buffer pool are immediately forced to disk? If so, we say that a force approach is used 10 27 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Simplest to Implement: no-steal, force Write only after a commit We don t have to worry about undoing writes to disk What are the drawbacks? 10 28 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
Commonly used: steal, no-force Allows for the highest level of concurrency And maximum flexibility of buffer pool 10 29 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
The log A log of all modifications to the database is kept in stable storage Guaranteed to survive crashes and media failures Write-ahead log This enables the RM to undo uncommitted transactions and redo committed ones Once the RM takes control, it has to scan the log to verify where to start the recovery 10 30 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
The RM and the log The amount of work the RM does is proportional to: the changes made by transactions that committed, but were not written to disk (because no-force approach) the changes made by uncommitted transactions prior to a crash, that might have been written to disk (because of steal approach). In order to minimize this work, the DBMS: has a background process that, regularly, writes dirty pages to disk creates checkpoints 10 31 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
ARIES A recovery algorithm, conceptually simple (used by DB2) Uses a no-force, steal approach After a crash, it works in three stages: 1. Analysis: identify dirty buffer pools and active transactions at time of crash 2. Redo: Repeat all actions (starting from an appropriate point in the log) and restore the state to the point when the crash occurred. 3. Undo: Undo the actions of transactions that did not commit. 10 32 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
ARIES... In order to succeed, it relies in three main principles 1. Write-ahead logging. Any change to a database is first recorded in the log 2. Repeating History during redo: ARIES brings the system back to the point of failure 3. Logging changes during undo: Keep logging in case of another failure 10 33 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca
The log, what is in it? Mainly, 5 types of records: Updating a page: any changes to a page Commit: a transaction has committed Abort: DBMS starts the abortion End: DBMS has ended the abortion Undoing update: rolling back 10 34 Introduction to Transaction Management (1.1.0) CSC 370 dmgerman@uvic.ca