2 nd Semester 2008/2009



Similar documents
Chapter 16: Recovery System

! Volatile storage: ! Nonvolatile storage:

How To Recover From Failure In A Relational Database System

Chapter 15: Recovery System

Chapter 14: Recovery System

Review: The ACID properties

Lesson 12: Recovery System DBMS Architectures

Information Systems. Computer Science Department ETH Zurich Spring 2012

Crash Recovery. Chapter 18. Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina

Chapter 10: Distributed DBMS Reliability

Recovery System C H A P T E R16. Practice Exercises

Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability

Recovery: An Intro to ARIES Based on SKS 17. Instructor: Randal Burns Lecture for April 1, 2002 Computer Science Johns Hopkins University

UVA. Failure and Recovery. Failure and inconsistency. - transaction failures - system failures - media failures. Principle of recovery

Recovery: Write-Ahead Logging

Crashes and Recovery. Write-ahead logging

Unit 12 Database Recovery

Introduction to Database Management Systems

Outline. Failure Types

Database Concurrency Control and Recovery. Simple database model

Course Content. Transactions and Concurrency Control. Objectives of Lecture 4 Transactions and Concurrency Control

arxiv: v1 [cs.db] 12 Sep 2014

Recovery algorithms are techniques to ensure transaction atomicity and durability despite failures. Two main approaches in recovery process

How To Write A Transaction System

Chapter 10. Backup and Recovery

Recovery Theory. Storage Types. Failure Types. Theory of Recovery. Volatile storage main memory, which does not survive crashes.

COS 318: Operating Systems

Crash Recovery in Client-Server EXODUS

Transaction Management Overview

Recovering from Main-Memory

DATABASDESIGN FÖR INGENJÖRER - 1DL124

Recovery. P.J. M c.brien. Imperial College London. P.J. M c.brien (Imperial College London) Recovery 1 / 1

Textbook and References

Datenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo

Lecture 18: Reliable Storage

Redo Recovery after System Crashes

Synchronization and recovery in a client-server storage system

Failure Recovery Himanshu Gupta CSE 532-Recovery-1

Database Tuning and Physical Design: Execution of Transactions

ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging

D-ARIES: A Distributed Version of the ARIES Recovery Algorithm

Design of Internet Protocols:

Transactions. SET08104 Database Systems. Napier University

Recovery Principles of MySQL Cluster 5.1

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

INTRODUCTION TO DATABASE SYSTEMS

6. Storage and File Structures

Robin Dhamankar Microsoft Corporation One Microsoft Way Redmond WA

Recover EDB and Export Exchange Database to PST 2010

Transaction Processing Monitors

Transactional Information Systems:

Derby: Replication and Availability

Recovery Principles in MySQL Cluster 5.1

CS 245 Final Exam Winter 2013

Module 3 (14 hrs) Transactions : Transaction Processing Systems(TPS): Properties (or ACID properties) of Transactions Atomicity Consistency

Agenda. Transaction Manager Concepts ACID. DO-UNDO-REDO Protocol DB101

6. Backup and Recovery 6-1. DBA Certification Course. (Summer 2008) Recovery. Log Files. Backup. Recovery

SQL Server Transaction Log from A to Z

File System Design and Implementation

B-tree Concurrency Control and Recovery in a Client-Server Database Management System

OS OBJECTIVE QUESTIONS

An Integrated Approach to Recovery and High Availability in an Updatable, Distributed Data Warehouse

Storage and File Structure

Transactions and the Internet

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Fast Checkpoint and Recovery Techniques for an In-Memory Database. Wenting Zheng

Roadmap DB Sys. Design & Impl. Detailed Roadmap. Paper. Transactions - dfn. Reminders: Locking and Consistency

Overview. File Management. File System Properties. File Management

Concurrency Control: Locking, Optimistic, Degrees of Consistency

NVRAM-aware Logging in Transaction Systems. Jian Huang

Unbundling Transaction Services in the Cloud

Configuring Apache Derby for Performance and Durability Olav Sandstå

Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Backup and Recovery...

DBMS Recovery Procedures Frederick Gallegos Daniel Manson

Recovery Protocols For Flash File Systems

DataBlitz Main Memory DataBase System

CPS221 Lecture - ACID Transactions

File-System Implementation

TRANSACÇÕES. PARTE I (Extraído de SQL Server Books Online )

Database System Concepts

Concurrency control. Concurrency problems. Database Management System

Lecture 7: Concurrency control. Rasmus Pagh

Concurrency Control. Module 6, Lectures 1 and 2

Windows NT File System. Outline. Hardware Basics. Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik

A Cost-Effective Method for Providing Improved Data Availability During DBMS Restart Recovery After a Failure C. MOHAN

Outline. Windows NT File System. Hardware Basics. Win2K File System Formats. NTFS Cluster Sizes NTFS

Promise of Low-Latency Stable Storage for Enterprise Solutions

Improving Transaction-Time DBMS Performance and Functionality

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies

An effective recovery under fuzzy checkpointing in main memory databases

Oracle Database 10g: Backup and Recovery 1-2

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Chapter 6 The database Language SQL as a tutorial

Microsoft SQL Server Always On Technologies

Caché Data Integrity Guide

Transcription:

Chapter 17: System Departamento de Engenharia Informática Instituto Superior Técnico 2 nd Semester 2008/2009 Slides baseados nos slides oficiais do livro Database System c Silberschatz, Korth and Sudarshan.

Outline 1 2 3 4 5

Outline 1 2 3 4 5

Failure Classification Transaction failure: ical errors: transaction cannot complete due to some internal error condition System errors: the database system must terminate an active transaction due to an error condition (e.g., deadlock) System crash: a power failure or other hardware or software failure causes the system to crash Fail-stop assumption: non-volatile storage contents are assumed to not be corrupted by system crash Database systems have numerous integrity checks to prevent corruption of disk data Disk failure: a head crash or similar disk failure destroys all or part of disk storage Destruction is assumed to be detectable: disk drives use checksums to detect failures

s algorithms are techniques to ensure database consistency and transaction atomicity and durability despite failures algorithms have two parts: 1 Actions taken during normal transaction processing to ensure enough information exists to recover from failures 2 Actions taken after a failure to recover the database contents to a state that ensures atomicity, consistency and durability

Storage Structure Volatile storage: does not survive system crashes examples: main memory, cache memory Nonvolatile storage: survives system crashes examples: disk, tape, flash memory, non-volatile (battery backed up) RAM Stable storage: a mythical form of storage that survives all failures approximated by maintaining multiple copies on distinct nonvolatile media

Data Access Physical blocks: blocks residing on the disk Buffer blocks: blocks residing temporarily in main memory Block movements between disk and main memory are initiated through the following two operations: input(b) transfers the physical block B to main memory output(b) transfers the buffer block B to the disk, and replaces the appropriate physical block there Each transaction T i has its private work-area in which local copies of all data items accessed and updated by it are kept T i s local copy of a data item X is called x i We assume, for simplicity, that each data item fits in, and is stored inside, a single block

Data Access (cont.) Transaction transfers data items between system buffer blocks and its private work-area using the following operations: read(x) assigns the value of data item X to the local variable x i write(x) assigns the value of local variable x i to data item X in the buffer block both these commands may necessitate the issue of an input(b X ) instruction before the assignment, if the block B X in which X resides is not already in memory output(b X ) need not immediately follow write(x) System can perform the output operation when it deems fit

Example of Data Access

and Atomicity Modifying the database without ensuring that the transaction will commit may leave the database in an inconsistent state Consider transaction T i that transfers 50$ from account A to account B Several output operations may be required for T i a failure may occur after one of these modifications have been made but before all of them are made To ensure atomicity despite failures, we first output information describing the modifications to stable storage without modifying the database itself Two common approaches: log-based recovery shadow-paging We assume (initially) that transactions run serially, that is, one after the other.

Outline 1 Deferred Immediate Checkpoints 2 Deferred Immediate Checkpoints 3 4 5

Deferred Immediate Checkpoints A log is kept on stable storage The log is a sequence of log records, and maintains a record of update activities on the database We assume for now that log records are written directly to stable storage (that is, they are not buffered) Two approaches using logs: Deferred database modification Immediate database modification

Records When transaction T i starts, it registers itself by writing a record Deferred Immediate Checkpoints < T i,start > Before T i executes write(x), a log record < T i, X, V 1, V 2 > is written, where V 1 is the value of X before the write, and V 2 is the value to be written to X When T i finishes its last statement, the log record < T i,commit > is written

Deferred Database Deferred Immediate Checkpoints The deferred database modification scheme records all modifications to the log, but defers all the writes to after partial commit Assume that transactions execute serially Transaction starts by writing < T i,start > record to log A write(x) operation results in a log record < T i,x,v > being written, where V is the new value for X Note: old value is not needed for this scheme The write is not performed on X at this time, but is deferred When T i partially commits, < T i,commit > is written to the log Finally, the log records are read and used to actually execute the previously deferred writes

Transaction Deferred Immediate Checkpoints During recovery, a transaction needs to be redone if and only if both < T i,start > and < T i,commit > are in the log Redoing a transaction T i sets the value of all data items updated by the transaction to the new values Crashes can occur while: the transaction is executing the original updates while recovery action is being taken

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 > < T 0,B,2050 >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 > < T 0,B,2050 > system failure! No redo actions need to be taken

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 > < T 0,B,2050 > < T 0,commit >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 > < T 0,B,2050 > < T 0,commit > < T 1,start >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 > < T 0,B,2050 > < T 0,commit > < T 1,start > < T 1,C,600 >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 > < T 0,B,2050 > < T 0,commit > < T 1,start > < T 1,C,600 > system failure! redo(t 0 ) must be performed since < T 0,commit > is present

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 > < T 0,B,2050 > < T 0,commit > < T 1,start > < T 1,C,600 > < T 1,commit >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,950 > < T 0,B,2050 > < T 0,commit > < T 1,start > < T 1,C,600 > < T 1,commit > system failure! redo(t 0 ) must be performed, followed by redo(t 1 ) since < T 0,commit > and < T 1,commit > are present

Immediate Database Deferred Immediate Checkpoints The immediate database modification scheme allows database updates of an uncommitted transaction to be made as the writes are issued Update log record must be written before database item is written We assume that the log record is output directly to stable storage Can be extended to postpone log record output, so long as prior to execution of an output(b) operation for a data block B, all log records corresponding to items B must be flushed to stable storage Output of updated blocks can take place at any time before or after transaction commit Order in which blocks are output can be different from the order in which they are written.

Transaction Deferred Immediate Checkpoints procedure has two operations instead of one: undo(t i ) restores the value of all data items updated by T i to their old values, going backwards from the last log record for T i redo(t i ) sets the value of all data items updated by T i to the new values, going forward from the first log record for T i Both operations must be idempotent (even if the operation is executed multiple times the effect is the same as if it is executed once) When recovering after failure: Transaction T i needs to be undone if the log contains the record < T i,start >, but does not contain the record < T i,commit > Transaction T i needs to be redone if the log contains both the record < T i,start > and the record < T i,commit > Undo operations are performed first, then redo operations

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 > < T 0,B,2000,2050 >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 > < T 0,B,2000,2050 > system failure! undo(t 0 ): B is restored to 2000 and A to 1000

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 > < T 0,B,2000,2050 > < T 0,commit >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 > < T 0,B,2000,2050 > < T 0,commit > < T 1,start >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 > < T 0,B,2000,2050 > < T 0,commit > < T 1,start > < T 1,C,700,600 >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 > < T 0,B,2000,2050 > < T 0,commit > < T 1,start > < T 1,C,700,600 > system failure! undo(t 1 ) and redo(t 0 ): C is restored to 700, and then A and B are set to 950 and 2050 respectively

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 > < T 0,B,2000,2050 > < T 0,commit > < T 1,start > < T 1,C,700,600 > < T 1,commit >

An Example Deferred Immediate Checkpoints T 0 : read(a) A := A 50 write(a) read(b) B := B + 50 write(b) T 1 : read(c) C := C 100 write(c) < T 0,start > < T 0,A,1000,950 > < T 0,B,2000,2050 > < T 0,commit > < T 1,start > < T 1,C,700,600 > < T 1,commit > system failure! redo(t 0 ) and redo(t 1 ): A and B are set to 950 and 2050, respectively, then C is set to 600

Checkpoints Deferred Immediate Checkpoints Problems in recovery procedure: searching the entire log is time-consuming we might unnecessarily redo transactions which have already output their updates to the database. Streamline recovery procedure by periodically performing checkpointing Output all log records currently residing in main memory onto stable storage Output all modified buffer blocks to the disk Write a log record < checkpoint > onto stable storage

using Checkpoints Deferred Immediate Checkpoints During recovery we need to consider only the most recent transaction T i that started before the checkpoint, and transactions that started after T i 1 Scan backwards from end of log to find the most recent < checkpoint > record 2 Continue scanning backwards till a record < T i,start > is found Need only consider the part of log following this record; earlier part of log can be ignored, and can be erased whenever desired 3 For all transactions starting from T i or later with no < T i,commit >, execute undo(t i ) Done only in case of immediate modification 4 Scanning forward in the log, for all transactions starting from T i or later with a < T i,commit >, execute redo(t i )

Checkpointing Example checkpoint system failure time Deferred Immediate Checkpoints T1 T2 T3 T4 T 1 can be ignored (updates already output to disk) T 4 undone T 2 and T 3 redone

Outline 1 2 3 4 5

With We modify the log-based recovery schemes to allow multiple transactions to execute concurrently All transactions share a single disk buffer and a single log A buffer block can have data items updated by one or more transactions We assume concurrency control using strict two-phase locking; I.e. the updates of uncommitted transactions should not be visible to other transactions Otherwise how to perform undo if T 1 updates A, then T 2 updates A and finally T 1 has to abort? ging is done as described earlier records of different transactions may be interspersed in the log

Checkpointing The checkpointing technique and actions taken on recovery have to be changed since several transactions may be active when a checkpoint is performed Checkpoints are performed as before, except that the checkpoint log record is now of the form < checkpointl > where L is the list of transactions active at the time of the checkpoint For now we assume no updates are in progress while the checkpoint is carried out

Restart When the system recovers from a crash, it first does the following: 1 Initialize undo-list and redo-list to empty 2 Scan the log backwards from the end, stopping when the first < checkpointl > record is found 3 For each record found during the backward scan: if the record is < T i,commit >, add T i to redo-list if the record is < T i,start >, then if T i is not in redo-list, add T i to undo-list 4 For every T i in L, if T i is not in redo-list, add T i to undo-list

Redoing and Undoing At this point undo-list consists of incomplete transactions which must be undone, and redo-list consists of finished transactions that must be redone now continues as follows: 1 Scan log backwards from most recent record, stopping when < T i,start > records have been encountered for every T i in undo-list During the scan, perform undo for each log record that belongs to a transaction in undo-list 2 Locate the most recent < checkpointl > record 3 Scan log forwards from the < checkpointl > record till the end of the log During the scan, perform redo for each log record that belongs to a transaction on redo-list

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list T 3 undo-list A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list T 3 undo-list A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list T 3 undo-list A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 10

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 20 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 10 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 10 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Example of < T 0,start > < T 0, A, 0, 10 > < T 0,commit > < T 1,start > < T 1, B, 0, 10 > < T 2,start > < T 2, C, 0, 10 > < T 2, C, 10, 20 > < checkpoint {T 1, T 2} > < T 3,start > < T 3, A, 10, 20 > < T 4,start > < T 3, D, 0, 10 > < T 4, E,0, 10 > < T 3,commit > 1 Scan log backwards 2 Perform undo 3 Perform redo redo-list undo-list T 3 T 4 T 1 T 2 A B C D E initial values 0 0 0 0 0 current value 20 0 0 10 0

Outline 1 2 3 4 5

Record With log record buffering, log records are buffered in main memory, instead of of being output directly to stable storage records are output to stable storage when a block of log records in the buffer is full, or a log force operation is executed force is performed to commit a transaction by forcing all its log records (including the commit record) to stable storage Several log records can thus be output using a single output operation, reducing the I/O cost

Record (cont.) The rules below must be followed if log records are buffered: 1 records are output to stable storage in the order in which they are created 2 Transaction T i enters the commit state only when the log record < T i,commit > has been output to stable storage 3 Before a block of data in main memory is output to the database, all log records pertaining to data in that block must have been output to stable storage This rule is called the write-ahead logging (or WAL) rule

Database Database maintains an in-memory buffer of data blocks When a new block is needed, if buffer is full an existing block needs to be removed from buffer If the block chosen for removal has been updated, it must be output to disk As a result of the WAL rule, if a block with uncommitted updates is output to disk, log records for the updates are output to the log on stable storage first No updates should be in progress on a block when it is output to disk: Before writing a data item, transaction acquires exclusive lock on block containing the data item Lock can be released once the write is completed Such locks held for short duration are called latches Before a block is output to disk, the system acquires an exclusive latch on the block

Outline 1 ARIES Optimizations ARIES Data Structures More ARIES Features 2 3 4 5 ARIES Optimizations ARIES Data Structures More ARIES Features

ARIES Optimizations ARIES Data Structures More ARIES Features s for and Isolation Exploiting Semantics ARIES a state of the art recovery method Incorporates numerous optimizations to reduce overheads during normal processing and to speed up recovery ARIES uses: log sequence numbers (LSN) to identify log records stores LSNs in pages to identify what updates have already been applied to a database page physiological redo: pages are identified physically, but operations within the page are logged logically dirty page table to avoid unnecessary redos during recovery fuzzy checkpointing that only records information about dirty pages, and does not require dirty pages to be written out at checkpoint time

Optimizations ARIES Optimizations ARIES Data Structures More ARIES Features Physiological redo: affected page is physically identified, action within page can be logical Used to reduce logging overheads E.g. when a record is deleted and all other records have to be moved to fill the gap Physiological redo can log just the record deletion Physical redo would require logging of old and new values for much of the page Requires page to be output to disk atomically Easy to achieve with hardware RAID, also supported by some disk systems Incomplete page output can be detected by checksum techniques, But extra actions are required for recovery Treated as a media failure

Sequence Numbers ARIES Optimizations ARIES Data Structures More ARIES Features sequence number (LSN) identifies each log record Must be sequentially increasing Typically an offset from beginning of log file to allow fast access Easily extended to handle multiple log files Each page contains a PageLSN which is the LSN of the last log record whose effects are reflected on the page To update a page: X-latch the page, and write the log record Update the page Record the LSN of the log record in PageLSN Unlock page Page flush to disk S-latches page Thus page state on disk is operation consistent PageLSN is used during recovery to prevent repeated redo

Records Each log record contains LSN of previous log record of the same transaction ARIES Optimizations ARIES Data Structures More ARIES Features < LSN, T i, PrevLSN, RedoInfo, UndoInfo > LSN in log record may be implicit Special redo-only log record called compensation log record (CLR) used to log actions taken during recovery that never need to be undone Have a field UndoNextLSN to note next (earlier) record to be undone Records in between would have already been undone Required to avoid repeated undo of already undone actions < LSN, T i, UndoNextLSN, UndoInfo >

Dirty Page Table ARIES Optimizations ARIES Data Structures More ARIES Features Dirty Page Table List of pages in the buffer that have been updated Contains, for each such page PageLSN of the page RecLSN is an LSN such that log records before this LSN have already been applied to the page version on disk Set to current end of log when a page is inserted into dirty page table (just before being updated) Recorded in checkpoints, helps to minimize redo work Checkpoint log record Contains: DirtyPageTable and list of active transactions For each active transaction, LastLSN, the LSN of the last log record written by the transaction Fixed position on disk notes LSN of last completed checkpoint log record

ARIES Optimizations ARIES Data Structures More ARIES Features 1 Analysis pass - Determines: which transactions to undo which pages were dirty (disk version not up to date) at time of crash RedoLSN: LSN from which redo should start 2 Redo pass repeats history, redoing all actions from RedoLSN RecLSN and PageLSNs are used to avoid redoing actions already reflected on page 3 Undo pass: Rolls back all incomplete transactions whose abort was complete earlier are not undone

The Analysis Pass I ARIES Optimizations ARIES Data Structures More ARIES Features Starts from last complete checkpoint log record Reads in DirtyPageTable from log record Sets undo-list = list of transactions in checkpoint log record Reads LSN of last log record for each transaction in undo-list from checkpoint log record Set RedoLSN = min of RecLSNs of all pages in DirtyPageTable In case no pages are dirty, RedoLSN = checkpoint record s LSN At end of analysis pass: RedoLSN determines where to start redo pass RecLSN for each page in DirtyPageTable used to minimize redo work All transactions in undo-list need to be rolled back

The Analysis Pass II ARIES Optimizations ARIES Data Structures More ARIES Features Scans forward from checkpoint If any log record found for transaction not in undo-list, adds transaction to undo-list Keeps track of last log record for each transaction in undo-list Update the transaction table Whenever an update log record is found If page is not in DirtyPageTable, it is added with RecLSN set to LSN of the update log record Update PageLSN If transaction end log record found, delete transaction from undo-list and transaction table

The Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features Repeats history by replaying every action not already reflected in the page on disk, as follows: Scans forward from RedoLSN Whenever an update log record is found: If the page is not in DirtyPageTable or the LSN of the log record is less than the RecLSN of the page in DirtyPageTable, then skip the log record Otherwise fetch the page from disk. If the PageLSN of the page fetched from disk is less than the LSN of the log record, redo the log record NOTE: if either test is negative the effects of the log record have already appeared on the page. First test avoids even fetching the page from disk!

The Undo Pass ARIES Optimizations ARIES Data Structures More ARIES Features Performs backward scan on log undoing all transactions in undo-list Next LSN to be undone for each transaction set to LSN of last log record for transaction found by analysis pass. At each step pick largest of these LSNs to undo, skip back to it and undo it After undoing a log record For ordinary log records, set next LSN to be undone for transaction to PrevLSN noted in the log record For compensation log records (CLRs) set next LSN to be undo to UndoNextLSN noted in the log record All intervening records are skipped since they would have been undo already

Undo Actions ARIES Optimizations ARIES Data Structures More ARIES Features When an undo is performed for an update log record Generate a CLR containing the undo action performed Set UndoNextLSN of the CLR to the PrevLSN value of the update log record ARIES supports partial rollback Used, for instance, to handle deadlocks by rolling back just enough to release the locks

Example: Analysis Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN Dirty Page Table PID PageLSN RecLSN P3 20 20 P5 05 05 Transaction Table TID LastLSN T 1 10 T 2 20 Undo List

Example: Analysis Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 20 20 P5 05 05 Transaction Table TID LastLSN T 1 10 T 2 20 Undo List T 1 : 10, T 2 : 20

Example: Analysis Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 20 20 P5 05 05 Transaction Table TID LastLSN T 1 10 T 2 20 T 3 35 Undo List T 1 : 10, T 2 : 20, T 3 : 35

Example: Analysis Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 05 05 Transaction Table TID LastLSN T 1 40 T 2 20 T 3 35 Undo List T 1 : 40, T 2 : 20, T 3 : 35

Example: Analysis Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 05 05 Transaction Table TID LastLSN T 2 20 T 3 35 Undo List T 2 : 20, T 3 : 35

Example: Analysis Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 05 05 P1 60 60 Transaction Table TID LastLSN T 2 20 T 3 60 Undo List T 2 : 20, T 3 : 60

Example: Analysis Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} >skip <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} >skip <flush P3> 20 :< T 2, 05, {P3,updateB} >redo 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} >skip <flush P3> 20 :< T 2, 05, {P3,updateB} >redo 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} >skip <flush P3> 20 :< T 2, 05, {P3,updateB} >redo 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} >skip <flush P3> 20 :< T 2, 05, {P3,updateB} >redo 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} >redo 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} >skip <flush P3> 20 :< T 2, 05, {P3,updateB} >redo 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} >redo 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} >skip <flush P3> 20 :< T 2, 05, {P3,updateB} >redo 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} >redo 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} >redo 70 :< T 2, 20, {P5,updateD} > RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Redo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} >redo 10 :< T 1, 00, {P3,updateA} >skip <flush P3> 20 :< T 2, 05, {P3,updateB} >redo 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} >redo 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} >redo 70 :< T 2, 20, {P5,updateD} >redo RedoLSN 05 Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 70 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 70, T 3 : 60

Example: Undo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > 71 : clr :< T 2, 20, {P5,undoupD} > Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 71 05 P1 60 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 20, T 3 : 60

Example: Undo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > 71 : clr :< T 2, 20, {P5,undoupD} > 72 : clr :< T 3, 35, {P1,undoupC} > Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 71 05 P1 72 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 20, T 3 : 35

Example: Undo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > 71 : clr :< T 2, 20, {P5,undoupD} > 72 : clr :< T 3, 35, {P1,undoupC} > Dirty Page Table PID PageLSN RecLSN P3 40 20 P5 71 05 P1 72 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 20

Example: Undo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > 71 : clr :< T 2, 20, {P5,undoupD} > 72 : clr :< T 3, 35, {P1,undoupC} > 73 : clr :< T 2, 05, {P3,undoupB} > Dirty Page Table PID PageLSN RecLSN P3 73 20 P5 71 05 P1 72 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 05

Example: Undo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > 71 : clr :< T 2, 20, {P5,undoupD} > 72 : clr :< T 3, 35, {P1,undoupC} > 73 : clr :< T 2, 05, {P3,undoupB} > 74 : clr :< T 2, 01, {P5,undoupD} > Dirty Page Table PID PageLSN RecLSN P3 73 20 P5 74 05 P1 72 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List T 2 : 01

Example: Undo Pass ARIES Optimizations ARIES Data Structures More ARIES Features 00 :< T 1,start > 01 :< T 2,start > 05 :< T 2, 01, {P5,updateD} > 10 :< T 1, 00, {P3,updateA} > <flush P3> 20 :< T 2, 05, {P3,updateB} > 30 :< checkpoint {DPT }, {TT } > 35 :< T 3,start > 40 :< T 1, 10, {P3,updateX} > 50 :< T 1,commit > 60 :< T 3, 35, {P1,updateC} > 70 :< T 2, 20, {P5,updateD} > 71 : clr :< T 2, 20, {P5,undoupD} > 72 : clr :< T 3, 35, {P1,undoupC} > 73 : clr :< T 2, 05, {P3,undoupB} > 74 : clr :< T 2, 01, {P5,undoupD} > Dirty Page Table PID PageLSN RecLSN P3 73 20 P5 74 05 P1 72 60 Transaction Table TID LastLSN T 2 70 T 3 60 Undo List

Other ARIES Features I ARIES Optimizations ARIES Data Structures More ARIES Features Independence Pages can be recovered independently of others E.g. if some disk pages fail they can be recovered from a backup while other pages are being used Savepoints: can record savepoints and roll back to a savepoint Useful for complex transactions Also used to rollback just enough to release locks on deadlock

Other ARIES Features II ARIES Optimizations ARIES Data Structures More ARIES Features Fine-grained locking: Index concurrency algorithms that permit tuple level locking on indices can be used These require logical undo, rather than physical undo optimizations: Dirty page table can be used to prefetch pages during redo Out of order redo is possible: redo can be postponed on a page being fetched from disk, and performed when page is fetched. Meanwhile other log records can continue to be processed

End of Chapter 17 ARIES Optimizations ARIES Data Structures More ARIES Features