Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Similar documents
Chapter 13: Query Processing. Basic Steps in Query Processing

SQL Query Evaluation. Winter Lecture 23

Query Processing C H A P T E R12. Practice Exercises

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

Database Tuning and Physical Design: Execution of Transactions

Textbook and References

Lecture 7: Concurrency control. Rasmus Pagh

Data Warehousing und Data Mining

Databases and Information Systems 1 Part 3: Storage Structures and Indices

CIS 631 Database Management Systems Sample Final Exam

Data Management in the Cloud

Transaction Management Overview

Contents RELATIONAL DATABASES

Multi-dimensional index structures Part I: motivation

Course Content. Transactions and Concurrency Control. Objectives of Lecture 4 Transactions and Concurrency Control

Expert Oracle. Database Architecture. Techniques and Solutions. 10gr, and 11g Programming. Oracle Database 9/, Second Edition.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

SQL Server Query Tuning

Concurrency Control. Module 6, Lectures 1 and 2

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Oracle Database 11g: SQL Tuning Workshop Release 2

Concurrency Control: Locking, Optimistic, Degrees of Consistency

Recovery System C H A P T E R16. Practice Exercises

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

Transactional properties of DBS

Introduction to Database Management Systems

Transactions: Definition. Transactional properties of DBS. Transactions: Management in DBS. Transactions: Read/Write Model

Chapter 6 The database Language SQL as a tutorial

Oracle Database 11g: SQL Tuning Workshop

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

Transactions. SET08104 Database Systems. Napier University

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm

Overview of Storage and Indexing

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server

Physical DB design and tuning: outline

Review: The ACID properties

COS 318: Operating Systems

TYPICAL QUESTIONS & ANSWERS

2 Associating Facts with Time

DATABASE DESIGN - 1DL400

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina

How To Write A Transaction System

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

Performance Tuning for the Teradata Database

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

Concurrency control. Concurrency problems. Database Management System

Module 3 (14 hrs) Transactions : Transaction Processing Systems(TPS): Properties (or ACID properties) of Transactions Atomicity Consistency

Inside the PostgreSQL Query Optimizer

Physical Data Organization

Answer Key. UNIVERSITY OF CALIFORNIA College of Engineering Department of EECS, Computer Science Division

Chapter 14: Recovery System

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

Lecture 1: Data Storage & Index

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

File Management. Chapter 12

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

A Shared-nothing cluster system: Postgres-XC

CS 245 Final Exam Winter 2013

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Optimizing Performance. Training Division New Delhi

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Storage and File Structure

Transaction Processing Monitors

! Volatile storage: ! Nonvolatile storage:

B2.2-R3: INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Principles of Distributed Database Systems

D B M G Data Base and Data Mining Group of Politecnico di Torino

Transactions and the Internet

Data storage Tree indexes

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

Database Internals (Overview)

MapReduce and the New Software Stack

Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies

LearnFromGuru Polish your knowledge

Roadmap DB Sys. Design & Impl. Detailed Roadmap. Paper. Transactions - dfn. Reminders: Locking and Consistency

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

2) What is the structure of an organization? Explain how IT support at different organizational levels.

DATA WAREHOUSING FOR SOCIAL NETWORKING AND HUMAN MOBILITY RESEARCH AND DETECTING REAL-WORLD EVENTS IN SPACE AND TIME USING FEATURE CLUSTERING

Hekaton: SQL Server s Memory-Optimized OLTP Engine

High-Performance Concurrency Control Mechanisms for Main-Memory Databases

Chapter 15: Recovery System

Question 1. Relational Data Model [17 marks] Question 2. SQL and Relational Algebra [31 marks]

Redo Recovery after System Crashes

Chapter 16: Recovery System

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Part I: Entity Relationship Diagrams and SQL (40/100 Pt.)

Distributed Databases

Efficient Processing of Joins on Set-valued Attributes

IBM DB2: LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs

Transcription:

Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1

Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2

Indexing Used for faster query processing Tree indexes (B+-trees) Good for equality and range queries Hash indexes Good only for equality queries Multi-dimensional indexes (R-trees, Grid-Files) Good for queries involving two or more attributes Specialized indexes (Bitmaps) 3

Clustering (primary) B+-tree on candidate key 41 11 21 30 45 51 1 3 11 13 15 21 23 30 33 41 43 45 47 51 53 record with search key 1 record with search key 3 FILE WITH RECORDS This example corresponds to dense B+-tree index: Every search key value appears in a leaf node You may also have sparse B+-tree, e.g., entries in leaf nodes correspond to pages 4

Non-clustering (secondary) B+-tree on candidate key 41 11 21 30 45 51 1 3 11 13 15 21 23 30 33 41 43 45 47 51 53 FILE WITH RECORDS record with search key 11 record with search key 3 record with search key 1 Should be always dense 5

Join Algorithms - Block Nested Loops r JOIN s, where r is the outer relation and s the inner relation for each block B r of r do begin for each block B s of s do begin for each tuple t r in B r do begin for each tuple t s in B s do begin if (t r,t s ) satisfies the join condition add (t r,t s ) to the result Use M 2 disk pages as blocking unit for outer relation, where M = memory size; use remaining 2 pages to buffer inner relation and output 6

Join Algorithms - Index Nested Loops Index lookups can replace file scans if join is an equi-join or natural join and an index is available on the inner relation s join attribute Can construct an index just to compute a join. For each tuple t r in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple t r. 7

Join Algorithms - Merge Join Sort both files on the join attribute using external sorting Scan the two sorted files and produce results for records that match on the join attribute Each block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memory) Can be used only for equi-joins and natural joins 8

Join Algorithms - Hash Join (Partition step) Partition both files on the same number of buckets using the same hash function on the join attribute. Essentially we create a hash file organization for both files (Join step) For each bucket of the small file (build input): Read the entire bucket in memory Read the corresponding bucket of the other file (probe input) page by page and produce results 9

Query Processing Algorithms for other operations Duplicate Elimination (Projection) based on Sorting or Hashing Set operations (e.g., Union, Intersection) based on Merge Join or Hash Join Combination of Operations Materialization: store results of an intermediate expression on disk and read them for the next operation Pipelining: pass on tuples to subsequent operations even as an operation is being executed (cheaper but not applicable with all algorithms) 10

Query Optimization Goal: to find a good evaluation plan to be executed An evaluation plan is an algebra expression together with the specific algorithm to be used for each operation. Query optimizer uses statistics Number of records per table Number of different values per attribute etc. Given the statistics, the optimizer estimates the cost of alternative plans and chooses the one with the minimum estimated cost. Standard optimization rules: Perform selections before joins Perform small joins before larger ones (for join ordering) Remove useless attributes 11

Transactions A transaction is a unit of program execution that accesses and possibly updates various data items. Atomicity. Either all operations of the transaction are properly reflected in the database or none are. Consistency. Execution of a transaction in isolation preserves the consistency of the database. Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures. 12

Concurrency control schemes Control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database In addition they need to achieve recoverability if a transaction T j reads data items previously written by a transaction T i, the commit operation of T i must appear before the commit operation of T j (durability property) It is also desirable to avoid cascading rollbacks when a single transaction failure leads to a series of transaction rollbacks. Solution: only permit reading items written by comitted transactions 13

Two-Phase Locking Protocol This is a protocol which ensures conflict-serializable schedules. Phase 1: Growing Phase transaction may obtain locks transaction may not release locks Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks The transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock). May have deadlocks. We can break the deadlocks when they happen, or prevent them by using special variants of 2PL 14

Timestamp-Based Protocol Each transaction T i has a timestamp TS(T i ) These timestamps determine the serializability order Each data item Q has two timestamps W-timestamp(Q) is the largest time-stamp of any transaction that executed write(q) successfully. R-timestamp(Q) is the largest time-stamp of any transaction that executed read(q) successfully. Transaction T i issues a read(q) If TS(T i ) < W-timestamp(Q), then T i needs to read a value of Q that has been already overwritten. Suppose a transaction T i issues a write(q) If TS(T i ) < R-timestamp(Q), then the value of Q that T i is producing was needed previously If TS(T ) i < W-timestamp(Q), then T 15 i is attempting to write an obsolete value of Q.

Multiversion Timestamp Protocol Each data item Q has a sequence of versions <Q 1, Q 2,..., Q m >. Let Q k be the version of Q whose write timestamp is the largest write timestamp less than or equal to TS(T i ). 1. If transaction T i issues a read(q), then the value returned is the content of version Q k. Reads always succeed. 2. If transaction T i issues a write(q), if TS(T i ) < R-timestamp(Q k ), then transaction T i is rolled back. Some other transaction T j that (in the serialization order defined by the timestamp values) should read T i 's write, has already read a version created by a transaction older than T i. If TS(T i ) = W-timestamp(Q k ), the contents of Q k are overwritten; Q k was written before also by T i. If TS(T i ) > W-timestamp(Q k ) a new version of Q is created. 16