Outline. Indexing. Dense and sparse indexes. Indexing. Primary and secondary indexes. Dense versus sparse indexes. The basics ISAM.

Similar documents
DATABASE DESIGN - 1DL400

Physical Data Organization

Lecture 1: Data Storage & Index

CSE 326: Data Structures B-Trees and B+ Trees

Concurrency Control. Chapter 17. Comp 521 Files and Databases Fall

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Unit Storage Structures 1. Storage Structures. Unit 4.3

Analysis of Algorithms I: Binary Search Trees

Overview of Storage and Indexing

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

Chapter 13: Query Processing. Basic Steps in Query Processing

Big Data and Scripting. Part 4: Memory Hierarchies

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

CIS 631 Database Management Systems Sample Final Exam

Data storage Tree indexes

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Data Warehousing und Data Mining

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Oracle Rdb Performance Management Guide

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Record Storage and Primary File Organization

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Databases and Information Systems 1 Part 3: Storage Structures and Indices

MS SQL Performance (Tuning) Best Practices:

6 March Array Implementation of Binary Trees

File Management. Chapter 12

In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

DATABASDESIGN FÖR INGENJÖRER - 1DL124

Symbol Tables. Introduction

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

Overview of Storage and Indexing. Data on External Storage. Alternative File Organizations. Chapter 8

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

Practical Database Design and Tuning

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

File Management. Chapter 12

Performance Tuning for the Teradata Database

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Proactive database performance management

Partitioning under the hood in MySQL 5.5

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Data Structures and Data Manipulation

Binary Search Trees 3/20/14

Project Group High- performance Flexible File System 2010 / 2011

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Physical Database Design and Tuning

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Database Tuning and Physical Design: Execution of Transactions

SQL Tables, Keys, Views, Indexes

1. Physical Database Design in Relational Databases (1)

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Introduction to Database Management Systems

Name: 1. CS372H: Spring 2009 Final Exam

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 12 File Management

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

Tune That SQL for Supercharged DB2 Performance! Craig S. Mullins, Corporate Technologist, NEON Enterprise Software, Inc.


Cassandra A Decentralized, Structured Storage System

David Dye. Extract, Transform, Load

In-Memory Databases MemSQL

CHAPTER 17: File Management

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Benchmarking Cassandra on Violin

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Algorithms Chapter 12 Binary Search Trees

Lecture 2 February 12, 2003

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

In Memory Accelerator for MongoDB

PES Institute of Technology-BSC QUESTION BANK

File-System Implementation

Multi-dimensional index structures Part I: motivation

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Design Patterns for Distributed Non-Relational Databases

External Memory Geometric Data Structures

Inside the PostgreSQL Query Optimizer

Original-page small file oriented EXT3 file storage system

Guide to Performance and Tuning: Query Performance and Sampled Selectivity

SQL - QUICK GUIDE. Allows users to access data in relational database management systems.

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Chapter 12 File Management

Chapter 12 File Management. Roadmap

The Classical Architecture. Storage 1 / 36

BIRCH: An Efficient Data Clustering Method For Very Large Databases

CS 245 Final Exam Winter 2013

INTRODUCTION TO DATABASE SYSTEMS

Transcription:

Outline Indexing CPS 216 Advanced Database Systems The basics ISAM B + -tree Next time R-tree Inverted lists Hash indexes 2 Indexing Given a value, locate the record(s) with this value SELECT * FROM R WHERE A = value; SELECT * FROM R, S WHERE R.A = S.B; Other search criteria, e.g. Range search SELECT * FROM R WHERE A > value; Keyword search /,9,-,80 /0 $0,7. Dense and sparse indexes Dense: one index entry for each search key value Sparse: one index entry for each block Records must be clustered according to search key 12 46 87 Sparse index on SID 12 Milhouse 10.1 142 Bart 10 2. 279 Jessica 10 4 4 Martin 8 2. 46 Ralph 8 2. 12 Nelson 10 2.1 679 Sherri 10. 697 Terri 10. 87 Lisa 8 4. 912 Windel 8.1 Bart Jessica Lisa Martin Milhouse Nelson Ralph Sherri Terri Windel Dense index 4 on name Dense versus sparse indexes Primary and secondary indexes Index size Sparse index is smaller Requirement on records Records must be clustered for sparse index Lookup Sparse index is smaller and may fit in memory Dense index can directly tell if a record exists Update Easier for sparse index Primary index Created for the primary key of a table Records are usually clustered according to the primary key Can be sparse Secondary index Usually dense SQL PRIMARY KEY declaration automatically creates a primary index, UNIQUE key automatically creates a secondary index Index can be created on non-key attribute(s) CREATE INDEX StudentGPAIndex ON Student(GPA); 6 1

ISAM What happens if you put a sparse index on top of another sparse index? Indexed Sequential Access Method (more or less) Example: insert 107 Index blocks Updates with ISAM,, 901, 12,, 192 901,, 996 Index blocks,, 901, 12,, 192 901,, 996 Data blocks, 108, 12, 129, 192, 197, 901, 907, 996, 997, 9, 121 7 Data blocks, 108, 12, 129 901, 907, 996, 997, 9, 121 192, 197, Overflow blocks 107 Example: delete 129 Overflow and empty data blocks degrade performance 8 B + -tree Sample B + -tree nodes Balanced: good performance guarantee Disk-based: one node per block; large fan-out Non-leaf k < k < k < k 0 Leaf to next leaf node in sequence 0 to records with these k values; or, store records directly in leaves 9 10 B + -tree rules All leaves at the same lowest level All nodes at least half full (except root) Query examples Lookup: SELECT * FROM R WHERE k = ; Max # Max # Min # Min # pointers keys active pointers keys Non-leaf f f 1 ceil(f / 2) ceil(f / 2) 1 Root f f 1 2 1 Leaf f f 1 floor(f / 2) floor(f / 2) 0 0 Range query: WHERE k > AND k < ; Find and follow next-leaf pointers 12 2

An insertion example (slide 1) Insert a record with search key value 12 An insertion example (slide 2) Insert a record with search key value 12 12 Split when a node becomes too full Split again! 12 1 14 An insertion example (slide ) Insert a record with search key value 12 Tree grows up when root is split (not shown in this example) A deletion example (slide 1) Delete the record with search key value Redistribute when a node becomes too empty 12 1 Right sibling has more than enough keys; steal one! 16 A deletion example (slide 2) Delete the record with search key value Remember to fix the key in an ancestor node Another deletion example (slide 1) Delete the record with search key value Coalesce! 17 Cannot steal from siblings 18

Another deletion example (slide 2) Delete the record with search key value Delete may propagate up; tree shrinks when root becomes empty (not shown in this example) Remember to fix the parent 19 Performance analysis How many I/Os are required for each operation? h (more or less), where h is the height of the tree Plus one or two to manipulate actual records Plus O(h) for reorganization (very rare if f is large) Minus one if we cache the root in memory How big is h? Roughly log fan-out n, where n is the number of records Fan-out is large (in hundreds) many keys and pointers can fit into one block A 4-level B + -tree is enough for typical tables 20 B + -tree in practice The index of choice in most commercial DBMS Complex reorganization for deletion often is not implemented (e.g., Oracle, Informix) Building a B + -tree from scratch Naïve approach Start with an empty B + -tree Process each record as a B + -tree insertion Next Bulk-loading Concurrency control Problem Every record require O(h) random I/Os 21 22 Bulk-loading a B + -tree Sort all records (or record pointers) by search key Just a few passes (assuming a big enough memory) Can have more sequential I/Os Now we already have all the leaf nodes! Insert each leaf node in order No need to look for the proper place to insert Only the rightmost path is affected; keep it in memory Rightmost path Concurrency control for B + -trees Naïve approach Treat nodes as data objects; use 2PL Problem: low concurrency Every read/write starts from the root root becomes bottleneck for locking That s the same as locking the entire table! Sorted leaves 2 24 4

A simple B + -tree locking protocol A lookup transaction can release its lock on the parent once it gets a lock on the child An insert/delete transaction can do the same, provided that its modification cannot propagate up to the parent Never lock a node twice (even if its parent is locked all the time) More reading in Red Book: Efficient Locking for Concurrent Operations on B-Trees 2 Remember the phantom? T1: T2: SELECT * FROM Student INSERT INTO Student WHERE age = 10; VALUES(12, Nelson, 10, 2.1); COMMIT; SELECT * FROM Student WHERE age = 10; COMMIT; T2 first locks all existing rows with age 10 T1 inserts a new row with age 10 T2 then sees the new row phantom! 26 Predicate locking with B + -tree If there is a B + -tree on Student(age) T2 will lock the B + -tree node containing age value 10 T1 has to wait for this lock to update the B + -tree No more phantom! Predicate locking can be generalized to range predicates, e.g., age > 18 AND age < 20 Lock the B + -tree node (possibly non-leaf) containing this range 27 B + -tree versus ISAM ISAM is more static; B + -tree is more dynamic Performance ISAM is more compact (at least initially) Fewer levels and I/Os than B + -tree Overtime, ISAM may not be balanced Cannot provide guaranteed performance as B + -tree does Concurrency control Much easier with ISAM Because index blocks are never updated! 28 B + -tree versus B-tree B-tree: why not store records (or record pointers) in non-leaf nodes? These records can be accessed with fewer I/Os Problems Storing more data in a node decreases fan-out and increases h Records in leaves require more I/Os to access Vast majority of the records live in leaves! 29