Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles



Similar documents
B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

CSE 326: Data Structures B-Trees and B+ Trees

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Binary Heap Algorithms

Analysis of Algorithms I: Binary Search Trees

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

From Last Time: Remove (Delete) Operation

Heaps & Priority Queues in the C++ STL 2-3 Trees

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Physical Data Organization

How To Create A Tree From A Tree In Runtime (For A Tree)

Big Data and Scripting. Part 4: Memory Hierarchies

Binary Search Trees 3/20/14

DATABASE DESIGN - 1DL400

Binary Trees and Huffman Encoding Binary Search Trees

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Databases and Information Systems 1 Part 3: Storage Structures and Indices

External Memory Geometric Data Structures

Data storage Tree indexes

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Data Structure with C

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Ordered Lists and Binary Trees

Converting a Number from Decimal to Binary

Algorithms Chapter 12 Binary Search Trees

Binary Heaps. CSE 373 Data Structures

6 March Array Implementation of Binary Trees

Lecture 1: Data Storage & Index

Symbol Tables. Introduction

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Data Structure [Question Bank]

PES Institute of Technology-BSC QUESTION BANK

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

TREE BASIC TERMINOLOGIES

Data Structures and Algorithms

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Chapter 14 The Binary Search Tree

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Indexing Techniques in Data Warehousing Environment The UB-Tree Algorithm

M-way Trees and B-Trees

A Comparison of Dictionary Implementations

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Why Use Binary Trees?

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Binary Search Trees CMPSC 122

Binary Search Trees (BST)

Unit Storage Structures 1. Storage Structures. Unit 4.3

In-Memory Databases MemSQL

Algorithms and Data Structures

Data Structures Fibonacci Heaps, Amortized Analysis

Data Warehousing und Data Mining

Data Structures. Level 6 C Module Descriptor

Persistent Binary Search Trees

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Introduction Advantages and Disadvantages Algorithm TIME COMPLEXITY. Splay Tree. Cheruku Ravi Teja. November 14, 2011

Exercises Software Development I. 11 Recursion, Binary (Search) Trees. Towers of Hanoi // Tree Traversal. January 16, 2013

Introduction to Data Structures and Algorithms

Chapter 13: Query Processing. Basic Steps in Query Processing

DATA STRUCTURES USING C

Full and Complete Binary Trees

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.

Overview of Storage and Indexing

Rotation Operation for Binary Search Trees Idea:

10CS35: Data Structures Using C

Raima Database Manager Version 14.0 In-memory Database Engine

Parallelization: Binary Tree Traversal

Binary Search Trees. Ric Glassey

Load Balancing. Load Balancing 1 / 24

Database Design Patterns. Winter Lecture 24

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

International Journal of Software and Web Sciences (IJSWS)

Vector storage and access; algorithms in GIS. This is lecture 6

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Analysis of Algorithms I: Optimal Binary Search Trees

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

CIS 631 Database Management Systems Sample Final Exam

Persistent Data Structures and Planar Point Location

1/1 7/4 2/2 12/7 10/30 12/25

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Data Structures and Data Manipulation

APP INVENTOR. Test Review

Algorithms and Data Structures

Analysis of Binary Search algorithm and Selection Sort algorithm

An Evaluation of Self-adjusting Binary Search Tree Techniques

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Transcription:

B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees: (2,4) trees. Advantages: search faster than in an average binary search tree because the tree is guaranteed to have the smallest possible depth. Today: another kind of multi-way search trees (Btrees). Optimises search in external memory. Two types of memory Main memory (RAM) External (peripheral) storage: hard disk, CD-ROM, tape, etc. Different considerations are important in designing algorithms and data structures for primary (main) versus secondary (peripheral) memory. External storage So far, we analysed data structures assuming that all data is kept in main memory. If the data is kept in external memory, should take into account disk access time. It takes much longer to find the right place on disk compared to main memory. Main principles Data is stored on disk in chunks (pages, blocks, allocation units) and the disk drive reads or writes a minimum of one page at a time. To optimise an algorithm for accessing external memory, should minimise disk accesses read and write in multiples of page sizes. B-trees A good example of a data structure for external memory is a B-tree. Better than binary search trees if data is stored in external memory (they are NOT better with in-memory data!). Proposed by R. Bayer and E. M. McCreigh in 1972. 1

B-trees Example of a B-Tree Each node in a tree should correspond to a block of data. Each node stores many data items and has many successors (stores addresses of successor blocks). The tree has fewer levels but search for an item involves more comparisons at each level. For database search, it is more important to minimise page swaps than comparisons. Definition of a B-Tree of order d There is a single root node, which may have only one record and two children (or none if the tree is empty). All other non-leaf nodes must have between d/2 and d-1 ordered records and d/2 +1 and d children. All keys in the ith subtree of a node are less than the ith key (if exists) and greater than the i-1st key (if exists). All leaves on the tree must be at the same level. Practical considerations Since each node correspond to a block/page, their size is usually a power of 2. The number of records in a node is therefore usually even. The number of children (the order of the tree) is therefore usually odd. B-Tree of order 5: Search in a B-Tree The search for a record with key x is done as in all multiway search trees: Search(node,x): if node is null, return null. Iterate through records (k 1, r 1 ), (k m,r m ) of the node. If x = k i return r i. If x< k i Search(child i, x) If k m < x, Search(child m+1, x). 2

Search for record with key 40: Search for record with key 40: Inserting in a B-Tree Find the node where the item is to be inserted by following the search procedure. If the node is not full, insert the item into the node in order. If the node is full, it has to be split. Inserting 12: Inserting 12: Inserting 53: 1 2 4 5 10 12 13 16 17 28 33 39 48 52 55 67 89 1 2 4 5 10 12 13 16 17 28 33 39 48 52 55 67 89 3

Inserting 53: the node is too full! 37 50 28 33 39 48 52 53 55 67 89 Find middle value (of old keys in the node and the new key). Keep records with keys smaller than middle in the old node. Put records with keys greater than middle in the new node. Push middle record up into parent node. 37 50 37 50 55 28 33 39 48 52 53 55 67 89 28 33 39 48 52 53 67 89 Insertion 37 28 33 39 48 52 53 67 89 If this makes the parent full, split it as well and push its middle item upwards. Continue doing this until either some space is found in an ancestor node, or a new root node is created. 4

Deletion in B-trees As in (2,4) trees, replace the item to be removed by its in-order successor (which is bound to be in a leaf node). Remove successor from its leaf node. This may cause underflow (fewer than d/2 records in that leaf node u). Depending on how many records the sibling of u has, this is fixed either by fusion or by transfer. Delete 37: 37 Delete 37: find replacement (39) Delete 37: replace 37 39 Delete 37: remove 39 from leaf Delete 37: fix underflow (fusion) 39 39 1 2 4 5 10 12 13 16 17 28 33 48 52 53 67 89 1 2 4 5 10 12 13 16 17 28 33 48 52 53 67 89 5

Delete 37: fix underflow (fusion) Delete 37: fix underflow (fusion) 39 1 2 4 5 10 12 13 16 17 28 33 48 52 53 67 89 Delete 16 Delete 16 (no need to replace) Delete 16: underflow (transfer) Delete 16: underflow (transfer) 3 9 13 15 13 1 2 4 5 10 12 17 28 33 39 48 52 53 67 89 1 2 4 5 10 12 17 28 33 39 48 52 53 67 89 6

Delete 16: underflow (transfer) Efficiency of B-trees 3 9 13 1 2 4 5 10 12 15 17 28 33 39 48 52 53 67 89 If a B-tree has order d, then each node (apart from the root) has at least d/2 children. So the depth of the tree is at most log d/2 (size)+1. In the worst case have to make d-1 comparisons in each node (linear search, but d-1 is a constant factor). Fewer disk accesses than for a binary tree. Each node could correspond to one block of data (plus addresses of successor nodes). Variant of B-trees Only leaves contain data Non-leaves contain only keys and block numbers Access speed is higher because each block can hold more block numbers. Programming is more complicated because there are two kinds of nodes: leaves and non-leaves. Indexing Index: list of key/block pairs. If small enough, could be kept in the main memory. If the index is too large to be loaded in the main memory, it is often kept in a B-tree. Multiple index files: if different kinds of search have to be performed, can keep one index for every set of keys. Summary and reading Very important: the issues involved in designing algorithms and data structure for use with external memory are different! Compulsory reading: Goodrich, Tamassia, Chapter 14.3 Or, Shaffer 9.1, 9.2 (external memory), 11.5 (Btrees) or Lafore pp.400-410 (Chapter 10). External sorting (optional) 7