CSE 326: Data Structures B-Trees and B+ Trees



Similar documents
B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

From Last Time: Remove (Delete) Operation

Binary Search Trees CMPSC 122

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

DATABASE DESIGN - 1DL400

Physical Data Organization

Big Data and Scripting. Part 4: Memory Hierarchies

Binary Heap Algorithms

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Binary Search Trees (BST)

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Review of Hashing: Integer Keys

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Heaps & Priority Queues in the C++ STL 2-3 Trees

Data storage Tree indexes

Introduction Advantages and Disadvantages Algorithm TIME COMPLEXITY. Splay Tree. Cheruku Ravi Teja. November 14, 2011

Chapter 13: Query Processing. Basic Steps in Query Processing

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Converting a Number from Decimal to Binary

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Data Warehousing und Data Mining

Symbol Tables. Introduction

A Comparison of Dictionary Implementations

Ordered Lists and Binary Trees

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Binary Search Tree Intro to Algorithms Recitation 03 February 9, 2011

M-way Trees and B-Trees

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Data Structures and Algorithms

Binary Heaps. CSE 373 Data Structures

IMPLEMENTING CLASSIFICATION FOR INDIAN STOCK MARKET USING CART ALGORITHM WITH B+ TREE

External Memory Geometric Data Structures

Lecture 1: Data Storage & Index

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

TREE BASIC TERMINOLOGIES

How To Create A Tree From A Tree In Runtime (For A Tree)

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Database Systems. Session 8 Main Theme. Physical Database Design, Query Execution Concepts and Database Programming Techniques

Name: 1. CS372H: Spring 2009 Final Exam

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

PES Institute of Technology-BSC QUESTION BANK

The ADT Binary Search Tree

Why Use Binary Trees?

Chapter 14 The Binary Search Tree

ECE 250 Data Structures and Algorithms MIDTERM EXAMINATION /5:15-6:45 REC-200, EVI-350, RCH-106, HH-139

Lecture Notes on Binary Search Trees

Binary Search Trees 3/20/14

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Rotation Operation for Binary Search Trees Idea:

Overview of Storage and Indexing

Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

File Management. Chapter 12

10CS35: Data Structures Using C

Binary Trees. Wellesley College CS230 Lecture 17 Thursday, April 5 Handout #28. PS4 due 1:30pm Tuesday, April

Data Structure with C

Analysis of Algorithms I: Binary Search Trees

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Analysis of Algorithms I: Optimal Binary Search Trees

Project Group High- performance Flexible File System 2010 / 2011

Full and Complete Binary Trees

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Binary Trees and Huffman Encoding Binary Search Trees

Data Structures and Data Manipulation

Optimal Binary Search Trees Meet Object Oriented Programming

recursion, O(n), linked lists 6/14

International Journal of Software and Web Sciences (IJSWS)

In-Memory Databases MemSQL

Vector storage and access; algorithms in GIS. This is lecture 6

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Algorithms Chapter 12 Binary Search Trees

An Evaluation of Self-adjusting Binary Search Tree Techniques

Algorithms and Methods for Distributed Storage Networks 7 File Systems Christian Schindelhauer

6 March Array Implementation of Binary Trees

Persistent Data Structures and Planar Point Location

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

Persistent Binary Search Trees

Data Structures Fibonacci Heaps, Amortized Analysis

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

Data Structures. Level 6 C Module Descriptor

Raima Database Manager Version 14.0 In-memory Database Engine

Cpt S 223. School of EECS, WSU

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

File System Management

Algorithms and Data Structures

Binary search tree with SIMD bandwidth optimization using SSE

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Algorithms and Data Structures

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Sample Questions Csci 1112 A. Bellaachia

Fundamental Algorithms

Data Structures For IP Lookup With Bursty Access Patterns

Transcription:

Announcements (4//08) CSE 26: Data Structures B-Trees and B+ Trees Brian Curless Spring 2008 Midterm on Friday Special office hour: 4:-5: Thursday in Jaech Gallery (6 th floor of CSE building) This is instead of my usual 11am office hour. Reading for this lecture: Weiss Sec. 4.7 2 Traversing very large datasets Suppose we had very many pieces of data (as in a database), e.g., n = 2 10 9. How many (worst case) hops through the tree to find a node? BST AVL Memory considerations What is in a tree node? In an object? Node: Object obj; Node left; Node right; Node parent; Object: Key key; data Splay Suppose the data is 1KB. How much space does the tree take? How much of the data can live in 1GB of RAM? 4

CPU Cycles to access: Registers 1 L1 Cache 2 Minimizing random disk access In our example, almost all of our data structure is on disk. L2 Cache Thus, hopping through a tree amounts to random accesses to disk. Ouch! Main memory 250 How can we address this problem? Disk Random:,000,000 Streamed: 5000 5 6 Complete tree has height: M-ary Search Tree Suppose, somehow, we devised a search tree with maximum branching factor M: B-Trees How do we make an M-ary search tree work? Each node has (up to) M-1 keys. Order property: subtree between two keys x and y contain leaves with values v such that x < v < y 7 21 # hops for find: Runtime of find: 7 x< <x<7 7<x< <x<21 21<x 8

B-Tree Structure Properties Root (special case) has between 2 and M children (or root could be a leaf) Internal nodes store up to M-1 keys have between M/2 and M children Leaf nodes store between (M-1)/2 and M-1 sorted keys all at the same depth B-Tree with M = 4 B-Tree: Example 10 42 6 20 27 4 50 1 2 8 9 22 24 28 2 5 8 9 44 47 52 60 9 10 B+ Trees In a B+ tree, the internal nodes have no data only the leaves do! Each internal node still has (up to) M-1 keys: Order property: subtree between two keys x and y contain leaves with values v such that x v < y Note the Leaf nodes have up to L sorted keys. 7 21 x< x<7 7 x< x<21 21 x 11 B+ Tree Structure Properties Root (special case) has between 2 and M children (or root could be a leaf) Internal nodes store up to M-1 keys have between M/2 and M children Leaf nodes where data is stored all at the same depth contain between L/2 and L data items

B+ Tree: Example B+ Tree with M = 4 (# pointers in internal node) and L = 5 (# data items in leaf) Disk Friendliness What makes B+ trees disk-friendly? Data objects 44 which I ll ignore in slides 6 20 27 4 50 1, AB.. 2, GH.. 4, XY.. 6 8 9 10 17 19 20 22 24 27 28 2 4 8 9 41 44 47 49 50 60 70 All leaves at the same depth 1. Many keys stored in a node All brought to memory/cache in one disk access. 2. Internal nodes contain only keys; Only leaf nodes contain keys and actual data Much of tree structure can be loaded into memory irrespective of data object size Data actually resides in disk Definition for later: neighbor is the next sibling to the left or right. 1 B+ trees vs. AVL trees Building a B+ Tree with Insertions Suppose again we have n = 2 10 9 items: Depth of AVL Tree Insert() Insert() Insert() Depth of B+ Tree with M = 256, L = 256 Great, but how to we actually make a B+ tree and keep it balanced? The empty B-Tree M = L =

Insert() Insert(2) Insert(6) Insert() M = L = 17 M = L = 2 2 2 6 Insert() 2 6 Insert(,,,8) 2 2 2 2 6 2 6 2 8 6 M = L = 19 M = L = 20

Insertion Algorithm And Now for Deletion 1. Insert the key in its leaf in sorted order 2. If the leaf ends up with L+1 items, overflow! Split the leaf into two nodes: original with (L+1)/2 items new one with (L+1)/2 items Add the new child to the parent If the parent ends up with M+1 children, overflow! This makes the tree deeper!. If an internal node ends up with M+1 children, overflow! Split the node into two nodes: original with (M+1)/2 children new one with (M+1)/2 children Add the new child to the parent If the parent ends up with M+1 items, overflow! 4. Split an overflowed root in two and hang the new nodes under a new root 5. Propagate keys up tree. 21 M = L = Delete(2) 2 2 6 8 22 Delete() Delete() 6 6 6 6 6 6 6 6 8 8 8 8 M = L = 2 M = L = 24

Delete() 6 Delete() 6 6 6 6 6 8 8 8 M = L = 25 M = L = 26 6 Delete() 6 6 6 8 6 8 6 8 M = L = 27 M = L = 28

Deletion Algorithm 1. Remove the key from its leaf 2. If the leaf ends up with fewer than L/2 items, underflow! Adopt data from a neighbor; update the parent If adopting won t work, delete node and merge with neighbor If the parent ends up with fewer than M/2 children, underflow! 29 Deletion Slide Two. If an internal node ends up with fewer than M/2 children, underflow! Adopt from a neighbor; update the parent If adoption won t work, merge with neighbor If the parent ends up with fewer than M/2 children, underflow! 4. If the root ends up with only one child, make the child the new root of the tree 5. Propagate keys up through tree. This reduces the height of the tree! Thinking about B+ Trees Tree Names You Might Encounter B+ Tree insertion can cause (expensive) splitting and propagation B+ Tree deletion can cause (cheap) adoption or (expensive) deletion, merging and propagation Propagation is rare if M and L are large (Why?) Pick branching factor M and data items/leaf L such that each node takes one full page/block of memory/disk. FYI: B-Trees with M =, L = x are called 2- trees Nodes can have 2 or keys B-Trees with M = 4, L = x are called 2--4 trees Nodes can have 2,, or 4 keys 1 2