Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb



Similar documents
From Last Time: Remove (Delete) Operation

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Converting a Number from Decimal to Binary

How To Create A Tree From A Tree In Runtime (For A Tree)

Binary Heap Algorithms

A Comparison of Dictionary Implementations

Persistent Binary Search Trees

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

CSE 326: Data Structures B-Trees and B+ Trees

6 March Array Implementation of Binary Trees

Algorithms Chapter 12 Binary Search Trees

Fundamental Algorithms

Analysis of Algorithms I: Binary Search Trees

Heaps & Priority Queues in the C++ STL 2-3 Trees

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

TREE BASIC TERMINOLOGIES

Binary Search Trees CMPSC 122

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Data Structures and Algorithms

Full and Complete Binary Trees

Why Use Binary Trees?

Data Structures Fibonacci Heaps, Amortized Analysis

Binary Heaps. CSE 373 Data Structures

M-way Trees and B-Trees

Cpt S 223. School of EECS, WSU

An Evaluation of Self-adjusting Binary Search Tree Techniques

CS711008Z Algorithm Design and Analysis

Ordered Lists and Binary Trees

Big Data and Scripting. Part 4: Memory Hierarchies

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Lecture 1: Course overview, circuits, and formulas

Analysis of Algorithms I: Optimal Binary Search Trees

Physical Data Organization

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Lecture 4: Balanced Binary Search Trees

Chapter 14 The Binary Search Tree

Algorithms and Data Structures

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Chapter 13: Query Processing. Basic Steps in Query Processing

Introduction Advantages and Disadvantages Algorithm TIME COMPLEXITY. Splay Tree. Cheruku Ravi Teja. November 14, 2011

Binary Search Trees 3/20/14

DATABASE DESIGN - 1DL400

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

The Goldberg Rao Algorithm for the Maximum Flow Problem

Exam study sheet for CS2711. List of topics

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Binary Trees and Huffman Encoding Binary Search Trees

Randomized Binary Search Trees

Binary Search Trees. Each child can be identied as either a left or right. parent. right. A binary tree can be implemented where each node

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

External Memory Geometric Data Structures

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Data storage Tree indexes

Laboratory Module 6 Red-Black Trees

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Review of Hashing: Integer Keys

Section IV.1: Recursive Algorithms and Recursion Trees

GRAPH THEORY LECTURE 4: TREES

Algorithms. Margaret M. Fleck. 18 October 2010

IMPLEMENTING CLASSIFICATION FOR INDIAN STOCK MARKET USING CART ALGORITHM WITH B+ TREE

PES Institute of Technology-BSC QUESTION BANK

Binary Search Trees. basic implementations randomized BSTs deletion in BSTs

Introduction to Data Structures and Algorithms

Outline. Introduction Linear Search. Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques

Symbol Tables. Introduction

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Binary Search Trees (BST)

The ADT Binary Search Tree

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Sample Questions Csci 1112 A. Bellaachia

Rotation Operation for Binary Search Trees Idea:

Optimal Binary Search Trees Meet Object Oriented Programming

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Simple Balanced Binary Search Trees

Binary Search Tree Intro to Algorithms Recitation 03 February 9, 2011

DESIGN AND ANALYSIS OF ALGORITHMS

Fast Sequential Summation Algorithms Using Augmented Data Structures

CIS 631 Database Management Systems Sample Final Exam

Binary search algorithm

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Analysis of a Search Algorithm

CS473 - Algorithms I

Data Structures and Data Manipulation

Transcription:

Binary Search Trees Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 27

1 Properties of Binary Search Trees 2 Basic BST operations The worst-case time complexity of BST operations 4 The average-case time complexity of BST operations 5 Self-balancing binary and multiway search trees 6 Self-balancing BSTs: AVL trees 7 Self-balancing BSTs: Red-black trees 8 Balanced B-trees for external search 2 / 27

Binary Search Tree: Left-Right Ordering of Keys Left-to-right numerical ordering in a BST: for every node i, the values of all the keys k left:i in the left subtree are smaller than the key k i in i and the values of all the keys k right:i in the right subtree are larger than the key k i in i: {k left:i } l < k i < r {k right:i } Compare to the bottom-up ordering in a heap where the key k i of every parent node i is greater than or equal to the keys k l and k r in the left and right child node l and r, respectively: k i k l and k i k r. i : k i BST i : k i Heap {k left:i } {k right:i } l : k l r : k r / 27

Binary Search Tree: Left-Right Ordering of Keys 10 10 10 15 15 15 1 5 1 5 1 11 4 8 2 8 4 12 BST Non-BST: Key 2 cannot be in the right subtree of key. Non-BST: Keys 11 and 12 cannot be in the left subtree of key 10. 4 / 27

Basic BST Operations BST is an explicit data structure implementing the table ADT. BST are more complex than heaps: any node may be removed, not only a root or leaves. The only practical constraint: no duplicate keys (attach them all to a single node). Basic operations: find a given search key or detect that it is absent in the BST. insert a node with a given key to the BST if it is not found. findmin: find the minimum key. findmax: find the maximum key. remove a node with a given key and restore the BST if necessary. 5 / 27

BST Operations Find / Insert a Node 10 8 < 10? 10 7 < 10? 8 <? 15 7 <? 15 1 5 8 < 5? 1 5 7 < 5? 4 8 8 = 8 4 8 7 < 8? Found node find: a successful binary search. 7 Inserted node insert: creating a new node at the point where an unsuccessful search stops. 6 / 27

BST Operations: FindMin / FindMax Extremely simple: starting at the root, branch repeatedly left (findmin) or right (findmax) as long as a corresponding child exists. The root of the tree plays a role of the pivot in quicksort and quickselect. As in quicksort, the recursive traversal of the tree can sort the items: 1 First visit the left subtree; 2 Then visit the root, and Then visit the right subtree. O(log n) average-case and O(n) worst-case running time for find, insert, findmin, and findmax operations, as well as for selecting a single item (just as in quickselect). 7 / 27

BST Operation: Remove a Node The most complex because the tree may be disconnected. Reattachment must retain the ordering condition. Reattachment should not needlessly increase the tree height. Standard method of removing a node i with c children: c Action 0 Simply remove the leaf i. 1 Remove the node i after linking its child to its parent node. 2 Swap the node i with the node j having the smallest key k j in the right subtree of the node i. After swapping, remove the node i (as now it has at most one right child). In spite of its asymmetry, this method cannot be really improved. 8 / 27

BST Operation: Remove a Node Remove 10 Replace 10 (swap with 12 and delete) 5 5 5 10 10 12 1 4 8 15 1 4 8 15 1 4 8 15 0 2 6 12 18 0 2 6 12 18 0 2 6 18 Minimum key in the right subtree 9 / 27

Analysing BST: The Worst-case Time Complexity Lemma.11: The search, retrieval, update, insert, and remove operations in a BST all take time in O(h) in the worst case, where h is the height of the tree. Proof: The running time T (n) of these operations is proportional to the number of nodes ν visited. Find / insert: ν = 1 + the depth of the node. Remove: the depth + at most the height of the node. In each case T (n) = O(h). For a well-balanced BST, T (n) O(log n) (logarithmic time). In the worst case T (n) Θ(n) (linear time) because insertions and deletions may heavily destroy the balance. 10 / 27

Analysing BST: The Worst-case Time Complexity BSTs of height h log n 5 5 BSTs of height h n 15 1 1 10 15 10 4 10 8 4 5 5 10 1 4 8 15 8 15 4 8 4 10 8 1 1 15 5 11 / 27

Analysing BST: The Average-case Time Complexity More balanced trees are more frequent than unbalanced ones. Definition.12: The total internal path length, S τ (n), of a binary tree τ is the sum of the depths of all its nodes. Depth 0 1 2 S τ (8) = 0 + + 1 + 1 + 2 + 2 + + + = 15 Average complexity of a successful search in τ: the average node depth, 1 n S 1 15 τ (n), e.g. Sτ (8) = = 1.875 in this example. 8 8 Average-case complexity of searching: Averaging S τ (n) for all the trees of size n, i.e. for all possible 1 n! insertion orders, occurring with equal probability, n!. 12 / 27

The Θ(log n) Average-case BST Operations Let S(n) be the average of the total internal path length, S τ (n), over all BST τ created from an empty tree by sequences of n random insertions, each sequence considered as equiprobable. Lemma.1: The expected time for successful and unsuccessful search (update, retrieval, insertion, and deletion) in such BST is Θ(log n). Proof: It should be proven that S(n) Θ(n log n). Obviously, S(1) = 0. Any n-node tree, n > 1, contains a left subtree with i nodes, a root at height 0, and a right subtree with n i 1 nodes; 0 i n 1. For a fixed i, S(n) = (n 1) + S(i) + S(n i 1), as the root adds 1 to the path length of each other node. 1 / 27

The Θ(log n) Average-case BST Operations Proof of Lemma.1 (continued): After summing these recurrences for 0 i n 1 and averaging, just the same recurrence as for the average-case quicksort analysis is obtained: S(n) = (n 1) + 2 n n 1 i=0 S(i) Therefore, S(n) Θ(n log n), and the expected depth of a node is 1 ns(n) Θ(log n). Thus, the average-case search, update, retrieval and insertion time is in Θ(log n). It is possible to prove (but in a more complicate way) that the average-case deletion time is also in Θ(log n). The BST allow for a special balancing, which prevents the tree height from growing too much, i.e. avoids the worst cases with linear time complexity Θ(n). 14 / 27

Self-balanced Search Trees Balancing ensures that the total internal path lengths of the trees are close to the optimal value of n log n. The average-case and the worst-case complexity of operations is O(log n) due to the resulting balanced structure. But the insertion and removal operations take longer time on the average than for the standard binary search trees. Balanced BST: AVL trees (1962: G. M. Adelson-Velskii and E. M. Landis). Red-black trees (1972: R. Bayer) symmetric binary B-trees ; the present name and definition: 1978; L. Guibas and R. Sedgewick. AA-trees (199: A. Anderson). Balanced multiway search trees: B-trees (1972: R. Bayer and E. McCreight). 15 / 27

Self-balancing BSTs: AVL Trees Complete binary trees have a too rigid balance condition to be maintained when new nodes are inserted. Definition.14: An AVL tree is a BST with the following additional balance property: for any node in the tree, the height of the left and right subtrees can differ by at most 1. The height of an empty subtree is 1. Advantages of the AVL balance property: Guaranteed height Θ(log n) for an AVL tree. Less restrictive than requiring the tree to be complete. Efficient ways for restoring the balance if necessary. 16 / 27

Self-balancing BSTs: AVL Trees Lemma.15: The height of an AVL tree with n nodes is Θ(log n). Proof: Due to the possibly different heights of subtrees, an AVL tree of height h may contain fewer than 2 h+1 1 nodes of the complete tree. Let S h be the size of the smallest AVL tree of height h. S 0 = 1 (the root only) and S 1 = 2 (the root and one child). The smallest AVL tree of height h has the smallest subtrees of height h 1 and h 2 by the balance property, so that S h = S h 1 + S h 2 + 1 = F h+ 1 i 1 2 4 5 6 7... h 0 1 2 4... F i 1 1 2 5 8 1... S h 1 2 4 7 12... where F i is the i th Fibonacci number (recall Lecture 6). 17 / 27

Self-balancing BSTs: AVL Trees (Proof of Lemma.15 cont.) That S h = F h+ 1 is easily proven by induction: Base case: S 0 = F 1 = 1 and S 1 = F 4 1 = 2. Hypothesis: Let S i = F i+ 1 and S i 1 = F i+2 1. Inductive step: Then S i+1 = S i + S i 1 1 = F i+ 1 } {{ } F } i+2 {{ 1 } S i S i 1 Therefore, for each AVL tree of height h and with n nodes: n S h ϕh+ 5 1 where ϕ 1.618, so that its height h 1.44 lg(n + 1) 1.. The worst-case height is at most 44% more than the minimum height for binary trees. The average-case height of an AVL tree is provably close to lg n. 18 / 27

Self-balancing BSTs: AVL Trees Rotation to restore the balance after BST insertions and deletions: q p p c a q a b Right rotation b c Left rotation If there is a subtree of large height below the node a, the right rotation will decrease the overall tree height. All self-balancing binary search trees use the idea of rotation. Rotations are mutually inverse and change the tree only locally. Balancing of AVL trees requires extra memory and heavy computations. More relaxed efficient BSTs, r.g., red-black trees, are used more often in practice. 19 / 27

Self-balancing BSTs: Red-black Trees Definition.17: A red-black tree is a BST such that Every node is coloured either red or black. Every non-leaf node has two children. The root is black. All children of a red node must be black. Every path from the root to a leaf must contain the same number of black nodes. Theorem.18: If every path from the root to a leaf contains b black nodes, then the tree contains at least 2 b 1 black nodes. 20 / 27

Self-balaning BSTs: Red-black Trees Proof of Theorem.18: Base case: Holds for b = 1 (either the black root only or the black root and one or two red children). Hypothesis: Let it hold for all red-black trees with b black nodes in every path. Inductive step: A tree with b + 1 black nodes in every path and two black children of the root contains two subtrees with b black nodes just under the root and has in total at least 1 + 2 (2 b 1) = 2 b+1 1 black nodes. If the root has a red child, the latter has only black children, so that the total number of the black nodes can become even larger. 21 / 27

Self-balancing BSTs: Red-black and AA Trees Searching in a red-black tree is logarithmic, O(log n). Each path cannot contain two consecutive red nodes and increase more than twice after all the red nodes are inserted. Therefore, the height of a red-black tree is at most 2 lg n. No precise average-case analysis (only empirical findings or properties of red-black trees with n random keys): The average case: lg n comparisons per search. The worst case: < 2 lg n + 2 comparisons per search. O(1) rotations and O(log n) colour changes to restore the tree after inserting or deleting a single node. AA-trees: the red-black trees where the left child may not be red are even more efficient if node deletions are frequent. 22 / 27

Balanced B-trees The Big-Oh analysis is invalid if the assumed equal time complexity of elementary operations does not hold. External ordered databases on magnetic or optical disks. One disk access hundreds of thousands of computer instructions. The number of accesses dominates running time. Even logarithmic worst-case complexity of red-black or AA-trees is unacceptable. Each search should involve a very small number of disk accesses. Binary tree search (with an optimal height lg n) cannot solve the problem. Height of an optimal m-ary search tree (m-way branching): log m n, i.e. lg n lg m 2 / 27

Balanced B-trees Height of the optimal m-ary search tree with n nodes: n 10 5 10 6 10 7 10 8 10 9 10 10 10 11 10 12 log 2 n 17 20 24 27 0 6 9 log 10 n 5 6 7 8 9 10 11 12 log 100 n 4 4 5 5 6 6 log 1000 n 2 2 4 4 4 Multiway search tree of order m = 4: k < 4 4; 10 4 k < 10 10 k 1; 6; 8 14; 17; 20 0 1 4 6 8 10 14 17 20 Data records are associated only with leaves (most of definitions). 24 / 27

Balanced B-trees A B-tree of order m is an m-ary search tree such that: 1 The root either is a leaf, or has µ {2,..., m} children. 2 There are µ { } m 2,..., m children of each non-leaf node, except possibly the root. µ 1 keys, (θ i : i = 1,..., µ 1), guide the search in each non-leaf node with µ children, θ i being the smallest key in subtree i + 1. 4 All leaves at the same depth. 5 Data items are in leaves, each leaf storing λ { l 2,..., l } items, for some l. Conditions 1 : to define the memory space for each node. Conditions 4 5: to form a well-balanced tree. 25 / 27

Balanced B-trees B-trees are usually named by their branching limits m 2 m: e.g., 2 trees with m = or 2 4 trees with m = 4. m = 4; l = 7: 2 4 B-tree with the leaf storage size 7 (2..4 children per node and 4..7 data items per leaf) < 55 75 55; < 75 < 2 2..0 1..9 40 < 60 60..70 71 < 84 84..90 91 26 / 27

Balanced B-trees Because the nodes are at least half full, a B-tree with m 8 cannot be a simple binary or ternary tree. Simple data insertion if the corresponding leaf is not full. Otherwise, splitting a full leaf into two leaves, both having the minimum number of data items, and updating the parent node. If necessary, the splitting propagates up until finding a parent that need not be split or reaching the root. Only in the extremely rare case of splitting the root, the tree height increases, and a new root with two children (halves of the previous root) is created. Data insertion, deletion, and retrieval in the worst case: about log m n 2 disk accesses. This number is practically constant if m is sufficiently big. 27 / 27