Binary Search Trees. Adnan Aziz. Heaps can perform extract-max, insert efficiently O(log n) worst case



Similar documents
M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

schema binary search tree schema binary search trees data structures and algorithms lecture 7 AVL-trees material

Introduction to Data Structures and Algorithms

Binary Search Trees. Each child can be identied as either a left or right. parent. right. A binary tree can be implemented where each node

Algorithms Chapter 12 Binary Search Trees

SAT Subject Math Level 1 Facts & Formulas

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Derivatives Math 120 Calculus I D Joyce, Fall 2013

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring Handout by Julie Zelenski with minor edits by Keith Schwarz

ACT Math Facts & Formulas

1.6. Analyse Optimum Volume and Surface Area. Maximum Volume for a Given Surface Area. Example 1. Solution

Instantaneous Rate of Change:

Analysis of Algorithms I: Binary Search Trees

Math 113 HW #5 Solutions

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function

From Last Time: Remove (Delete) Operation

The modelling of business rules for dashboard reporting using mutual information

How To Create A Tree From A Tree In Runtime (For A Tree)

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Geometric Stratification of Accounting Data

Optimized Data Indexing Algorithms for OLAP Systems


Pressure. Pressure. Atmospheric pressure. Conceptual example 1: Blood pressure. Pressure is force per unit area:

2 Limits and Derivatives

Schedulability Analysis under Graph Routing in WirelessHART Networks

A Comparison of Dictionary Implementations

The EOQ Inventory Formula

Verifying Numerical Convergence Rates

New Vocabulary volume

FINITE DIFFERENCE METHODS

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Note nine: Linear programming CSE Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1

Laboratory Module 6 Red-Black Trees

In other words the graph of the polynomial should pass through the points

2.1: The Derivative and the Tangent Line Problem

SAT Math Facts & Formulas

Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations

Theoretical calculation of the heat capacity

Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula

Data Structures Fibonacci Heaps, Amortized Analysis

Math Test Sections. The College Board: Expanding College Opportunity

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

How To Ensure That An Eac Edge Program Is Successful

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

College Planning Using Cash Value Life Insurance

Average and Instantaneous Rates of Change: The Derivative

6. Differentiating the exponential and logarithm functions

Lecture 4: Balanced Binary Search Trees

Data Structures and Algorithms

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Notes: Most of the material in this chapter is taken from Young and Freedman, Chap. 12.

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

TREE BASIC TERMINOLOGIES

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

Chapter 14 The Binary Search Tree

Binary Search Trees CMPSC 122

Chapter 10: Refrigeration Cycles

Writing Mathematics Papers

Binary Heap Algorithms

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Persistent Binary Search Trees

Symbol Tables. Introduction

Converting a Number from Decimal to Binary

Catalogue no XIE. Survey Methodology. December 2004

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

SAT Math Must-Know Facts & Formulas

Sections 3.1/3.2: Introducing the Derivative/Rules of Differentiation

Binary Search Trees 3/20/14


Persistent Data Structures and Planar Point Location

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

3 Ans. 1 of my $30. 3 on. 1 on ice cream and the rest on 2011 MATHCOUNTS STATE COMPETITION SPRINT ROUND

SWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle

Tangent Lines and Rates of Change

Analysis of Algorithms I: Optimal Binary Search Trees

6 March Array Implementation of Binary Trees

Lecture Notes on Binary Search Trees

Binary Heaps. CSE 373 Data Structures

PES Institute of Technology-BSC QUESTION BANK

Chapter 7 Numerical Differentiation and Integration

CS711008Z Algorithm Design and Analysis

2.12 Student Transportation. Introduction

Shell and Tube Heat Exchanger

Introduction Advantages and Disadvantages Algorithm TIME COMPLEXITY. Splay Tree. Cheruku Ravi Teja. November 14, 2011

Binary Search Tree Intro to Algorithms Recitation 03 February 9, 2011

f(x + h) f(x) h as representing the slope of a secant line. As h goes to 0, the slope of the secant line approaches the slope of the tangent line.

Lecture 2 February 12, 2003

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

An inquiry into the multiplier process in IS-LM model

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

Research on the Anti-perspective Correction Algorithm of QR Barcode

CSE 326: Data Structures B-Trees and B+ Trees

Introduction to Data Structures and Algorithms

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Cpt S 223. School of EECS, WSU

Transcription:

Binary Searc Trees Adnan Aziz 1 BST basics Based on CLRS, C 12. Motivation: Heaps can perform extract-max, insert efficiently O(log n) worst case Has tables can perform insert, delete, lookup efficiently O(1) (on average) Wat about doing searcing in a eap? Min in a eap? Max in a as table? Binary Searc Trees support searc, insert, delete, max, min, successor, predecessor time complexity is proportional to eigt of tree recall tat a complete binary tree on n nodes as eigt O(log n) Basics: A BST is organized as a binary tree added caveat: keys are stored at nodes, in a way so as to satisfy te BST property: for any node x in a BST, if y is a node in x s left subtree, ten key[y] key[x], and if y is a node in x s rigt subtree, ten key[y] key[x]. Implementation represent a node by an object wit 4 data members: key, pointer to left cild, pointer to rigt cild, pointer to parent (use NIL pointer to denote empty) 1

5 2 3 3 7 7 2 5 8 5 8 5 Figure 1: BST examples 15 6 18 3 7 17 20 2 4 13 9 Figure 2: BST working example for queries and updates 2

1.1 Query operations 1.1.1 Print keys in sorted order Te BST property allows us to print out all te keys in a BST in sorted order: INORDER-TREE-WALK(x) if x!= NIL ten INORDER-TREE-WALK(left[x]) print key[x] INORDER-TREE-WALK(rigt[x]) 1.1.2 Searc for a key TREE-SEARCH(x,k) if x = NIL or k = key[x] ten return x if k < key[x] ten return TREE-SEARCH(left[x],k) else return TREE-SEARCH(rigt[x],k) Try searc 13, 11 in example 1.1.3 Min, Max TREE-MINIMUM(x) wile (left[x]!= NIL) do x <- left[x] return x Symmetric procedure to find max. Try min=2, max=20 3

1.2 Successor and Predecessor Given a node x in a BST, sometimes want to find its successor node wose key appears immediately after x s key in an in-order walk Conceptually: if te rigt cild of x is not NIL, get min of rigt cild, oterwise examine parent Tricky wen x is left cild of parent, need to keep going up te searc tree TREE-SUCCESSOR(x) if rigt[x]!= NIL ten return TREE-MINIMUM( rigt[x] ) y <- p[x] wile y!= NIL and x = rigt[y] do x <- y y <- p[y] Symmetric procedure to find pred. Try succ 15, 6, 13, 20; pred 15, 6, 7, 2 Teorem: all operations above take time O() were is te eigt of te tree. 1.3 Updates 1.3.1 Inserts Insert: Given a node z wit key v, left[z],rigt[z],p[z] all NIL, and a BST T update T and z so tat updated tree includes te node z wit BST property still true Idea beind algoritm: begin at root, trace pat downward comparing v wit key at current node. get to te bottom insert node z by setting its parent to last node on pat, update parent left/rigt cild appropriately Refer to TREE-INSERT(T,z) in CLRS for detailed pseudo-code. Try insert keys 12, 1, 7, 16, 25 in te example 4

1.3.2 Deletions Given a node z (assumed to be a node in a BST T ), delete it from T. Tricky, need to consider 3 cases: 1. z as no cildren modify p[z] to replace te z-cild of p[z] by NIL 2. z as one cild splice z out of te tree 3. z as two cildren splice z s successor (call it y) out of te tree, and replace z s contents wit tose of y. Crucial fact: y cannot ave a left cild Refer to TREE-DELETE(T,z) in CLRS for detailed pseudo-code. Try deleting 9, 7, 6 from example Teorem: Insertion and deletion can be performed in O() time, were is te eigt of te tree. note ow we crucially made use of te BST property 2 Balanced binary searc trees Fact: if we build a BST using n inserts (assuming insert is implemented as above), and te keys appear in a random sequence, ten te eigt is very likely to be O(log n) (proved in CLRS 12.4). random sequence of keys: extremely unrealistic assumption Result olds only wen tere are no deletes if tere are deletes, te eigt will tend to O( n). Tis is because of te asymmetry in deletion te predecessor is always used to replace te node. Te asymmetry can be removed by alternating between te successor and predecessor. Question 1. Given a BST on n nodes, can you always find a BST on te same n keys aving eigt O(log n)? 5

Yes, just tink of a (near) complete binary tree Question 2. Can you implement insertion and deletion so tat te eigt of te tree is always O(log n)? Yes, but tis is not so obvious performing tese operations in O(log n) time is a lot tricker. Broadly speaking, two options to keeping BST balanced Make te insert sequence look random (treaps) Store some extra information at te nodes, related to te relative balance of te cildren modify insert and delete to ceck for and correct skewedness several classes of balanced BSTs, e.g., AVL, Red-Black, 2-3, etc. 2.1 Treaps Randomly builts BSTs tend to be balanced given set S randomly permute elements, ten insert in tat order. (CLRS 12.4) In general, all elements in S are not available at start Solution: assign a random number called te priority for eac key (assume keys are distinct). In addition to satisfying BST property (v left/rigt cild of u key(v) < / > key(u)), require tat if v is cild of u, ten priority(v) > priority(u). Treaps will ave te following property: If we insert x 1,..., x n into a treap, ten te resulting tree is te one tat would ave resulted if keys ad been inserted in te order of teir priorities Treap insertion is very fast only a constant number of pointer updates. It is used in LEDA, a widely used advanced data structure library for C++. 2.2 Deterministic balanced trees Many alternatives: AVL, Red-Black, 2-3, B-trees, weigt balance trees, etc. most commonly used are AVL and Red-Black trees. 6

G:4 B:7 H:5 C:25 G:4 B:7 H:5 A:10 E:23 K:65 A:10 E:23 K:65 I:73 C:25 I:73 D:9 G:4 G:4 B:7 H:5 B:7 H:5 A:10 E:23 K:65 A:10 E:23 K:65 C:25 I:73 D:9 I:73 D:9 C:25 G:4 F:2 F:2 B:7 H:5.. B:7 G:4 A:10 D:9 H:5 A:10 D:9 K:65 C:25 E:23 I:73 C:25 E:23 K:65 I:73 Figure 3: Treap 7 insertion

x left rotate(x) y a y rigt rotate(x) x c b c a b Figure 4: Rotations 2.3 AVL Trees Keep tree eigt balanced for eac node eigts of left and rigt subtrees differ by at most 1. Implement by keeping an extra field [x] (eigt) for eac node. Heigt of AVL tree on n nodes is O(lg n). Follows from te following recurrence: If M is te minimum number of nodes in an AVL tree of eigt, M = 1 + M 1 + M 2. Te minimum number of nodes in an AVL tree of eigt 0 or 1 is 1. Te recurrence F = F 1 + F 2 wit F 0 = F 1 = 1 is known yields wat are known as te Fibonnaci numbers. Tese numbers ave been studied in great detail; in particular, F = φ (1 φ) 5, were φ = (1 + 5)/2. Clearly, te Fibonnaci numbers grow exponentially. Since M > F, tis guarantees tat M grows exponentially wit. Ordinary insertion can trow off balance by at most 1, and only along te searc pat. moving up from added leaf, find first place along searc pat were tree is not balanced re-balance te tree using rotations (Figure 4) Upsot: insertion takes O(lg n) time, and te number of rotations is O(1). Proof: based on figures Deletions, as usual, are more complicated. Rebalancing cannot in general be acieved wit a single rotation tere are cases were Θ(lg n) rotations are required. AVL trees are very efficient in practice te worst cases require 45% more compares tan optimal trees. Empirical studies sow tat tey require lg n + 0.25 compares on average. 8

A A B C B C +1 1 1 1 1 +1 T3 T4 T3 T4 T1 T2 T1 T2 new new Figure 5: AVL: Two possible insertions tat invalidate balances. Tere are two symmetric cases on te rigt side. A is te first node on te searc pat from te new node to te root were te eigt property fails. A B B C A +1 1 1 +1 C T1 T2 T3 T4 T1 1 1 new new T2 T3 T4 Figure 6: AVL: single rotation to rectify invalidation 9

A D B C B A E D E F G C F G 1 T4 T2 1 T3 T1 T2 T3 T1 new T4 new Figure 7: AVL: double rotation to rectify invalidation 2.4 Red-black trees CLRS Capter 13 RB-tree is a BST wit an additional bit of storage per node its color, wic can be red or black. Needs to satisfy te following properties: P1 Every node is colored red or black P2 Te root is black P3 Every leaf (NIL) is black P4 If a node is red, bot its cildren are black P5 For every node, all pats to descendant leaves ave te same number of black nodes Define te black eigt of node x, b(x), to be te number of black nodes, not including x, on any pat from x to leaf (note tat P5 guarantees tat te black eigt is well defined). Lemma 1 A red-black tree wit n internal nodes as eigt at most 1 + 2 lg n. 10

26 17 41 14 21 30 47 10 16 19 23 28 38 7 12 15 20 35 39 3 Figure 8: RB tree: Tick circles represent black nodes; we re ignoring te NIL leaves. Proof: Subtree rooted at any node x as at least 2 b(x) 1 internal nodes tis follows directly by induction. Ten note tat P4 guarantees at least alf te nodes on any pat from root to leaf are black (including root). So te black eigt is at least /2, ence n 2 /2 1; result follows from moving te 1 to left side, and taking logs. From te above it follows tat te insert and delete operations take O(lg n) time. However, tey may leave result in violations of some of te properties P1 P5. Properties are restored by canging colors, and updating pointers troug rotations. Conceptually rotation is simple, but code looks a mess see page 278, CLRS. Example of update sequence for insertion in Figure 9. 2.5 Splay trees Based on Tarjan, Data Structures and Network Algoritms, 1983, Capter 4, Section 3. Key idea adjust BST after eac searc, insert, and delete operation. (Tarjan actually allows a couple more operations, specifically, joining two trees, and splitting a tree at a node, but we ll stick to te searc/insert/delete ops.) Key result altoug a single operation may be expensive (Θ(n), were n is te number of keys in te tree), a sequence of m total operations, of wic n are inserts will complete in O(m lg n). Basic operation: splaying. 11

11 2 1 7 14 15 z 4 5 8 z and p[z] red, z s uncle red > recolor 11 2 z 1 7 5 8 14 15 4 z & p[z] red, z s uncle black, z rigt cild of p[z] > left rotate 11 z 2 7 8 14 15 1 5 4 z & p[z] red, z left cild of p[z] > rigt rotate, recolor 7 z 11 2 1 5 8 14 15 4 12 Figure 9: Insert in RB-tree

As part of searc/insert/update, perform splaying on eac node on pat from x to root. Definition of splaying at a node x: if x as a parent, but no grandparent, rotate at p(x) if x as a grandparent (wic implies it as a parent), and bot x and p(x) are left cildren or bot are rigt cildren, ten rotate at p(p(x)) ten at p(x) if x as a grandparent and x is a left cild, and p(x) is a rigt cild or vice versa, roate at p(x) and ten te new p(x) (wic will be te old grandparent of x) Searc/update node x splay at x Delete x splay its parent just prior to deletion 3 Augmenting data structures CLRS Capter 14 Typical engineering situation: rarely te case tat cut-and-paste from textbook suffices. sometimes need a wole new data structure most often, augment existing data structure wit some auxiliary information Two examples dynamic order statistics, and interval trees 3.1 Dynamic order statistics In addition to usual BST operations (insert, delete, lookup, succ, pred, min, max), would like to ave a rank operation: rank of element is its position in te linear order of te set One approac to return fast rank information balanced BST wit additional size field size[x] = number of nodes in subtree rooted at x, including x itself size[x] = size[left[x]] + size[rigt[x]] + 1, were size[nil] = 0 Two kinds of queries: 13

Retrieve element wit given rank Determine rank of element Let s see ow to compute rank information using size field; later will sow ow size can be updated troug inserts and deletes. // Return element wit rank i in tree rooted at node x OS_SELECT(x,i) r <- size[left[x]] + 1 if ( i = r ) ten return x; elseif ( i < r ) ten return OS_SELECT(left[x], i) else return OS_SELECT(rigt[x], i-r) Idea: case analysis // Return rank of element at node x in tree T OS_RANK(T,x) r <- size[left[x]] + 1 y <- x wile ( y!= root[t] ) do if ( y = rigt[p[y]] ) return r ten r <- r + size[left[p[y]]] + 1 y <- p[y] Idea: at start of eac iteration of wile loop, r is rank of key[x] in subtree rooted at node y Bot rank algoritms are O() (= O(lg n) for balanced BST). How to preserve size field troug inserts and deletes? 14

Insert before rotations: simply increment size field for eac node on searc pat rotations: only two nodes ave field canges upsot: insert remains O() Deletion: similar argument 3.2 Interval trees Augment BSTs to support operations on intervals. A closed interval [t 1, t 2 ] is an ordered pair of real numbers t 1 t 2 ; it represents te set {t R t 1 t t 2 }. (Can also define open and alf-open intervals, but no real difference). Useful for representing events wic occupy a continuous period of time. Natural query: wat events appened in a given time? Nice solution via BSTs. Represent interval [t 1, t 2 ] as an object i, wit fields low[i] = t 1 and ig[i] = t 2. Fact: intervals satisfy tricotomy if i, i are intervals, ten eiter tey overlap, or one is to te left of te oter Interval tree BST wit eac node containing an interval. given node x wit interval int[x], BST key is low[int[x]] store additional information: max[x], maximum value of any endpoint stored in subtree rooted at x must update information troug inserts and deletes can be done efficiently: max[x] = max(ig[int[x]], max[left[x]], max[rigt[x]]) New operation: interval searc 15

// find node in T wose interval overlaps wit interval i INTERVAL_SEARCH( T, i ) x <- root[t] wile ( x!= nil[t] ) && ( i does not overlap int[x] ) do if ( left[x]!= nil[t] && ( max[left[x]] >= low[i] ) return x ten x <- left[x] else x <- rigt[x] Wy does tis work? If tree T contains an interval tat overlaps i, ten tere is suc an interval in te subtree rooted at T 16