Binary Searc Trees Adnan Aziz 1 BST basics Based on CLRS, C 12. Motivation: Heaps can perform extract-max, insert efficiently O(log n) worst case Has tables can perform insert, delete, lookup efficiently O(1) (on average) Wat about doing searcing in a eap? Min in a eap? Max in a as table? Binary Searc Trees support searc, insert, delete, max, min, successor, predecessor time complexity is proportional to eigt of tree recall tat a complete binary tree on n nodes as eigt O(log n) Basics: A BST is organized as a binary tree added caveat: keys are stored at nodes, in a way so as to satisfy te BST property: for any node x in a BST, if y is a node in x s left subtree, ten key[y] key[x], and if y is a node in x s rigt subtree, ten key[y] key[x]. Implementation represent a node by an object wit 4 data members: key, pointer to left cild, pointer to rigt cild, pointer to parent (use NIL pointer to denote empty) 1
5 2 3 3 7 7 2 5 8 5 8 5 Figure 1: BST examples 15 6 18 3 7 17 20 2 4 13 9 Figure 2: BST working example for queries and updates 2
1.1 Query operations 1.1.1 Print keys in sorted order Te BST property allows us to print out all te keys in a BST in sorted order: INORDER-TREE-WALK(x) if x!= NIL ten INORDER-TREE-WALK(left[x]) print key[x] INORDER-TREE-WALK(rigt[x]) 1.1.2 Searc for a key TREE-SEARCH(x,k) if x = NIL or k = key[x] ten return x if k < key[x] ten return TREE-SEARCH(left[x],k) else return TREE-SEARCH(rigt[x],k) Try searc 13, 11 in example 1.1.3 Min, Max TREE-MINIMUM(x) wile (left[x]!= NIL) do x <- left[x] return x Symmetric procedure to find max. Try min=2, max=20 3
1.2 Successor and Predecessor Given a node x in a BST, sometimes want to find its successor node wose key appears immediately after x s key in an in-order walk Conceptually: if te rigt cild of x is not NIL, get min of rigt cild, oterwise examine parent Tricky wen x is left cild of parent, need to keep going up te searc tree TREE-SUCCESSOR(x) if rigt[x]!= NIL ten return TREE-MINIMUM( rigt[x] ) y <- p[x] wile y!= NIL and x = rigt[y] do x <- y y <- p[y] Symmetric procedure to find pred. Try succ 15, 6, 13, 20; pred 15, 6, 7, 2 Teorem: all operations above take time O() were is te eigt of te tree. 1.3 Updates 1.3.1 Inserts Insert: Given a node z wit key v, left[z],rigt[z],p[z] all NIL, and a BST T update T and z so tat updated tree includes te node z wit BST property still true Idea beind algoritm: begin at root, trace pat downward comparing v wit key at current node. get to te bottom insert node z by setting its parent to last node on pat, update parent left/rigt cild appropriately Refer to TREE-INSERT(T,z) in CLRS for detailed pseudo-code. Try insert keys 12, 1, 7, 16, 25 in te example 4
1.3.2 Deletions Given a node z (assumed to be a node in a BST T ), delete it from T. Tricky, need to consider 3 cases: 1. z as no cildren modify p[z] to replace te z-cild of p[z] by NIL 2. z as one cild splice z out of te tree 3. z as two cildren splice z s successor (call it y) out of te tree, and replace z s contents wit tose of y. Crucial fact: y cannot ave a left cild Refer to TREE-DELETE(T,z) in CLRS for detailed pseudo-code. Try deleting 9, 7, 6 from example Teorem: Insertion and deletion can be performed in O() time, were is te eigt of te tree. note ow we crucially made use of te BST property 2 Balanced binary searc trees Fact: if we build a BST using n inserts (assuming insert is implemented as above), and te keys appear in a random sequence, ten te eigt is very likely to be O(log n) (proved in CLRS 12.4). random sequence of keys: extremely unrealistic assumption Result olds only wen tere are no deletes if tere are deletes, te eigt will tend to O( n). Tis is because of te asymmetry in deletion te predecessor is always used to replace te node. Te asymmetry can be removed by alternating between te successor and predecessor. Question 1. Given a BST on n nodes, can you always find a BST on te same n keys aving eigt O(log n)? 5
Yes, just tink of a (near) complete binary tree Question 2. Can you implement insertion and deletion so tat te eigt of te tree is always O(log n)? Yes, but tis is not so obvious performing tese operations in O(log n) time is a lot tricker. Broadly speaking, two options to keeping BST balanced Make te insert sequence look random (treaps) Store some extra information at te nodes, related to te relative balance of te cildren modify insert and delete to ceck for and correct skewedness several classes of balanced BSTs, e.g., AVL, Red-Black, 2-3, etc. 2.1 Treaps Randomly builts BSTs tend to be balanced given set S randomly permute elements, ten insert in tat order. (CLRS 12.4) In general, all elements in S are not available at start Solution: assign a random number called te priority for eac key (assume keys are distinct). In addition to satisfying BST property (v left/rigt cild of u key(v) < / > key(u)), require tat if v is cild of u, ten priority(v) > priority(u). Treaps will ave te following property: If we insert x 1,..., x n into a treap, ten te resulting tree is te one tat would ave resulted if keys ad been inserted in te order of teir priorities Treap insertion is very fast only a constant number of pointer updates. It is used in LEDA, a widely used advanced data structure library for C++. 2.2 Deterministic balanced trees Many alternatives: AVL, Red-Black, 2-3, B-trees, weigt balance trees, etc. most commonly used are AVL and Red-Black trees. 6
G:4 B:7 H:5 C:25 G:4 B:7 H:5 A:10 E:23 K:65 A:10 E:23 K:65 I:73 C:25 I:73 D:9 G:4 G:4 B:7 H:5 B:7 H:5 A:10 E:23 K:65 A:10 E:23 K:65 C:25 I:73 D:9 I:73 D:9 C:25 G:4 F:2 F:2 B:7 H:5.. B:7 G:4 A:10 D:9 H:5 A:10 D:9 K:65 C:25 E:23 I:73 C:25 E:23 K:65 I:73 Figure 3: Treap 7 insertion
x left rotate(x) y a y rigt rotate(x) x c b c a b Figure 4: Rotations 2.3 AVL Trees Keep tree eigt balanced for eac node eigts of left and rigt subtrees differ by at most 1. Implement by keeping an extra field [x] (eigt) for eac node. Heigt of AVL tree on n nodes is O(lg n). Follows from te following recurrence: If M is te minimum number of nodes in an AVL tree of eigt, M = 1 + M 1 + M 2. Te minimum number of nodes in an AVL tree of eigt 0 or 1 is 1. Te recurrence F = F 1 + F 2 wit F 0 = F 1 = 1 is known yields wat are known as te Fibonnaci numbers. Tese numbers ave been studied in great detail; in particular, F = φ (1 φ) 5, were φ = (1 + 5)/2. Clearly, te Fibonnaci numbers grow exponentially. Since M > F, tis guarantees tat M grows exponentially wit. Ordinary insertion can trow off balance by at most 1, and only along te searc pat. moving up from added leaf, find first place along searc pat were tree is not balanced re-balance te tree using rotations (Figure 4) Upsot: insertion takes O(lg n) time, and te number of rotations is O(1). Proof: based on figures Deletions, as usual, are more complicated. Rebalancing cannot in general be acieved wit a single rotation tere are cases were Θ(lg n) rotations are required. AVL trees are very efficient in practice te worst cases require 45% more compares tan optimal trees. Empirical studies sow tat tey require lg n + 0.25 compares on average. 8
A A B C B C +1 1 1 1 1 +1 T3 T4 T3 T4 T1 T2 T1 T2 new new Figure 5: AVL: Two possible insertions tat invalidate balances. Tere are two symmetric cases on te rigt side. A is te first node on te searc pat from te new node to te root were te eigt property fails. A B B C A +1 1 1 +1 C T1 T2 T3 T4 T1 1 1 new new T2 T3 T4 Figure 6: AVL: single rotation to rectify invalidation 9
A D B C B A E D E F G C F G 1 T4 T2 1 T3 T1 T2 T3 T1 new T4 new Figure 7: AVL: double rotation to rectify invalidation 2.4 Red-black trees CLRS Capter 13 RB-tree is a BST wit an additional bit of storage per node its color, wic can be red or black. Needs to satisfy te following properties: P1 Every node is colored red or black P2 Te root is black P3 Every leaf (NIL) is black P4 If a node is red, bot its cildren are black P5 For every node, all pats to descendant leaves ave te same number of black nodes Define te black eigt of node x, b(x), to be te number of black nodes, not including x, on any pat from x to leaf (note tat P5 guarantees tat te black eigt is well defined). Lemma 1 A red-black tree wit n internal nodes as eigt at most 1 + 2 lg n. 10
26 17 41 14 21 30 47 10 16 19 23 28 38 7 12 15 20 35 39 3 Figure 8: RB tree: Tick circles represent black nodes; we re ignoring te NIL leaves. Proof: Subtree rooted at any node x as at least 2 b(x) 1 internal nodes tis follows directly by induction. Ten note tat P4 guarantees at least alf te nodes on any pat from root to leaf are black (including root). So te black eigt is at least /2, ence n 2 /2 1; result follows from moving te 1 to left side, and taking logs. From te above it follows tat te insert and delete operations take O(lg n) time. However, tey may leave result in violations of some of te properties P1 P5. Properties are restored by canging colors, and updating pointers troug rotations. Conceptually rotation is simple, but code looks a mess see page 278, CLRS. Example of update sequence for insertion in Figure 9. 2.5 Splay trees Based on Tarjan, Data Structures and Network Algoritms, 1983, Capter 4, Section 3. Key idea adjust BST after eac searc, insert, and delete operation. (Tarjan actually allows a couple more operations, specifically, joining two trees, and splitting a tree at a node, but we ll stick to te searc/insert/delete ops.) Key result altoug a single operation may be expensive (Θ(n), were n is te number of keys in te tree), a sequence of m total operations, of wic n are inserts will complete in O(m lg n). Basic operation: splaying. 11
11 2 1 7 14 15 z 4 5 8 z and p[z] red, z s uncle red > recolor 11 2 z 1 7 5 8 14 15 4 z & p[z] red, z s uncle black, z rigt cild of p[z] > left rotate 11 z 2 7 8 14 15 1 5 4 z & p[z] red, z left cild of p[z] > rigt rotate, recolor 7 z 11 2 1 5 8 14 15 4 12 Figure 9: Insert in RB-tree
As part of searc/insert/update, perform splaying on eac node on pat from x to root. Definition of splaying at a node x: if x as a parent, but no grandparent, rotate at p(x) if x as a grandparent (wic implies it as a parent), and bot x and p(x) are left cildren or bot are rigt cildren, ten rotate at p(p(x)) ten at p(x) if x as a grandparent and x is a left cild, and p(x) is a rigt cild or vice versa, roate at p(x) and ten te new p(x) (wic will be te old grandparent of x) Searc/update node x splay at x Delete x splay its parent just prior to deletion 3 Augmenting data structures CLRS Capter 14 Typical engineering situation: rarely te case tat cut-and-paste from textbook suffices. sometimes need a wole new data structure most often, augment existing data structure wit some auxiliary information Two examples dynamic order statistics, and interval trees 3.1 Dynamic order statistics In addition to usual BST operations (insert, delete, lookup, succ, pred, min, max), would like to ave a rank operation: rank of element is its position in te linear order of te set One approac to return fast rank information balanced BST wit additional size field size[x] = number of nodes in subtree rooted at x, including x itself size[x] = size[left[x]] + size[rigt[x]] + 1, were size[nil] = 0 Two kinds of queries: 13
Retrieve element wit given rank Determine rank of element Let s see ow to compute rank information using size field; later will sow ow size can be updated troug inserts and deletes. // Return element wit rank i in tree rooted at node x OS_SELECT(x,i) r <- size[left[x]] + 1 if ( i = r ) ten return x; elseif ( i < r ) ten return OS_SELECT(left[x], i) else return OS_SELECT(rigt[x], i-r) Idea: case analysis // Return rank of element at node x in tree T OS_RANK(T,x) r <- size[left[x]] + 1 y <- x wile ( y!= root[t] ) do if ( y = rigt[p[y]] ) return r ten r <- r + size[left[p[y]]] + 1 y <- p[y] Idea: at start of eac iteration of wile loop, r is rank of key[x] in subtree rooted at node y Bot rank algoritms are O() (= O(lg n) for balanced BST). How to preserve size field troug inserts and deletes? 14
Insert before rotations: simply increment size field for eac node on searc pat rotations: only two nodes ave field canges upsot: insert remains O() Deletion: similar argument 3.2 Interval trees Augment BSTs to support operations on intervals. A closed interval [t 1, t 2 ] is an ordered pair of real numbers t 1 t 2 ; it represents te set {t R t 1 t t 2 }. (Can also define open and alf-open intervals, but no real difference). Useful for representing events wic occupy a continuous period of time. Natural query: wat events appened in a given time? Nice solution via BSTs. Represent interval [t 1, t 2 ] as an object i, wit fields low[i] = t 1 and ig[i] = t 2. Fact: intervals satisfy tricotomy if i, i are intervals, ten eiter tey overlap, or one is to te left of te oter Interval tree BST wit eac node containing an interval. given node x wit interval int[x], BST key is low[int[x]] store additional information: max[x], maximum value of any endpoint stored in subtree rooted at x must update information troug inserts and deletes can be done efficiently: max[x] = max(ig[int[x]], max[left[x]], max[rigt[x]]) New operation: interval searc 15
// find node in T wose interval overlaps wit interval i INTERVAL_SEARCH( T, i ) x <- root[t] wile ( x!= nil[t] ) && ( i does not overlap int[x] ) do if ( left[x]!= nil[t] && ( max[left[x]] >= low[i] ) return x ten x <- left[x] else x <- rigt[x] Wy does tis work? If tree T contains an interval tat overlaps i, ten tere is suc an interval in te subtree rooted at T 16