Similar documents
The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Union-Find Problem. Using Arrays And Chains

Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

Analysis of Algorithms, I

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Union-Find Algorithms. network connectivity quick find quick union improvements applications

GRAPH THEORY LECTURE 4: TREES

The Goldberg Rao Algorithm for the Maximum Flow Problem

CS711008Z Algorithm Design and Analysis

Analysis of Algorithms I: Binary Search Trees

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Full and Complete Binary Trees

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Binary Search Trees CMPSC 122

Mathematics for Algorithm and System Analysis

Union-find with deletions

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

From Last Time: Remove (Delete) Operation

arxiv:cs/ v1 [cs.dc] 30 May 2006

Approximation Algorithms

Mathematical Induction. Lecture 10-11

Complexity of Union-Split-Find Problems. Katherine Jane Lai

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

Lecture 6 Online and streaming algorithms for clustering

Classification/Decision Trees (II)

Exam study sheet for CS2711. List of topics

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

How To Create A Tree From A Tree In Runtime (For A Tree)

8.1 Min Degree Spanning Tree

Topology-based network security

Data Structure [Question Bank]

Data Structure with C

The LCA Problem Revisited

Lecture 4 Online and streaming algorithms for clustering

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

1/1 7/4 2/2 12/7 10/30 12/25

An algorithmic classification of open surfaces

Portable Bushy Processing Trees for Join Queries

Data Structures. Algorithm Performance and Big O Analysis

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

An Evaluation of Self-adjusting Binary Search Tree Techniques

Algorithms Chapter 12 Binary Search Trees

Sequential Data Structures

Lecture 5 - CPA security, Pseudorandom functions

Converting a Number from Decimal to Binary

Connectivity and cuts

Rotation Operation for Binary Search Trees Idea:

Ordered Lists and Binary Trees

Minimizing the Eect of Clock Skew. Via Circuit Retiming 1. Technical Report May, 1992

Lecture 1: Course overview, circuits, and formulas

5 Directed acyclic graphs

Cpt S 223. School of EECS, WSU

Data Structures Fibonacci Heaps, Amortized Analysis

A Non-Linear Schema Theorem for Genetic Algorithms

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

CS473 - Algorithms I

Class notes Program Analysis course given by Prof. Mooly Sagiv Computer Science Department, Tel Aviv University second lecture 8/3/2007

Solutions to Homework 6

DEGREES OF CATEGORICITY AND THE HYPERARITHMETIC HIERARCHY

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Binary Heap Algorithms

6.3 Conditional Probability and Independence

Catalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity.

Small Maximal Independent Sets and Faster Exact Graph Coloring

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

Course: Model, Learning, and Inference: Lecture 5

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

On line construction of suffix trees 1

Social Media Mining. Graph Essentials

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

SCORE SETS IN ORIENTED GRAPHS

Big Data and Scripting. Part 4: Memory Hierarchies

Fast Edge Splitting and Edmonds Arborescence Construction for Unweighted Graphs

Carnegie Mellon University. Extract from Andrew Moore's PhD Thesis: Ecient Memory-based Learning for Robot Control

ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

Algorithm Design and Analysis

A Practical Scheme for Wireless Network Operation

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

CS473 - Algorithms I

Vector storage and access; algorithms in GIS. This is lecture 6

Chapter 6: Graph Theory

PROVABLY GOOD PARTITIONING AND LOAD BALANCING ALGORITHMS FOR PARALLEL ADAPTIVE N-BODY SIMULATION

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

Data Structures and Algorithms

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation

Transcription:

Lecture 10 Union-Find The union-nd data structure is motivated by Kruskal's minimum spanning tree algorithm (Algorithm 2.6), in which we needed two operations on disjoint sets of vertices: determine whether vertices u and v are in the same set form the union of disjoint sets A and B. The data structure provides two operations from which the above two operations can be implemented: nd(v), which returns a canonical element of the set containing v. We ask if u and v are in the same set by asking if nd(u) =nd(v). union(u v), which merges the sets containing the canonical elements u and v. To implement these operations eciently, we represent each set as a tree with data elements at the vertices. Each element u has a pointer parent(u) to its parent in the tree. The root serves as the canonical element of the set. To eect a union(u v), we combine the two trees with roots u and v by making u achild of v or vice-versa. To doand(u), we start at u and follow parent pointers, traversing the path up to the root of the tree containing u, which gives the canonical element. To improve performance, we will use two heuristics: 48

Lecture 10 Union-Find 49 When merging two trees in a union, always make the root of the smaller tree a child of the root of the larger. We maintain with each vertex u the size of the subtree rooted at u, and update whenever we do a union. After nding the root v of the tree containing u in a nd(u), we traverse the path from u to v one more time and change the parent pointers of all vertices along the path to point directly to v. This process is called path compression. It will pay o in subsequent nd operations, since we will be traversing shorter paths. Let us start with some basic observations about these heuristics. Let be a sequence of m union and nd operations starting with n singleton sets. Consider the execution of both with and without path compression. In either case we combine two smaller sets to form a larger in each union operation. Observe that the collection of sets at time t is the same with or without path compression, and the trees have the same roots, although the trees will in general be shorter and bushier with path compression. Observe also that u becomes a descendant of v at time t with path compression if and only if u becomes a descendant of v at time t without path compression. However, without path compression, once u becomes a descendant of v, it remains a descendant of v forever, but with path compression, it might later become a non-descendant of v. 10.1 Ackermann's Function The two heuristics will allow a sequence of union and nd operations to be performed in O((m + n)(n)) time, where (n) is the inverse of Ackermann's function. Ackermann's function is a famous function that is known for its extremely rapid growth. Its inverse (n) grows extremely slowly. The texts [3, 100] give inequivalent denitions of Ackermann's function, and in fact there does not seem to be any general agreement on the denition of \the" Ackermann's function but all these functions grow at roughly the same rate. Here is yet another denition that grows at roughly the same rate: A 0 (x) = x +1 A k+1 (x) = A x k(x) where A i k is the i-fold composition of A k with itself: or more accurately, A i k = A k A {z k } i A 0 k = the identity function

50 Lecture 10 Union-Find A i+1 k = A k A i k : In other words, to compute A k+1 (x), start with x and apply A k x times. It is not hard to show by induction that A k is monotone in the sense that x y! A k (x) A k (y) and that for all x, x A k (x). As k grows, these functions get extremely huge extremely fast. For x = 0 or 1, the numbers A k (x) are small. For x 2, A 0 (x) = x +1 A 1 (x) = A x 0(x) = 2x A 2 (x) = A x 1(x) = x2 x 2 x A 3 (x) = A x 2(x) 2 2 22 {z } = 2 " x x A 4 (x) = A x 3(x) 2 " (2 ""(2 " 2) ) {z } x. = 2 "" x For x = 2, the growth of A k (2) as a function of k is beyond comprehension. Already for k = 4, the value of A 4 (2) is larger than the number of atomic particles in the known universe or the number of nanoseconds since the Big Bang. A 0 (2) = 3 A 1 (2) = 4 A 2 (2) = 8 A 3 (2) = 2 11 = 2048 A 4 (2) 2 " 2048 = 2 2 22 {z } 2048 We dene a unary function that majorizes all the A k (i.e., grows asymptotically faster than all of them): A(k) = A k (2) and call it Ackermann's function. This function grows asymptotically faster than any primitive recursive function, since it can be shown that all primitive recursive functions are bounded almost everywhere by one of the functions A k. The primitive recursive functions are those computed by a simple pascal-like programming language over the natural numbers with for loops but no while

Lecture 10 Union-Find 51 loops. The level k corresponds roughly to the depth of nesting of the for loops [79]. The inverse of Ackermann's function is (n) = the least k such that A(k) n which for all practical purposes is 4. We will show next time that with our heuristics, any sequence of m union and nd operations take at most O((m+ n)(n)) time, which is not quite linear but might as well be for all practical purposes. This result is due to Tarjan (see [100]). A corresponding lower bound for pointer machines with no random access has also been established [99, 87].

Lecture 11 Analysis of Union-Find Recall from last time the heuristics: In a union, always merge the smaller tree into the larger. In a nd, use path compression. We made several elementary observations about these heuristics: the contents of the trees are the same with or without path compression the roots of the trees are the same with or without path compression avertex u becomes a descendant ofv at time t with path compression if and only if it does so without path compression. With path compression, however, u may at some later point become a non-descendant of v. Recall also the denitions of the functions A k and : A 0 (x) = x +1 A k+1 (x) = A x k(x) (n) = least k such that A k (2) n (15) and that (n) 4forall practical values of n. 52

Lecture 11 Analysis of Union-Find 53 11.1 Rank of a Node As in the last lecture, let be a sequence of m union and nd instructions starting with n singleton sets. Let T t (u) denote the subtree rooted at u at time t in the execution of without path compression, and dene the rank of u to be rank (u) = 2 + height(t m (u)) (16) where height(t ) is the height of T or length of the longest path in T. In other words, we execute without path compression, then nd the longest path in the resulting tree below u. The rank of u is dened to be two more than the length of this path. (Beware that our rank is two more than the rank as dened in [3, 100]. This is for technical reasons the 2's in (15) and (16) are related.) As long as u has no parent, the height of T t (u) can still increase, since other trees can be merged into it but once u becomes a child of another vertex, then the tree rooted at u becomes xed, since no trees will ever again be merged into it. Also, without path compression, the height of a tree can never decrease. It follows that if u ever becomes a descendant of v (with or without path compression), say attimet, then for all s>tthe height oft s (u) is less than the height oft s (v), therefore rank (u) < rank (v) : (17) The following lemma captures the intuition that if we always merge smaller trees into larger, the trees will be relatively balanced. Lemma 11.1 jt t (u)j 2 height (Tt(u)) : (18) Proof. The proof is by induction on t, using the fact that we always merge smaller trees into larger. For the basis, we have T 0 (u) = fug, thus height(t 0 (u)) = 0 and jt 0 (u)j = 1, so (18) holds at time 0. If (18) holds at time t and the height of the tree does not increase in the next step, i.e. if height(t t+1 (u)) = height(t t (u)), then (18) still holds at time t +1, since jt t+1 (u)j jt t (u)j. Finally, if height(t t+1 (u)) > height(t t (u)), then the instruction executed at time t must be a union instruction that merges a tree T t (v) into T t (u), making v achild of u in T t+1 (u). Then height(t t (v)) = height(t t+1 (v)) = height(t t+1 (u)) ; 1 : By the induction hypothesis, jt t (v)j 2 height (Tt(v)) :

54 Lecture 11 Analysis of Union-Find Since we always merge smaller trees into larger, jt t (u)j jt t (v)j : Therefore jt t+1 (u)j = jt t (u)j + jt t (v)j 2 height (Tt(v)) height (Tt(v)) +2 height (Tt(v))+1 = 2 = 2 height (T t+1(u)) : 2 Lemma 11.2 The maximum rank after executing is at most blog nc +2. Proof. By Lemma 11.1, n jt m (u)j 2 height (Tm(u)) 2 rank (u);2 so blog nc rank (u) ; 2 : Lemma 11.3 jfu j rank (u) =rgj n 2 r;2 : Proof. If rank (u) = rank (v), then by (17) T m (u) and T m (v) are disjoint. Thus n j = [ X rank (u)=r X rank (u)=r rank (u)=r T m (u)j jt m (u)j 2 r;2 by Lemma 11.1 = jfu j rank (u) =rgj 2 r;2 : Now consider the execution of with path compression. We will focus on the distance between u and parent(u) as measured by the dierence in their ranks, and how this distance increases due to path compression. Recall that rank (u) is xed and independent of time however, rank (parent(u)) can 2 2

Lecture 11 Analysis of Union-Find 55 change with time because the parent ofu can change due to path compression. By (17), this value can only increase. Specically, we consider the following conditions, one for each k: Dene rank (parent(u)) A k (rank (u)) : (19) (u) = the greatest k for which (19) holds. The value of (u) is time-dependent and can increase with time due to path compression. Note that (u) exists if u has a parent, since by (17), rank (parent(u)) rank (u)+1 = A 0 (rank (u)) at the very least. For n 5, the maximum value (u) can take on is (n) ; 1, since if (u) =k, therefore n > blog nc +2 rank (parent(u)) by Lemma 11.2 A k (rank (u)) A k (2) (n) > k : 11.2 Analysis Each union operation requires constant time, thus the time for all union instructions is O(m). Each instruction nd(u) takes time proportional to the length of the path from u to v, where v is the root of the tree containing u. The path is traversed twice, once to nd v and then once again to change all the parent pointers along the path to point to v. This amounts to constant time (say one time unit) per vertex along the path. We charge the time unit associated such a vertex x as follows: If x has an ancestor y on the path such that (y) = (x), then charge x's time unit to x itself. If x has no such ancestor, then charge x's time unit to the nd instruction.

56 Lecture 11 Analysis of Union-Find Let us now tally separately the total number of time units apportioned to the vertices and to the nd instructions and show that in each casethe total is O((m + n)(n)). There are at most (n) time units charged to each nd instruction, at most one for each ofthe(n) possiblevalues of, since for each such value k only the last vertex x on the path with (x) =k gets its time unit charged to the nd instruction. Since there are at most m nd instructions in all, the total time charged to nd instructions is O(m(n)). Let us now count all the charges to a particular vertex x over the course of the entire computation. For such acharge occurring at time t, x must have an ancestor y such that (y) =(x) =k for some k. Then at time t, Suppose that in fact rank (parent(x)) A k (rank (x)) rank (parent(y)) A k (rank (y)) : rank (parent(x)) A i k(rank (x)) i 1 : Let v be the last vertex on the path. Then at time t, rank (v) rank (parent(y)) A k (rank (y)) A k (rank (parent(x))) A k (A i k(rank (x))) A i+1 k (rank (x)) and since v is the new parent of x at time t +1,we have at time t +1that rank (parent(x)) A i+1 k (rank (x)) : Thus at most rank (x) such charges can be made against x before and at that point rank (parent(x)) A rank (x) k (rank (x)) = A k+1 (rank (x)) (x) k +1: Thus after at most rank (x) such charges against x, (x) increases by at least one. Since (x) can increase only (n) ; 1 times, there can be at most rank (x)(n) such charges against x in all. By Lemma 11.3, there are at most r(n) n = n(n) r 2 r;2 2 r;2

Lecture 11 Analysis of Union-Find 57 charges against vertices of rank r. Summing over all values of r, we obtain the following bound on all charges to all vertices: We have shown 1X r=0 n(n) r 2 r;2 = n(n) = 8n(n) : 1X r=0 r 2 r;2 Theorem 11.4 A sequence of m union and nd operations starting with n singleton sets takes time at most O((m + n)(n)).