# Union-find with deletions

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 19 Union-find with deletions Haim Kaplan * Nira Shafrir Robert E. Tarjan ~ Abstract In the classical union-find problem we maintain a partition of a universe of n elements into disjoint sets subject to the operations union and find. The operation union(a, B,C) replaces sets A and B in the partition by their union, given the name C. The operation find(z) returns the name of the set containing the element x. In this paper we revisit the union-find problem in a context where the underlying partitioned universe is not fixed. Specifically, we allow a delete(x) operation which removes the element x from the set containing it. We consider both worst-case performance and amortized performance. In both settings the challenge is to dynamically keep the size of the structure representing each set proportional to the number of elements in the set which may now decrease as a result of deletions. For any fixed k, we describe a data structure that supports find and delete in O(log k n) worst-case time and union in O(k) worst-case time. This matches the best possible worst-case bounds for find and union in the classical setting. ~rthermore, using an incremental global rebuilding technique we obtain a reduction converting any union-find data structure to a union-find with deletions data structure. Our reduction is such that the time bounds for find and union change only by a constant factor. The time it takes to delete an element x is the same as the time it takes to find the set containing x plus the time it takes to unite a singleton set with this set. In an amortized setting a classical data structure of Tarjan supports a sequence of m finds and at most n unions on a universe of n elements in O(n d- rna(m + n, n, log n)) time where ~(m,n,l) = min{k I A~(/mJ ) > l} and A,(j) is Ackermarm's function as described in [6]. We refine the analysis of this data structure and show that in fact the cost of each find is proportional to the size of the corresponding set. Specifically, we show that one can pay for a sequence of union and find operations by charging a constant to each participating element and O(a(m, n, log(l))) for a find of an element in a set of size I. We also show how keep these amortized costs for each find and each participating element while allowing deletions. The amortized cost of deleting an element from a set of l elements is the same as the amortized cost of finding the element; namely, O(a(m, n, log(/))). 1 Introduction A union-find data structure allows the following operations on a collection of disjoint sets. ~ool of Computer Science, Tel Aviv University, Tel Aviv 69978, Tel Aviv, Israel. haimk~laath, tan. ac. il tschool of Computer Science, Tel Aviv University, Tel Aviv 69978, Tel Aviv, Israel. nira~ath, gnu. ac. il \$Department of Computer Science, Princeton University, Princeton, NJ 08544, and InterTrust Technologies Corporation 4750 Patrick Henry Drive, Santa Clara, reg~cs, princeton, edu make-set(x): Creates a set containing the single element x. union(a, B, C): Combines the sets A and B into a new set C, destroying sets A and B. find(x): Finds and returns (the name of) the set that contains x. We can extend in a straightforward way a data structure supporting these operations to also support an insert(x, A) operation that inserts an item x not yet in any set into set A. We perform insert(z,a) by first perfo.rming B -- make-set(x) followed by union(a, B). The time it takes to perform insert is the time it takes to perform make-set plus the time it takes to perform union of a set with a single element with another set. In this paper we study the union-find with deletions problem where we allow in addition to the three operations above a delete operation, defined as follows. delete(x): Deletes x from the set that contains it. Note that the delete operation does not get the set containing x as a parameter. Classical analysis of union-find data structures assumes a fixed universe of n items dynamically partitioned into a collection of disjoint sets. The starting point is a collection of n sets each containing a single item. Some of these sets are subsequently combined by performing unions but the underlying universe of items in all sets remains the original universe. In the unionfind with deletions problem the underlying universe is a moving target. We can remove an item x from the universe by performing delete(x) and we can add an item x to a set S by doing an insert(x, S) operation. We believe that the union-find with deletions problem is of general interest. A data structure that allows an efficient delete operation would be useful in any context where the partition that we maintain is not over a fixed set of items. Consider for example an application where the size of a set may be too large to fit into memory if we count deleted elements while without them it is much smaller and always fits into main memory. Our starting point for studying this problem was the classical meldable heap data type implemented for example

2 20 by Fibonacci heaps [4]. A data structure implementing this data type maintains an item-disjoint 1 set of heaps subject to the following operations. make-heap: Return a new, empty heap. insert(i, h): Insert a new item i with predefined key into heap h. find-rain(h): Return an item of minimum key in heap h. This operation does not change h. delete-rain(h): Delete an item of minimum key from heap h and return it. meld(hl,h2): Return the heap formed by taking the union of the item-disjoint heaps hi and h2. This operation destroys hi and h2. decrease-key(a, i, h): Decrease the key of item i in heap h by subtracting the nonnegative real number A. This operation assumes that the position of i in h is known. delete(i, h): Delete item i from heap h. This operation assumes that the position of i in h is known. Notice that the operations decrease-key and delete get both the target item and the heap containing it as parameters. Therefore in order to use this data structure we have to keep track of which heap contains an item while heaps undergo melds. One could do this using an external union-find data structure containing a set for each heap. When melding two heaps we unite the corresponding sets and when we need to find out which heap contains an item x we perform find(x) and discover the corresponding set. Furthermore, when we delete an item from a heap we may want to be able to delete the item also from the corresponding set. This will prevent the union-find operations from becoming too expensive because they act on large sets. One simple way to add a delete operation to any union-find data structure is simply by doing nothing during delete. Deleted items remain in the data structure mixed with live ones. The drawback of this simple scheme is that the size of the data structure does not remain proportional to the number of live items in it. As a result the space requirements of the data structure may become prohibitive and the performance of find operations is degraded. For some applications, such as the incremental graph biconnectivity algorithm implemented in [5], this straightforward implementation of delete suffices. The incremental graph biconnectivity algorithm keeps track of nodes containing deleted items and reuses them to store new items that are added to the set. As a result the total number of elements does not become too large. In this paper we suggest some general techniques and data structures, not specific to a particular application, to overcome the difficulties that arise when deletions are allowed. We consider both worst-case and amortized performance. In a worst-case setting, a classical result of Staid [7] (building upon a previous result of Blum [2]) gives for any fixed k a data structure that supports union in O(k) time and find in O(log k n) time where n is the size of the corresponding set. Our first result is a simple data structure with the same performance for find and union as the data structure of Smid, that also supports delete in O(log k n) time, where n is the size of the set containing the element that we delete. Like Smid and Blum, we use k-ary balanced trees as our data structures. The essence of our result is a technique to keep such trees balanced when we perform deletions, while paying only O(log k n) time per deletion. Note that with respect to find and union these time bounds are known to be optimal in the cell probe model [3]. (See also [1].) We believe that our technique for incrementally shrinking a k-ary tree that undergoes deletions, while keeping it balanced, may be useful in other contexts. Still with respect to worst-case performance, we develop a general technique to add a delete operation to a union-find data structure that does not support delete to begin with. We use an incremental rebuilding technique, where each set is gradually being rebuilt without the deleted items in it. Let D be a union-find data structure that supports find, union, and insert, in O(ti(n)), O((t~(n)), and O(t,(n)), respectively, where n is the maximum size of a set involved in the operation. Then by applying our transformation to D we obtain a data structure that supports delete from a set of size n in O(ti(n) + ti(n)) time, without hurting the time bounds of the other operations. By applying this reduction to the structure of Smid, we obtain an alternative to the data structure we obtained by directly modifying Smid's structure. This alternative also supports union in O(k) time and find and delete in O(log k n) time 2. Our direct approach however, consumes less space than the data structure obtained via this reduction. We also point out that the O(ty(n)) component in the time bound of delete stems from the need to find the corresponding set in order to delete from it. We can obtain a faster implementation of delete by this reduction if we assume that delete gets not only a pointer to the deleted element but also a pointer to the set containing it. In an amortized setting Tarjan [8] and Tarjan and Van Leeuwen [9] showed that a classical data structure supports a sequence of rn finds and at most n unions on a universe of n elements in O(n + rna(rn + n, n, logn)) time where a(m,n,l) = min{k I Ak([-~J ) > 1} and litems may have the same key. 2Smid's data structure supports insert in constant time.

3 21 A~(j) is Ackermann's function as defined in [6]. This data structure uses a tree to represent each set and its efficiency is due to two simple heuristics. The first heuristic called union by rank makes the root of the set of smaller rank a child of the root of the set of larger rank to carry out a union a. The second heuristic called path compression makes all nodes on a find-path children of the root. We refine the analysis of this data structure and show that in fact the cost of each find is proportional to the size of the corresponding set. Specifically, we show that one can pay for a sequence of union and find operations by charging a constant to each participating element and O(a(m, n, log(l))) for a find of an element in a set of size l. Here m is the total number of finds and n is the total number of elements participating in the sequence of operations. This refined analysis raises the question of whether we can keep this time bound while adding a delete operation. In such a context we would like the charge per find to be O(~(m, n, log(l))) where I is the actual size of the corresponding set when we perform the find. We show that indeed this is possible. By marking items as deleted and rebuilding each set when the number of non-deleted items in it drops by a factor of 2 we can also support delete in O(~(m, n, log(l))) amortized time where l is the size of the set containing the deleted item. This time bound for delete stems from the need to discover the set from which we delete in order to check whether we need to rebuild it. A possible drawback of set rebuilding is the bad worstcase time bound for delete (proportional to the size of the corresponding set). By applying the incremental set rebuilding technique of Section 3 to the union by rank with path compression data structure, we can preserve the amortized time bounds mentioned above while keeping the worst-case time bound of delete logarithmic. The space requirements however, will increase by a constant factor. The structure of this paper is as follows. Section 2 describes a union-find with deletions data structure that builds upon the structure of Staid. In this section we develop a technique to keep a k-ary tree balanced while it undergoes deletions. Section 3 shows how to add a delete operation to any union-find data structure using incremental rebuilding. Section 4 refines the analysis of Kozen [6] for the compressed tree data structure, to show that the amortized time per find is proportional to the size of the set in which the find is performed. In Section 5 we show how to maintain these refined time bounds while allowing deletions. We conclude in Section SAn alternative heuristic in which we make the union by size has similar performance. 6 where we introduce some open questions. A simple presentation of Smid's data structure is provided in the Appendix. 2 Union-find with deletions via k-ary trees In this section we extend the data structure of Smid [7] to support delete in 0 [~ log k ] time (A simple presentation of this data structure is provided in the Appendix.). we store each set in a tree such that the elements of the set reside at the leaves of the tree. We maintain the trees such that all leaves are at the same distance from the root. Each internal node v is classified as either short, gaining, or losing. Node v is short if h(p(v)) > h(v) + 1, where p(v) is the parent of v and h(v) is the distance from v to a leaf. If v is not short then it is either gaining or losing. Each node v such that h(v) > 1 has exactly one gaining child. For every internal node v we denote by g(v) its unique gaining sibling. We represent the trees such that each node v points to its parent (except when v is the root), to its gaining child, to a list containing its short children and to a list containing its losing children. In addition each node is marked as short, losing, or gaining, and has a counter field that stores the number of its non-short children. If v is a root then it also stores the height of the tree and the name of the corresponding set. Our forest satisfies the following invariant. INVARIANT 2.1. A root of a tree does not have short children. In addition if we cut subtrees rooted at short nodes from the trees of our union-find forest then the resulting set of trees will always satisfy the following invariants. INVARIANT ) A root of height h > 1 has at least two children. A root of height h = I has at least one leaf child. 2) A gaining node of height h has at least k children. 3) A losing node of height h > 0 has at least two children. Each node v such that h(v) > 1 has a gaining child and therefore by invariant 2.2(2) v has at least k grandchildren of height h- 2. This implies the following lemma. LEMMA 2.1. The height of a tree representing a set with n elements is O(log n~ log k). 2.1 operations We will show how to implement find, union, and delete without violating any of the invariants. Find(x): We follow parent pointers until we get to the root; we return the name of the set stored at the root.

5 23 deleted by delete' and their gaining siblings have short children we perform delete' again, this time carrying out the updates. So assume that we discover a node v that has short children. We follow a path from v to a leaf x I. We replace x and x ~. Then we delete x as described above in the case where x has a short ancestor. The following lemma proves that our implementation is correct. LEMMA 2.2. A sequence of union, find, and delete operations on a forest that initially satisfies Invariants 2.1 and 2.2 results in forest that satisfies Invariants 2.1 and Proof. It is easy to see that union produces a tree that satisfies Invariants 2.1 and 2.2. Next we consider a delete operation. We assume that the nodes we refer to in the remainder of the proof don't have short ancestors. When the root loses its next to last child, say w, and w is not a leaf, then g(w) becomes the new root. Since the root may lose a child only ifg(w) does not have short children we obtain that Invariant 2.1 holds throughout the sequence. Since g(w) has at least k children when it becomes a root then it follows that Invariant 2.2(1) also holds throughout the sequence. Let v be a gaining node. Node v becomes gaining when it becomes a child of another node when performing union. At that point v has at least k children of height h(v) - 1. As long as v is gaining, delete may not delete a child of v of height h(v) - 1; delete may only replace a losing child of v with another losing child of a losing sibling. Therefore the number of children of height h(v) - 1 of a gaining node remains at least k as long as the node is gaining, so Invariant 2.2(2) holds. Since delete t recursively deletes every losing node that has less than 2 children we obtain that Invariant 2.2(3) holds throughout the sequence. When we create a new node v and h(v) > 1 we assign a gaining child to it. The algorithm does not move or delete this child unless v is deleted too. Therefore every node v such that h(v) > 1 always has a gaining child. It is easy to see that each application of delete r takes O(1) time. Therefore the running time of delete is no greater than the height of the tree representing the corresponding set. For a set with n elements Lemma 2.1 shows that this height is at most O(logn/log k). 3 Union-find with deletions using incremental copying All the proposed algorithms for union-find represent a set by a tree and handle finds by following the path of ancestors to the root. The time bound for find and union can be stated in terms of the sizes of the sets involved, not in terms of the total universe size. In this section we show how to add a delete operation to any such union-find algorithm by using incremental copying. By applying this technique to the union-find data structure of Smid (described in Appendix A) we obtain a data structure with performance similar to the data structure of Section 2. The advantage of the data structure of Section 2 over the one we obtain here is its smaller (by a constant factor) space requirements. Given a union-find data structure which supports find(x) in O(tl(n)) worst-case time where n is the size of the set containing x, and insert (make-set + union in which one of the sets is a singleton set) in O(t~(n)) worst-case time where n is the size of the set to which we insert the new item, we will augment this data structure to support delete(x) in O(tf (n) + t~ (n)) worstcase time where n is the size of the set containing x, while keeping the worst-case time bounds for union and find the same as in the original data structure. In particular if we apply this technique to Smid's data structure we get a union-find data structure that supports delete in O/z -9-~ ~ log k J worst-case time, since Smid's structure supports insert in O(1) time. We represent each set S by one or two sets, each in a different union-find data structure without deletions. We denote the first such set by Sn and the second set if it exists by So. If x is an item in S then x is represented by a node either in Sn or in So. Item x points to the node representing it. In case x is represented by a node in S,, there may also be a node associated with x in So that is not being used any more. At the beginning we represent S only by S,,, and So is empty. When we perform a delete operation on S we mark the item as deleted and increment the number of deleted items in Sn by one. We perform union of S and S t by uniting Sn with Sin and So with S~. When at least 1/4 of the items in Sn are marked deleted we rename Sn to be So and start a new set S~. Each time we delete an element from S and both S~ and So exist, we mark the item as deleted in the set that contains it and insert four items that are not marked deleted from So into Sn. We maintain So to contain at least four undeleted items that are not contained in Sn. If after renaming Sn into So or after we insert four undeleted items from So to Sn, So contains less than four items that are not in Sn, we insert these remaining items into S~ and discard So. When an item x from So is inserted into S~ we consider the node corresponding to x in Sn as representing x, and make x point to it. When there are no more undeleted items in So we discard it and represent S by Sn only. To establish the correctness of this algorithm we will show that So is empty when at least 1/4 of the items in Sn are marked deleted.

6 24 In order to implement this algorithm, with each set Sn and So we maintain a list of nodes, denoted by L(Sn) and L(So), respectively. The list L(Sn) contains a node for each undeleted item in S.. The node which corresponds to x in Sn has a pointer to the node associated with x in L(S,~). The list L(So) contains a node for each undeleted item in So that has not yet been inserted into Sn. The node which corresponds to x in So points to the node corresponding to x in L(So). The node corresponding to x in L(Sn) or L(So) has a pointer to x. We also maintain the total number of items in Sn, and the number of items marked deleted in S=. We assume that from the node identifying S we can get to the nodes identifying Sn and So and vice versa. We also assume that a find in Sn or So returns the node identifying Sn or So respectively, from which we can easily get to the node identifying S. Next we describe the implementations of the operations using this representation. Union(A, B, C): We perform union(an, Bn, Cn) and union(ao, Bo, Co). We also concatenate L(An) with L(Bn) to form L(Cn), and L(Ao) with L(Bo) to form L(Co). We set the number of items in Cn to be the sum of the number of items in An and the number of items in Bn. We similarly set the counter of the number of deleted items in Cn. We make the node identifying C point to the nodes identifying Cn and Co and vice versa. Find(x): We perform find using the node identifying x in a union-find data structure without deletions. We get to a node identifying either Sn or So for some set S. From that node we get to the node identifying S and return it. Delete(x): We first perform find(x) on the node representing x in a union-find data structure without deletions as described above. We get the node identifying the set S containing x, and a node identifying the set among Sn and So containing x. We denote this set by S~. We delete the node which corresponds to x in L(S,). If Sx = Sn we also increment the number of deleted items in Sn. If the number of deleted items in Sn after the increment reaches 1/4 of the total number of items in Sn then we rename Sn to So and set S,, to be empty. Next if L(So) is not empty, we remove four nodes (or less if there are less than four such items in L(So)) from L(So) and insert them to S,, (performing 4 make set operations and 4 union operations of each of these new sets and Sn). We also insert four nodes corresponding to these 4 items into L(Sn). If after removing these items from L(So), L(So) contains less than four items, we insert those items to S,,, insert corresponding nodes into L(Sn), remove them from L(So), and discard So. To establish the correctness of this algorithm we will show that the fraction of deleted items in Sn is never greater than 1/4. Furthermore, when the fraction of deleted items in Sn reaches 1/4, So must be empty. LEMMA 3.1. For every set S, at most 1/4 of the items in Sn are deleted. When exactly 1/4 of the items in Sn are deleted, So is empty, and we rename Sn to So. Proof. The proof is by induction on the sequence of operations. Assume the claim holds before the ith operation. If the ith operations is a find then the claim clearly holds after the ith operation. If the ith operation is a union(a, B, C) then since less than 1/4 of An is deleted and less than 1/4 of Bn is deleted then also less than 1/4 of Cn is deleted. Therefore the claim also holds after the ith operation. Assume that the ith operation is delete(x). Let S be the set containing x. Ifx E So and x Sn then the claim clearly holds after the deletion. If x E Sn and So # ib then the number of deleted items in S,~ before the delete was less than (1/4)lSn I. After the delete the number of deleted item increases by one but we also insert at least four new items from So. Therefore the fraction of deleted items in S= is less than ((1/4)lSn[ + 1)/(ISn I + 4) = 1/4. It follows that the fraction of deleted items in Sn can reach 1/4 only when So is empty. If indeed So is empty and the fraction of deleted items in Sn reaches 1/4 we rename Sn to So and then insert items into a newly created Sn, so the fraction of deleted items in So is 0 after the delete. The running time of the operations is dominated by the time it takes to manipulate the underlying unionfind data structure without deletions. To analyze the running time of the operations on the underlying unionfind data structure we bound the fraction of deleted items in any one of these sets. Lemma 3.1 proved that for every S the fraction of deleted items in Sn is at most 1/4. We now show that a similar claim is also true for So. Specifically we show that at most 1/2 of the items in So are deleted. Here when we count the number of items in So we do consider elements that were moved to Sn but nodes corresponding to them still belong to So. Furthermore, an element in So that was moved and subsequently deleted is counted as one of the deleted elements in So. When for some S we rename Sn to So, the fraction of deleted items in So is 1/4. Subsequently we may delete more items in So, but ~he following lemma shows that by the time the fraction of deleted items in So reaches 1/2 we have inserted all undeleted items in So into Sn and discarded So. LEMMA 3.2. Let e be the fraction of deleted items in So. (Recall that c is the ratio between the number of

7 25 deleted items in So and the total number of items in So including ones that have been inserted into Sn.) The number of undeleted items from So that have already been inserted into S,~ is at least (4c- 1)lSol Proof. the proof is by induction on the number of operations. We assume that the claim holds before the ith operation. The claim clearly holds after the ith operation if the ith operation is a find. Assume the ith operation is delete(x). If So is empty after the delete, then the fraction of deleted items in it is defined as 0 and the claim holds. Otherwise let c be the fraction of deleted items in So before the ith operation and let c ~ be the fraction of deleted items in So after the ith operation. Clearly c t < c + 1~]. Since we copied 4 undeleted items from So to Sn, the number of items already copied from So to Sn after the delete is at least (4c- 1)1,% I (4(c+ 1/ISol) - 1)ISo I > (4c' - 1)lSol so the claim holds after the delete. Assume the ith operation is union(a, B, C). Let a be the fraction of deleted items in Ao, and b be the fraction of deleted items in /30. Clearly, [Col = IAol + IBol. The fraction of deleted items in Co is c = ala l+bib ] ICol The number of items already copied from Co to C,, is at least (4a - l)laol 4- (4b - 1)IBol = = 4(alAol + blbol) - (Iaol + I/3ol) = 4(alaol + blbol) - I04 = /4(alaol+bIBol) 1" ~ ICol k ' rc:l - ) = (4c- 1)lCol so the claim holds after the union as well. I As a corollary we get that the fraction of deleted items in So is between 1/4 and 1/2. LEMMA 3.3. Let c be the fraction of items marked as deleted in So. Then 1/4 < c < 1/2. Proof. When So is created (by renaming Sn), 1/4 of its items are deleted. Let c be the fraction of deleted items in So. By Lemma 3.2 the number of items already copied from So to Sn is at least (4c - 1)lSo I. If c = 1/2 then we have already copied all undeleted items from So to S,~ and discarded So. Therefore c < 1/2. Lemma 3.3 and Lemma 3.1 show that the number of items in Sn and So is within a constant factor of the number of undeleted items in S. Therefore the time it takes to do union and find in our union-find data structure with deletions is proportional to the time it takes to do union and find, respectively, in the underlying union-find data structure without deletions. We perform a delete operation by a find followed by a constant number of insert operations. Therefore the time for delete is proportional to the time it takes to do a find plus the time it takes to do insert on the underlying union-find data structure without deletions. Assume that we start with a union find data structure without deletions in which union of sets of size at most n takes O(tu(n)), find of an item in a set of size n takes O(tl(n)) and insert into a set of size n takes O(ti(n)). We obtain a union-find with deletions data structure in which union takes O(tu(n)) time, find of an item in a set of size n takes O(t! (n)), and delete of an item from a set of size n takes O(tf(n) + t~(n)). 4 Union-find via path compression and linking by rank or size - revisited In this section and the following one n will denote the total number of elements involved in the sequence of operations, and m will denote the total number of finds. We represent each set by a rooted tree where each node points to its parent. We denote the parent of a node x by p(x). Each node represents an element, and the root of a tree also represents the corresponding set. Each node has a rank associated with it. We maintain in the data structure the ranks of the roots. Ranks of other nodes are static and need not be maintained in the data structure. The rank of a set is defined to be the rank of the root of the tree representing the set. We perform find(x) by following parent pointers starting from x until we get to the root. We return the root. We also use a heuristic called path compression, where we make all nodes on the find path from x to the root children of the root. We perform make-set(x) by creating a new tree with x as its root. We set the rank of x to be 0. We perform union(a, B, C) by making the root with smaller rank a child of the root with higher rank. In case rank(a) = rank(b) we arbitrarily choose one of the roots and make it a child of the other. If rank(a) = rank(b) we also increment the rank of the node that becomes the root of the new set C. It is clear from the definitions of the operations that only the rank of a root node can increase by performing a union. Therefore, the rank of a node does not change once it stops being a root of a tree. It is easy to prove by induction that the number of nodes in the subtree rooted at a node of rank r is at least 2 ~ (see [8]). To analyze the algorithm we use the following definition of Ackermann's function. Ao(x)=-z+l forx>_l, Ak+l(x) = a~+l(z) for x > 1. ( where A (x) = x and A~ +~ (x) = Ak(a~ (x)).

8 26 We also define the following three-parameter inverse to Ackermann's function a(m, n, I) ~- min{k I Ak(L-~] ) > l}. We define the cost of each union or make-set operation to be one, and the cost of lind(x) to be equal to the number of vertices on the path from x to the root of the set containing it at the time of the find (inclusive). Thus, the actual cost of a sequence of operations is proportional to number of union and make-set operations, which is O(n), plus the sum of the lengths of all find paths. We show that if we charge a constant to each participating element and charge O(a(m + n, n, r)) to a find on a set of rank r then the sum of the charges suffices to pay for the sequence. For a non-root node x, we define the level of x, k(x) = max{k IAk(r(z)) < r(p(x))}. We also define the index of a:, i(x), to be the largest i for which r(p(x)) > A~k(,) (r(x)). By the definition of Ackermann's function and the level function we have that 0 < i(x) < r(x). The threshold of a find operation f is t(f) = t~(m+ n, n, r) where r is the rank of the corresponding set. We define S~, for 0 < i < t(f) to be the set of all nodes on the find path whose level is i. We also define S~ (I) to be the set of all nodes on the find path except the last whose level is at least t(l ).4 We denote by Lf the set containing the last node on the find path in S}, for every 0 < i < t(f), and all nodes on the find path whose rank is at most t(l). We denote by N~ the set of nodes v e S~ - L;, for every 0 < i < t(]). Clearly the sets L I and the sets N~, 0 < i < t(l) partition the set of all the nodes on the find path except the root. The find path contains at most one last node from each S~, and at most one node of each rank smaller than t(f). Therefore the number of nodes in LI is at most 2t(l) = 2 a(m + n, n, r), where r is the rank of the corresponding set. Next we count the total number of nodes in sets N~, for 0 <_ i <_ a(m + n,n,n), and for all find operations. We will count separately the nodes in sets Nil (1) and the nodes in the sets N~ for i < t(l). To count the total number of nodes in the sets N~ for i < t(f) we will repartition them into multisets Mr, 1 < t < a(m+n,n,n), defined as follows. The multiset Mt contains all nodes that occur in sets N~ where t(f) = t andi<t. We observe that node x cannot occur more than r(x) times in a set N} for a find f with i < t(f). This is because each time x occurs in a set N) its level must be i and its index is incremented. After r(x) such increments 4Note that the root node has no level associated with it and therefore does not belong to one of the sets S~f. the level of the node increases to i 3-1 and therefore it cannot belong to N~ if i < t(f). We have at most n/2 r nodes of rank r, each occurring in Mt at most r times at a fixed level by the observation above. Therefore for every level i < t there are at most ~r t+l ~r nodes in Mt of level i. Summing over the t levels 0 < i < t we find that Mt t ~ oo r~-t+l 2-7r. By summing up the sizes of Mt for all values of t, 1 < t < a(m+n,n,n), we find that,(m+n,n,n) "("+n,',') t n Mr< Z 2--/~ ~ (r+t) = o(n). Last we count the number of nodes in sets Ntl (I) for all find operations. Suppose x E N~ (/) for some find operation f. We will show that r(p(x)) < ~ + u From the definition ofn~ (1), it follows that k(x) > t(f), and x is followed by another node y with k(y) > t(f). Clearly, r(y) >_ r~p(x)). Let r be the rank of the set containing this find path. By the definition of t(f) and k(y) we have that > r > r(p(y)) > Ak(u)(r(y)). Since Ak(w) is increasing both in k and in w, it must be the case that r(y) < Lm~-~], and therefore r(p(x)) < [mn-~j. Since following each find where x E N~ (I), r(p(x)) increases by at least one, x cannot be in a set N~ (I) more than [~-~j times. Summing over all nodes we obtain that the number of nodes in sets AT) (I) for all find operations is at most m + n. Thus we have proved the following theorem THEOREM 4.1. : A sequence of ra finds mixed with at most n unions on sets containing n elements takes O(n+~-~im_l a(m +n, n, log(n/))) where ni is the number of nodes in the set returned by the i-th find. 5 Handling deletions by set rebuilding To add deletions to the data structure described in the previous section while keeping the same time bounds for union and find, we have to associate two counters with each set. A pointer to these counters is stored at the root of the tree corresponding to each set. The first counter counts the total number of elements in the set and the second counter counts the total number of elements that have been deleted but are still represented by a node in the corresponding tree. We also keep a mark bit with each node in each set. This mark bit is set in every node that corresponds to a deleted item. We perform find as before We also perform union as

9 27 before and in addition we set each of the two counters of the resulting sets to store the sum of the values of the corresponding counters in the original sets. We perform delete(x) as follows. First we mark the node corresponding to x as deleted. Then we perform find(x) to discover the corresponding set S. We increment the counter that counts the number of deleted items in S. Then if the number of deleted items in S at least [[S[/2J, we rebuild S. To rebuild S, we pick one of its live nodes to be the root, and set its rank to 1. We make all other elements children of the root. We update the counters to show that the number of deleted items is zero and the total number of elements is IS[- LlSl/2J. The analysis of this data structure is similar to the analysis of the data structure in Section 4. Here however we can no longer assume that the rank of a node never decreases, and that the rank of the parent of a node never decreases, since this happens when we rebuild sets. To overcome this difficulty we think of the elements in the set after rebuilding as new elements. By thinking of the elements this way we at most double the number of elements involved in the sequence. This is because we can associate the elements in the set after rebuilding with the deleted elements in the set before the rebuilding. This way each real element is associated with at most one artificial element resulting from rebuilding. Clearly the total cost of rebuilding is O(n) since we can charge the cost of each rebuilding to the deleted items. Thus each item gets only a constant charge. The dominating factor in the amortized cost of delete is the need to find the corresponding set by doing a find. In summary, we have obtained the following generalization of Theorem 4.1. THEOREM 5.1. : A sequence of m finds mixed with d <_ n delete operations and at most n unions and n make- set operations takes O (?z --]- Eim= l Ot( Tn dl- n, n, log(t/i) ) -4- d Y'~j=I a(m + n, n, log(dj))) time where n~ is the number of live nodes in the set returned by the i-th find and dj is the number of live node in the set where we do the j-th delete operation. The size of the data structure at any time during the sequence is proprtional to the number of live items in it. 6 Summary We have presented several union-find data structures that support deletions. Some are designed for applications where worst-case performance is important and another has good amortized performance. In an amortized setting, if one is only interested in keeping the overall size of the data structure proportional to the number of live elements in it, it suffices to use a single global counter. This counter counts the number of live elements in the data structure. Each time an item is deleted we just mark it as such, and decrement the counter of live items. When the number of live elements changes by a constant factor we can rebuild the whole collection of sets. With this global approach we can have individual sets in which the fraction of items marked deleted is large even though the overall number of such items is only a constant fraction of the total number of items. In applications where the space requirement of each set separately matters, this approach may not be good enough. Furthermore, the performance of finds on sets full of deleted items degrades. We have shown that by maintaining a counter per set we can keep the size of each set proportional to the actual size of the set and avoid any degradation in the performance of finds. The issue is even more subtle to solve in a worstcase setting, where we need to remove deleted elements from sets incrementally while the sets are subject to regular operations. We have described an incremental rebuilding technique to achieve this. Furthermore we have shown directly how to modify the data structure of Staid, which has best possible worst-case performance, to support deletions. The direct approach is more spaceefficient. In all our structures, the time bound for delete is the same as the time bound for find. Intuitively, this is a result of the need to discover the set we are deleting from in order to do rebuilding or rebalancing operations. Whether a faster implementation of delete is possible is an open question. References [1] A. M. Ben-Amram and Z. Galil. A generalization of a lower bound technique due to Fredman and Saks. Algorithmica, 30:34-66, [2] N. Blum. On the single-operation worst-ea~e time complexity of the disjoint set union problem. SIAM J. Computing, 15(4): , [3] M. L. l~redman and M. E. Saks. The cell probe complexity of dynamic data structures. In Proc. 21st A CM Symposium on Theory of Computing, pages , [4] M. L. Fredman and R. E. Tarjan. Fibonacei heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34(3): , [5] M. Korupoln, R. Mettu, V. Ramaehandran, and Y. Zhao. Experimental evaluation of algorithms for incremental graph connectivity and biconnectivity, [6] D. Kozen. The design and analysis of algorithms, [7] M. Smid. A data structure for the union-find problem having good single-operation complexity. ALCOM:

10 28 algorithms review, Newsletter of the ESPRIT H Basic Research action program project no. 3075, 1, [8] R. E. Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the ACM, 22: , [9] R. E. Tarjan and J. Van Leeuwen. Worst case analysis of set union algorithms. Journal of the ACM, 31: , Appendix A Simple Union-Find structure with optimal worst-case performance In this section we give a simple description of Smid's data structure for union-find [7]. For a fixed parameter k the data structure that we describe supports union in O(k) time and find in O(l -r~) log k time We represent each set by a tree such that the elements of the set reside at the leaves of the tree. Each node of a tree contains a pointer to its parent if it is not the root, and a linked list of its children if it is not a leaf. The root of the tree also contains the name of the set, the number of its children, and the height of the tree. We call every node which is not the root and not a leaf an internal node. The height of a node v, denoted by h(v), is the length of the longest path from v to a leaf. Each tree also satisfies the following two invariants. 1. The height of the root is at least 1. A root of height h = 1 has at least one child. A root of height h > 1 has at least two children. All children of the root are of height h - 1. list of children of a. We discard a. If a has at least k children and h(a) = h(b) - 1 we make a a child of b. Otherwise, h(a) < h(b) - 1, and we make a a child of v. We change the name of the set stored at b to be the name of the new set. We also increment the field that stores the number of children of b in case the root a becomes a child of b. 2. The trees A and B have equal heights, and the number of children of a is smaller than k. We make the children of a point to b instead of a, concatenate the lists of children of a and b and store the resulting list with b. We store in b the new name of the set. We increase the number of children of b by the number of children of a. 3. The trees A and B have equal heights, and the number of children of a is at least k. We create a new root c. We make a and b be children of c by concatenating them into a list, making this list the list of children of c, and setting the parent pointers of a and b to point to e. We store with c the name of the new set, set its children counter to two, and set its height to be one greater than the height of It is easy to see that the properties of the trees stated above are maintained through a sequence of union and finds, therefore union takes O(k) time and find takes Otl -P-~ ~ time. ~ log k / 2. Each internal node of height h has at least k children of height h - 1. Internal nodes of height h may have any number of children of height < h - 1. It is easy to see that these properties ensure that the height of a tree T representing a set with n elements is Ot l g~ Next we describe how to implement union and klogk '" find. Find(x): We follow parent pointers from the leaf containing x until we get to the root. We return the name of the set stored at the root. Union(A,B,C): Let a be the root of A and let b be the root of B. Assume without loss of generality that the height of A is no greater than the height of B and that if A and B are of the same height then b has at least as many children as a. There are three cases. 1. The height of A is strictly smaller than the height of B. Let v an arbitrary child of b. If a has less than k children, we make the children of a point to v, and concatenate the list of children of v with the

### The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints

Lecture 10 Union-Find The union-nd data structure is motivated by Kruskal's minimum spanning tree algorithm (Algorithm 2.6), in which we needed two operations on disjoint sets of vertices: determine whether

### Complexity of Union-Split-Find Problems. Katherine Jane Lai

Complexity of Union-Split-Find Problems by Katherine Jane Lai S.B., Electrical Engineering and Computer Science, MIT, 2007 S.B., Mathematics, MIT, 2007 Submitted to the Department of Electrical Engineering

### CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis Lecture 7 Binary heap, binomial heap, and Fibonacci heap 1 Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 The slides were

### CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

### 2 Exercises and Solutions

2 Exercises and Solutions Most of the exercises below have solutions but you should try first to solve them. Each subsection with solutions is after the corresponding subsection with exercises. 2.1 Sorting

### COT5405 Analysis of Algorithms Homework 3 Solutions

COT0 Analysis of Algorithms Homework 3 Solutions. Prove or give a counter example: (a) In the textbook, we have two routines for graph traversal - DFS(G) and BFS(G,s) - where G is a graph and s is any

### Data Structures Fibonacci Heaps, Amortized Analysis

Chapter 4 Data Structures Fibonacci Heaps, Amortized Analysis Algorithm Theory WS 2012/13 Fabian Kuhn Fibonacci Heaps Lacy merge variant of binomial heaps: Do not merge trees as long as possible Structure:

### Converting a Number from Decimal to Binary

Converting a Number from Decimal to Binary Convert nonnegative integer in decimal format (base 10) into equivalent binary number (base 2) Rightmost bit of x Remainder of x after division by two Recursive

### Analysis of Algorithms I: Binary Search Trees

Analysis of Algorithms I: Binary Search Trees Xi Chen Columbia University Hash table: A data structure that maintains a subset of keys from a universe set U = {0, 1,..., p 1} and supports all three dictionary

### Two General Methods to Reduce Delay and Change of Enumeration Algorithms

ISSN 1346-5597 NII Technical Report Two General Methods to Reduce Delay and Change of Enumeration Algorithms Takeaki Uno NII-2003-004E Apr.2003 Two General Methods to Reduce Delay and Change of Enumeration

### Union-Find Problem. Using Arrays And Chains

Union-Find Problem Given a set {,,, n} of n elements. Initially each element is in a different set. ƒ {}, {},, {n} An intermixed sequence of union and find operations is performed. A union operation combines

### Cpt S 223. School of EECS, WSU

Priority Queues (Heaps) 1 Motivation Queues are a standard mechanism for ordering tasks on a first-come, first-served basis However, some tasks may be more important or timely than others (higher priority)

### The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

### Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

### Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse

### Lecture 2 February 12, 2003

6.897: Advanced Data Structures Spring 003 Prof. Erik Demaine Lecture February, 003 Scribe: Jeff Lindy Overview In the last lecture we considered the successor problem for a bounded universe of size u.

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

### root node level: internal node edge leaf node CS@VT Data Structures & Algorithms 2000-2009 McQuain

inary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each

### The Relative Worst Order Ratio for On-Line Algorithms

The Relative Worst Order Ratio for On-Line Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, joan@imada.sdu.dk

### 8.1 Min Degree Spanning Tree

CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

### Lecture Notes on Spanning Trees

Lecture Notes on Spanning Trees 15-122: Principles of Imperative Computation Frank Pfenning Lecture 26 April 26, 2011 1 Introduction In this lecture we introduce graphs. Graphs provide a uniform model

### Balanced search trees

Lecture 8 Balanced search trees 8.1 Overview In this lecture we discuss search trees as a method for storing data in a way that supports fast insert, lookup, and delete operations. (Data structures handling

### A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

CmSc 250 Intro to Algorithms Chapter 6. Transform and Conquer Binary Heaps 1. Definition A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called

### CS268: Geometric Algorithms Handout #5 Design and Analysis Original Handout #15 Stanford University Tuesday, 25 February 1992

CS268: Geometric Algorithms Handout #5 Design and Analysis Original Handout #15 Stanford University Tuesday, 25 February 1992 Original Lecture #6: 28 January 1991 Topics: Triangulating Simple Polygons

### On the k-path cover problem for cacti

On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

### Answers to Homework 1

Answers to Homework 1 p. 13 1.2 3 What is the smallest value of n such that an algorithm whose running time is 100n 2 runs faster than an algorithm whose running time is 2 n on the same machine? Find the

### Multimedia Communications. Huffman Coding

Multimedia Communications Huffman Coding Optimal codes Suppose that a i -> w i C + is an encoding scheme for a source alphabet A={a 1,, a N }. Suppose that the source letter a 1,, a N occur with relative

### The LCA Problem Revisited

The LA Problem Revisited Michael A. Bender Martín Farach-olton SUNY Stony Brook Rutgers University May 16, 2000 Abstract We present a very simple algorithm for the Least ommon Ancestor problem. We thus

### Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Binary Search Trees Data in each node Larger than the data in its left child Smaller than the data in its right child FIGURE 11-6 Arbitrary binary tree FIGURE 11-7 Binary search tree Data Structures Using

### Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

### A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,

### arxiv:cs/0605141v1 [cs.dc] 30 May 2006

General Compact Labeling Schemes for Dynamic Trees Amos Korman arxiv:cs/0605141v1 [cs.dc] 30 May 2006 February 1, 2008 Abstract Let F be a function on pairs of vertices. An F- labeling scheme is composed

### Analysis of Algorithms, I

Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications

### MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

### Partitioning edge-coloured complete graphs into monochromatic cycles and paths

arxiv:1205.5492v1 [math.co] 24 May 2012 Partitioning edge-coloured complete graphs into monochromatic cycles and paths Alexey Pokrovskiy Departement of Mathematics, London School of Economics and Political

### Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Binary Search Trees Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 27 1 Properties of Binary Search Trees 2 Basic BST operations The worst-case time complexity of BST operations

### A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Chapter 12: Binary Search Trees A binary search tree is a binary tree with a special property called the BST-property, which is given as follows: For all nodes x and y, if y belongs to the left subtree

### Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

### Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Data Structures Jaehyun Park CS 97SI Stanford University June 29, 2015 Typical Quarter at Stanford void quarter() { while(true) { // no break :( task x = GetNextTask(tasks); process(x); // new tasks may

### Load Balancing. Load Balancing 1 / 24

Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

### From Last Time: Remove (Delete) Operation

CSE 32 Lecture : More on Search Trees Today s Topics: Lazy Operations Run Time Analysis of Binary Search Tree Operations Balanced Search Trees AVL Trees and Rotations Covered in Chapter of the text From

### 4.4 The recursion-tree method

4.4 The recursion-tree method Let us see how a recursion tree would provide a good guess for the recurrence = 3 4 ) Start by nding an upper bound Floors and ceilings usually do not matter when solving

### Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

CSE 220: Handout 29 Disjoint Sets 1 Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information about relations

### Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

### CS 840 Fall 2016 Self Organizing Linear Search

CS 840 Fall 2016 Self Organizing Linear Search We will consider searching under the comparison model Binary tree lg n upper and lower bounds This also holds in average case Updates also in O(lg n) Linear

### Comparing Leaf and Root Insertion

30 Research Article SACJ, No. 44., December 2009 Comparing Leaf Root Insertion Jaco Geldenhuys, Brink van der Merwe Computer Science Division, Department of Mathematical Sciences, Stellenbosch University,

### Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

### Quicksort is a divide-and-conquer sorting algorithm in which division is dynamically carried out (as opposed to static division in Mergesort).

Chapter 7: Quicksort Quicksort is a divide-and-conquer sorting algorithm in which division is dynamically carried out (as opposed to static division in Mergesort). The three steps of Quicksort are as follows:

### Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

### Solving Recurrences. Lecture 13 CS2110 Summer 2009

Solving Recurrences Lecture 13 CS2110 Summer 2009 In the past Complexity: Big-O definition Non-recursive programs 2 Today Complexity of recursive programs Introduce recurrences Methods to solve recurrences

### Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Binary Search Trees A Generic Tree Nodes in a binary search tree ( B-S-T) are of the form P parent Key A Satellite data L R B C D E F G H I J The B-S-T has a root node which is the only node whose parent

### CMPS 102 Solutions to Homework 1

CMPS 0 Solutions to Homework Lindsay Brown, lbrown@soe.ucsc.edu September 9, 005 Problem..- p. 3 For inputs of size n insertion sort runs in 8n steps, while merge sort runs in 64n lg n steps. For which

### Balanced Trees Part One

Balanced Trees Part One Balanced Trees Balanced trees are surprisingly versatile data structures. Many programming languages ship with a balanced tree library. C++: std::map / std::set Java: TreeMap /

### Balanced Binary Trees

Reminders about Trees Balanced Binary Trees With pictures by John Morris (ciips.ee.uwa.edu.au/~morris) A binary tree is a tree with exactly two sub-trees for each node, called the left and right sub-trees.

### Optimal Tracking of Distributed Heavy Hitters and Quantiles

Optimal Tracking of Distributed Heavy Hitters and Quantiles Ke Yi HKUST yike@cse.ust.hk Qin Zhang MADALGO Aarhus University qinzhang@cs.au.dk Abstract We consider the the problem of tracking heavy hitters

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### 10. Graph Matrices Incidence Matrix

10 Graph Matrices Since a graph is completely determined by specifying either its adjacency structure or its incidence structure, these specifications provide far more efficient ways of representing a

### Hash-based Digital Signature Schemes

Hash-based Digital Signature Schemes Johannes Buchmann Erik Dahmen Michael Szydlo October 29, 2008 Contents 1 Introduction 2 2 Hash based one-time signature schemes 3 2.1 Lamport Diffie one-time signature

### A. V. Gerbessiotis CS Spring 2014 PS 3 Mar 24, 2014 No points

A. V. Gerbessiotis CS 610-102 Spring 2014 PS 3 Mar 24, 2014 No points Problem 1. Suppose that we insert n keys into a hash table of size m using open addressing and uniform hashing. Let p(n, m) be the

### Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we

### CSE 5311 Homework 2 Solution

CSE 5311 Homework 2 Solution Problem 6.2-6 Show that the worst-case running time of MAX-HEAPIFY on a heap of size n is Ω(lg n). (Hint: For a heap with n nodes, give node values that cause MAX- HEAPIFY

### Binary Heap Algorithms

CS Data Structures and Algorithms Lecture Slides Wednesday, April 5, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005 2009 Glenn G. Chappell

### Chapter 4. Trees. 4.1 Basics

Chapter 4 Trees 4.1 Basics A tree is a connected graph with no cycles. A forest is a collection of trees. A vertex of degree one, particularly in a tree, is called a leaf. Trees arise in a variety of applications.

### Dynamic 3-sided Planar Range Queries with Expected Doubly Logarithmic Time

Dynamic 3-sided Planar Range Queries with Expected Doubly Logarithmic Time Gerth Stølting Brodal 1, Alexis C. Kaporis 2, Spyros Sioutas 3, Konstantinos Tsakalidis 1, Kostas Tsichlas 4 1 MADALGO, Department

### Gambling and Data Compression

Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

### Algorithms Chapter 12 Binary Search Trees

Algorithms Chapter 1 Binary Search Trees Outline Assistant Professor: Ching Chi Lin 林 清 池 助 理 教 授 chingchi.lin@gmail.com Department of Computer Science and Engineering National Taiwan Ocean University

### THE SCHEDULING OF MAINTENANCE SERVICE

THE SCHEDULING OF MAINTENANCE SERVICE Shoshana Anily Celia A. Glass Refael Hassin Abstract We study a discrete problem of scheduling activities of several types under the constraint that at most a single

### I/O-Efficient Spatial Data Structures for Range Queries

I/O-Efficient Spatial Data Structures for Range Queries Lars Arge Kasper Green Larsen MADALGO, Department of Computer Science, Aarhus University, Denmark E-mail: large@madalgo.au.dk,larsen@madalgo.au.dk

### Data storage Tree indexes

Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating

### Problem Set #1 Solutions

CS161: Design and Analysis of Algorithms Summer 2004 Problem Set #1 Solutions General Notes Regrade Policy: If you believe an error has been made in the grading of your problem set, you may resubmit it

### The Online Set Cover Problem

The Online Set Cover Problem Noga Alon Baruch Awerbuch Yossi Azar Niv Buchbinder Joseph Seffi Naor ABSTRACT Let X = {, 2,..., n} be a ground set of n elements, and let S be a family of subsets of X, S

### Strategic Deployment in Graphs. 1 Introduction. v 1 = v s. v 2. v 4. e 1. e 5 25. e 3. e 2. e 4

Informatica 39 (25) 237 247 237 Strategic Deployment in Graphs Elmar Langetepe and Andreas Lenerz University of Bonn, Department of Computer Science I, Germany Bernd Brüggemann FKIE, Fraunhofer-Institute,

### Persistent Data Structures and Planar Point Location

Persistent Data Structures and Planar Point Location Inge Li Gørtz Persistent Data Structures Ephemeral Partial persistence Full persistence Confluent persistence V1 V1 V1 V1 V2 q ue V2 V2 V5 V2 V4 V4

### Balanced Binary Search Tree

AVL Trees / Slide 1 Balanced Binary Search Tree Worst case height of binary search tree: N-1 Insertion, deletion can be O(N) in the worst case We want a tree with small height Height of a binary tree with

### B-slack trees: Space Efficient B-trees

B-slack trees: Space Efficient B-trees Trevor Brown Department of Computer Science, University of Toronto, Canada tabrown@cs.toronto.edu Abstract. B-slack trees, a subclass of B-trees that have substantially

### Sample Problems in Discrete Mathematics

Sample Problems in Discrete Mathematics This handout lists some sample problems that you should be able to solve as a pre-requisite to Computer Algorithms Try to solve all of them You should also read

### 6.852: Distributed Algorithms Fall, 2009. Class 2

.8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election

### The Cost of Offline Binary Search Tree Algorithms and the Complexity of the Request Sequence

The Cost of Offline Binary Search Tree Algorithms and the Complexity of the Request Sequence Jussi Kujala, Tapio Elomaa Institute of Software Systems Tampere University of Technology P. O. Box 553, FI-33101

### Binary Trees and Huffman Encoding Binary Search Trees

Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary

### Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

### Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

### A o(n)-competitive Deterministic Algorithm for Online Matching on a Line

A o(n)-competitive Deterministic Algorithm for Online Matching on a Line Antonios Antoniadis 2, Neal Barcelo 1, Michael Nugent 1, Kirk Pruhs 1, and Michele Scquizzato 3 1 Department of Computer Science,

### How Asymmetry Helps Load Balancing

How Asymmetry Helps oad Balancing Berthold Vöcking nternational Computer Science nstitute Berkeley, CA 947041198 voecking@icsiberkeleyedu Abstract This paper deals with balls and bins processes related

### Lecture 2: Universality

CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal

### Persistent Data Structures

6.854 Advanced Algorithms Lecture 2: September 9, 2005 Scribes: Sommer Gentry, Eddie Kohler Lecturer: David Karger Persistent Data Structures 2.1 Introduction and motivation So far, we ve seen only ephemeral

### A simple existence criterion for normal spanning trees in infinite graphs

1 A simple existence criterion for normal spanning trees in infinite graphs Reinhard Diestel Halin proved in 1978 that there exists a normal spanning tree in every connected graph G that satisfies the

### Lecture 17 : Equivalence and Order Relations DRAFT

CS/Math 240: Introduction to Discrete Mathematics 3/31/2011 Lecture 17 : Equivalence and Order Relations Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT Last lecture we introduced the notion

### Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths

Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths Satyajeet S. Ahuja, Srinivasan Ramasubramanian, and Marwan Krunz Department of ECE, University of Arizona, Tucson,

### princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of

### Determination of the normalization level of database schemas through equivalence classes of attributes

Computer Science Journal of Moldova, vol.17, no.2(50), 2009 Determination of the normalization level of database schemas through equivalence classes of attributes Cotelea Vitalie Abstract In this paper,

### Graphs without proper subgraphs of minimum degree 3 and short cycles

Graphs without proper subgraphs of minimum degree 3 and short cycles Lothar Narins, Alexey Pokrovskiy, Tibor Szabó Department of Mathematics, Freie Universität, Berlin, Germany. August 22, 2014 Abstract

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### The Open University s repository of research publications and other research outputs

Open Research Online The Open University s repository of research publications and other research outputs The degree-diameter problem for circulant graphs of degree 8 and 9 Journal Article How to cite:

### A tree can be defined in a variety of ways as is shown in the following theorem: 2. There exists a unique path between every two vertices of G.

7 Basic Properties 24 TREES 7 Basic Properties Definition 7.1: A connected graph G is called a tree if the removal of any of its edges makes G disconnected. A tree can be defined in a variety of ways as

### 4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in

### Chapters 1 and 5.2 of GT

Subject 2 Spring 2014 Growth of Functions and Recurrence Relations Handouts 3 and 4 contain more examples;the Formulae Collection handout might also help. Chapters 1 and 5.2 of GT Disclaimer: These abbreviated

### Randomized algorithms

Randomized algorithms March 10, 2005 1 What are randomized algorithms? Algorithms which use random numbers to make decisions during the executions of the algorithm. Why would we want to do this?? Deterministic