# I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis

Save this PDF as:

Size: px
Start display at page:

Download "I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis"

## Transcription

1 I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal Lars Arge, Ke Yi Department of Computer Science, Duke University, Durham, NC 0, USA. Department of Computer Science, University of Aarhus, Aarhus, Denmark. ABSTRACT Despite extensive study over the last four decades and numerous applications, no I/O-efficient algorithm is known for the union-find problem. In this paper we present an I/O-efficient algorithm for the batched (off-line) version of the union-find problem. Given any sequence of N union and find operations, where each union operation joins two distinct sets, our algorithm uses O(SORT(N)) = N B O( N log B /B ) I/Os, where is the memory size and B is the disk block size. This bound is asymptotically optimal in the worst case. If there are union operations that join a set with itself, our algorithm uses O(SORT(N) + ST(N)) I/Os, where ST(N) is the number of I/Os needed to compute the minimum spanning tree of a graph with N edges. We also describe a simple and practical O(SORT(N) log( N ))-I/O algorithm for this problem, which we have implemented. We are interested in the union-find problem because of its applications in terrain analysis. A terrain can be abstracted as a height function defined over R, and many problems that deal with such functions require a union-find data structure. With the emergence of modern mapping technologies, huge amount of elevation data is being generated that is too large to fit in memory, thus I/O-efficient algorithms are needed to process this data efficiently. In this paper, we study two terrain analysis problems that benefit from a unionfind data structure: (i) computing topological persistence and (ii) constructing the contour tree. We give the first O(SORT(N))-I/O algorithms for these two problems, assuming that the input terrain is represented as a triangular mesh with N vertices. Finally, we report some preliminary experimental results, showing that our algorithms give order-of-magnitude improvement over previous methods on large data sets that do not fit in memory. Work on this paper is supported by ARO grant W9NF Pankaj K. Agarwal and Ke Yi are also supported by NSF under grants CCR-00-0, EIA-0-90, CCR-0-0, and DEB-0-, by ARO grant DAAD9-0--0, and by a grant from the U.S. Israel Binational Science Foundation. Lars Arge is also supported by an Ole Rømer Scholarship from the Danish National Science Research Council. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SCG 0, June, 00, Sedona, Arizona, USA. Copyright 00 AC /0/000...\$.00. Categories and Subject Descriptors: F.. [Theory of Computation]: Nonnumerical Algorithms and Problems. General Terms: Algorithms, Experimentation. Keywords: Union-find, terrain analysis, contour trees, I/O-efficient algorithms.. INTRODUCTION The union-find problem asks for maintaining a partition of a set U = {x, x,... } (the universe) and a representative element of each set in this partition under a sequence Σ of UNION(x i, x j) and FIND(x i) operations: UNION(x i, x j) joins the set containing x i and the set containing x j, and provides a new representative element for the new set; FIND(x i) returns the representative element of the set containing x i. In the on-line version of the problem, Σ is given one operation at a time, whereas in the batched (or off-line) version, the entire sequence Σ is known in advance. The union-find problem is a fundamental algorithmic problem because of its applications in numerous problems across different domains, from programming languages to graph and geometric algorithms, and from computational topology to computational biology. In many of these algorithms only the batched version of the problem is required; see [,,, 0] for a sample of applications. The main motivation for our study of the union-find problem arises from terrain modeling and analysis. A terrain can be abstracted as a height function defined over R, and there is a rich literature on the study of such functions. We are interested in two broad problems in terrain analysis, namely flow and contour line analysis. A key step in the flow analysis of a terrain is to modify the height function so that small depressions on the terrain (sinks) disappear. We use the notion of topological persistence, introduced in [], to address this problem. In the contour-line analysis, the notion of contour tree is critical [, 9, ]. ost existing topological persistence and contour tree algorithms rely on efficient data structures for the batched union-find problem. With the emergence of high-resolution terrain-mapping technologies, huge amount of data is being generated that is too large to fit in memory and has to reside on disks. Existing algorithms cannot handle such massive data sets, mainly because they optimize CPU running time while optimizing disk access is much more important. otivated by these factors we propose efficient algorithms for the batched union-find problem in the I/O-model [] (also known as the external memory model), and use them to develop I/O-efficient algorithms for computing topological persistence and contour trees. Related results. In the I/O model, the machine consists of an infinitesize external memory (disk) and a main memory of size. A

2 block of B consecutive elements can be transferred between main memory and disk in one I/O operation (or simply I/O). Computation can only occur on elements in main memory, and the complexity of an algorithm is measured in terms of the number of I/Os it uses to solve a problem. any fundamental problems have been solved in the I/O model. For example, sorting N elements takes SORT(N) = Θ( N log B /B N ) I/Os, and permuting N elements B takes PER(N) = Θ(min{N, SORT(N)}) I/Os. Please refer to the surveys by Vitter [] and Arge [] for other results, including I/O-efficient algorithms for various geometric and graph problems. The study of the union-find problem (in internal memory computation models) started early in the sixties [] and continues even today. Tarjan [0] proved that an on-line sequence of N union and find operations can be performed in time O(Nα(N)), where α(n) is the inverse Ackermann function. This bound is tight in the worst case [9, ]. A simpler proof of the upper bound and various generalizations have been proposed in [, ]. The batched version of the problem has a lower bound of Ω(Nα(N)) in the pointermachine model [], but it can be solved in linear time in the RA model [0]. However, despite of its importance, no I/O-efficient algorithms have been developed for the union-find problem. Note that the naïve use of the RA algorithms in the I/O model results in O(Nα(N)) and O(N) I/O-algorithms, in the on-line and batched case, respectively. For realistic values of N, B, and, SORT(N) N, and the difference in running time between an algorithm performing N I/Os and one performing SORT(N) I/Os can be very significant. Topological persistence (see Section and [, ] for its definition) of a height function is a measure of its topological attributes. Over the last few years, it has been successfully applied to a variety of problems, including topological simplification [, ], identifying features on a surface [], and removing topological noises [0]; see also []. Efficient algorithms for computing persistence in internal memory are developed in [, ]. The efficient algorithms for computing the topological persistence of a two-dimensional height function rely on a union-find data structure, so no I/O-efficient algorithm is known for this problem. Contour trees are widely used to represent the topological changes in the contours of a height function (see Section and [, 9, ] for a formal definition). Van Kreveld et al. [] gave an O(N log N)- time algorithm for constructing the contour tree of a piecewiselinear height function on R, represented by a mesh with N vertices. The algorithm was later extended to D by Tarasov and Vyalyi [9], and to arbitrary dimensions by Carr et al. []. The last two algorithms rely on a union-find data structure. No I/O-efficient algorithm is known for computing contour trees even in two dimensions. Our results. Our main result is the first I/O-efficient algorithms for the batched union-find problem. In Section, we present an algorithm that uses O(SORT(N)) I/Os on a sequence of N union and find operations, provided that none of the union operations is redundant, that is, for each UNION(x i, x j), x i and x j are in different sets. This is optimal in the worst case since there is a simple O(N/B)-I/O reduction from permutation. An interesting feature of our algorithm is that it reduces the problem to two geometric problems. If redundant union operations are allowed, we describe an algorithm that uses O(SORT(N) + ST(N)) I/Os, where ST(N) is the number of I/Os needed to compute the minimum spanning tree of a graph with N edges. Currently the best bound If N < SORT(N), which is true only for extremely large inputs, there is a gap between O(SORT(N)) and Ω(PER(N)). However, in this case we can obtain an O(PER(N)) = O(N)-I/O algorithm by simply using the O(N) RA algorithm of [0]. for ST(N) is O(SORT(N) log log B) I/Os [] by a deterministic algorithm or expected O(SORT(N)) I/Os by a randomized algorithm []. We develop the first I/O-efficient algorithms for topological persistence and contour trees using our union-find algorithm. In Section, we describe an O(SORT(N))-I/O algorithm for computing topological persistence of a terrain represented as a triangular mesh with N vertices. The algorithm is obtained by simply plugging our union-find algorithm to the previous internal memory algorithm of [, ]. In Section, we describe an O(SORT(N))-I/O algorithm for computing the contour tree of a terrain represented as a triangular mesh with N vertices. For this problem it is not enough to just apply our union-find algorithm to the previous internal memory algorithms. We prove a property of contour trees (cf. Lemma ), which is interesting in its own right. Using this property, we reduce the problem to a geometric problem and design an I/O-efficient algorithm for this problem. As in [], our algorithm extends to higher dimensions. While theoretically I/O-efficient, our O(SORT(N) + ST(N))- I/O algorithms for the batched union-find problem are probably too complicated to be of practical interests, therefore in Section we present a simple divide-and-conquer algorithm that uses O(SORT(N) log( N )) I/Os. It does not have to invoke an ST algorithm to handle redundant union operations. We have implemented this algorithm, and present a few preliminary empirical results in Section. These experiments show that our algorithm gives order-ofmagnitude improvement over previous methods on large data sets. We also apply our techniques to the so-called flooding problem [], a key step in flow analysis on terrains.. I/O-OPTIAL BATCHED UNION-FIND Let U = {x, x,... } be the universe of elements, and let Σ = σ, σ,..., σ N be a sequence of union and find operations. Let the index of the operation σ t denote its time stamp. Assume without loss of generality that each element in U has been used in at least one operation of Σ. In this section we describe an I/Oefficient algorithm for the batched union-find problem, assuming that all union operations in Σ are non-redundant. At the end of the section we discuss how to handle redundant union operations. Our algorithm consists of two stages. In the first stage, we compute a total ordering on the elements of U so that as Σ is performed, each set consists of a contiguous sequence of elements in this ordering and each union operation joins two adjacent sets. ore precisely, if sets A and B are joined together by any union operation in Σ, then there exist some x i A and x j B such that x i and x j are adjacent in this ordering. We call this special case the interval union-find problem. In the second stage, we solve an instance of the batched interval union-find problem. We formulate each of these two subproblems as a geometric problem and present an O(SORT(N))-I/O algorithm for each of them. We also note that the assumption on non-redundant union operations is needed only in the first stage. From union-find to interval union-find. Given a sequence Σ, let the union graph G(Σ) be a weighted undirected graph whose vertices are the elements of U. There is an edge e between x i and x j with weight ω(e) = t for each UNION(x i, x j) operation with time stamp t. If all union operations are non-redundant, G(Σ) is a forest. For simplicity we assume it to be a tree; otherwise, we connect them by edges of weight. We pick an arbitrary node of G(Σ) as its root and call this rooted tree the union tree, denoted by T. As a convention, when we refer to an edge (u, v) in a rooted tree, u is always the parent of v.

3 r (a) (b) (c) Figure : (a) The union tree T, with weights (time stamps) associated with each edge. The dashed lines show the Euler tour. (b) A horizontal segment is built for each edge e of T, with ω(e) as its y-coordinate and the positions of e in the Euler tour as its left and right x-coordinates. (c) The equivalent union tree T after transformation. A union tree T on the same set of nodes as T is equivalent to T if for any t, the connected components of the forest formed by the edges of T and T with weight at most t are the same. In order to reduce the union-find problem to the interval union-find problem, we first transform T into an equivalent union tree T, with the property that the weights along any leaf-to-root path are increasing. Then we show how a certain in-order traversal of T defines an ordering of the elements of U that results in Σ being an interval union-find instance. Transforming T to T. We transform T into T by repeatedly applying the following short-cut operation, which simulates the path compression technique: For any w T, let v be w s parent and u be v s parent. If ω(u, v) < ω(v, w), we promote w to be a child of u, by removing the edge (v, w) and adding the edge (u, w) with ω(u, w) = ω(v, w). It is easy to see that the new tree resulted after this operation is equivalent to the old tree, since at the time UNION(v, w) is issued, u and v are already in the same set. When this procedure stops, we obtain a tree T in which the weights along any leaf-to-root path are increasing. In order to perform all these short-cut operations efficiently, for each node v, we need to find its ancestor in T that becomes the parent of v in T. We cast this problem in a geometric setting. We construct an Euler tour on T, which starts and ends at r and traverses every edge of T exactly twice, once in each direction (see Figure (a)). Such a tour can be constructed in O(SORT(N)) I/Os []. For each edge e T, we map it to a horizontal line segment whose y-coordinate is ω(e), and whose left (resp. right) x-coordinate is the position (index) where e appears for the first (resp. second) time in the Euler tour. Note that the x-spans of these segments are nested. Refer to Figure (a) and (b). These segments can be easily constructed in O(SORT(N)) I/Os given the Euler tour. Abusing the notation, we also use e to denote the segment that corresponds to the edge e T. Let E be this set of segments. Let γ(e) be the shortest segment in E that lies above e and whose x-projection contains that of e. We claim that if e = (u, v) and γ(e) = (w, z), then (z, v) is an edge of T, i.e., z is the parent of v in T. Indeed, the second condition (γ(e) above e) ensures that v cannot be shortcut to w, and the first condition (γ(e) being the shortest) and the third condition (the x-projection of γ(e) containing that of e) ensure that z is the lowest ancestor of v that satisfies the second condition. If γ(e) does not exist for some e = (u, v), we know that v must be a child of the root r. Refer to Figure (c). Thus it suffices to compute γ(e) for each e. We next observe that since the segments in E are nested, the x-span of a segment e contains that of another segment e if and only if the x-span of e contains the x-coordinate of the left endpoint of e. Let P be the set of left endpoints of the segments in E. For a point p P and a subset X E, let γ(p, X) denote the shortest segment of X that is above p and whose x-span contains the x-coordinate of p. The problem now reduces to computing γ(p, E) for each p P. We use the distribution sweeping technique [] to solve this problem. We first sort E and P by y-coordinates. We then divide the plane into m = /B vertical slabs L,..., L m, each of which contains roughly the same number of segment endpoints of E. Let P i = P L i, let E i be the set of segments with at least one endpoint inside L i, and Ei the set of segments that completely span L i. Note that for any point p P i, γ(p, E) is the shorter segment of γ(p, E i) and γ(p, Ei ). We compute the sets E i, P i, and the segment γ(p, Ei ) for each p P i, by a sweep-line algorithm, and compute γ(p, E i) for each p P i and for i m recursively. We sweep the plane in the ( y)-direction starting from y =. For each slab L i, we maintain in memory the shortest segment λ i among the segments swept so far that completely span L i. When the sweep line reaches a point p P, we determine the slab L i that contains p, set γ(p, Ei ) to λ i, and add p to P i. When the sweep line reaches a segment e of E, we add e to E i if L i contains at least one of the endpoints of e, and update λ j if e spans L j and e is shorter than λ j. We recursively solve the problem for each (E i, P i). The sweep can be implemented by a scan of P and E. N Since the depth of recursion is O(log /B ), the overall cost is B O(SORT(N)) I/Os. Traversing T. To obtain an ordering of the elements of U that results in Σ being an interval union-find instance, we perform a weight-guided, in-order traversal on T starting from the root. When we reach a node u of T with children u,..., u k where w(u, u ) < < w(u, u k ), first we recursively visit the subtree rooted at u, then visit u, and then recursively visit the subtrees rooted at u,..., u k. To perform this traversal I/O-efficiently we first order each node s children in weight order, then compute an Euler tour. In the tour a leaf of T appears once, and an internal node with k children appears k + times. It is easy to verify that, for any internal node u, if we delete all of u s appearances but the second from the tour, then the desired order is obtained. To do so we first associate with each appearance of u with a node id and its position in the Euler tour, then sort them by the node id, and finally do a scan to remove all but the second appearance of each internal node.

4 LEA. We can convert a sequence of N union and find operations without redundant union operations into an instance of the batched interval union-find problem in O(SORT(N)) I/Os. PROOF. It only remains to show that this ordering indeed produces an instance of the interval union-find problem. Let u be any node of T, and let u,..., u k be u s children where w(u, u ) < < w(u, u k ). For any i k, when the union operation corresponding to (u, u i) is to be performed, in the sequence Σ, all union operations corresponding to the whole subtree of u i must have already been performed, since they have smaller weights than ω(u, u i). If i =, then u must have not been joined with any other node at the time. So the union operation corresponding to the edge (u, u ) joins u s subtree and the singleton set {u}, which is immediately after u s subtree in the ordering. If i, then all the subtrees of u,..., u i must have been joined together with u, and u i s subtree is immediately after these subtrees in the ordering, so the union operation corresponding to the edge (u, u i) also joins adjacent sets. Solving interval union-find. We solve the interval union-find problem by formulating it as another geometric problem. Let x < x < be the sequence of ordered elements of U. For each union operation σ t = UNION(x i, x j) in Σ, we create a horizontal line segment with y-coordinate t, x-coordinate x i, and right x-coordinate x j. For each find operation σ t = FIND(x i) in Σ, we create a query point (x i, t). For σ t = FIND(x i) we return the smallest element of the set as its representative at time t that contains x i. For each query point q, consider the union of the x-projections of all the segments lower than q. Each interval in the union corresponds to a set at time t (when FIND(x i) was performed), so we need to return the left endpoint of the interval that contains x i. We solve this geometric problem by solving a series of batched orthogonal ray-shooting problems, in which we are given a set of horizontal segments and a set of query points, the goal is to find the first segment hit by a vertical ray shooting upwards from each query point. This is a special case of the endpoint dominance problem [] and can be solved in O(SORT(N)) I/Os. First, from the left endpoint of each segment e we shoot a ray downwards, and collect all the segments whose rays do not hit any other segments. Second, from the left endpoints of these segments we shoot upwards. These rays stop at the first horizontal segments they hit, or go to infinity. Let V be the resulting set of vertical segments. See Figure where these rays are drawn as dashed lines. Finally, from each query point q = (x i, t), we shoot a ray downwards and hit a horizontal segment. If the ray does not hit any segment we simply return x i as the answer to q; otherwise, we shoot a ray leftward from q and find the first segment of V hit by the ray. Then the x-coordinate of this vertical segment is returned as the answer to q. It is not difficult to verify that we indeed return the correct representative element for each query point. The whole query answering process is four instances of the batched orthogonal ray-shooting problem, thus can be completed in O(SORT(N)) I/Os. LEA. Given a sequence of N union and find operations possibly with redundant union operations, the batched interval unionfind problem can be solved in O(SORT(N)) I/Os. Combining Lemmas and, we obtain the main result. THEORE. Given a sequence of N union and find operations with no redundant union operations, the batched union-find problem can be solved in O(SORT(N)) I/Os. (x i, t ) (x i, t ) (x i, t ) (x i, t ) Figure : Answering find queries σ t = FIND(x i). If no segment lies below a left endpoint, it is connected to the segment lying immediately above it by a dashed vertical line. REARK. In some applications, certain elements are required to be the representative elements of the sets returned by the find operations. For example, each element may be weighted, and the minimum-weight element of a set is required to be the representative of the set (see Section and ). Our algorithm as described above may pick arbitrary elements as the representatives, but such a requirement can be satisfied by running our algorithm twice, followed by a post-processing step in O(SORT(N)) I/Os: We first answer all the find queries by returning the minimum element in the ordering produced by our reduction from union-find to interval union-find; then by solving the internal union-find problem symmetrically we can also return the maximum elements in the same ordering. Now we have both the minimum and maximum elements of the set for each find query, returning the minimum-weight element in the set becomes a range-minimum query. Again, this batched range-minimum problem can be solved in O(SORT(N)) I/Os using distribution sweeping []. REARK. We can prove the Ω(PER(N))-I/O lower bound by a simple reduction from permutation. The idea is to build a sequence of union and find operations such that the results to the find queries exactly form the desired permutation. We omit the details from this abstract. Handling non-redundant union operations. So far we have assumed that there are no redundant union operations in Σ. If this assumption does not hold, G(Σ) contains cycles. Then we first compute the ST (minimum spanning forest in general) on the union graph G(Σ), and delete all edges that are not in the ST. From Kruskal s algorithm [] we know that only the union operations corresponding to the ST edges of G(Σ) are non-redundant. For a general graph with N edges, computing the ST takes O(SORT(N) log log B) I/Os deterministically [] or expected O(SORT(N)) I/Os by a randomized algorithm []; if the graph is planar, computing the ST takes deterministic O(SORT(N)) I/Os []. So we have the following. THEORE. Given a sequence Σ of N union and find operations possibly with redundant union operations, the batched unionfind problem can be solved in by a deterministic algorithm with O(SORT(N) log log B) I/Os or a randomized algorithm with expected O(SORT(N)) I/Os. If the union graph of Σ is planar, the bound is deterministic O(SORT(N)) I/Os. Semi-on-line union-find. The semi-on-line union-find problem is a variant of the union-find problem in which all the union operations are given in advance, but the find queries appear in an on-line

7 9 J 9 S 9 J 9 S 9 C (a) (b) (c) (d) (e) Figure : Construction of a contour tree from the join tree and the split tree using Carr et al. []: (a) The join tree J of a mesh, where the number beside each node is its height. (b) The split tree S. All qualified nodes are circled. (c)-(d) One step of Carr et al.: node is removed and the edge between node and node is added to C. (e) The final contour tree C. nodes depends on the choices of which qualified nodes to delete in previous steps. We therefore present a different characterization of contour tree edges, formulate the problem as a geometric problem, and solve it using O(SORT(N)) I/Os. As a convention, we say a node is both an ancestor and a descendant of itself. For any edge e = (u, v) J, let σ(e) be the highest node that is a descendant of v in J and at the same time an ancestor of u in S. LEA. For any edge e J, the node σ(e) is defined. PROOF. it can be checked that the descendants of a node w in J (resp. S ) are exactly those lying in the connected component of the closure of <f(w) (resp. >f(w) ) that contains w. Now consider any edge e = (u, v) J. During the construction of J, u is connected to v because there is a vertex w Lk (u) such that v is the highest node in the connected component of <f(u) that contains w. So w is an descendant of v in J. During the construction of S, when the edge (u, w) of is processed at w (recall that the vertices are swept in decreasing order of their heights), w is connected to the lowest node w in the connected component of >f(w) that contains u. Hence w, and thus w, is an ancestor of u in S. This shows that there exists a vertex w that is a descendant of u in J and an ancestor of u in S, thereby implying that σ(e) is defined. The following lemma characterizes the edges of C. LEA. For each edge e = (u, v) J, (u, σ(e)) is an edge of C. PROOF. Please refer to Figure (a). Let e = (u, v) be any edge of J. We know that σ(e) is the highest node that is both a descendant of v in J and an ancestor of u in S. Let P be the set of nodes on the path of J from u to σ(e), and P the set of nodes on the path of S from σ(e) to u. u and σ(e) are not included in P and P. Note that P and P are disjoint, otherwise we would have chosen a different σ(e). We follow the procedure in [], i.e., repeatedly delete qualified nodes one by one except u until σ(e) becomes qualified. This strategy always works since at any time, the remaining part of the contour tree yet to be constructed has at least two leaves. So there are at least two qualified nodes, and we can always find one other than u to remove. Since u is not deleted and u was originally a descendant of σ(e) in S, u remains a descendant of σ(e) when the latter becomes qualified. Since σ(e) is not a leaf in S when it becomes qualified, by definition, it must be a leaf in J and has one child in S. P w v e (a) u σ(e) w P Figure : (a) Solid lines represent the edges of J ; dashed lines represent the edges of S ; σ(e) is the highest node that is both a descendant of v in J and an ancestor of u in S ; (u, σ(e)) is a contour tree edge. (b) Part of the contour tree C. Next we prove that when σ(e) becomes qualified, P must be empty, i.e., u is the parent of σ(e) in J, and thus (u, σ(e)) is added to C. Suppose on the contrary P when σ(e) becomes qualified, and w P is the parent of σ(e) in J. Since σ(e) is qualified and is a leaf of J, by Lemma, (w, σ(e)) is a contour tree edge. Consider the construction of S. By Lemma, S is the same as the split tree constructed on C. During the construction of S using C, when the sweep plane reached f(σ(e)), we connect σ(e) to the root of the already constructed subtree of S that contains w, since w is in the upper link of σ(e) in C. So σ(e) must be an ancestor of w in S, while it is also an ancestor of u in S. Let w be the nearest common ancestor of u and w in S. Since w has at least two children in S when σ(e) is qualified, it cannot be a qualified node at that time, and thus w σ(e). So w is a some node in P. As the sweep plane reaches f(w ) while constructing S, w is the node that merges the components containing u and w together, so w must be on the path between u and w in C; see Figure (b). Finally consider the construction of J. When the sweep plane reaches f(u), u merges into the component containing w in J, which implies that the height of all the nodes on the path in C connecting u and w is less than f(u). However on the other hand, u and w are still not in the same component at this point, because otherwise w would be a descendant of u in J and we would have chosen w as σ(e). This means that the path in C connecting u and w must have some node whose height is greater than f(u), but we just concluded that the height of all the nodes on the path from u to w (b) u w

8 w is less than f(u), a contradiction. Hence P is empty, implying that (u, σ(e)) is an edge in J when σ(e) is qualified, which is added to C. Since J has N edges e = (u, v), and (u, σ(e)) are all different by definition. So if we can compute σ(e) for each edge of J, we have all the N edges of C. In the following, we show how this maps to a geometric problem, which can be solved with O(SORT(N)) I/Os. First we perform an Euler tour on J, and record for each node its first and second appearances in the tour. This gives us an interval [x (u), x (u)] for each node u, such that u is an ancestor of v if and only if x (u) < x (v) and x (u) > x (v). Note that these intervals are nested. We do the same with S, and compute another set of nested intervals [y (u), y (u)]. We map each node u to a vertical segment s(u) with x-coordinate x (u) and y-span [y (u), y (u)], and map each edge e = (u, v) J to a horizontal segment s(e) with x-span [x (v), x (v)] and y-coordinate y (u). It is easy to see that for any node w and any e = (u, v) J, w is a descendant of v in J and an ancestor of u in S if and only if s(w) intersects s(e). Thus the problem of finding σ(e) can be formulated as follows: for the horizontal segment s(e), find the shortest vertical segment that intersects s(e). s(e) L L L L Figure : Solving the segment-intersecting problem using distribution sweeping. The horizontal segment s(e) is broken into three pieces: the middle piece spans slabs L and L, the left leftover piece falls inside L, and the right leftover piece falls inside L. We solve this batched problem using the distribution sweeping technique. During the execution, with each horizontal segment we store its shortest intersecting vertical segment found so far. Initially we sort the endpoints of the all the segments by the y-coordinate. Then we divide the plane into Θ(/B) vertical slabs, each of which contains roughly equal number of vertical segments. Next we sweep a horizontal line top-down. During the sweep, for each slab L we maintain an external memory stack that stores all vertical segments that fall inside L. Whenever we reach the top endpoint of a vertical segment, we push it into the corresponding stack; whenever we reach the bottom endpoint of a vertical segment, we pop one from the corresponding stack. Since there are O(/B) stacks, we can allocate one memory block for each stack, so that all stack operations take linear I/Os. When we reach a horizontal segment, we look at all the slabs that are completely spanned by it, and check the top element of each corresponding stack as a candidate answer. During the sweep, we also form the instances of the subproblems for each slab. The vertical segments are simply distributed to their corresponding slabs. For each horizontal segment, we break it into at most three pieces: a middle piece that completely spans a number of slabs, a left leftover piece and a right leftover piece that fall inside a slab (see Figure ). We throw away its middle piece, and distribute the two leftover pieces into the corresponding slab. Note that at any level in the recursion, a horizontal segment has at most two pieces. Thus employing standard distribution sweeping analysis yields an O(SORT(N))-I/O bound for the algorithm. Putting all the pieces together, we obtain the following. THEORE. The contour tree of a triangular mesh with N vertices in R can be computed using O(SORT(N)) I/Os. REARK. Our algorithm extends to higher dimensions, with N replaced by the number of simplices in the mesh [], except that in d dimensions the union graph when building the join and split trees may not be planar, and the O(SORT(N) log log B)-I/O ST algorithm or the randomized ST algorithm with expected O(SORT(N)) I/Os needs to be used.. EXPERIENTS We have implemented our practical batched union-find algorithm (Section ) using the TPIE framework []. In this section, we report some preliminary experimental results of our algorithm when applied to some of its applications. We compared with the internal memory algorithm that uses link-by-rank and path compression []. We could have implemented the linear-time off-line union-find algorithm [0], but it is quite complicated and is unlikely to outperform the simple O(Nα(N)) on-line algorithm in practice. In the experiments we limited the physical memory of the test machine to (unless otherwise specified) in order to obtain a large data size to memory size ratio. In the experiments with the internal memory algorithm, we relied on virtual memory to handle I/Os (swapping). We performed three sets of experiments. We first used minimum spanning tree as a test case for the batched union-find algorithm. Next, we applied our union-find algorithm to computing the topological persistence of a terrain. Finally, we used the persistence values to the so-called flooding problem on terrains. Computing minimum spanning trees. We investigated the performance of our union-find algorithm when applied to Kruskal s algorithm for computing STs. We generated random graphs with N vertices and N edges. For each edge, we randomly picked its two endpoints, and also randomly picked its weight uniformly from (0, ). This random graph model is similar to the earlier experimental studies of ST algorithms [, ]. We tested the internal memory algorithm (I) and the external memory algorithm (E) on eight graphs, with N between million to million. The results are shown in Figure (a), where the running time reported is just the time spent by the union-find algorithm. It does not include the initial sorting, which is performed by both algorithms and is only a small fraction of the total running time. The main memory can accommodate the data structure for a graph with roughly up to million vertices. As we can see, when the memory limit is not reached, the two algorithms perform roughly the same. In fact, in this case E is the same as I except for an additional step to check whether the input can be handled in memory. That is why E spends a little more time for the first two data sets. For larger graphs, the running time of I immediately deteriorates due to the random access pattern in its data structure. On the other hand, the running time of E scales roughly linearly in the size of the input graph. On the graph with million vertices, I spends roughly hours, that is, 00 times more than that of E. We cannot run I on the other data sets in a reasonable amount of time. We should point out that there exist other practical external memory ST algorithms [] that do not use the union-find algorithm.

9 running time (seconds) running time (seconds) to (, 0) memory limit reached I E 0 # vertices (million) (a) memory limit reached I E 0 0 # vertices (million) (b) Figure : Running time of the internal memory (I) and the external memory (E) union-find algorithms for (a) computing minimum spanning trees, and (b) computing topological persistence (running time is shown in log scale). However, our focus is on investigating the performance of unionfind algorithms. A comprehensive experimental evaluation of ST algorithms is beyond the scope of this paper. Computing topological persistence on triangulated terrains. We implemented the topological persistence algorithm using both the internal memory (I) and our (E) union-find algorithm. We tested the performance on the elevation data of the Neuse River Basin of North Carolina, obtained from []. The data set, consisting of over points, is too large for the I algorithm, so we created eight smaller data sets by sampling million to million points. We converted this data into a triangular mesh using an I/O-efficient Delaunay triangulation algorithm []. The experimental results of these eight data sets are shown in Figure (b) (note the log scale on the running time). Again the running time shown is only that of the union-find algorithms. In this set of experiments we see similar trends of both I and E as the ST case, except that I deteriorates slower in this case. This is because the union and find operations generated by the topological persistence algorithm have better locality than in the ST case due to the underlying geometry. Still, the difference in running time between I and E gets more significant as the data set gets larger. On the last data set of million points, the difference is more than two orders of magnitude. Finally, we set the physical memory to its full size ( GB) and tried the entire Neuse River Basin data set (with 0. billion points). E finishes in about. hours, while I crashed after running for about hours, because the -bit address space is not enough to accommodate such a large data set. Automated terrain flooding. Flow analysis of a terrain is a central problem, which models how water flows and how river networks form when water is uniformly poured on a terrain. Almost all existing flow models proposed in the literature [, ] assume that once water flows into a minima, it never flows out. This is, of course, true only if the minima corresponds to the ocean or some big lake. However, on a mesh constructed from the elevation data of a terrain, there are numerous minima (often called sinks), due to either measurement noises or real pits on the terrain. As a result, various sink-removal techniques have been proposed to remove all the spurious sinks before computing the actual flows. Currently, the most popular sink-removal method is flooding [], which has been used in many widely used GIS softwares such as ArcInfo [] and GRASS [, ]. The idea of flooding is to simulate uniformly pouring water onto the terrain until all sinks are filled and a steady-state is reached. See Figure 9(a) and (b). (a) (b) (c) Figure 9: (a) The terrain. (b) Flood the terrain until a steady-state is reached. (c) Partially flood the terrain. One serious problem with this flooding method is that it floods all sinks, regardless of its importance. As such, many important geographical features, such as big lakes or river valleys that do not end up in the ocean will vanish after this flooding procedure. Instead of filling all the sinks, we can use persistence to measure the importance of sinks on a terrain. For a user-specified threshold τ, we declare all sinks with persistence greater than τ to be the real sinks, while the rest should be removed. This gives us an automated way to flood the terrain while preserving important geographical features. Refer to Figure 9(c). We computed the distribution of the persistence values of all sinks for the Neuse River Basin. The distribution is highly skewed: A few sinks have large persistence (possible real sinks), while the majority have very small persistence (spurious sinks). Based on this information, we can choose an appropriate persistence threshold to remove all the spurious sinks while preserving the main features. We applied the flooding algorithm to the terrain with a persistence threshold of τ = 0. We also ran the algorithm with τ =, i.e., flooding all sinks, which gives the same results as the previous flooding algorithm. A portion of the original terrain, and the flooded terrains are shown in Figure 0. With a threshold of τ = 0, around 99.% of the sinks have been removed, while the major features have been preserved. On the other hand, the previous flooding procedure has undesirably eradicated some major features.

10 (a) (b) (c) Figure 0: (a) Original terrain. (b) Terrain flooded with persistence threshold τ = 0. (c) Terrain flooded with τ =.. REFERENCES [] P. K. Agarwal, L. Arge, and K. Yi. I/O-efficient construction of constrained Delaunay triangulations. In Proc. rd European Sympos. Algorithms, pages, 00. [] P. K. Agarwal, H. Edelsbrunner, J. Harer, and Y. Wang. Extreme elevation on a -manifold. In Proc. 0th Annu. Sympos. Comput. Geom., pages, 00. [] A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Commun. AC, (9):, 9. [] L. Arge. External memory data structures. In J. Abello, P.. Pardalos, and. G. C. Resende, editors, Handbook of assive Data Sets, pages. Kluwer Academic Publishers, 00. [] L. Arge, G. S. Brodal, and L. Toma. On external memory ST, SSSP and multi-way planar graph separation. J. Algorithms, (): 0, 00. [] L. Arge, J. Chase, P. Halpin, L. Toma, D. Urban, J. S. Vitter, and R. Wickremesinghe. Flow computation on massive grid terrains. GeoInformatica, ():, 00. [] L. Arge, O. Procopiuc, and J. S. Vitter. Implementing I/O-efficient data structures using TPIE. In Proc. 0th European Sympos. Algorithms, pages 00, 00. [] L. Arge, D. E. Vengroff, and J. S. Vitter. External-memory algorithms for processing line segments in geographic information systems. Proc. rd European Sympos. Algorithms, pages 9 0, 99. [9] B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion B-tree. VLDB Journal, ():, 99. [0] P. T. Bremer, V. Pascucci, H. Edelsbrunner, and B. Hamann. A topological hierarchy for functions on triangulated surfaces. IEEE Trans. Vis. Comput. Graphics, 0: 9, 00. [] H. Carr, J. Snoeyink, and U. Axen. Computing contour trees in all dimensions. Comput. Geom. Theory Appl., : 9, 00. [] Y.-J. Chiang,. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and J. S. Vitter. External-memory graph algorithms. In Proc. th AC-SIA Sympos. Discrete Algorithms, pages 9 9, 99. [] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, nd Edition. The IT Press, Cambridge, ass., 00. [] R. Dementiev, P. Sanders, D. Schultes, and J. Sibeyn. Engineering an external memory minimum spanning tree algorithm. In Proc. rd IFIP Intl. Conf. on Theoretical Computer Science, pages 9 0, 00. [] H. Edelsbrunner, J. Harer, and A. Zomorodian. Hierarchical orse complexes for piecewise linear -manifolds. In Proc. th Annu. Sympos. Comput. Geom., pages 0 9, 00. [] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification. In Proc. st IEEE Sympos. Found. Comput. Sci., pages, 000. [] Environmental Systems Research Inc. ARC/INFO Professional GIS, 99. Version... [] F. W. Fredman and D. E. Willard. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. In Proc. st IEEE Sympos. Found. Comput. Sci. pages, 990. [9]. L. Fredman and. E. Saks. The cell probe complexity of dynamic data structures. In Proc. st AC Sympos. Theory Comput., pages, 99. [0] H. Gabow and R. Tarjan. A linear time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci., 0:09, 9. [] B. A. Galler and. J. Fisher. An improved equivalence algorithm. Commun. AC, ():0 0, 9. []. T. Goodrich, J.-J. Tsay, D. E. Vengroff, and J. S. Vitter. External-memory computational geometry. In Proc. th IEEE Sympos. Found. Comput. Sci., pages, 99. [] GRASS Development Team. GRASS GIS homepage. [] S. Jenson and J. Domingue. Extracting topographic structure from digital elevation data for geographic information system analysis. Photogrammetric Engineering and Remote Sensing, ():9 00, 9. [] B.. E. oret and H. D. Shapiro. An empirical analysis of algorithms for constructing a minimum spanning tree. In Proc. Workshop Algorithms Data Struct., pages 00, 99. [] North Carolina Flood apping Program. [] J. F. O Callaghan and D.. ark. The extraction of drainage networks from digital elevation data. Computer Vision, Graphics and Image Processing,, 9. [] R. Seidel and. Sharir. Top-down analysis of path compression. SIA J. Comput., ():, 00. [9] S. P. Tarasov and. N. Vyalyi. Construction of contour trees in D in O(n log n) steps. In Proc. th Annu. Sympos. Comput. Geom., pages, 99. [0] R. E. Tarjan. Efficiency of a good but not linear set union algorithm. J. AC, :, 9. [] R. E. Tarjan. A class of algorithms that require nonlinear time to maintain disjoint sets. J. Comput. and Sys. Sci., :0, 99. [] R. E. Tarjan and J. V. Leeuwen. Worst-case analysis of set union algorithms. J. AC, :, 9. []. van den Bercken, B. Seeger, and P. Widmayer. A generic approach to bulk loading multidimensional index structures. In Proc. Intl. Conf. Very Large Databases, pages 0, 99. []. van Kreveld, R. van Oostrum, C. Bajaj, V. Pascucci, and D. Schikore. Contour trees and small seed sets for isosurface traversal. In Proc. th Annu. Sympos. Comput. Geom., pages 9, 99. [] J. S. Vitter. External memory algorithms and data structures: Dealing with ASSIVE data. AC Comput. Surveys, ():09, 00. [] A. Zomorodian. Topology for Computing. Cambridge University Press, 00.

### I/O-Efficient Spatial Data Structures for Range Queries

I/O-Efficient Spatial Data Structures for Range Queries Lars Arge Kasper Green Larsen MADALGO, Department of Computer Science, Aarhus University, Denmark E-mail: large@madalgo.au.dk,larsen@madalgo.au.dk

### The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints

### Cache-Oblivious Red-Blue Line Segment Intersection

Cache-Oblivious Red-Blue Line Segment Intersection Lars Arge 1,, Thomas Mølhave 1,, and Norbert Zeh 2, 1 MADALGO, Department of Computer Science, University of Aarhus, Denmark. E-mail: {large,thomasm}@madalgo.au.dk

### A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,

### Homework Exam 1, Geometric Algorithms, 2016

Homework Exam 1, Geometric Algorithms, 2016 1. (3 points) Let P be a convex polyhedron in 3-dimensional space. The boundary of P is represented as a DCEL, storing the incidence relationships between the

### CS268: Geometric Algorithms Handout #5 Design and Analysis Original Handout #15 Stanford University Tuesday, 25 February 1992

CS268: Geometric Algorithms Handout #5 Design and Analysis Original Handout #15 Stanford University Tuesday, 25 February 1992 Original Lecture #6: 28 January 1991 Topics: Triangulating Simple Polygons

### External Memory Geometric Data Structures

External Memory Geometric Data Structures Lars Arge Department of Computer Science University of Aarhus and Duke University Augues 24, 2005 1 Introduction Many modern applications store and process datasets

### Lecture Notes on Spanning Trees

Lecture Notes on Spanning Trees 15-122: Principles of Imperative Computation Frank Pfenning Lecture 26 April 26, 2011 1 Introduction In this lecture we introduce graphs. Graphs provide a uniform model

### Two General Methods to Reduce Delay and Change of Enumeration Algorithms

ISSN 1346-5597 NII Technical Report Two General Methods to Reduce Delay and Change of Enumeration Algorithms Takeaki Uno NII-2003-004E Apr.2003 Two General Methods to Reduce Delay and Change of Enumeration

### Monotone Partitioning. Polygon Partitioning. Monotone polygons. Monotone polygons. Monotone Partitioning. ! Define monotonicity

Monotone Partitioning! Define monotonicity Polygon Partitioning Monotone Partitioning! Triangulate monotone polygons in linear time! Partition a polygon into monotone pieces Monotone polygons! Definition

### From Point Cloud to Grid DEM: A Scalable Approach

From Point Cloud to Grid DEM: A Scalable Approach Pankaj K. Agarwal 1, Lars Arge 12, and Andrew Danner 1 1 Department of Computer Science, Duke University, Durham, NC 27708, USA {pankaj, large, adanner}@cs.duke.edu

### Dynamic 3-sided Planar Range Queries with Expected Doubly Logarithmic Time

Dynamic 3-sided Planar Range Queries with Expected Doubly Logarithmic Time Gerth Stølting Brodal 1, Alexis C. Kaporis 2, Spyros Sioutas 3, Konstantinos Tsakalidis 1, Kostas Tsichlas 4 1 MADALGO, Department

### COT5405 Analysis of Algorithms Homework 3 Solutions

COT0 Analysis of Algorithms Homework 3 Solutions. Prove or give a counter example: (a) In the textbook, we have two routines for graph traversal - DFS(G) and BFS(G,s) - where G is a graph and s is any

### The LCA Problem Revisited

The LA Problem Revisited Michael A. Bender Martín Farach-olton SUNY Stony Brook Rutgers University May 16, 2000 Abstract We present a very simple algorithm for the Least ommon Ancestor problem. We thus

### Planar Tree Transformation: Results and Counterexample

Planar Tree Transformation: Results and Counterexample Selim G Akl, Kamrul Islam, and Henk Meijer School of Computing, Queen s University Kingston, Ontario, Canada K7L 3N6 Abstract We consider the problem

### 2.3 Scheduling jobs on identical parallel machines

2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

### 1 Point Inclusion in a Polygon

Comp 163: Computational Geometry Tufts University, Spring 2005 Professor Diane Souvaine Scribe: Ryan Coleman Point Inclusion and Processing Simple Polygons into Monotone Regions 1 Point Inclusion in a

### Complexity of Union-Split-Find Problems. Katherine Jane Lai

Complexity of Union-Split-Find Problems by Katherine Jane Lai S.B., Electrical Engineering and Computer Science, MIT, 2007 S.B., Mathematics, MIT, 2007 Submitted to the Department of Electrical Engineering

### Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

Lecture 10 Union-Find The union-nd data structure is motivated by Kruskal's minimum spanning tree algorithm (Algorithm 2.6), in which we needed two operations on disjoint sets of vertices: determine whether

### Offline sorting buffers on Line

Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

### On the k-path cover problem for cacti

On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

### CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

### Computational Geometry. Lecture 1: Introduction and Convex Hulls

Lecture 1: Introduction and convex hulls 1 Geometry: points, lines,... Plane (two-dimensional), R 2 Space (three-dimensional), R 3 Space (higher-dimensional), R d A point in the plane, 3-dimensional space,

### Tufts University, Spring 2005 Michael Horn, Julie Weber (2004) Voronoi Diagrams

Comp 163: Computational Geometry Professor Diane Souvaine Tufts University, Spring 2005 Michael Horn, Julie Weber (2004) Voronoi Diagrams 1 Voronoi Diagrams 1 Consider the following problem. To determine

### Divide-and-Conquer. Three main steps : 1. divide; 2. conquer; 3. merge.

Divide-and-Conquer Three main steps : 1. divide; 2. conquer; 3. merge. 1 Let I denote the (sub)problem instance and S be its solution. The divide-and-conquer strategy can be described as follows. Procedure

### Fast Sequential Summation Algorithms Using Augmented Data Structures

Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik vadim.stadnik@gmail.com Abstract This paper provides an introduction to the design of augmented data structures that offer

### Shortest Inspection-Path. Queries in Simple Polygons

Shortest Inspection-Path Queries in Simple Polygons Christian Knauer, Günter Rote B 05-05 April 2005 Shortest Inspection-Path Queries in Simple Polygons Christian Knauer, Günter Rote Institut für Informatik,

### Topological Data Analysis Applications to Computer Vision

Topological Data Analysis Applications to Computer Vision Vitaliy Kurlin, http://kurlin.org Microsoft Research Cambridge and Durham University, UK Topological Data Analysis quantifies topological structures

### Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

### A fast and robust algorithm to count topologically persistent holes in noisy clouds

A fast and robust algorithm to count topologically persistent holes in noisy clouds Vitaliy Kurlin Durham University Department of Mathematical Sciences, Durham, DH1 3LE, United Kingdom vitaliy.kurlin@gmail.com,

### MATHEMATICAL ENGINEERING TECHNICAL REPORTS. The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation

MATHEMATICAL ENGINEERING TECHNICAL REPORTS The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation Shinji IMAHORI, Mutsunori YAGIURA METR 2007 53 September 2007 DEPARTMENT

### Graphs without proper subgraphs of minimum degree 3 and short cycles

Graphs without proper subgraphs of minimum degree 3 and short cycles Lothar Narins, Alexey Pokrovskiy, Tibor Szabó Department of Mathematics, Freie Universität, Berlin, Germany. August 22, 2014 Abstract

### Balanced Trees Part One

Balanced Trees Part One Balanced Trees Balanced trees are surprisingly versatile data structures. Many programming languages ship with a balanced tree library. C++: std::map / std::set Java: TreeMap /

### A. V. Gerbessiotis CS Spring 2014 PS 3 Mar 24, 2014 No points

A. V. Gerbessiotis CS 610-102 Spring 2014 PS 3 Mar 24, 2014 No points Problem 1. Suppose that we insert n keys into a hash table of size m using open addressing and uniform hashing. Let p(n, m) be the

### Persistent Data Structures and Planar Point Location

Persistent Data Structures and Planar Point Location Inge Li Gørtz Persistent Data Structures Ephemeral Partial persistence Full persistence Confluent persistence V1 V1 V1 V1 V2 q ue V2 V2 V5 V2 V4 V4

### Analysis of Algorithms, I

Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications

### 4.4 The recursion-tree method

4.4 The recursion-tree method Let us see how a recursion tree would provide a good guess for the recurrence = 3 4 ) Start by nding an upper bound Floors and ceilings usually do not matter when solving

### Arrangements And Duality

Arrangements And Duality 3.1 Introduction 3 Point configurations are tbe most basic structure we study in computational geometry. But what about configurations of more complicated shapes? For example,

### Triangulation by Ear Clipping

Triangulation by Ear Clipping David Eberly Geometric Tools, LLC http://www.geometrictools.com/ Copyright c 1998-2016. All Rights Reserved. Created: November 18, 2002 Last Modified: August 16, 2015 Contents

### MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

### Internet Packet Filter Management and Rectangle Geometry

Internet Packet Filter Management and Rectangle Geometry David Eppstein S. Muthukrishnan Abstract We consider rule sets for internet packet routing and filtering, where each rule consists of a range of

### Masters Thesis Computing the Visibility Graph of Points Within a Polygon. Jesper Buch Hansen

Masters Thesis Computing the Visibility Graph of Points Within a Polygon Jesper Buch Hansen December 22, 2005 Department of Computer Science, University of Aarhus Student Jesper Buch Hansen Supervisor

### Simplified External memory Algorithms for Planar DAGs. July 2004

Simplified External Memory Algorithms for Planar DAGs Lars Arge Duke University Laura Toma Bowdoin College July 2004 Graph Problems Graph G = (V, E) with V vertices and E edges DAG: directed acyclic graph

### Existence of Simple Tours of Imprecise Points

Existence of Simple Tours of Imprecise Points Maarten Löffler Department of Information and Computing Sciences, Utrecht University Technical Report UU-CS-007-00 www.cs.uu.nl ISSN: 09-7 Existence of Simple

### Load Balancing. Load Balancing 1 / 24

Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

### Clustering & Visualization

Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

### root node level: internal node edge leaf node CS@VT Data Structures & Algorithms 2000-2009 McQuain

inary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each

### The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

### Labeling outerplanar graphs with maximum degree three

Labeling outerplanar graphs with maximum degree three Xiangwen Li 1 and Sanming Zhou 2 1 Department of Mathematics Huazhong Normal University, Wuhan 430079, China 2 Department of Mathematics and Statistics

### Closest Pair Problem

Closest Pair Problem Given n points in d-dimensions, find two whose mutual distance is smallest. Fundamental problem in many applications as well as a key step in many algorithms. p q A naive algorithm

### Persistent Binary Search Trees

Persistent Binary Search Trees Datastructures, UvA. May 30, 2008 0440949, Andreas van Cranenburgh Abstract A persistent binary tree allows access to all previous versions of the tree. This paper presents

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

### R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions

### SOLVING DATABASE QUERIES USING ORTHOGONAL RANGE SEARCHING

SOLVING DATABASE QUERIES USING ORTHOGONAL RANGE SEARCHING CPS234: Computational Geometry Prof. Pankaj Agarwal Prepared by: Khalid Al Issa (khalid@cs.duke.edu) Fall 2005 BACKGROUND : RANGE SEARCHING, a

### Partitioning edge-coloured complete graphs into monochromatic cycles and paths

arxiv:1205.5492v1 [math.co] 24 May 2012 Partitioning edge-coloured complete graphs into monochromatic cycles and paths Alexey Pokrovskiy Departement of Mathematics, London School of Economics and Political

### Chapter 6 Planarity. Section 6.1 Euler s Formula

Chapter 6 Planarity Section 6.1 Euler s Formula In Chapter 1 we introduced the puzzle of the three houses and the three utilities. The problem was to determine if we could connect each of the three utilities

### Optimal Binary Space Partitions in the Plane

Optimal Binary Space Partitions in the Plane Mark de Berg and Amirali Khosravi TU Eindhoven, P.O. Box 513, 5600 MB Eindhoven, the Netherlands Abstract. An optimal bsp for a set S of disjoint line segments

### B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

### Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

### Connected Identifying Codes for Sensor Network Monitoring

Connected Identifying Codes for Sensor Network Monitoring Niloofar Fazlollahi, David Starobinski and Ari Trachtenberg Dept. of Electrical and Computer Engineering Boston University, Boston, MA 02215 Email:

### GRAPH THEORY LECTURE 4: TREES

GRAPH THEORY LECTURE 4: TREES Abstract. 3.1 presents some standard characterizations and properties of trees. 3.2 presents several different types of trees. 3.7 develops a counting method based on a bijection

### Persistent Data Structures and Planar Point Location. Inge Li Gørtz

Persistent Data Structures and Planar Point Location Inge Li Gørtz Persistent Data Structures Ephemeral Partial persistence Full persistence Confluent persistence V1 V1 V1 V1 V2 q ue V2 V2 V5 V2 V4 V4

### External-Memory Algorithms for Processing Line Segments in Geographic Information Systems

BRICS RS-96-12 Arge et al.: External-Memory Algorithms for Processing Line Segments in GIS BRICS Basic Research in Computer Science External-Memory Algorithms for Processing Line Segments in Geographic

### An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

### INTERSECTION OF LINE-SEGMENTS

INTERSECTION OF LINE-SEGMENTS Vera Sacristán Discrete and Algorithmic Geometry Facultat de Matemàtiques i Estadística Universitat Politècnica de Catalunya Problem Input: n line-segments in the plane,

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths

Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths Satyajeet S. Ahuja, Srinivasan Ramasubramanian, and Marwan Krunz Department of ECE, University of Arizona, Tucson,

### Improved Billboard Clouds for Extreme Model Simplification

Improved Billboard Clouds for Extreme Model Simplification I.-T. Huang, K. L. Novins and B. C. Wünsche Graphics Group, Department of Computer Science, University of Auckland, Private Bag 92019, Auckland,

### Data Warehousing und Data Mining

Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

### Tree-representation of set families and applications to combinatorial decompositions

Tree-representation of set families and applications to combinatorial decompositions Binh-Minh Bui-Xuan a, Michel Habib b Michaël Rao c a Department of Informatics, University of Bergen, Norway. buixuan@ii.uib.no

### CMSC 451: Graph Properties, DFS, BFS, etc.

CMSC 451: Graph Properties, DFS, BFS, etc. Slides By: Carl Kingsford Department of Computer Science University of Maryland, College Park Based on Chapter 3 of Algorithm Design by Kleinberg & Tardos. Graphs

### Persistent Data Structures

6.854 Advanced Algorithms Lecture 2: September 9, 2005 Scribes: Sommer Gentry, Eddie Kohler Lecturer: David Karger Persistent Data Structures 2.1 Introduction and motivation So far, we ve seen only ephemeral

### Fairness in Routing and Load Balancing

Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria

### Segmentation of building models from dense 3D point-clouds

Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute

### TOWARDS AN AUTOMATED HEALING OF 3D URBAN MODELS

TOWARDS AN AUTOMATED HEALING OF 3D URBAN MODELS J. Bogdahn a, V. Coors b a University of Strathclyde, Dept. of Electronic and Electrical Engineering, 16 Richmond Street, Glasgow G1 1XQ UK - jurgen.bogdahn@strath.ac.uk

### Solutions to Exercises 8

Discrete Mathematics Lent 2009 MA210 Solutions to Exercises 8 (1) Suppose that G is a graph in which every vertex has degree at least k, where k 1, and in which every cycle contains at least 4 vertices.

### INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

### High degree graphs contain large-star factors

High degree graphs contain large-star factors Dedicated to László Lovász, for his 60th birthday Noga Alon Nicholas Wormald Abstract We show that any finite simple graph with minimum degree d contains a

### Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

### Author(s)Kiyomi, Masashi; Saitoh, Toshiki; Ue.

JAIST Reposi https://dspace.j Title Voronoi Game on a Path Author(s)Kiyomi, Masashi; Saitoh, Toshiki; Ue Citation IEICE TRANSACTIONS on Information an E94-D(6): 1185-1189 Issue Date 2011-06-01 Type Journal

### Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse

### IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

### EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27 The Problem Given a set of N objects, do any two intersect? Objects could be lines, rectangles, circles, polygons, or other geometric objects Simple to

### Competitive Analysis of On line Randomized Call Control in Cellular Networks

Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising

### COLORED GRAPHS AND THEIR PROPERTIES

COLORED GRAPHS AND THEIR PROPERTIES BEN STEVENS 1. Introduction This paper is concerned with the upper bound on the chromatic number for graphs of maximum vertex degree under three different sets of coloring

### Using Data-Oblivious Algorithms for Private Cloud Storage Access. Michael T. Goodrich Dept. of Computer Science

Using Data-Oblivious Algorithms for Private Cloud Storage Access Michael T. Goodrich Dept. of Computer Science Privacy in the Cloud Alice owns a large data set, which she outsources to an honest-but-curious

### 4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in

### Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Binary Search Trees Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 27 1 Properties of Binary Search Trees 2 Basic BST operations The worst-case time complexity of BST operations

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Data Structure with C

Subject: Data Structure with C Topic : Tree Tree A tree is a set of nodes that either:is empty or has a designated node, called the root, from which hierarchically descend zero or more subtrees, which

### Binary Trees and Huffman Encoding Binary Search Trees

Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary

### Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

### INTERSECTION OF LINE-SEGMENTS

INTERSECTION OF LINE-SEGMENTS Vera Sacristán Computational Geometry Facultat d Informàtica de Barcelona Universitat Politècnica de Catalunya Problem Input: n line-segments in the plane, s i = (p i, q

### Data Structures for Moving Objects

Data Structures for Moving Objects Pankaj K. Agarwal Department of Computer Science Duke University Geometric Data Structures S: Set of geometric objects Points, segments, polygons Ask several queries

### Graph Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 3. Topic Overview Definitions and Representation Minimum

### ABSTRACT. For example, circle orders are the containment orders of circles (actually disks) in the plane (see [8,9]).

Degrees of Freedom Versus Dimension for Containment Orders Noga Alon 1 Department of Mathematics Tel Aviv University Ramat Aviv 69978, Israel Edward R. Scheinerman 2 Department of Mathematical Sciences

### Graph. Consider a graph, G in Fig Then the vertex V and edge E can be represented as:

Graph A graph G consist of 1. Set of vertices V (called nodes), (V = {v1, v2, v3, v4...}) and 2. Set of edges E (i.e., E {e1, e2, e3...cm} A graph can be represents as G = (V, E), where V is a finite and