Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

Similar documents
The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Full and Complete Binary Trees

A Non-Linear Schema Theorem for Genetic Algorithms

Solutions to Homework 6

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Lecture 10: Regression Trees

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

α = u v. In other words, Orthogonal Projection

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Applied Algorithm Design Lecture 5

The Goldberg Rao Algorithm for the Maximum Flow Problem

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Year 9 set 1 Mathematics notes, to accompany the 9H book.

Computational Geometry. Lecture 1: Introduction and Convex Hulls

6.3 Conditional Probability and Independence

Social Media Mining. Graph Essentials

Exam study sheet for CS2711. List of topics

Sample Questions Csci 1112 A. Bellaachia

International Journal of Advanced Research in Computer Science and Software Engineering

Section 1.1. Introduction to R n

Inner Product Spaces and Orthogonality

Linearly Independent Sets and Linearly Dependent Sets

Approximation Algorithms

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

Triangulation by Ear Clipping

5.1 Bipartite Matching

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

GRAPH THEORY LECTURE 4: TREES

Heaps & Priority Queues in the C++ STL 2-3 Trees

Analysis of Algorithms I: Optimal Binary Search Trees

Mathematical Induction. Mary Barnes Sue Gordon


Lecture 8 : Coordinate Geometry. The coordinate plane The points on a line can be referenced if we choose an origin and a unit of 20

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold

Data Structures. Chapter 8

Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Operation Count; Numerical Linear Algebra

Scheduling Shop Scheduling. Tim Nieberg

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Data Structures and Algorithms Written Examination

Printing Letters Correctly

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Converting a Number from Decimal to Binary

Vector storage and access; algorithms in GIS. This is lecture 6

3. INNER PRODUCT SPACES

Reducing Clocks in Timed Automata while Preserving Bisimulation

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Network (Tree) Topology Inference Based on Prüfer Sequence

Connectivity and cuts

Binary Search Trees CMPSC 122

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Near Optimal Solutions

Exponential time algorithms for graph coloring

Scan-Line Fill. Scan-Line Algorithm. Sort by scan line Fill each span vertex order generated by vertex list

Catalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity.

On the k-path cover problem for cacti

Algorithm Design and Analysis

Shortest Inspection-Path. Queries in Simple Polygons

Binary Heap Algorithms

Shortcut sets for plane Euclidean networks (Extended abstract) 1

COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS

2) Write in detail the issues in the design of code generator.

Closest Pair Problem

CS 103X: Discrete Structures Homework Assignment 3 Solutions

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Lecture 1: Course overview, circuits, and formulas

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Linear Programming. March 14, 2014

Transportation Polytopes: a Twenty year Update

Continued Fractions and the Euclidean Algorithm

Arrangements And Duality

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc.,

Section IV.1: Recursive Algorithms and Recursion Trees

Dynamic Programming. Lecture Overview Introduction

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

17. Inner product spaces Definition Let V be a real vector space. An inner product on V is a function

Mathematical Induction. Lecture 10-11

Class One: Degree Sequences

Mathematics for Algorithm and System Analysis

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

BALTIC OLYMPIAD IN INFORMATICS Stockholm, April 18-22, 2009 Page 1 of?? ENG rectangle. Rectangle

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Graph theoretic techniques in the analysis of uniquely localizable sensor networks

Analysis of Algorithms, I

Lesson 1: Positive and Negative Numbers on the Number Line Opposite Direction and Value

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010


CMPSCI611: Approximating MAX-CUT Lecture 20

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Lecture 17 : Equivalence and Order Relations DRAFT

Transcription:

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li Advised by: Dave Mount May 22, 2014 1 INTRODUCTION In this report we consider the implementation of an efficient algorithm for computing Euclidean minimum spanning trees, which will be based in part on the well-separated pair decomposition, introduced by Callahan and Kosaraju. In order to compute the Euclidean Minimum Spanning Tree of n points in the space, one can naively link each pair of edges to build the complete Euclidean graph G=(P,E), where P stands for the vertex set, and E stands for set of edges, which consist of all pairs (p, q) p, q ϵ P. For such a graph we have E = nn 2, which is Ѳ(nn2 ), and combining this with the Ο(ElogV)=Ο(nn 2 logn) running time for the execution of Kruskal s Algorithm, would result in an overall running time of Ο(nn 2 logn). In this paper we present a more efficient solution. Our algorithm for computing the Euclidean minimum spanning tree will be based on a number of components. More specifically, the program provides the construction of a point region quadtree, which used to store input set of data points. We then implemented the well-separated decomposition upon the constructed point region quadtree to store each well-separated pair of cells as cross links in the tree. The next procedure is to implement projections for each pair of well-separated cells in order to find (approximately) the closest pair of points and build the graph G. Finally, we implement Kruskal s Algorithm to obtain the Minimum Spanning Tree from G. This paper is organized as follows. Section 2 defines the basic concepts of the well-separated decomposition, as well as how it is applied in this program.

Section 3 presents a projection method for approximating the closest pair of points between each well-separated pair and how this is used in Kruskal s algorithm. Section 4 provides a pseudo-code description of the complete method an evaluation of merits of this application of well-separated decomposition theorem, along with the graph showing the ultimate Euclidean minimum spanning tree built from several combined procedures. Section 5 make a conclusion about this newly-tested approximation algorithm. 2 CONCEPTS OF WELL-SEPARATED DECOMPOSITION From [1, p. 88], let P be a given set of n points in the Euclidean space, for any factor s 1, two non-overlapping subsets A and B of P are said to be wellseparated if both A and B can be enclosed within two Euclidean balls of radius r such that the minimum distance between SS aa and SS bb is at least sr. Figure 1: cell A and B are well-separated by the factor s From [1, p.90], for a given point set P and a separation factor s, the wellseparated decomposition is defined to be a collection of pairs of subsets of P, denoted {{AA 1, BB 1 }, {AA 2, BB 2 },, {AA mm, BB mm }}, such that (1) AA ii, BB ii P, for 1 i m (2) AA ii BB ii =, for 1 i m m (3) i=1 (AA ii BB ii ) = P P

(4) AA ii aaaaaa BB ii are s-well-separated, for 1 i m In order to construct the well-separated pair decomposition, we build a point quadtree in which to store input set data points. The quadtree is based on a recursive subdivision of two dimensional space which is based on repeatedly splitting a square into four quadrants. The result is a decomposition of space into squares, called cells, each of which contains a set of points. For the construction of WSPD, we make some modifications to the existing quadtree structure. In the general version of quadtree structure, each internal node serves as a placeholder, but does not store any data points, and each leaf node stores either zero or one data points in its associated cell. Figure 2: Point region quadtree partitions data points into cells. In our augmented version of quadtree structure, we add an associated representative for each node u, which we denote by rep(u). As a leaf node, if it contains a point p, then rep(u)={uu}; otherwise it contains no point, then rep(u)=. For the internal node, the case is different from that of the traditional version. Since each internal node contains more than one data point within its associated

cell, it must have two or more nonempty leaf nodes descended from it. In this case, we may randomly select one of such leaf children node v, and set rep(u)=rep(v) (see [1] for more detail)). Also, we associate each node with its depth in the tree, which we call its level. In my program, in order to compare levels among different nodes, we can first calculate the side-length of cell associated with each node, then determine the result by the key feature that the side length of a cell associated with u is no smaller than that of a cell associated with v if and only if level(u) level(v). The side length of a cell of is generated by the formula below. Let x be the side length of the cell associated with root node, each time we partition the cell, we divide the side length of the cell by 2. Accumulatively, the cell that is partitioned i times from the original cell, is smaller by the factor 1/2 ii. Thus, letting x denote the side length of the root cell, the side length of the associated cell of a node at level i is x/2 ii. Let us next consider the implementation of well-separated pair decomposition. The general idea of this algorithm is that each execution we check the pair of nonempty nodes if they can be decomposed into well-separated pairs, then that pair will be stored, otherwise we compare the level of this pair of nonempty nodes and decide which has larger cell size. We subdivide the node containing cell with larger size into its children nodes and recursively call that function for each child with the other node. (See [1] for more detail).

Figure 3: Some of the s-separated cells generated from the WSPD. 3 MINIMUM PROJECTION OF EACH WELL-SEPARATED PAIR AND KRUSKAL S ALGORITHM After we constructed each pair of well- separated pairs, the next step is to construct for each well separated pair PP 1 and PP 2 the closest pair of points in these two sets. We do this through an approximation. Consider the line segment joining the centers of the Euclidean balls containing PP 1 and PP 2, and let pp 1 PP 1 and pp 2 PP 2 be points with closet projections along this line. The essence of mathematical formula used in [2]. Let uu 1 and uu 2 be the two nodes defining any of the wellseparated pairs each containing a set of data points and let (xx 1,yy 1 ), (xx 2,yy 2 ) be the center points of the cells associated with uu 1 and uu 2, respectively. First, define the vector between two center points by VV = (xx 2 xx 1, yy 2 yy 1 ). Then for each data point (xx pp, yy pp ), we create another vector ww pp = (xx pp xx 1, yy pp yy 1 ). In order to calculate the projection distance from (xx pp, yy pp ) to (xx 1, yy 1 ), we create the distance vector dd pp by the inner product of vv and ww pp such that dd pp =vv ww pp = (vv xx ww ppxx + vv yy

ww ppyy ). After recursively executing the above calculation for each data point p= (pp xx, pp yy ) ϵ u2, where pp xx xx 2 rr uu2xx, xx 2 + rr uu2xx, pp yy yy 2 rr uu2yy, yy 2 + rr uu2yy, we can find the minimum projection value among all calculated vector values ww pp and store the according pp 2 = (pp xx2, pp yy2 ) uu 2. We repeat the above steps with swapped uu 1 and uu 2 to obtain another data point pp 1 = (pp xx1, pp yy1 ) uu 1. The ultimate step for each separated pair is to add the edge (pp 1, pp 2 ) into the graph G with its weight w=distance (pp 1, pp 2 ). Following the above calculation method of finding minimum projections, we have added one edge into the graph G for each well-separated pair. Altogether, the number of edges E in the graph G is the same as the number of well-separated pairs. Now, we just have one more step left to obtain the minimum spanning tree from this graph G. Among the best known algorithms for computing the minimum spanning tree of a graph are Kruskal s algorithm and Prim s algorithm. In my program, I chose the former. Kruskal s algorithm is a greedy algorithm in the goal of building a subset tree which includes each vertex in the original graph with the minimized weight of all edges connected within (see details in [3]). In my version of the application of Kruskal s algorithm, I built a priority queue to sort the set of edges in graph G based on its associated weight. In that way, each stage of the while loop within the Kruskal s algorithm, the program will make the local decision of selecting the edge with the lowest weight among the set of all input edges with the hope of finding global optimal solution in a reasonable time (See detail in [4]). There are two standard methods for implementing adding edge process: one is the union-find data structure, the other is the labeling method. In my program, I chose the latter. The structure I wrapped with this method is the treemap, the key of which has the type Integer, each index of the labeling, maps to a sequence of vertices in a cycle. During the execution of the loop, when we try to add a new edge into the minimum spanning tree, we check that if both vertices have not been added to any of the cycle yet, we create a new key, and map this Integer value to the new cycle containing just one edge. In the case that one vertex has been added to one of the cycles with other vertex not, we just add this new vertex into the cycle containing the other vertex. Another situation is that both vertices have been added to the same cycle, in which case this edge is redundant

for the new minimum spanning tree; hence, we do nothing. The last condition is bit complicated that both vertices are added to the treemap data structure but within different cycles. In this case, we will compare the labels associated with two different vertices and merge the cycle with larger label into the cycle with smaller label, removing larger key from the treemap. Finally, we will get the output of approximation minimum spanning tree, the sequence of edges. Figure 4: A sample of the minimum spanning tree built by Kruskal s Algorithm 4 ALGORITHM IN THE FORM OF PSEUDOCODE Two algorithms, Well-Separated Pair Decomposition and Kruskal s algorithm, are presented in the following code block, each describing how the data structures are wrapped in the associated implementation. Algorithm 1: Well-Separated Pair Decomposition Algorithm (from [1, p93]) 1. WS-pairs(u, v, s) { 2. if (rep(u) or rep(v) is empty) return ; 3. else if (u and v are s-well separated) 4. return {{u, v}}; 5. else { 6. if (level(u) > level(v))

7. Let u 1,..., u mm denote the children of u; 8. return ii=1 WS-pairs(uu ii, v, s); 9. } 10. } Algorithm 2: Kruskal s Algorithm (from [3]) My implementation of this algorithm is based on the treemap data structure and the labeling method for maintaining connected components, each key in the map is the index of a label, and it maps to a sequence of vertices forming a cycle. At the beginning of execution, we initialize each label associated with the vertex with value -1. 1. A= 2. For each e G. E; 3. Make-set (E) 4. For each (u, v) E, ordered by weight (u,v), increasing, and stored in the priority queue 5. If Label(u)=-1 and Label(v)=-1, we add a new key that maps to this new edge, and put this pair into the treemap. A=A { (u, v)} 6. Else if Label(u)=-1 and Label(v) -1 we put u into the sequence of vertices that Label(v) maps to A=A {(u,v)} 7. Else if Label(u) -1 and Label(v)=-1, we put v into the sequence of vertices that Label(u) maps to A=A {(u, v)} 8. Else if Label(u)=Label(v), we continue to the next iteration 9. Else we compare the value of Label (u) and Label (v), and merge the sequence of vertices that larger key value maps to into the sequence of vertices that smaller key value maps to. Remove the larger key value from the treemap. A=A {(u,v)}

10. Return A In this program, the algorithms described above are all linked with one another. The well-separated decomposition theorem provides a list of well-separated pairs served as the prerequisite of the minimum projection method. Compared with the naive way of building nn 2 pairs of edges from the input set of n data points, this method makes local comparisons for each pair and construct the graph which contains the number of edges equal to that of the well-separated pairs. Such partially linked graph structure greatly saves space. The above two algorithms combined take time Ο(nlogn), which might also perform better for the efficiency concern. The last step is identical for either the traditional method or the newly application of well-separated decomposition theorem in my program. The Kruskal s algorithm executed on two different graphs with different size of edges. During the loop of this algorithm, it will check whether each existing edge should be added into the newly created minimum spanning tree, obviously, in our program, the graph G with fewer edges will have the great probability to save the total running time of minimum spanning tree algorithm, which costs Ο(nn 2 ) in the naïve way of building the graph.

Figure 5: A minimum spanning tree generated from my program: the PR quadtree partitions data points, well-separated decomposition separates pairs, minimum projection method build edges, Kruskal s Algorithm find ultimately minimum spanning tree. 5 CONCLUSION Well-separated decomposition theorem is widely applied in several fields such as astronomy, molecular dynamics, fluid dynamics, plasma physics, and even surface reconstruction. We present an algorithm that uses this concept to efficiently construct an approximation to the minimum spanning tree. This shows how geometry can be exploited to obtain algorithms that greatly improve the storage space and running time efficiency over traditional approaches.

References 1. David Mount. CMSC754 Lecture Notes. Mar. 2013. url: http://www.cs.umd.edu/~mount/754/lects/754lects.pdf. 2. Meyer, Carl D. (2000). Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics. 3. Wikipedia. Kruskal's_algorithm Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/wiki/kruskal's_algorithm 4. Wikipedia. Greedy_algorithm Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/wiki/greedy_algorithm