Closest Pair Problem



Similar documents
5.4 Closest Pair of Points

Computational Geometry. Lecture 1: Introduction and Convex Hulls

Closest Pair of Points. Kleinberg and Tardos Section 5.4

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

28 Closest-Point Problems

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

Arrangements And Duality

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Near Optimal Solutions

BALTIC OLYMPIAD IN INFORMATICS Stockholm, April 18-22, 2009 Page 1 of?? ENG rectangle. Rectangle

Algorithms for Computing Convex Hulls Using Linear Programming

Practice with Proofs

CSE 421 Algorithms: Divide and Conquer

Solving Geometric Problems with the Rotating Calipers *

Lecture 8 : Coordinate Geometry. The coordinate plane The points on a line can be referenced if we choose an origin and a unit of 20

CS473 - Algorithms I

Dynamic Programming. Lecture Overview Introduction

Planar Curve Intersection

1. Prove that the empty set is a subset of every set.

Shortest Path Algorithms

2.3 Convex Constrained Optimization Problems

1.3. DOT PRODUCT If θ is the angle (between 0 and π) between two non-zero vectors u and v,

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Scan-Line Fill. Scan-Line Algorithm. Sort by scan line Fill each span vertex order generated by vertex list

Chapter 6. Cuboids. and. vol(conv(p ))

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Selected practice exam solutions (part 5, item 2) (MAT 360)

Geometry: Unit 1 Vocabulary TERM DEFINITION GEOMETRIC FIGURE. Cannot be defined by using other figures.

Triangle deletion. Ernie Croot. February 3, 2010

6.2 Permutations continued

Definition Given a graph G on n vertices, we define the following quantities:

3. How many winning lines are there in 5x5 Tic-Tac-Toe? 4. How many winning lines are there in n x n Tic-Tac-Toe?

Algorithm Design and Analysis Homework #1 Due: 5pm, Friday, October 4, 2013 TA === Homework submission instructions ===

Section 1.1. Introduction to R n

ABSTRACT. For example, circle orders are the containment orders of circles (actually disks) in the plane (see [8,9]).

1 if 1 x 0 1 if 0 x 1

Lecture 4 Online and streaming algorithms for clustering

Diffuse Reflection Radius in a Simple Polygon

Stochastic Inventory Control

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

1.7 Graphs of Functions

Equations Involving Lines and Planes Standard equations for lines in space

Converting a Number from Decimal to Binary

MODERN APPLICATIONS OF PYTHAGORAS S THEOREM

Optimal Binary Search Trees Meet Object Oriented Programming

THE BANACH CONTRACTION PRINCIPLE. Contents

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation

Copyright 2011 Casa Software Ltd. Centre of Mass

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11}

Session 7 Bivariate Data and Analysis

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Homework until Test #2

Algorithms. Margaret M. Fleck. 18 October 2010

Math 4310 Handout - Quotient Vector Spaces

CS154. Turing Machines. Turing Machine. Turing Machines versus DFAs FINITE STATE CONTROL AI N P U T INFINITE TAPE. read write move.

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

α = u v. In other words, Orthogonal Projection

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

The Container Selection Problem

MA107 Precalculus Algebra Exam 2 Review Solutions

Applied Algorithm Design Lecture 5

CSC2420 Spring 2015: Lecture 3

Mathematical Methods of Engineering Analysis

Contents. 2 Lines and Circles Cartesian Coordinates Distance and Midpoint Formulas Lines Circles...

csci 210: Data Structures Recursion

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

1 Approximating Set Cover

PROBLEM SET. Practice Problems for Exam #1. Math 1352, Fall Oct. 1, 2004 ANSWERS

Physics 2048 Test 1 Solution (solutions to problems 2-5 are from student papers) Problem 1 (Short Answer: 20 points)

Solutions to Homework 6

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

INTERSECTION OF LINE-SEGMENTS

Matrix-Chain Multiplication

Lecture 2: Universality

LCs for Binary Classification

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

Approximation Algorithms

Metric Spaces. Chapter Metrics

11 th Annual Harvard-MIT Mathematics Tournament

Systems of Linear Equations

SURFACE AREA AND VOLUME

The Pairwise Sorting Network

The Dot and Cross Products

1 Solution of Homework

160 CHAPTER 4. VECTOR SPACES

Notes on Complexity Theory Last updated: August, Lecture 1

The LCA Problem Revisited

1 Review of Least Squares Solutions to Overdetermined Systems

Solutions to Homework 10

17 Greatest Common Factors and Least Common Multiples

Lesson 2: Circles, Chords, Diameters, and Their Relationships

Reflection and Refraction

Transcription:

Closest Pair Problem Given n points in d-dimensions, find two whose mutual distance is smallest. Fundamental problem in many applications as well as a key step in many algorithms. p q A naive algorithm takes O(dn 2 ) time. Element uniqueness reduces to Closest Pair, so Ω(n log n) lower bound. We will develop a divide-and-conquer based O(n log n) algorithm; dimension d assumed constant.

1-Dimension Problem 1D problem can be solved in O(n log n) via sorting. Sorting, however, does not generalize to higher dimensions. So, let s develop a divide-and-conquer for 1D. Divide the points S into two sets S 1, S 2 by some x-coordinate so that p < q for all p S 1 and q S 2. Recursively compute closest pair (p 1, p 2 ) in S 1 and (q 1, q 2 ) in S 2. S1 S2 p1 p2 p3 q3 q1 q2 median m Let be the smallest separation found so far: = min( p 2 p 1, q 2 q 1 )

1D Divide & Conquer S1 S2 p1 p2 p3 q3 q1 q2 median m The closest pair is {p 1, p 2 }, or {q 1, q 2 }, or some {p 3, q 3 } where p 3 S 1 and q 3 S 2. Key Observation: If m is the dividing coordinate, then p 3, q 3 must be within of m. In 1D, p 3 must be the rightmost point of S 1 and q 3 the leftmost point of S 2, but these notions do not generalize to higher dimensions. How many points of S 1 can lie in the interval (m, m]? By definition of, at most one. Same holds for S 2.

1D Divide & Conquer S1 S2 p1 p2 p3 q3 q1 q2 Closest-Pair (S). median m If S = 1, output =. If S = 2, output = p 2 p 1. Otherwise, do the following steps: 1. Let m = median(s). 2. Divide S into S 1, S 2 at m. 3. 1 = Closest-Pair(S 1 ). 4. 2 = Closest-Pair(S 2 ). 5. 12 is minimum distance across the cut. 6. Return = min( 1, 2, 12 ). Recurrence is T (n) = 2T (n/2) + O(n), which solves to T (n) = O(n log n).

2-D Closest Pair We partition S into S 1, S 2 by vertical line l defined by median x-coordinate in S. Recursively compute closest pair distances 1 and 2. Set = min( 1, 2 ). Now compute the closest pair with one point each in S 1 and S 2. S1 S2 p1 1 q1 2 q2 p2 P1 P2 In each candidate pair (p, q), where p S 1 and q S 2, the points p, q must both lie within of l.

2-D Closest Pair At this point, complications arise, which weren t present in 1D. It s entirely possible that all n/2 points of S 1 (and S 2 ) lie within of l. S1 S2 p1 1 q1 2 q2 p2 P1 P2 Naively, this would require n 2 /4 calculations. We show that points in P 1, P 2 ( strip around l) have a special structure, and solve the conquer step faster.

Conquer Step Consider a point p S 1. All points of S 2 within distance of p must lie in a 2 rectangle R. p R R P1 P2 How many points can be inside R if each pair is at least apart? In 2D, this number is at most 6! So, we only need to perform 6 n/2 distance comparisons! We don t have an O(n log n) time algorithm yet. Why?

Conquer Step Pairs In order to determine at most 6 potential mates of p, project p and all points of P 2 onto line l. p R R P1 P2 Pick out points whose projection is within of p; at most six. We can do this for all p, by walking sorted lists of P 1 and P 2, in total O(n) time. The sorted lists for P 1, P 2 can be obtained from pre-sorting of S 1, S 2. Final recurrence is T (n) = 2T (n/2) + O(n), which solves to T (n) = O(n log n).

d-dimensional Closest Pair Two key features of the divide and conquer strategy are these: 1. The step where subproblems are combined takes place in one lower dimension. 2. The subproblems in the combine step satisfy a sparsity condition. 3. Sparsity Condition: Any cube with side length 2 contains O(1) points of S. 4. Note that the original problem does not necessarily have this condition. Dividing Plane H

The Sparse Problem Given n points with -sparsity condition, find all pairs within distance. Divide the set into S 1, S 2 by a median place H. Recursively solve the problem in two halves. Project all points lying within thick slab around H onto H. Call this set S. S inherits the -sparsity condition. Why?. Recursively solve the problem for S in d 1 space. The algorithms satisfies the recurrence U(n, d) = 2U(n/2, d) + U(n, d 1) + O(n). which solves to U(n, d) = O(n(log n) d 1 ).

Getting Sparsity Recall that divide and conquer algorithm solves the left and right half problems recursively. The sparsity holds for the merge problem, which concerns points within thick slab around H. Dividing Plane H Set P2 If S is a set where inter-point distance is at least, then the -cube centered at p contains at most a constant number of points of S, depending on d.

Proof of Sparsity Let C be the -cube centered at p. Let L be the set of points in C. Imagine placing a ball of radius /2 around each point of L. No two balls can intersect. Why? The volume of cube C is (2) d. The volume of each ball is 1 c d (/2) d, for a constant c d. Thus, the maximum number of balls, or points, is at most c d 4 d, which is O(1). ball a b /2ball

Closest Pair Algorithm Divide the input S into S 1, S 2 by the median hyperplane normal to some axis. Recursively compute 1, 2 for S 1, S 2. Set = min( 1, 2 ). Let S be the set of points that are within of H, projected onto H. Use the -sparsity condition to recursively examine all pairs in S there are only O(n) pairs. The recurrence for the final algorithm is: T (n, d) = 2T (n/2, d) + U(n, d 1) + O(n) = 2T (n/2, d) + O(n(log n) d 2 ) + O(n) = O(n(log n) d 1 ).

Improving the Algorithm If we could show that the problem size in the conquer step is m n/(log n) d 2, then U(m, d 1) = O(m(log m) d 2 ) = O(n). This would give final recurrence T (n, d) = 2T (n/2, d) + O(n) + O(n), which solves to O(n log n). Theorem: Given a set S with -sparsity, there exists a hyperplane H normal to some axis such that 1. S 1, S 2 n/4d. 2. Number of points within of H is O( n (log n) d 2 ). 3. H can be found in O(n) time.

Sparse Hyperplane We prove the theorem for 2D. Show there is a line with α n points within of it, for some constant α. For contradiction, assume no such line exists. Partition the plane by placing vertical lines at distance 2 from each other, where to the left of leftmost line, and right of rightmost line. > n/2 points 2 l lines 2 k lines

Sparse Hyperplane If there are k slabs, we have kα n 3n/4, which gives k 3 4α n. > n/2 points 2 l lines 2 k lines Similarly, if there is no horizontal line with desired properties, we get l 3 4α n. By sparsity, number of points in any 2 cell is some constant c.

Sparse Hyperplane > n/2 points 2 l lines 2 k lines This gives that the num. of points inside all the slabs is at most ckl, which is at most ( 3 4α) 2 cn. Since there are n/2 points inside the slabs, this is a contradiction if we choose α 18c 4. So, one of these k vertical of l horizontal lines must satisfy the desired properties. Since we know, we can check these k + l lines and choose the correct one in O(n) time.

Optimal Algorithm Actually we can start the algorithm with such a hyperplane. The divide and conquer algorithm now satisfies the recurrence T (n, d) = 2T (n/2, d) + U(m, d 1) + O(n). By new sparsity claim, m n/(log n) d 2, and so U(m, d 1) = O(m(log m) d 2 ) = O(n). Thus, T (n, d) = 2T (n/2, d) + O(n) + O(n), which solves to O(n log n). Solves the Closest Pair problem in fixed d in optimal O(n log n) time.