Computational Geometry

Similar documents
Closest Pair Problem

Full and Complete Binary Trees

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Analysis of Binary Search algorithm and Selection Sort algorithm

Analysis of Algorithms I: Binary Search Trees

Persistent Data Structures

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

6 March Array Implementation of Binary Trees

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Computational Geometry. Lecture 1: Introduction and Convex Hulls

Binary Heaps. CSE 373 Data Structures

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27

From Last Time: Remove (Delete) Operation

Data Warehousing und Data Mining

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Internet Packet Filter Management and Rectangle Geometry

Lecture 1: Course overview, circuits, and formulas

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Big Data and Scripting. Part 4: Memory Hierarchies

Rotation Operation for Binary Search Trees Idea:

Converting a Number from Decimal to Binary

Zabin Visram Room CS115 CS126 Searching. Binary Search

CS711008Z Algorithm Design and Analysis

Binary Search Trees 3/20/14

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Why Use Binary Trees?

3. How many winning lines are there in 5x5 Tic-Tac-Toe? 4. How many winning lines are there in n x n Tic-Tac-Toe?

Approximation Algorithms

Closest Pair of Points. Kleinberg and Tardos Section 5.4

Binary Search Trees CMPSC 122

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Algorithms. Margaret M. Fleck. 18 October 2010

Symbol Tables. Introduction

Data Structures for Moving Objects

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Classification/Decision Trees (II)

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems

Algorithms Chapter 12 Binary Search Trees

Line Segment Intersection

Data storage Tree indexes

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Algorithms and Data Structures

Persistent Data Structures and Planar Point Location

TREE BASIC TERMINOLOGIES

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Lecture Notes on Binary Search Trees

Dynamic 3-sided Planar Range Queries with Expected Doubly Logarithmic Time

Introduction to Algorithms. Part 3: P, NP Hard Problems

Data Structures and Algorithm Analysis (CSC317) Intro/Review of Data Structures Focus on dynamic sets

Data Structure and Algorithm I Midterm Examination 120 points Time: 9:10am-12:10pm (180 minutes), Friday, November 12, 2010

Shortest Path Algorithms

Binary search algorithm

The Goldberg Rao Algorithm for the Maximum Flow Problem

Cpt S 223. School of EECS, WSU

DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Computational Geometry: Line segment intersection

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

How To Create A Tree From A Tree In Runtime (For A Tree)

Physical Data Organization

Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

External Memory Geometric Data Structures

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

5.4 Closest Pair of Points

New Approach of Computing Data Cubes in Data Warehousing

Ordered Lists and Binary Trees

Lecture 5 - CPA security, Pseudorandom functions

Colored Range Searching on Internal Memory

Review of Hashing: Integer Keys

28 Closest-Point Problems

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

How To Find The Partition Complexity Of A Binary Space Partition

ABSTRACT. For example, circle orders are the containment orders of circles (actually disks) in the plane (see [8,9]).

Data Structures and Algorithms Written Examination

OPTIMAL BINARY SEARCH TREES

CS473 - Algorithms I

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Quality and Efficiency in Kernel Density Estimates for Large Data

Lecture Notes on Binary Search Trees

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Scan-Line Fill. Scan-Line Algorithm. Sort by scan line Fill each span vertex order generated by vertex list

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Data Structures and Algorithms

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Lecture 2: Universality

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Faster deterministic integer factorisation

Helvetic Coding Contest 2016

Binary Heap Algorithms

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

Transcription:

Range trees 1

Range queries Database queries salary G. Ometer born: Aug 16, 1954 salary: $3,500 A database query may ask for all employees with age between a 1 and a 2, and salary between s 1 and s 2 2 19,500,000 19,559,999 date of birth

Result Range queries Theorem: A set of n points on the real line can be preprocessed in O(nlogn) time into a data structure of O(n) size so that any 1D range [counting] query can be answered in O(logn [+k]) time 3

Result Range queries Theorem: A set of n points in the plane can be preprocessed in O(nlogn) time into a data structure of O(n) size so that any 2D range query can be answered in O( n + k) time, where k is the number of answers reported For range counting queries, we need O( n) time 4

Faster queries Range queries Can we achieve O(logn [+k]) query time? 5

Faster queries Range queries Can we achieve O(logn [+k]) query time? 6

Faster queries Range queries If the corners of the query rectangle fall in specific cells of the grid, the answer is fixed (even for lower left and upper right corner) 7

Faster queries Range queries Build a tree so that the leaves correspond to the different possible query rectangle types (corners in same cells of grid), and with each leaf, store all answers (points) [or: the count] Build a tree on the different x-coordinates (to search with left side of R), in each of the leaves, build a tree on the different x-coordinates (to search with the right side of R), in each of the leaves,... 8

Faster queries Range queries n n n n 9

Faster queries Range queries Question: What are the storage requirements of this structure, and what is the query time? 10

Range queries Faster queries Recall the 1D range tree and range query: Two search paths (grey nodes) Subtrees in between have answers exclusively (black) 11

Example 1D range query Range queries A 1-dimensional range query with [25, 90] 49 23 80 10 37 62 89 3 19 30 49 59 70 89 93 3 10 19 23 30 37 59 62 70 80 93 97 12

Example 1D range query Range queries A 1-dimensional range query with [61, 90] 23 49 split node 80 10 37 62 89 3 19 30 49 59 70 89 93 3 10 19 23 30 37 59 62 70 80 93 97 13

Examining 1D range queries Range queries Observation: Ignoring the search path leaves, all answers are jointly represented by the highest nodes strictly between the two search paths Question: How many highest nodes between the search paths can there be? 14

Examining 1D range queries Range queries For any 1D range query, we can identify O(logn) nodes that together represent all answers to a 1D range query 15

Toward 2D range queries Range queries For any 2d range query, we can identify O(logn) nodes that together represent all points that have a correct first coordinate 16

Toward 2D range queries Range queries (1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) 17

Toward 2D range queries Range queries (1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) 18

Toward 2D range queries Range queries data structure for searching on y-coordinate (1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) 19

Toward 2D range queries Range queries (5, 9) (3, 8) (6, 7) (1, 5) (9, 4) (7, 3) (4, 2) (8, 1) (1, 5) (3, 8) (4, 2) (5, 9) (6, 7) (7, 3) (8, 1) (9, 4) 20

2D range trees Every internal node stores a whole tree in an associated structure, on y-coordinate Question: How much storage does this take? 21

Storage of 2D range trees To analyze storage, two arguments can be used: By level: On each level, any point is stored exactly once. So all associated trees on one level together have O(n) size By point: For any point, it is stored in the associated structures of its search path. So it is stored in O(logn) of them 22

algorithm 23 Algorithm Build2DRangeTree(P) 1. Construct the associated structure: Build a binary search tree T assoc on the set P y of y-coordinates in P 2. if P contains only one point 3. then Create a leaf ν storing this point, and make T assoc the associated structure of ν. 4. else Split P into P left and P right, the subsets and > the median x-coordinate x mid 5. ν left Build2DRangeTree(P left ) 6. ν right Build2DRangeTree(P right ) 7. Create a node ν storing x mid, make ν left the left child of ν, make ν right the right child of ν, and make T assoc the associated structure of ν 8. return ν

Efficiency of construction The construction algorithm takes O(nlog 2 n) time T(1) = O(1) T(n) = 2 T(n/2) + O(nlogn) which solves to O(nlog 2 n) time 24

Efficiency of construction Suppose we pre-sort P on y-coordinate, and whenever we split P into P left and P right, we keep the y-order in both subsets For a sorted set, the associated structure can be built in linear time 25

Efficiency of construction The adapted construction algorithm takes O(n log n) time T(1) = O(1) T(n) = 2 T(n/2) + O(n) which solves to O(nlogn) time 26

2D range queries How are queries performed and why are they correct? Are we sure that each answer is found? Are we sure that the same point is found only once? 27

2D range queries ν p p µ µ p p 28

Query algorithm Algorithm 2DRangeQuery(T,[x : x ] [y : y ]) 1. ν split FindSplitNode(T,x,x ) 2. if ν split is a leaf 3. then report the point stored at ν split, if an answer 4. else ν lc(ν split ) 5. while ν is not a leaf 6. do if x x ν 7. then 1DRangeQ(T assoc (rc(ν)),[y : y ]) 8. ν lc(ν) 9. else ν rc(ν) 10. Check if the point stored at ν must be reported. 11. Similarly, follow the path from rc(ν split ) to x... 29

2D range query time Question: How much time does a 2D range query take? Subquestions: In how many associated structures do we search? How much time does each such search take? 30

2D range queries ν µ µ 31

2D range query efficiency We search in O(logn) associated structures to perform a 1D range query; at most two per level of the main tree The query time is O(logn) O(logm + k ), or O(logn ν + k ν ) ν where k ν = k the number of points reported 32

2D range query efficiency Use the concept of grey and black nodes again: 33

2D range query efficiency The number of grey nodes is O(log 2 n) The number of black nodes is O(k) if k points are reported The query time is O(log 2 n + k), where k is the size of the output 34

Result Theorem: A set of n points in the plane can be preprocessed in O(nlogn) time into a data structure of O(nlogn) size so that any 2D range query can be answered in O(log 2 n + k) time, where k is the number of answers reported Recall that a kd-tree has O(n) size and answers queries in O( n + k) time 35

Efficiency n logn log 2 n n 16 4 16 4 64 6 36 8 256 8 64 16 1024 10 100 32 4096 12 144 64 16384 14 196 128 65536 16 256 256 1M 20 400 1K 16M 24 576 4K 36

2D range query efficiency Question: How about range counting queries? 37

Higher dimensional range trees A d-dimensional range tree has a main tree which is a one-dimensional balanced binary search tree on the first coordinate, where every node has a pointer to an associated structure that is a (d 1)-dimensional range tree on the other coordinates 38

Storage The size S d (n) of a d-dimensional range tree satisfies: S 1 (n) = O(n) for all n S d (1) = O(1) for all d S d (n) 2 S d (n/2) + S d 1 (n) for d 2 This solves to S d (n) = O(nlog d 1 n) 39

Query time The number of grey nodes G d (n) satisfies: G 1 (n) = O(logn) for all n G d (1) = O(1) for all d G d (n) 2 logn + 2 logn G d 1 (n) for d 2 This solves to G d (n) = O(log d n) 40

Result Theorem: A set of n points in d-dimensional space can be preprocessed in O(nlog d 1 n) time into a data structure of O(nlog d 1 n) size so that any d-dimensional range query can be answered in O(log d n + k) time, where k is the number of answers reported Recall that a kd-tree has O(n) size and answers queries in O(n 1 1/d + k) time 41

Comparison for d = 4 n logn log 4 n n 3/4 1024 10 10,000 181 65,536 16 65,536 4096 1M 20 160,000 32,768 1G 30 810,000 5,931,641 1T 40 2,560,000 1G 42

Improving the query time We can improve the query time of a 2D range tree from O(log 2 n) to O(logn) by a technique called fractional cascading This automatically lowers the query time in d dimensions to O(log d 1 n) time 43

Improving the query time The idea illustrated best by a different query problem: Suppose that we have a collection of sets S 1,...,S m, where S 1 = n and where S i+1 S i We want a data structure that can report for a query number x, the smallest value x in all sets S 1,...,S m 44

Improving the query time 1 2 3 5 8 13 21 34 55 S 1 1 3 5 8 13 21 34 55 S 2 1 3 13 21 34 55 3 34 55 S 3 S 4 45

Improving the query time 1 2 3 5 8 13 21 34 55 S 1 1 3 5 8 13 21 34 55 S 2 1 3 13 21 34 55 3 34 55 S 3 S 4 46

Improving the query time 1 2 3 5 8 13 21 34 55 S 1 1 3 5 8 13 21 34 55 S 2 1 3 13 21 34 55 3 34 55 S 3 S 4 47

Improving the query time Suppose that we have a collection of sets S 1,...,S m, where S 1 = n and where S i+1 S i We want a data structure that can report for a query number x, the smallest value x in all sets S 1,...,S m This query problem can be solved in O(logn + m) time instead of O(m logn) time 48

Improving the query time Can we do something similar for m 1-dimensional range queries on m sets S 1,...,S m? We hope to get a query time of O(logn + m + k) with k the total number of points reported 49

Improving the query time 1 2 3 5 8 13 21 34 55 S 1 1 3 5 8 13 21 34 55 S 2 1 3 13 21 34 55 3 34 55 S 3 S 4 50

Improving the query time 1 2 3 5 8 13 21 34 55 S 1 1 3 5 8 13 21 34 55 S 2 1 3 13 21 34 55 3 34 55 S 3 S 4 51

Improving the query time [6,35] 1 2 3 5 8 13 21 34 55 S 1 1 3 5 8 13 21 34 55 S 2 1 3 13 21 34 55 S 3 3 34 55 S 4 52

Now we do the same on the associated structures of a 2-dimensional range tree Note that in every associated structure, we search with the same values y and y Replace all associated structures except for the one of the root by a linked list For every list element (and leaf of the associated structure of the root), store two pointers to the appropriate list elements in the lists of the left child and of the right child 53

54

55

17 8 52 5 15 33 58 2 7 12 17 21 41 58 67 2 5 7 8 12 15 21 33 41 52 (2, 19) (7, 10) (12, 3) (17, 62) (21, 49) (41, 95) (58, 59) (93, 70) (5, 80) (8, 37) (15, 99) (33, 30) (52, 23) (67, 89) 67 93 56

3 10 19 23 30 37 49 59 62 70 80 89 95 99 3 10 19 37 62 80 99 23 30 49 59 70 89 95 10 19 37 80 3 62 99 23 30 49 95 59 70 89 19 80 10 37 3 99 62 30 49 23 95 59 70 89 19 80 10 37 3 99 49 30 95 23 89 70 57

17 [4, 58] [19, 65] 8 52 5 15 33 58 2 7 12 17 21 41 58 67 2 5 7 8 12 15 21 33 41 52 (2, 19) (7, 10) (12, 3) (17, 62) (21, 49) (41, 95) (58, 59) (93, 70) (5, 80) (8, 37) (15, 99) (33, 30) (52, 23) (67, 89) 67 93 58

3 10 19 23 30 37 49 59 62 70 80 89 95 99 3 10 19 37 62 80 99 23 30 49 59 70 89 95 10 19 37 80 3 62 99 23 30 49 95 59 70 89 19 80 10 37 3 99 62 30 49 23 95 59 70 89 19 80 10 37 3 99 49 30 95 23 89 70 59

3 10 19 23 30 37 49 59 62 70 80 89 95 99 3 10 19 37 62 80 99 23 30 49 59 70 89 95 10 19 37 80 3 62 99 23 30 49 95 59 70 89 19 80 10 37 3 99 62 30 49 23 95 59 70 89 19 80 10 37 3 99 49 30 95 23 89 70 60

Instead of doing a 1D range query on the associated structure of some node ν, we find the leaf where the search to y would end in O(1) time via the direct pointer in the associated structure in the parent of ν The number of grey nodes reduces to O(logn) 61

Result Theorem: A set of n points in d-dimensional space can be preprocessed in O(nlog d 1 n) time into a data structure of O(nlog d 1 n) size so that any d-dimensional range query can be answered in O(log d 1 n + k) time, where k is the number of answers reported Recall that a kd-tree has O(n) size and answers queries in O(n 1 1/d + k) time 62

Both for kd-trees and for range trees we have to take care of multiple points with the same x- or y-coordinate 63

Both for kd-trees and for range trees we have to take care of multiple points with the same x- or y-coordinate 64

Treat a point p = (p x,p y ) with two reals as coordinates as a point with two composite numbers as coordinates A composite number is a pair of reals, denoted (a b) We let (a b) < (c d) iff a < c or ( a = c and b < d ); this defines a total order on composite numbers 65

The point p = (p x,p y ) becomes ((p x p y ), (p y p x )). Then no two points have the same first or second coordinate The median x-coordinate or y-coordinate is a composite number The query range [x : x ] [y : y ] becomes [(x ) : (x + )] [(y ) : (y + )] We have (p x,p y ) [x : x ] [y : y ] iff ((p x p y ), (p y p x )) [(x ) : (x + )] [(y ) : (y + )] 66