Linear Time Selection

Similar documents
Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

CS473 - Algorithms I

Analysis of Binary Search algorithm and Selection Sort algorithm

Closest Pair Problem

Near Optimal Solutions

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Converting a Number from Decimal to Binary

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Algorithms. Margaret M. Fleck. 18 October 2010

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Approximation Algorithms

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Optimal Binary Search Trees Meet Object Oriented Programming

APP INVENTOR. Test Review

Mathematical Induction. Lecture 10-11

Analysis of Algorithms I: Binary Search Trees

Section IV.1: Recursive Algorithms and Recursion Trees

Offline sorting buffers on Line

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Randomized algorithms

Algorithms for Computing Convex Hulls Using Linear Programming

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

DESIGN AND ANALYSIS OF ALGORITHMS

Algorithms and Methods for Distributed Storage Networks 9 Analysis of DHT Christian Schindelhauer

WORKED EXAMPLES 1 TOTAL PROBABILITY AND BAYES THEOREM

Lecture 1: Course overview, circuits, and formulas

Binary Heap Algorithms

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

Find-The-Number. 1 Find-The-Number With Comps

Biostatistics 615/815

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Integration. Topic: Trapezoidal Rule. Major: General Engineering. Author: Autar Kaw, Charlie Barker.

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman

CS711008Z Algorithm Design and Analysis

From Last Time: Remove (Delete) Operation

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Heaps & Priority Queues in the C++ STL 2-3 Trees

Direct Methods for Solving Linear Systems. Matrix Factorization

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

CS473 - Algorithms I

Core Maths C1. Revision Notes

1.7 Graphs of Functions

The Taxman Game. Robert K. Moniot September 5, 2003

Lecture 3: Finding integer solutions to systems of linear equations

IB Maths SL Sequence and Series Practice Problems Mr. W Name

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

1 if 1 x 0 1 if 0 x 1

Load Balancing. Load Balancing 1 / 24

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

Solutions for Practice problems on proofs

LINEAR INEQUALITIES. Mathematics is the art of saying many things in many different ways. MAXWELL

TREE BASIC TERMINOLOGIES

Method To Solve Linear, Polynomial, or Absolute Value Inequalities:

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

The Relative Worst Order Ratio for On-Line Algorithms

Parametric Equations and the Parabola (Extension 1)

Binary Heaps. CSE 373 Data Structures

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Dynamic TCP Acknowledgement: Penalizing Long Delays

TAKE-AWAY GAMES. ALLEN J. SCHWENK California Institute of Technology, Pasadena, California INTRODUCTION

28 Closest-Point Problems

An example of a computable

the recursion-tree method

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27

Why? A central concept in Computer Science. Algorithms are ubiquitous.

2010 Solutions. a + b. a + b 1. (a + b)2 + (b a) 2. (b2 + a 2 ) 2 (a 2 b 2 ) 2

1/1 7/4 2/2 12/7 10/30 12/25

Dynamic Programming. Lecture Overview Introduction

THE SCHEDULING OF MAINTENANCE SERVICE

Homework until Test #2

Introduction. Appendix D Mathematical Induction D1

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Algorithms and Data Structures

AP CALCULUS AB 2009 SCORING GUIDELINES

Math 55: Discrete Mathematics

Lecture 2 February 12, 2003

Math 120 Final Exam Practice Problems, Form: A

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Solving Geometric Problems with the Rotating Calipers *

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite

Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Sorting Algorithms. Nelson Padua-Perez Bill Pugh. Department of Computer Science University of Maryland, College Park

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Transcription:

Chapter 2 Linear Time Selection Definition 1 Given a sequence x 1, x 2,..., x n of numbers, the ith order statistic is the ith smallest element. Maximum or Minimum When i = 1, it is the minimum element and when i = n it is the maximum element. We can find either in n 1 comparisons by doing a linear search of the list, from left to right or from right to left. So, n 1 comparisons are sufficient. These many comparisons are also necessary. Think of the process determinimg the maxiumum (minimum), for example, as a tournament in which each comparison is a match to determine the winner (loser). The maximum (minimum) element is the champion-winner (champion-loser). Snce the champion does not lose (win) a match and everyone else must lose (win) at least one match, n 1 comparisons are also necessary. Cormen, Ex 9.1-1, page 185 Solution: To determine the second smallest element or the 2nd order statistic, we first determine the champion-loser. We do this by constructing a (binary) tournament tree of height log n for n players. This is esaily done by cnstructing a left tournament tree on n/2 elements and a right tournament tree on the remaining n/2 elements, continuing the construction recursively for both the subtrees. Since this tree has n leaves, there are n 1 internal nodes and hence as many matches have been played. Now we determine the second largest by determining the champion loser from among at most log n players who have lost to the champion loser. Hence the second smallest can be found in at most n + log n 2 comparisons. An interesting question is whether these many comparisons are necessary. We will discuss this later on in the course. Here s an interesting problem related to the linear search for the minimum (or maximum) element, discussed above. Problem 1 Show that during this linear search the expected number of times the the miniumum (or maximum) value is reset is Θ(log n). 1

Solution Let the elements be x 1, x 2, x 3,..., x i, x i+1,..., x n. Let the indicator random variable X i = 1 if the maximum is reset on examining the x i element and 0 otherwise. Thus X = Σ n i=1 X i is the number of times the maximum is reset. Now, by linearity of the expectation operator, E[X] = Σ n i=1 E[X i] is the expected value of the random variable X, where E[X i ] = p i, the probability that on examining x i, the maximum is reset. Suppose we have examined the elements from x 1 to x i 1. The next element x i will reset the maximum if it is the largest of the elements from x 1 to x i. Since all permutations of the first i values are equally likely, any one of the first i values can occur in the ith spot, and, in particular, the maximum, with a probabilty of 1/i. Thus E[X] = Σ n i=11/i. Hence E[X] = Θ(log n). Maximum and Minimum Using the above algorithm we can find both the maximum and the minimum in 2n 2 comparisons. The following cleverer algorithm makes use of at most 3 n/2 comparisons. If n odd, we set the first element to both the maximum and minimum. The remaining elements we compare in pairs, using the smaller (larger) one to reset the minumum (maximum), if necssary. Thus 3 comparisons for (n 1)/2 pairs, for a total of 3 n/2 comparisons. If n is even, we set the smaller of the first pair to be the minimum and the larger to the maximum. We then proceed as in the odd case, for a total of 1 + 3 (n/2 1) or 3 n/2 2 comparisons. Hence at most 3 n/2 comparisons are sufficient in all cases. We can improve on the number of comparisons by maintaining two disjoint sets S 1 and S 2 of potential maxima and potential minimum respectively. Initially, S 1 and S 2 are empty. We compare the elements in pairs, adding the smaller to S 1 and the larger to S 2. When we have exhausted all the elements or have just one left, we determine the minimum of S 1 and the maximum of S 2. If there is no leftover element, these are respectively the minimum and maximum of S; else we compare these with the leftover element to determine if these need to be reset. Let n = 2k. Then the number of comparisons made is 3k 2 = 3n/2 2. If n = 2k + 1 then the number of comparisons made is 3k = 3n/2 2. This solves Ex 9.1-2 in Cormen partly. The other part is the following problem. Problem 2 Show that 3n/2 2 comparisons are necessary Problem 3 It is possible to solve this problem by a divide-and-conquer approach. We divide the list into two nearly equal halves and apply the algorithm recursively to the two halves. Let max l (max r ) and min l (min r ) be the maximum and minimum elements of the left (right) half. Then max(max l, max r ) is the maximum of the entire list and min(min l, min r ) is the minimum. If T(n) is the number of comparisons made on a list of size n, then T(n) = T( n/2 ) + T( n/2 ) + 2, n 3 Determine a closed form expression for T(n), assuming that T(2) = 1 and T(1) = 0. c Dr. Asish Mukhopadhyay 2

Randomized Selection It s similar to randomized quicksort in the sense that we choose a random element with respect to which we partition the input array into a left subarray and a right subarray, locate the subarray that contains the ith order statistic and continue recursively with this subarray. RANDOMIZED SELECT(A, p, r, i) 1. if p = r 2 then return A[p] 3. q RANDOMIZED PARTITION(A, p, r) 4. k q p + 1 5. if i = k 6. then return A[q] 7. elseif i < k 8. then return RANDOMIZED SELECT(A, p, q 1, i) 9. else return RANDOMIZED SELECT(A, q + 1, r, i k) Cormen, Ex 9.2-1 This is obvious since k 1 always. When k = 1(= r p + 1), we either return the pivot as answer or continue with the right (left) subarray of non-zero length. Also note that the recursion bottoms out when the array is of size 1. The analysis is interesting. Let T(n) be the time to find the ith order statistic of the input array A[1..n]. The time T(n) is a random variable, since any element A[i] can be chosen as the pivot element. The probability of this is 1/n (which is also the the expected value, E[X k ], of the random variable defined below). We are interested in the expected (or mean) value of T(n). Let X k be a random variable (also called indicator random variable) defined such that: Then, X k = 1 if A[1..q] has k elements X k = 0 otherwise T(n) = Σ n k=1 X k max(t(k 1), T(n k)) + Θ(n), (2.1) where we have assumed that the ith order statistic is in the larger subarray, created by the partition. Applying the expectation operator E[] to both sides of the above equation, we get, from the linearity of this operator, and the independence of the random variables, X k and T(n) that: E[T(n)] = Σ n k=1 E[X k] E[max([T(k 1)], T(n k))] + Θ(n), = Σ n k=1probability (pivot is k th smallest of A[1..n]) E[max(T(k 1), T(n k))] + Θ(n) = Σ n k=11/n E[max(T(k 1), T(n k))] + Θ(n), where we have made the simplifying assumption that the larger subarray contains the ith order statistic. If we assume that T(n) is monotone increasing, then T(k 1) > T(n k) for k 1 > n k or k > (n + 1)/2, while T(k 1) < T(n k) for k 1 < n k and T(k 1) = T(n k) for k = (n + 1)/2 or n/2. Thus the terms in the sum being equal around this middle value, which exists when n is odd, c Dr. Asish Mukhopadhyay 3

E[T(n)] = 2/n Σ n 1 k= n/2 E[T(k)] + Θ(n) = 2/n Σ n 1 k=1e[t(k)] 2/n Σ n/2 1 k=1 E[T(k)] + Θ(n) Assume that E[T(n)] cn for a suitable constant c > 0. Then from the above we have that: E[T(n)] 2/n c n(n 1)/2 2/n c ( n/2 1) n/2 /2 + Θ(n) 2/n c n(n 1)/2 2/n c (n/2 1)(n/2 2)/2 + Θ(n) c (n 1) c (n/2 3)/2 + Θ(n) c n c (n/2 1)/2 + Θ(n) c n, provided c (n/2 1)/2 > a n, where we have replaced the Θ(n) term by a n, for some constant a. A small rearrangement reduces the inequality to n (c/4 a) c/2 > 0, so that E[T(n)] c n for n > (c/2)/(c/4 a). By choosing c > 4a, we can set n 0 = 2c/(c 4a). Thus E[T(n)] = O(n) and therefore E[T(n)] = Θ(n) since T(n) has a trivial lower bound of Ω(n). Deterministic Selection A clever deterministic method is used to choose the pivot to lie in the shaded region shown in Fig. 2.1 below. The implication of this is that no matter whether the ith order statistic lies to the left of the pivot or to its right, we are sure to prune the size of the input set by approximately a quarter in each partitioning step. This is a beautiful example of the prune-and-search pardigm. approx n/4 elements < pivot approx n/4 elements > pivot pivot lies somewhere here Figure 2.1: Where the pivot lies This is how it is done. We make groups of 5 out of the n input elements. If n is not a multiple of 5, there is one residual group with up to 4 elements. Choose the median element of each group to give us n/5 elements. We find the median m of these n/5 medians, using this same algorithm; setting m to be the pivot we continue recursively with the subarray that contains the ith order statistic. For the complexity analysis, let us establish a lower bound on the number of elements that are greater than the pivot, m. Barring the group that contains m and the residual group, 1/2 n/5-2 groups have 3 elements greater than m. Thus at least 3n/10 6 elements are greater than the pivot. Similarly, we can establish that at least 3n/10 6 elements are smaller than the pivot. We note that 3n/10 6 n/4 for n 120 and, a fortiori, for n 140. This solves Ex 9.3-2, page 192. If T(n) is the worst-case complexity of this deterministic selection algorithm, then T(n) T(7n/10 + 6) + T( n/5 ) + Θ(n) since we continue recursively with a subarray of size at most 7n/10 + 6, and also call the algorithm recursively to determine the median of a group of n/5 elements. c Dr. Asish Mukhopadhyay 4

We use the substitution method again to show that T(n) c n for n n 0, for suitable choices of c and n 0 ; for n < n 0, T(n) = O(1). We have T(n) c (7n/10 + 6) + c ( n/5 ) + a n c (7n/10 + 6) + c (n/5 + 1) + a n c n + c ( 3n/10 + 6) + c (n/5 + 1) + a n c n + (c n/10 + 7 c + a n) c n, provided c n/10 + 7 c + a n 0 or c a n/(n/10 7). We can satisfy this last inequality by letting c 20a and 20a 10a n/(n 70). From the latter inequality it follows that n 140. Thus choosing n 0 = 140 and c 20a, we satisfy T(n) c n. Note that for the above analysis to go through we must have 7/10+1/5 = 9/10 < 1. If we choose to make groups of 3, then these fractions are 2/3 and 1/3, adding up to 1, in which case the analysis dose not go through. Groups of 7 also work since in this case the fractions are 5/7 and 1/7. This solves exercise 9.3-1, page 192. Knowing how to find an order staistic in O(n) worst-case time, helps us do Quicksort in O(n log n) worst-case time since this is the solution to the recurrence This solves Cormen, page 192, Ex 9.3-3 T(n) = 2 T(n/2) + O(n) The solution to Cormen, page 192, Ex. 9.3-4 is also easily obtained, using this algorithm. At each step of the deterministic selection algorithm, the status of at least n/4 elements with respect to the ith order statistic are resolved in the sense that these are known to be either greater than or smaller than the ith order statistic. Thus we successively resolve the status of at least 1/4 n, 1/4 3n/4, 1/4 3/4 3n/4,... elements. This gives us an upper bound of 1/4 1/(1 3/4) n = n elements. Thus repeating the step enough times till a subarray of constant size is left, we resolve the status of all elements except for this subarray of constant size. Since we now determine the ith order statistic by brute-force, the status of the remaining elements are simultaneously resolved. Cormen, Ex 9.3-5, page 192 is also easily solved. After each median-finding step, the ith order statistic is located in one of the two half-sized subarrays. Thus the complexity is that of the sum: Cormen, Order Statistics, page 192, 9.3-7 c n + c n/2 + c n/4 +... = 2 c n We first find the median of the n numbers. To determine the k numbers that are closest to the median, we find the absolute distances of the remaining numbers from the median, and find the kth smallest of these. These are then used to find the k closest to the median. Cormen, Order Statistics, page 192, 10.3-8 c Dr. Asish Mukhopadhyay 5

Given two sorted arrays X[] and Y [], each of size n, find their median in O(log n). This problem is interesting. We first compare the x n/2 th element (median) of X[], with the y n/2 th element (median) of Y []. Suppose y n/2 > x n/2. Then between the two arrays there are more than n elements that are less than y n/2, Thus we can prune from consideration the n/2 elements of Y [] that are greater than y n/2. Next, we compare the x 3n/4 th element of X[] with the y n/4 th element of Y []. Let x 3n/4 > y n/4. In this case, we prune the elements of X[] that are greater than x 3n/4 since between the two arrays we have more than n elements that are less than x 3n/4. Next, we compare y 3n/8 with x 5n/8. Thus in O(log n) steps we prune n elements to obtain the median. We terminate when the intervals of indetermination are of size 1, when we determine the median by brute-force. COMBINED-MEDIAN(X[p,q], Y[r,s]) // q-p+1 = s-r+1 1. if (p=q) and (r=s) 2. then return max(x[p], Y[r]) 3. elseif X[(p+q)/2] > Y[(r+s)/2] 4. return COMBINED-MEDIAN(X[p,(p+q)/2], Y[(r+s)/2, s]) //search space reduce by half 5. else 6. return COMBINED-MEDIAN(X[(p+q)/2, q], Y[r,(r+s)/2]) //search space reduce by half Cormen, 192, 9.3-9 Suppose there is just one oilfield. The optimal solution is one in which the east-west line goes through the oilfield. When there are two oilfields, the east-west line can have any y-value that lies between the two oilfields. The sum of the distances is always the distance betwen the oilfields. Thus when there are n oilfields, and n is odd the best solution is obtained when the east-west line goes through the oil-well with the median y-value. For n even, the optimal solution is obtained when the eaest west line goes between the wells with the n/2th y-coordinate and the n/2 + 1th y-coordinate values. c Dr. Asish Mukhopadhyay 6