Lecture 6: Expected Running Time of Quick-Sort (CLRS , (C.2))

Similar documents
Biostatistics 615/815

Randomized algorithms

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Converting a Number from Decimal to Binary

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

CS473 - Algorithms I

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Algorithms. Margaret M. Fleck. 18 October 2010

6. Standard Algorithms

Sorting Algorithms. Nelson Padua-Perez Bill Pugh. Department of Computer Science University of Maryland, College Park

CS473 - Algorithms I

Distributed Computing over Communication Networks: Maximal Independent Set

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Dynamic Programming. Lecture Overview Introduction

Analysis of Binary Search algorithm and Selection Sort algorithm

Algorithm Analysis [2]: if-else statements, recursive algorithms. COSC 2011, Winter 2004, Section N Instructor: N. Vlajic

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

Zabin Visram Room CS115 CS126 Searching. Binary Search

CMSC 451 Design and Analysis of Computer Algorithms 1

Closest Pair Problem

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Section IV.1: Recursive Algorithms and Recursion Trees

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Near Optimal Solutions

7 Gaussian Elimination and LU Factorization

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

28 Closest-Point Problems

Quick Sort. Implementation

From Last Time: Remove (Delete) Operation

Algorithm Design and Analysis Homework #1 Due: 5pm, Friday, October 4, 2013 TA === Homework submission instructions ===

Dalhousie University CSCI 2132 Software Development Winter 2015 Lab 7, March 11

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

DESIGN AND ANALYSIS OF ALGORITHMS

Approximation Algorithms

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

11 Multivariate Polynomials

Sequential Data Structures

Part 2: Community Detection

Algorithms for Computing Convex Hulls Using Linear Programming

1 Portfolio Selection

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Dynamic Programming Problem Set Partial Solution CMPSC 465

Lecture 1: Course overview, circuits, and formulas

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

14.1 Rent-or-buy problem

APP INVENTOR. Test Review

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Offline sorting buffers on Line

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

4.2 Sorting and Searching

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Data Structures. Algorithm Performance and Big O Analysis

The Running Time of Programs

Minimax Strategies. Minimax Strategies. Zero Sum Games. Why Zero Sum Games? An Example. An Example

Using Data-Oblivious Algorithms for Private Cloud Storage Access. Michael T. Goodrich Dept. of Computer Science

Lecture 4 Online and streaming algorithms for clustering

Data Structures and Algorithms Written Examination

Introduction to Programming (in C++) Sorting. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

from Recursion to Iteration

TREE BASIC TERMINOLOGIES

The Relative Worst Order Ratio for On-Line Algorithms

Financial Mathematics and Simulation MATH Spring 2011 Homework 2

Exam study sheet for CS2711. List of topics

2WB05 Simulation Lecture 8: Generating random variables

Linear Programming. April 12, 2005

HW4: Merge Sort. 1 Assignment Goal. 2 Description of the Merging Problem. Course: ENEE759K/CMSC751 Title:

CSE 421 Algorithms: Divide and Conquer

Computer Science 210: Data Structures. Searching

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

Analysis of Algorithms I: Binary Search Trees

I. Introduction. MPRI Cours Lecture IV: Integer factorization. What is the factorization of a random number? II. Smoothness testing. F.

Introduction to Parallel Programming and MapReduce

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

Binary Heap Algorithms

Lecture 13: The Knapsack Problem

Big Data Processing with Google s MapReduce. Alexandru Costan

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Roots of Equations (Chapters 5 and 6)

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

THE SCHEDULING OF MAINTENANCE SERVICE

Notes on Complexity Theory Last updated: August, Lecture 1

Data Structures and Data Manipulation

Symbol Tables. Introduction

You are to simulate the process by making a record of the balls chosen, in the sequence in which they are chosen. Typical output for a run would be:

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

5.1 Bipartite Matching

Transcription:

Lecture 6: Expected Running Time of Quick-Sort (CLRS 7.3-7.4, (C.2)) May 23rd, 2002 1 Quick-sort review Last time we discussed quick-sort. Quick-Sort is opposite of merge-sort Obtained using divide-and-conquer Abstract algorithm Divide A[1...n] into subarrays A = A[1..q 1] and A =A[q+1...n] such that all elements in A are larger than A[q] andallelementsina are smaller than A[q]. Recursively sort A and A. (nothing to combine/merge. A already sorted after sorting A and A ) Pseudo code: Partition(A, p, r) x = A[r] i = p 1 FOR j = p TO r 1DO IF A[j] x THEN i = i +1 Exchange A[i] anda[j] OD Exchange A[i + 1]andA[r] RETURN i +1 Quicksort(A, p, r) IF p<rthen q=partition(a, p, r) Quicksort(A, p, q 1) Quicksort(A, q +1,r) Sort using Quicksort(A, 1,n) 1

Analysis : Partition runs in Θ(r p) time. If array is always partitioned nicely in two halves (partition returns q = r p 2 ), we have the recurrence T (n) =2T(n/2) + Θ(n) T (n) =Θ(nlg n). Butintheworstcase,partition always returns q = p (when input is sorted) and in this case we get the recurrence T (n) =T(n 1) + T (1) + Θ(n) T (n) =Θ(n 2 ) What s maybe even worse is that the worst-case happens when the data is already sorted. Quick-sort often perform well in practice and last time we started trying to justify this theoretically. We saw that even if all the splits are relatively bad (we looked at the case 9 still get worst-case running time O(n log n). To justify it further we define average and expected running time. 10 n, 1 10 n)we 2 Average and Expected Running Time (Randomized Algorithms) We are normally interested in worst-case running time of an algorithm, that is, the maximal running time over all input of size n T (n) =max X =n T(X) We are sometimes interested in analyzing the average-case running time of an algorithm, that is, the expected value for the running time, over all input of size n T a (n) =E X =n [T(n)] = X =n T (X) Pr[X] The problem is that we often don t know the probability Pr[X] of getting a particular input X. Sometime we assume that all possible inputs are equally likely, but thats often not very realistic in practice. Instead of using average case running time we therefore consider what we call randomized algorithms, that is, algorithms that make some random choices during their execution Running time of normal deterministic algorithm only depend on he input configuration. Running time of randomized algorithm depend not only on input configuration but also on the random choices made by the algorithm. Running time of a randomized algorithm is not fixed for a given input! We are often interested in analyzing the worst-case expected running time of a randomized algorithm, that is, the maximal of the average running times for all inputs of size n T e (n) =max X =n E[T(X)] 2

3 Randomized Quick-Sort We could analyze quick-sort assuming that we are sorting numbers 1 through n and that all n! different input configurations are equally likely. Average running time would be T a (n) =O(nlog n). The assumption that all inputs are equally likely are not very realistic (data tend to be somewhat sorted). We can enforce that all n! permutations are equally likely by randomly permuting the input before the algorithm Most computers have pseudo-random number generator random(1,n) returning random number between 1 and n Using pseudo-random number generator we can generate random permutation (all n! permutations equally likely) in O(n) time: Choose element in A[1] randomly among elements in A[1..n], choose element in A[2] randomly among elements in A[2..n], choose element in A[3] randomly among elements in A[3..n], and so on. (Note: Just choosing A[i] randomly among elements A[1..n] for all i will not give random permutation!) Alternatively we can modify Partition sightly and exchange last element in A with random element in A before partitioning RandPartition(A, p, r) i=random(p, r) Exchange A[r] and A[i] RETURN Partition(A, p, r) RandQuicksort(A, p, r) IF p<rthen q=randpartition(a, p, r) RandQuicksort(A, p, q 1) RandQuicksort(A, q +1,r) 3

4 Expected Running Time of Randomized Quick-Sort Running time of RandQuicksort is dominated by the time spent in Partition procedure. Partition is called n times The pivot element x is not included in any recursive calls. One call of Partition takes O(1) time plus time proportional to the number of iterations of FOR-loop. In each iteration of FOR-loop we compare an element with the pivot element. If X is the number of comparisons A[j] x performed in Partition over the entire execution of RandQuicksort then the running time is O(n + X). To analyze the expected running time we need to compute E[X] To compute X we use z 1,z 2,...,z n to denote the elements in A where z i is the ith smallest element. We also use Z ij to denote {z i,z i+1,...,z j }. Each pair of elements z i and z j are compared at most ones (when either of them is the pivot) X = n 1 nj=i+1 X ij where { 1 If zi compared to z X ij = i 0 If z i not compared to z i [ n 1 ] E[X] = E nj=i+1 X ij = n 1 nj=i+1 E[X ij ] = n 1 nj=i+1 Pr[z i compared to z j ] To compute Pr[z i compared to z j ] it is useful to consider when two elements are not compared. Example: Consider an input consisting of numbers 1 through n. Assume first pivot it 7 first partition separates the numbers into sets {1, 2, 3, 4, 5, 6} and {8, 9, 10}. In partitioning, 7 is compared to all numbers. No number from the first set will ever be compared to a number from the second set. In general, once a pivot x, z i <x<z j, is chosen, we know that z i and z j cannot later be compared. On the other hand, if z i is chosen as pivot before any other element in Z ij then it is compared to each element in Z ij. Similar for z j. In example: 7 and 9 are compared because 7 is first item from Z 7,9 to be chosen as pivot, and 2 and 9 are not compared because the first pivot in Z 2,9 is 7. Prior to an element in Z ij being chosen as pivot, the set Z ij is together in the same partition any element in Z ij is equally likely to be first element chosen as pivot 1 the probability that z i or z j is chosen first in Z ij is j i+1 Pr[z i compared to z j ]= 2 j i+1 4

We now have: E[X] = n 1 nj=i+1 Pr[z i compared to z j ] = n 1 nj=i+1 2 = n 1 < n 1 = n 1 j i+1 n i 2 k=1 k+1 n i k=1 2 k O(log n) = O(n log n) Next time we will see how to make quick-sort run in worst-case O(n log n) time. 5