Mergesort and Quicksort

Similar documents
Sorting Algorithms. rules of the game shellsort mergesort quicksort animations. Reference: Algorithms in Java, Chapters 6-8

4.2 Sorting and Searching

Biostatistics 615/815

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Sorting Algorithms. Nelson Padua-Perez Bill Pugh. Department of Computer Science University of Maryland, College Park

Analysis of Binary Search algorithm and Selection Sort algorithm

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

CS473 - Algorithms I

CS473 - Algorithms I

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Converting a Number from Decimal to Binary

6. Standard Algorithms

Data Structures. Algorithm Performance and Big O Analysis

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Algorithms. Margaret M. Fleck. 18 October 2010

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

Introduction to Programming (in C++) Sorting. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

5.4 Closest Pair of Points

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Randomized algorithms

Class Overview. CSE 326: Data Structures. Goals. Goals. Data Structures. Goals. Introduction

Zabin Visram Room CS115 CS126 Searching. Binary Search

CS/COE

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Now is the time. For all good men PERMUTATION GENERATION. Princeton University. Robert Sedgewick METHODS

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Functions Recursion. C++ functions. Declare/prototype. Define. Call. int myfunction (int ); int myfunction (int x){ int y = x*x; return y; }

1.4 Arrays Introduction to Programming in Java: An Interdisciplinary Approach Robert Sedgewick and Kevin Wayne Copyright /6/11 12:33 PM!

Class : MAC 286. Data Structure. Research Paper on Sorting Algorithms

1/1 7/4 2/2 12/7 10/30 12/25

Exam study sheet for CS2711. List of topics

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Algorithm Analysis [2]: if-else statements, recursive algorithms. COSC 2011, Winter 2004, Section N Instructor: N. Vlajic

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

THIS CHAPTER studies several important methods for sorting lists, both contiguous

A COMPARATIVE STUDY OF LINKED LIST SORTING ALGORITHMS

Sample Questions Csci 1112 A. Bellaachia

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Chapter 7: Sequential Data Structures

Closest Pair Problem

Figure 1: Graphical example of a mergesort 1.

Basic Programming and PC Skills: Basic Programming and PC Skills:

Sequential Data Structures

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

The Running Time of Programs

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Binary Heap Algorithms

APP INVENTOR. Test Review

Binary Search Trees. basic implementations randomized BSTs deletion in BSTs

HW4: Merge Sort. 1 Assignment Goal. 2 Description of the Merging Problem. Course: ENEE759K/CMSC751 Title:

Full and Complete Binary Trees

csci 210: Data Structures Recursion

Binary search tree with SIMD bandwidth optimization using SSE

Section IV.1: Recursive Algorithms and Recursion Trees

Sample Induction Proofs

Lecture Notes on Linear Search

Binary Multiplication

AP Computer Science AB Syllabus 1

cs2010: algorithms and data structures

6 March Array Implementation of Binary Trees

Pseudo code Tutorial and Exercises Teacher s Version

Year 9 set 1 Mathematics notes, to accompany the 9H book.

Introduction to Programming System Design. CSCI 455x (4 Units)

AP Computer Science Java Mr. Clausen Program 9A, 9B

Data Structures and Algorithms Written Examination

To My Parents -Laxmi and Modaiah. To My Family Members. To My Friends. To IIT Bombay. To All Hard Workers

Chapter 8: Bags and Sets

Quick Sort. Implementation

2) Write in detail the issues in the design of code generator.

Regression Verification: Status Report

Chapter 13: Query Processing. Basic Steps in Query Processing

What Is Recursion? Recursion. Binary search example postponed to end of lecture

In mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write

Persistent Binary Search Trees

Data Structures and Data Manipulation

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

Dynamic Programming. Lecture Overview Introduction

For example, we have seen that a list may be searched more efficiently if it is sorted.

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Algorithm Design and Recursion

Generating Elementary Combinatorial Objects

An Innocent Investigation

Answer: (a) Since we cannot repeat men on the committee, and the order we select them in does not matter, ( )

dictionary find definition word definition book index find relevant pages term list of page numbers

Cpt S 223. School of EECS, WSU

CSE 421 Algorithms: Divide and Conquer

3.3 Addition and Subtraction of Rational Numbers

MAT Program Verication: Exercises

Appendix: Solving Recurrences [Fa 10] Wil Wheaton: Embrace the dark side! Sheldon: That s not even from your franchise!

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Transcription:

Mergesort Mergesort and Quicksort Basic plan: Divide array into two halves. Recursively sort each half. Merge two halves to make sorted whole. mergesort mergesort analysis quicksort quicksort analysis animations Reference: Algorithms in Java, Chapters 7 and 8 Copyright 2007 by Robert Sedgewick and Kevin Wayne. 1 3 Mergesort and Quicksort Mergesort: Example Two great sorting algorithms. Full scientific understanding of their properties has enabled us to hammer them into practical system sorts. Occupy a prominent place in world's computational infrastructure. Quicksort honored as one of top 10 algorithms of 20 th century in science and engineering. Mergesort. Java sort for objects. Perl, Python stable. Quicksort. Java sort for primitive types. C qsort, Unix, g++, Visual C++, Python. 2 4

Merging Merging. Combine two pre-sorted lists into a sorted whole. How to merge efficiently? Use an auxiliary array. copy merge aux[] a[] l A G L O R H I M S T A G H I L M i m private static void merge(comparable[] a, Comparable[] aux, int l, int m, int r) for (int k = l; k < r; k++) aux[k] = a[k]; int i = l, j = m; for (int k = l; k < r; k++) if (i >= m) a[k] = aux[j++]; else if (j >= r) a[k] = aux[i++]; else if (less(aux[j], aux[i])) a[k] = aux[j++]; else a[k] = aux[i++]; k j r mergesort mergesort analysis quicksort quicksort analysis animations 5 7 Mergesort: Java implementation of recursive sort Mergesort analysis: Memory public class Merge private static void sort(comparable[] a, Comparable[] aux, int l, int r) if (r <= l + 1) return; int m = l + (r - l) / 2; sort(a, aux, l, m); sort(a, aux, m, r); merge(a, aux, l, m, r); Q. How much memory does mergesort require? A. Too much Original input array = N. Auxiliary array for merging = N. Local variables: constant. Function call stack: log 2 N [stay tuned]. Total = 2N + O(log N). cannot fill the memory and sort public static void sort(comparable[] a) Comparable[] aux = new Comparable[a.length]; sort(a, aux, 0, a.length); Q. How much memory do other sorting algorithms require? N + O(1) for insertion sort and selection sort. In-place = N + O(log N). l m r Challenge for the bored. In-place merge. [Kronrud, 1969] 10 11 12 13 14 15 16 17 18 19 6 8

Mergesort analysis: Comparison count Mergesort recurrence: Proof 2 (by telescoping) Def. T(N) number of comparisons to mergesort an input of size N = T(N/2) + T(N/2) + N for N > 1, with T(1) = 0 (assume that N is a power of 2) left half right half merge Mergesort recurrence for N > 1, with T(1) = 0 Pf. T(N)/N = 2 T(N/2)/N + 1 given divide both sides by N not quite right for odd N = T(N/2)/(N/2) + 1 algebra same recurrence holds for many algorithms = T(N/4)/(N/4) + 1 + 1 telescope (apply to first term) same number of comparisons for any input of size N. Solution of Mergesort recurrence T(N) ~ N lg N lg N log 2 N = T(N/8)/(N/8) + 1 + 1 + 1... = T(N/N)/(N/N) + 1 + 1 +...+ 1 telescope again stop telescoping, T(1) = 0 true for all N, as long as integer approx of N/2 is within a constant = lg N easy to prove when N is a power of 2. can then use induction for general N (see COS 341) T(N) = N lg N 9 11 Mergesort recurrence: Proof 1 (by recursion tree) Mergesort recurrence: Proof 3 (by induction) for N > 1, with T(1) = 0 (assume that N is a power of 2) for N > 1, with T(1) = 0 (assume that N is a power of 2) T(N) + Claim. If T(N) satisfies this recurrence, then T(N) = N lg N. Pf. [by induction on N] Base case: N = 1. T(N/2) T(N/2) N = N Inductive hypothesis: T(N) = N lg N Goal: show that T(2N) + 2N lg (2N). lg N T(N/4) T(N/4) T(N / 2 k ) T(N/4) T(N/4) 2(N/2)... 2 k (N/2 k )... = N = N T(2N) = 2 T(N) + 2N given = 2 N lg N + 2 N inductive hypothesis = 2 N (lg (2N) - 1) + 2N algebra = 2 N lg (2N) QED T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2) N/2 (2) = N T(N) = N lg N N lg N Ex. (for COS 341). Extend to show that T(N) ~ N lg N for general N 10 12

Bottom-up mergesort Mergesort: Practical Improvements Basic plan: Pass through file, merging to double size of sorted subarrays. Do so for subarray sizes 1, 2, 4, 8,..., N/2, N. proof 4 that Mergesort uses N lgn compares No recursion needed 13 Use sentinel. Two statements in inner loop are array-bounds checking. Reverse one subarray so that largest element is sentinel (Program 8.2) Use insertion sort on small subarrays. Mergesort has too much overhead for tiny subarrays. Cutoff to insertion sort for 7 elements. Stop if already sorted. Is biggest element in first half " smallest element in second half? Helps for nearly ordered lists. Eliminate the copy to the auxiliary array. Save time (but not space) by switching the role of the input and auxiliary array in each recursive call. See Program 8.4 (or Java system sort) 15 Bottom-up Mergesort: Java implementation Sorting Analysis Summary uses sentinel (see Program 8.2) public class Merge private static void merge(comparable[] a, Comparable[] aux, int l, int m, int r) for (int i = l; i < m; i++) aux[i] = a[i]; for (int j = m; j < r; j++) aux[j] = a[m + r - j - 1]; int i = l, j = r - 1; for (int k = l; k < r; k++) if (less(aux[j], aux[i])) a[k] = aux[j--]; else a[k] = aux[i++]; Running time estimates: Home pc executes 10 8 comparisons/second. Supercomputer executes 10 12 comparisons/second. Insertion Sort (N 2 ) Mergesort (N log N) computer thousand million billion thousand million billion home 2.8 hours 317 years 1 sec 18 min super 1 second 1.6 weeks public static void sort(comparable[] a) int N = a.length; Comparable[] aux = new Comparable[N]; for (int m = 1; m < N; m = m+m) for (int i = 0; i < N-m; i += m+m) merge(a, aux, i, i+m, Math.min(i+m+m, N)); Lesson 1. Good algorithms are better than supercomputers. Concise industrial-strength code if you have the space 14 16

Quicksort Partitioning Q. How do we partition in-place efficiently? A. Scan from right, scan from left, exchange mergesort mergesort analysis quicksort quicksort analysis animations 17 19 Quicksort Quicksort example Basic plan. Shuffle the array. Partition array so that: element a[i] is in its final place for some i no larger element to the left of i no smaller element to the right of i Sort each piece recursively. Sir Charles Antony Richard Hoare 1980 Turing Award 18 20

Quicksort: Java implementation of recursive sort Quicksort Implementation details Partitioning in-place. Using a spare array makes partitioning easier, but is not worth the cost. public class Quick public static void sort(comparable[] a) StdRandom.shuffle(a); sort(a, 0, a.length - 1); private static void sort(comparable[] a, int l, int r) if (r <= l) return; int m = partition(a, l, r); sort(a, l, m-1); sort(a, m+1, r); Terminating the loop. Testing whether the pointers cross is a bit trickier than it might seem. Staying in bounds. The (i == r) test is redundant, but the (j == l) test is not. Preserving randomness. Shuffling is key for performance guarantee. Equal keys. When duplicates are present, it is (counter-intuitively) best to stop on elements equal to partitioning element. 21 23 Quicksort: Java implementation of partitioning procedure private static int partition(comparable[] a, int l, int r) int i = l - 1; int j = r; while(true) while (less(a[++i], a[r])) if (i == r) break; while (less(a[r], a[--j])) if (j == l) break; if (i >= j) break; find item on left to swap find item on right to swap check if pointers cross mergesort mergesort analysis quicksort quicksort analysis animations exch(a, i, j); swap exch(a, i, r); return i; swap with partitioning element return index where crossing occurs 22 24

Quicksort: Average-case analysis Quicksort: Summary of performance characteristics Theorem. The average number of comparisons C N to quicksort a random file of N elements is about 2N ln N. The precise recurrence satisfies C 0 = C 1 = 0 and for N # 2: C N = N + 1 + ((C 0 + C N-1 ) +... + (C k-1 + C N-k ) +... + (C N-1 + C 1 )) / N partition left right partitioning probability = N + 1 + 2 (C 0... + C k-1 +... + C N-1 ) / N Multiply both sides by N NC N = N(N + 1) + 2 (C 0... + C k-1 +... + C N-1 ) Worst case. Number of comparisons is quadratic. N + (N-1) + (N-2) + + 1 N 2 / 2. More likely that your computer is struck by lightning. Average case. Number of comparisons is ~ 1.39 N lg N. 39% more comparisons than mergesort. but faster than mergesort in practice because of lower cost of other high-frequency operations. Random shuffle probabilistic guarantee against worst case basis for math model that can be validated with experiments Subtract the same formula for N-1: NC N - (N - 1)C N-1 = N(N + 1) - (N - 1)N + 2 C N-1 Caveat emptor. Many textbook implementations go quadratic if input: Is sorted. Simplify: NC N = (N + 1)C N-1 + 2N Is reverse sorted. Has many duplicates (even if randomized) [stay tuned] 25 27 Quicksort: Average Case Sorting analysis summary NC N = (N + 1)C N-1 + 2N Divide both sides by N(N+1) to get a telescoping sum: Running time estimates: Home pc executes 10 8 comparisons/second. C N / (N + 1) = C N-1 / N + 2 / (N + 1) Supercomputer executes 10 12 comparisons/second. = C N-2 / (N - 1) + 2/N + 2/(N + 1) Insertion Sort (N 2 ) Mergesort (N log N) = C N-3 / (N - 2) + 2/(N - 1) + 2/N + 2/(N + 1) computer home thousand million 2.8 hours billion 317 years thousand million 1 sec billion 18 min = 2 ( 1 + 1/2 + 1/3 +... + 1/N + 1/(N + 1) ) super 1 second 1.6 weeks Approximate the exact answer by an integral: thousand Quicksort (N log N) million billion C N 2(N + 1)( 1 + 1/2 + 1/3 +... + 1/N ) N 0.3 sec 6 min Finally, the desired result: = 2(N + 1) H N 2(N + 1)" dx/x 1 Lesson 1. Good algorithms are better than supercomputers. Lesson 2. Great algorithms are better than good ones. C N 2(N + 1) ln N 1.39 N lg N 26 28

Quicksort: Practical improvements Insertion sort animation Median of sample. Best choice of pivot element = median. But how to compute the median? Estimate true median by taking median of sample. left of pointer is in sorted order right of pointer is untouched Insertion sort small files. Even quicksort has too much overhead for tiny files. Can delay insertion sort until end. Optimize parameters. Median-of-3 random elements. 12/7 N log N comparisons a[i] Cutoff to insertion sort for 10 elements. Non-recursive version. Use explicit stack. Always sort smaller half first. guarantees O(log N) stack size All validated with refined math models and experiments 29 i 31 Mergesort animation done untouched merge in progress input mergesort mergesort analysis quicksort quicksort analysis animations 30 merge in progress output auxiliary array 32

Bottom-up mergesort animation this pass last pass merge in progress input merge in progress output auxiliary array 33 Quicksort animation i first partition v second partition done j 34