Intro. to the Divide-and-Conquer Strategy via Merge Sort CMPSC 465 CLRS Sections 2.3, Intro. to and various parts of Chapter 4

Similar documents
Binary Search Trees CMPSC 122

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Vieta s Formulas and the Identity Theorem

Converting a Number from Decimal to Binary

CS473 - Algorithms I

Dynamic Programming Problem Set Partial Solution CMPSC 465

Data Structures and Algorithms Written Examination

6.3 Conditional Probability and Independence

Section IV.1: Recursive Algorithms and Recursion Trees

Data Structures Fibonacci Heaps, Amortized Analysis

Full and Complete Binary Trees

CS473 - Algorithms I

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

Algorithms. Margaret M. Fleck. 18 October 2010

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

No Solution Equations Let s look at the following equation: 2 +3=2 +7

Lecture 1: Course overview, circuits, and formulas

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Analysis of Algorithms I: Optimal Binary Search Trees

Analysis of Algorithms I: Binary Search Trees

Loop Invariants and Binary Search

Finding Rates and the Geometric Mean

Lecture Notes on Linear Search

From Last Time: Remove (Delete) Operation

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

the recursion-tree method

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Ordered Lists and Binary Trees

Algorithm Design and Analysis Homework #1 Due: 5pm, Friday, October 4, 2013 TA === Homework submission instructions ===

Biostatistics 615/815

Symbol Tables. Introduction

Closest Pair Problem

3. Mathematical Induction

Greatest Common Factor and Least Common Multiple

GRAPH THEORY LECTURE 4: TREES

Dynamic Programming. Lecture Overview Introduction

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Answer: (a) Since we cannot repeat men on the committee, and the order we select them in does not matter, ( )

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

11 Multivariate Polynomials

Data Structures and Algorithms

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Solutions of Linear Equations in One Variable

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

recursion, O(n), linked lists 6/14

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

1 Review of Newton Polynomials

We can express this in decimal notation (in contrast to the underline notation we have been using) as follows: b + 90c = c + 10b

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Reading 13 : Finite State Automata and Regular Expressions

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

1/1 7/4 2/2 12/7 10/30 12/25

Research Tools & Techniques

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Row Echelon Form and Reduced Row Echelon Form

Lecture 6 Online and streaming algorithms for clustering

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Binary Trees and Huffman Encoding Binary Search Trees

Lecture 3: Finding integer solutions to systems of linear equations

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

8 Primes and Modular Arithmetic

Optimal Binary Search Trees Meet Object Oriented Programming

International Journal of Advanced Research in Computer Science and Software Engineering

Lecture 4 Online and streaming algorithms for clustering

Sequential Data Structures

10CS35: Data Structures Using C

PES Institute of Technology-BSC QUESTION BANK

Class Overview. CSE 326: Data Structures. Goals. Goals. Data Structures. Goals. Introduction

Chapter 8: Bags and Sets

Data Structure [Question Bank]

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Base Conversion written by Cathy Saxton

Years after US Student to Teacher Ratio

CS 2302 Data Structures Spring 2015

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Math 55: Discrete Mathematics

1. The volume of the object below is 186 cm 3. Calculate the Length of x. (a) 3.1 cm (b) 2.5 cm (c) 1.75 cm (d) 1.25 cm

CSE 326: Data Structures B-Trees and B+ Trees

Section 1.5 Exponents, Square Roots, and the Order of Operations

6. Standard Algorithms

Triangulation by Ear Clipping

Pushdown automata. Informatics 2A: Lecture 9. Alex Simpson. 3 October, School of Informatics University of Edinburgh als@inf.ed.ac.

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Regions in a circle. 7 points 57 regions

Continued Fractions. Darren C. Collins

Cartesian Products and Relations

The ADT Binary Search Tree

8 Divisibility and prime numbers

The Taxman Game. Robert K. Moniot September 5, 2003

MILS and MOA A Guide to understanding what they are and How to derive the Range Estimation Equations

Transcription:

Intro. to the Divide-and-Conquer Strategy via Merge Sort CMPSC 465 CLRS Sections 2.3, Intro. to and various parts of Chapter 4 I. Algorithm Design and Divide-and-Conquer There are various strategies we can use to design algorithms. One is an incremental approach, of the sort of approach where we start with a solution for a single element problem and figure out ways, one increment at a time, to build the solution to the next largest problem from the prior solution. Insertion sort fits this strategy. Question: Why? Because at each step, we have already we insert into its place,, and we get out. Another common strategy is called divide-and-conquer. It is for situations that are naturally recursive and has three components: Divide Conquer Combine This is one of three major algorithm design strategies we ll study in this course. The other two are greedy algorithms (you got a taste of that in 360 with minimum spanning trees) and dynamic programming. II. Merge Sort We ve been discussing the merge sort algorithm informally all along. In this lesson, we ll focus on merge sort and use it as an introduction to the important ideas of analyzing divide-and-conquer algorithms. We looked at merge sort on the first day. The idea is to sort a subarray A[p...r], where we start with p = and r =. However, the algorithm is recursive, and at each level, the values of p and r change. Here s the high-level overview of merge sort: 1. Divide the array into two subarrays and, where q is the midpoint of p and r 2. Conquer by recursively sorting the two subarrays. 3. Combine by the two sorted subarrays A[p..q] and A[q+1..r] to produce a single sorted subarray. Question: All recursive definitions and algorithms need a base case. What s the base case here? Page 1 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465

Here s the pseudocode for merge sort from CLRS: MERGE-SORT(A, p, r) if p < r // check for base case q = ( p + r) / 2 // divide MERGE-SORT(A, p, q) MERGE-SORT(A, q + 1, r) MERGE(A, p, q, r) Let s trace the algorithm a few times. // conquer // conquer // combine Example: Trace MERGE-SORT for the following initial array: index 1 2 3 4 5 6 7 8 value 5 2 4 7 1 3 2 6 Example: Trace merge sort for the following initial array: index 1 2 3 4 5 6 7 8 9 10 11 value 4 7 2 6 1 4 7 3 5 2 6 Page 2 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465

III. Merging in Linear Time The concept of merging is relatively intuitive, but implementing it or expressing it in pseudocode isn t quite as straightforward. The problem s input and output are the following: Input with preconditions: Array A and indices p, q, r s.t. p q < r, subarray A[p..q] is sorted, and subarray A[q+1..r] is sorted Output/postcondition: Subarrays A[p..q] and A[q+1..r] have been merged into a single sorted subarray in A[p..r] Last week, we quickly discussed why this can be done in Θ(n) time, but let s look a little more carefully here. Let s visualize this as two piles of cards, where each pile is sorted and we can see the smallest card from each pile on top. We ll have the input piles face up. We want to merge those cards into a single sorted pile, face down. Let s act this out, but you need to count as we go. I ll leave you space to take notes and keep count So, Each basic step was to and move it to an output pile, exposing a smaller card. We repeatedly perform basic steps until. Then, we finish by. Each basic step takes time. What s the maximum number of basic steps?. What s the running time? The CLRS version does empty pile detection via a sentinel card, whose value is guaranteed to lose in a comparison to any value. So, works for this, and you ll see it in the pseudocode. In this mindset Given that the input array is indexed from p to r, there are exactly non-sentinel cards. So, we can use this to stop the algorithm after basic steps. Thus we can just fill up the output array from index p through index r. Page 3 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465

Here s the pseudocode for merging from CLRS: Example: Suppose array A is as below and trace MERGE(A, 9, 12, 16): index 9 10 11 12 13 14 15 16 value 2 4 5 7 1 2 3 6 Page 4 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465

IV. The Correctness of the Merge Algorithm Let s study the correctness of the Merge algorithm via proving the correctness of the last loop, which is the heart of the algorithm as far as proving correctness goes: Loop Invariant: Correctness Proof: Initialization: Maintenance: Termination: Page 5 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465

V. Back to Recurrences The key tool to analyze divide-and-conquer algorithms is the recurrence. We can use a recurrence to describe the running time of a divide-and-conquer algorithm very naturally, because of the recursion in the algorithm. In this world, we use the following conventions: We let T (n) be a function for the running time for a problem of size n. A base case corresponds to a small enough problem size and we use a simple or brute force strategy we say takes Θ(1) time. The recursive case is where we divide a problem into a subproblems, each 1/b the size of the original. We express the time to divide a size n problem as D(n). We express the time to combine the solutions to the subproblems as C(n). So, using these conventions, a general form of a recurrence for the running time of a divide-and-conquer algorithm is As always, our goal with a recurrence is to solve it for a closed form, i.e. a formula that determines the same sequence. Expanding upon the list we have from discrete math, there are several methods for solving recurrences: 1. Iteration: Start with the recurrence and keep applying the recurrence equation until we get a pattern. (The result is a guess at the closed form.) 2. Substitution: Guess the solution; prove it using induction. (The result here is a proven closed form. It's often difficult to come up the guess blindly though, so, in practice, we need another strategy to guess the closed form.) 3. Recursion Tree: Draw a tree that illustrates the decomposition of a problem of size n into subproblems and tally the costs of each level of the tree. Use this to find a closed form. (Like iteration, this is really a guess at the closed form.) 4. Master Theorem: Plugging into a formula that gives an approximate bound on the solution. (The result here is only a bound on the closed form. It is not an exact solution. For many of our purposes, that s good enough.) We ll look at the new techniques in depth in this chapter. Here are some other issues that are specific to the use of recurrences in analyzing algorithms: Floors and ceilings: Expressing boundary conditions: Asymptotic notation: Page 6 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465

VI. The Merge Sort Recurrence In Epp 11.5, we derived a merge sort recurrence. We ll simplify it a bit here and state it in the same symbols as above. Some notes: The base case occurs when n = 1. For n 2, o The divide step is a computation of the average of p and q and takes Θ(1) o The conquer step is to solve two recursive subproblems, each of size n/2 o The combine step is to merge an n element array, which takes Θ(n) time So the recurrence is: Θ(1) for n = 1 T (n) = 2T (n / 2) + Θ(n) for n > 1 as Θ(n) + Θ(1) = Θ(n). VII. Analysis via Recursion Tree Let s simplify the recurrence as follows, using c as a constant (which, in general is not the same, but works for this case and keeps the analysis clean): T (n) = c for n = 1 2T (n / 2) + cn for n > 1 While we could solve this recurrence with our old iteration strategy, we ll use a new strategy called a recursion tree, which visually represents the recursion and is less prone to errors. At each step: The root of the recursion tree (or subtree) is the cost of dividing and combining at that step. We branch to children, with one child for each subproblem. o o Initially, we label the nodes as the recurrence function value. Then, we apply another round of recursion, repeating the whole process and replacing the roots with their costs as described. We continue this expansion until problem sizes get to the base case, or 1. Let s draw the first step of the recursion tree for merge sort: Let s draw the recursion tree after the first two steps: Page 7 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465

Now let s draw the complete recursion tree: Now, we must analyze the tree s cost at each level: What is the cost of the top level? What is the cost per subproblem on the second level? How many subproblems? Second level cost? Third level cost? Cost per level? Next, What is the height of the tree? How many levels does the tree have? (we ll prove this momentarily) What is the total cost of the tree? What is the merge sort running time? Page 8 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465

Finally, let s be clean in our analysis and prove the claim that there are lg n + 1 levels, for problem sizes that are powers of 2, inductively: Homework: CLRS Exercises 2.3-1, 2.3-2 Page 9 of 9 Prepared by D. Hogan referencing CLRS - Introduction to Algorithms (3rd ed.) for PSU CMPSC 465