A strategy for designing greedy algorithms and proving optimality

Similar documents
International Journal of Advanced Research in Computer Science and Software Engineering

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Analysis of Algorithms I: Optimal Binary Search Trees

Section IV.1: Recursive Algorithms and Recursion Trees

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Full and Complete Binary Trees

Information, Entropy, and Coding

3. Mathematical Induction

Binary Trees and Huffman Encoding Binary Search Trees

Arithmetic Coding: Introduction

From Last Time: Remove (Delete) Operation

Data Structures Fibonacci Heaps, Amortized Analysis

Mathematical Induction. Lecture 10-11

Symbol Tables. Introduction

Binary Search Trees CMPSC 122

Near Optimal Solutions

GRAPH THEORY LECTURE 4: TREES

Lecture 1: Course overview, circuits, and formulas

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Reading 13 : Finite State Automata and Regular Expressions

Cosmological Arguments for the Existence of God S. Clarke

Fundamentele Informatica II

K-Cover of Binary sequence

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Gambling and Data Compression

The Prime Numbers. Definition. A prime number is a positive integer with exactly two positive divisors.

Ordered Lists and Binary Trees

Algorithms and Data Structures

1 if 1 x 0 1 if 0 x 1

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

Lecture 10: Regression Trees

Math 4310 Handout - Quotient Vector Spaces

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

Real Roots of Univariate Polynomials with Real Coefficients

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Approximation Algorithms

Network (Tree) Topology Inference Based on Prüfer Sequence

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

Notes on Complexity Theory Last updated: August, Lecture 1

Cardinality. The set of all finite strings over the alphabet of lowercase letters is countable. The set of real numbers R is an uncountable set.

CS 3719 (Theory of Computation and Algorithms) Lecture 4

6.080/6.089 GITCS Feb 12, Lecture 3

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

A Non-Linear Schema Theorem for Genetic Algorithms

Math 55: Discrete Mathematics

Properties of Stabilizing Computations

MATH10040 Chapter 2: Prime and relatively prime numbers

Lecture 6 Online and streaming algorithms for clustering

Optimal Binary Search Trees Meet Object Oriented Programming

COMPSCI 105 S2 C - Assignment 2 Due date: Friday, 23 rd October 7pm

Mathematical Induction

ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

CMPSCI611: Approximating MAX-CUT Lecture 20

Lecture 4 Online and streaming algorithms for clustering

Fairness in Routing and Load Balancing

6.2 Permutations continued

10CS35: Data Structures Using C

Notes from Week 1: Algorithms for sequential prediction

Continued Fractions and the Euclidean Algorithm

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Lecture 5 - CPA security, Pseudorandom functions

Regular Languages and Finite Automata

Computational Models Lecture 8, Spring 2009

An example of a computable

Analysis of Algorithms, I

Find-The-Number. 1 Find-The-Number With Comps

Classification/Decision Trees (II)

Sequential lmove Games. Using Backward Induction (Rollback) to Find Equilibrium

8 Primes and Modular Arithmetic

Optimal Online-list Batch Scheduling

CHAPTER 3. Methods of Proofs. 1. Logical Arguments and Formal Proofs

CS 103X: Discrete Structures Homework Assignment 3 Solutions

Chapter 4: Computer Codes

CSE 326: Data Structures B-Trees and B+ Trees

Dynamic Programming Problem Set Partial Solution CMPSC 465

Lecture 3: Finding integer solutions to systems of linear equations

INCIDENCE-BETWEENNESS GEOMETRY

CHAPTER 7 GENERAL PROOF SYSTEMS

Practice with Proofs

Analysis of Compression Algorithms for Program Data

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Scheduling Shop Scheduling. Tim Nieberg

This asserts two sets are equal iff they have the same elements, that is, a set is determined by its elements.

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

8 Divisibility and prime numbers

A Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Graph Theory Problems and Solutions

Linear Programming. March 14, 2014

Lemma 5.2. Let S be a set. (1) Let f and g be two permutations of S. Then the composition of f and g is a permutation of S.

Khalid Sayood and Martin C. Rost Department of Electrical Engineering University of Nebraska

Security-Aware Beacon Based Network Monitoring

TREE BASIC TERMINOLOGIES

Data Structures. Chapter 8

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)

Transcription:

A strategy for designing greedy algorithms and proving optimality Hung Q. Ngo This document outlines a strategy for designing greedy algorithms and proving their optimality. I illustrated the strategy with two examples in the lectures on Monday and Wednesday. We will begin with the INTERVAL SCHEDULING problem, then the generic description of the strategy, and finally the HUFFMAN CODING example. 1 The INTERVAL SCHEDULING problem In the INTERVAL SCHEDULING problem, we are given a set R of n intervals [s i, f i ), 1 i n. We are supposed to output a maximum-size subset of R that are mutually non-overlapping. Our algorithm A takes R as input and outputs A(R) a subset of non-overlapping intervals. The algorithm is a recursive one: (1) let I R be the interval with earliest finish time (i.e. smallest f i ), (2) let R be the set of intervals which do not overlap with I (3) return {I} A(R ) The hard part is to prove that the algorithm returns a best solution possible. Let OPT(R) denote the maximum number of non-overlapping intervals in R. We want to show that A(R) OPT(R), for all R. We will prove this by induction on R. The base case when R 1 obviously holds. Suppose A(R) OPT(R), for all input interval set R with at most n 1 intervals in it. Consider an input R with n intervals. From the recursive structure of the algorithm, it is easy to see that A(R) 1 + A(R ). By the induction hypothesis, we have Thus, A(R ) OPT(R ) A(R) 1 + OPT(R ). Consequently, we would be in business if we can prove that OPT(R) 1 + OPT(R ). (1) We will prove (1) by proving two claims. Notes based on Hung Ngo s guest lectures in Fall 09 version of CSE 331. 1

Claim 1.1. There exists an optimal solution O for the instance R (i.e. a set of non-overlapping intervals for which O OPT(R)) such that I O. Claim 1.2. Let O be the optimal solution in the previous claim. Let O O {I}. Then, O is optimal for the instance R ; namely O OPT(R ). Before proving the two claims, let s assume they are true and prove (1) first. By Claim 1, we have O OPT(R). By Claim 2, we have O OPT(R ). But O {I} O. Thus, O 1 + O, which implies (1) as desired. Now we prove the two claims. Proof of Claim 1.1. Let Ō be any optimal solution to the instance R. If Ō contains I then we are done. If not, let Ī be the interval in Ō with the earliest finish time. Since I was the interval with the earliest finish time overall, I s finish time is at least as early as Ī. Hence, I does not overlap with any interval in Ō {Ī}. Consequently, the set O Ō {I} {Ī} is a set of non-overlapping intervals. Since O Ō and Ō is optimal, O is optimal as well. And, O contains I as desired. Proof of Claim 1.2. Suppose O is not optimal for the instance R. Let Ō be any optimal solution to R. In particular, Ō > O. Recall that R consists of all intervals in R that do not overlap with I. This means that non of the intervals in Ō overlaps with I either. Thus, Ō {I} is a set of non-overlapping intervals. However, Ō {I} 1 + Ō > 1 + O O OPT(R). This is a contradiction as there can be no set of non-overlapping intervals with size more than OPT(R). 2 A general greedy algorithm design strategy and its analysis The above analysis of the recursive algorithm might seem convoluted at first read. However, it follows a very clear algorithm design and proof strategy, which is the subject of this section. 2.1 The design strategy Let us consider an instance R of some optimization problem, in which we want to find a solution to minimize or maximize something. In many cases, a solution can be built piece-by-piece, using a recursive structure similar to the algorithm described in the previous section. Generic Greedy Algorithm A on instance R (1) Construct a piece I in a greedy manner. (2) Define a sub-problem R (3) Glue I to A(R ) and output the result You should try to compare the above generic description with the algorithm from Section 1. Here s another example with precisely the same structure. Given an input array a[1..n], sort the array in ascending order. The famous bubble sort algorithm has the same structure: (1) let e be the smallest element of a (2) let a be the array a with element e removed 2

(3) return e bubble-sort(a ), where denotes the concatenation operation An important point to notice here is that different problems typically require different ways to glue I to A(R ). Sorting in ascending order can be formulated as the problem of minimizing the number of inversions in the array, which we will encounter later. 2.2 The analysis The cost of the final solution returned by A, denoted by A(R), can often be computed from the cost of the piece I and the cost of the sub-solution A(R ): A(R) cost(i) + A(R ). (2) In some problem, the cost is not additive as above. It could be the case that A(R) cost(i) A(R ), or some other way of computing A(R) from cost(i) and A(R ). However, we will use the additive case as an illustration because it occurs most often in practice. Our objective is to prove that A(R) OPT(R). And, we often can do this by induction on R : the size of the problem instance. By the induction hypothesis, we have A(R ) OPT(R ). Thus, all we actually have to do is to prove that OPT(R) cost(i) + OPT(R ). (3) Note the parallel between (1) and (3). Now, we will prove (3) by proving two claims. Claim 2.1 (Generic Claim 1). There exists an optimal solution O to the instance R which contains the greedy choice I. Claim 2.2 (Generic Claim 2). Given the O in Generic Claim 1, O O I ( cut off I from O) is an optimal solution to the instance R It is extremely important to note that the statements above are meaningless if they are not explicitly stated in the context of a particular problem. See the section on INTERVAL SCHEDULING and HUFFMAN CODING for concrete example. Our objective here is to outline the general line of reasoning only. Now, after proving the two claims, we can prove (3) by noting that OPT(R) cost(o) cost(i) + cost(o ) cost(i) + OPT(R ). The first equality holds because O is optimal for R. The second equality holds because O is O with I cut off. The third equality holds due to Generic claim 2. 3 Huffman coding Given a text file over some alphabet C in which letter c C occurs f(c) times, what is the best way to encode this text file such that the total number of bits needed is minimized? For example, suppose C has only 4 letters a, b, c, d. We can use 2 bits to encode each letter: 00 a, 01 b, 10 c, 11 d. The text string abbcccccccd will thus need 11 2 22 bits. In 1951, David Huffman, then a graduate student at MIT worked on a term paper whose topic was precisely this problem: how to design the most efficient coding scheme for a given a text file/string. He came up with the Huffman code. The idea is very simple: characters which appear more frequently in the text file should be encoded with less bits. 3

Huffman s idea is as follows. An encoding scheme can be represented by a full binary tree, where each leaf represents a character. For every internal node in the tree, mark the left branch with a 0, and the right branch with a 1. The sequence of 0s and 1s encountered when we travel from the root to a leaf is the encoding of the corresponding character. For instance, an encoding tree for the text string abbcccccccd might be * 0/ \1 * c 0/ \1 * b 0/ \1 a d The encoding is thus 000 a, 001 d, 01 b, 1 c. It is not hard to see that we can uniquely decode any string encoded using this method, which is called a prefix code. The total number of bits needed to encode abbcccccccd is now only 17 instead of 22. Given an encoding tree T, let d T (c) denote the depth of character c C in the tree T, then we need d T (c) bits to encode c. Thus, a text file with character frequencies f(c) will need cost(t ) c C f(c)d T (c) many bits to be encoded. The problem is to seek a full encoding tree T minimizing its cost. Huffman proposed the following algorithm, following precisely the strategy described in the previous section. Initially, each character in C is an isolated node. No tree is formed yet, and they will be merged gradually to form a full binary tree. (1) Let c 1 and c 2 be two least frequently appeared characters in C. Create a new fake character c 12. (2) Define the frequency of c 12 by f(c 12 ) f(c 1 ) + f(c 2 ). Let C C {c 12 } {c 1, c 2 }. (3) Let T be the optimal encoding tree for the new character set C (with the frequency for c 12 defined above). T is constructed recursively. To obtain T, join two nodes c 1, c 2 to the node c 12 in T. (This is how we glue the greedy choice to the sub-solution T.) 4

Now, it s easy to see that cost(t ) f(c)d T (c) c C f(c)d T (c) + f(c 1 )d T (c 1 ) + f(c 2 )d T (c 2 ) f(c)d T (c) + f(c 1 )(d T (c 12 ) + 1) + f(c 2 )(d T (c 12 ) + 1) f(c)d T (c) + f(c 1 )d T (c 12 ) + f(c 2 )d T (c 12 ) + f(c 1 ) + f(c 2 ) f(c)d T (c) + (f(c 1 ) + f(c 2 ))d T (c 12 ) + f(c 1 ) + f(c 2 ) f(c)d T (c) + f(c 12 )d T (c 12 ) + f(c 1 ) + f(c 2 ) c C f(c)d T (c) + f(c 1 ) + f(c 2 ) cost(t ) + f(c 1 ) + f(c 2 ) This is where I stopped on Wednesday! The equality cost(t ) f(c 1 ) + f(c 2 ) + cost(t ) (4) plays the role of (2). Now, we prove two claims along the general line of reasoning. Claim 3.1. There exists an optimal encoding tree T for the instance C in which c 1 and c 2 are siblings. Claim 3.2. Let T be the optimal encoding tree from the previous claim. Let c 12 be the parent of c 1 and c 2. Let T be the tree T with c 1 and c 2 removed. Then T is an optimal encoding tree for the sub problem C C {c 12 } {c 1, c 2 }, where f(c 12 ) f(c 1 ) + f(c 2 ). Let s assume the two claims are true. We want to prove that cost(t ) cost( T ) which is the optimal cost for C. We shall prove by induction on C that our algorithm always constructs an optimal encoding tree, which certainly holds when C 2. Let s assume that our algorithm is optimal for C n 1. Consider the case when C n. By an identical reasoning as above, we can show that cost( T ) f(c 1 ) + f(c 2 ) + cost( T ). By the induction hypothesis, we have cost(t ) cost( T ), because C n 1. This along with (4) give as desired. Now, we prove the two claims. cost( T ) f(c 1 ) + f(c 2 ) + cost(t ) cost(t ) 5

Proof of Claim 3.1. Let T be any optimal solution for C. If c 1 and c 2 are siblings, then we are done. Otherwise, without loss of generality, we assume that d T (c 1 ) d T (c 2 ) and that c is a sibling of c 1. Let T be the tree obtained from T by switching the nodes c 2 and c. Then, since everything else is the same except for c 2 and c, we have cost(t ) cost( T ) (f(c 2 )d T (c 2 ) + f(c)d T (c)) (f(c 2 )d T (c 2 ) + f(c)d T (c)) (f(c 2 )d T (c) + f(c)d T (c 2 )) (f(c 2 )d T (c 2 ) + f(c)d T (c)) (f(c 2 ) f(c))d T (c) + (f(c) f(c 2 ))d T (c 2 ) (f(c 2 ) f(c))(d T (c) d T (c 2 )) (f(c 2 ) f(c))(d T (c 1 ) d T (c 2 )) 0. The last inequality follows because f(c 2 ) f(c) (c 2 was one of the two least frequent characters), and d T (c 1 ) d T (c 2 ) by our assumption above. Since T is already optimal, cost(t ) cost( T ) implies that T must be optimal as well; moreover, c 1 and c 2 are siblings in T as we wanted to show. Proof of Claim 3.2. Suppose T is not optimal for the problem C. Then, there s an even better encoding tree ˆT for the sub-problem C : cost( ˆT ) < cost( T ). Now, let ˆT be the tree obtained by gluing c 1 and c 2 to the parent node c 12 in ˆT. Then, cost( ˆT ) f(c 1 ) + f(c 2 ) + cost( ˆT ) < f(c 1 ) + f(c 2 ) + cost( T ) cost( T ). This means that T is not the best solution to the problem C, a contradiction. 6