Greedy. Greedy. Greedy. Greedy Examples: Minimum Spanning Tree

Similar documents
Approximation Algorithms

Warshall s Algorithm: Transitive Closure

Near Optimal Solutions

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

6.3 Conditional Probability and Independence

Cpt S 223. School of EECS, WSU

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

Social Media Mining. Graph Essentials

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Linear Programming. March 14, 2014

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Triangle deletion. Ernie Croot. February 3, 2010

NOTES ON LINEAR TRANSFORMATIONS

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Solutions to Homework 6

GRAPH THEORY LECTURE 4: TREES

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Definition Given a graph G on n vertices, we define the following quantities:

2.3 Convex Constrained Optimization Problems

3. Mathematical Induction

Scheduling Shop Scheduling. Tim Nieberg

Randomized algorithms

Mathematical Induction. Lecture 10-11

Welcome to the course Algorithm Design

Linear Programming. April 12, 2005

CS473 - Algorithms I

5.1 Bipartite Matching

Vector and Matrix Norms

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Algorithm Design and Analysis

Data Structures and Algorithms Written Examination

Practical Guide to the Simplex Method of Linear Programming

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

The Prime Numbers. Definition. A prime number is a positive integer with exactly two positive divisors.

8 Divisibility and prime numbers

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

Systems of Linear Equations

Large induced subgraphs with all degrees odd

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

What is Linear Programming?

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Problem Set 7 Solutions

SCORE SETS IN ORIENTED GRAPHS

Row Echelon Form and Reduced Row Echelon Form

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms

Connectivity and cuts

CMPSCI611: Approximating MAX-CUT Lecture 20

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

On Integer Additive Set-Indexers of Graphs

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Matrix Representations of Linear Transformations and Changes of Coordinates

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

T ( a i x i ) = a i T (x i ).

Lecture 1: Course overview, circuits, and formulas

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

Single machine parallel batch scheduling with unbounded capacity

Math 55: Discrete Mathematics

1 if 1 x 0 1 if 0 x 1

Full and Complete Binary Trees

Handout #Ch7 San Skulrattanakulchai Gustavus Adolphus College Dec 6, Chapter 7: Digraphs

Discrete Mathematics Problems

Section Inner Products and Norms

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

Distributed Computing over Communication Networks: Maximal Independent Set

Midterm Practice Problems

Follow links for Class Use and other Permissions. For more information send to:

5 Homogeneous systems

Graph Theory Problems and Solutions

Random graphs with a given degree sequence

A 2-factor in which each cycle has long length in claw-free graphs

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

A simpler and better derandomization of an approximation algorithm for Single Source Rent-or-Buy

All trees contain a large induced subgraph having all degrees 1 (mod k)

International Journal of Advanced Research in Computer Science and Software Engineering

Dynamic Programming. Lecture Overview Introduction

3. Linear Programming and Polyhedral Combinatorics

Mathematical Induction

The last three chapters introduced three major proof techniques: direct,

8. Matchings and Factors

Regular Languages and Finite Automata

Guessing Game: NP-Complete?

Simple Graphs Degrees, Isomorphism, Paths

Linear Algebra Notes

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

Analysis of Algorithms I: Optimal Binary Search Trees

On the k-path cover problem for cacti

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

Even Faster Algorithm for Set Splitting!

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

THE DIMENSION OF A VECTOR SPACE

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Catalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity.

LEARNING OBJECTIVES FOR THIS CHAPTER

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Linear Programming Problems

24. The Branch and Bound Method

Transcription:

This course - 91.503, Analysis of Algorithms - follows 91.0 (Undergraduate Analysis of Algorithms) and assumes the students familiar with the detailed contents of that course. We will not re-cover any of the materials in the previous course, but we will use the details of them whenever appropriate. The Algorithms Qualifying Exam covers the contents of BOTH courses. We start off with some coverage of Methods (Activity Selection, Knapsack Problem, Huffman Codes, etc.). Please look at the appropriate sections in the text to refresh your memories. 11/28/07 1 What is a Algorithm? Solves an optimization problem: the solution is best in some sense. Strategy: At each decision point, do what looks best locally Choice does not depend on evaluating potential future choices or solving subproblems Top-down algorithmic structure With each step, reduce problem to a smaller problem Optimal Substructure: optimal solution contains in it optimal solutions to subproblems Choice Property: locally best = globally best 11/28/07 2 Examples: Minimum Spanning Tree Minimum Spanning Tree Dijkstra Shortest Path Huffman Codes Fractional Knapsack Activity Selection Time: O( E lg lg E ) given fast FIND-SET, UNION Time: O( E lg lg V ) = O( E lg lg E ) slightly faster with fast priority queue Invariant: Minimum weight spanning forest Becomes single tree at end Invariant: Minimum weight tree Spans all vertices at end Produces minimum weight tree of edges that includes every vertex. A 2 B 3 G 5 1 E 1 7 8 F D 2 C for Undirected, Connected, Weighted Graph G=(V,E) 11/28/07 3 11/28/07 source: 91.503 textbook Cormen et al. 1

Single Source Shortest Paths: Dijkstra s Algorithm for (nonnegative) weighted, directed graph G=(V,E) Huffman Codes A 2 B 3 G 5 1 E 1 7 8 F D C 2 11/28/07 source: 91.503 textbook Cormen et al. 5 11/28/07 Fractional Knapsack Value: $0 $100 $120 10 20 30 50 Activity Selection Problem Instance: Set S = {1,2,...,n} of n activities Each activity i has: start time: s i s i! f finish time: f i i Activities i, j are compatible iff nonoverlapping: s! f ) s! f ) [ i i [ j j Objective: select a maximum-sized set of mutually compatible activities item1 item2 item3 knapsack 11/28/07 7 11/28/07 8 2

Algorithm source: 91.503 textbook Cormen, et al. Why does this all work? Algorithm: S = presort activities in S by nondecreasing finish time and renumber GREEDY-ACTIVITY-SELECTOR(S ) n length[s ] A {1} j 1 for i 2 to n» do if s i! f j» then A " A! {i}» j i return A Running time? 11/28/07 9 Why does it provide an optimal (maximal) solution? It is clear that it will provide a solution (a set of nonoverlapping activities), but there is no obvious reason to believe that it will provide a maximal set of activities. We start with a restatement of the problem in such a way that a dynamic programming solution can be constructed. The dynamic programming solution will be shown to be maximal. It will then be modified into a greedy solution in such a way that maximality will be preserved. 11/28/07 10 Consider the set S of activities, sorted in monotonically increasing order of finishing time: i s i f i 1 2 3 5 7 8 9 10 11 1 3 0 5 3 5 8 8 2 12 5 7 8 9 10 11 12 13 1 Where the activity a i corresponds to the time interval [s i, f i ). The subset {a 3, a 9, a 11 } consists of mutually compatible activities, but is not maximal. The subsets {a 1, a, a 8, a 11 }, and {a 2, a, a 9, a 11 } are also compatible and are maximal. First Step: find an optimal substructure. You need to define an appropriate space of subproblems. S ij = {a k in S: f i s k < f k s j } is the set of all those activities compatible with a i and a j and that are also compatible with those activities that finish no later than a i and start no earlier than a j. Add two activities: a 0 :[, 0); a n+1 :[inf, ); which come before and after, respectively, all activities in S = S 0, n+1. Now assume the activities are sorted in non-decreasing order of finish: f 0 f 1 f n < f n+1. Then S ij is empty whenever i j. 11/28/07 11 11/28/07 12 3

From the original problem we now introduce a space of suproblems: find a maximum size subset of mutually compatible activities from S ij, 0 i < j n+1. We now need to see the substructure of the problem, and show that it possesses the optimal substructure property Substructure: suppose a solution to S ij contains some activity a k, f i s k < f k s j. a k generates two subproblems, S ik and S kj. The solution to S ij is the union of a solution to S ik, the singleton {a k }, and a solution to S kj. So any solution to a larger problem can be obtained by patching together solutions for smaller problems. Cardinality is also additive. Optimal Substructure: assume A ij is an optimal solution to S ij, containing an activity a k. Then A ij contains a solution to S ik (the activities that end before the beginning of a k ) and a solution to S kj (the activities that begin after the end of a k ). If these solutions were not already maximal, then one could obtain a maximal solution for, say, S ik, and splice it with {a k } and the solution to S kj to obtain a solution for S ij of greater cardinality, thus violating the optimality condition assumed for A ij. 11/28/07 13 11/28/07 1 Use Optimal Substructure to construct an optimal solution: Any solution to a nonempty problem S ij includes some activity a k, and any optimal solution to S ij must contain optimal solutions to S ik and S kj. For each a k in S 0,n+1, find (recursively) optimal solutions of S 0k and S k,n+1, say A 0k and A k,n+1. Splice them together, along with a k, to form solutions A k. Take a maximal A k as an optimal solution A 0,n+1. We need to look at the recursion in more detail 11/28/07 15 Second Step: the recursive algorithm. Let c[i,j] denote the maximum number of compatible activities in S ij. It is easy to see that c[i,j] = 0, whenever i j - and S ij is empty. If, in a non-empty set of activities S ij, the activity a k occurs as part of a maximal compatible subset, this generates two subproblems, S ik and S kj, and the equation c[i,j] = c[i,k] + 1 + c[k,j], which tells us how the cardinalities are related. The problem is that k is not known a priori. Solution: c[i,j] = max i<k<j c[i,k] + 1 + c[k,j], if S ij is not empty. And one can write an easy bottom up (dymanic programming) algorithm to compute a maximal solution. 11/28/07 1

A better mousetrap. Theorem 1.1. Consider any nonempty subproblem S ij, and let a m be the activity in S ij with the earliest finish time: f m = min{f k : a k in S ij }. Then 1. a m is used in some maximum size subset of mutually compatible activities of S ij. 2. The subproblem S im is empty, so that choosing a m leaves the subproblem S mj as the only one that may be nonempty. Proof: 2) if S im is nonempty then S ij must contain an activity prior to a m. Contradiction. 1) Suppose A ij is a maximal subset. Either a m is in A ij, and we are done, or its is not. Let a k be the earliest finishing activity in A ij. It can be replaced by a m (why?) thus giving a maximal subset containing a m. Why is this a better mousetrap? The dynamic programming solution requires solving j - i - 1 subproblems to solve S ij. Total Running Time??? Theorem 1.1 gives us conditions under which solving S ij requires solving ONE subproblem only, since the other subproblems implied by the dynamic programming recurrence relation are empty, or irrelevant (they might give us other maximal solutions, but we need just one) Lots less computation Another benefit comes from the observation that the problem can be solved in a top-down fashion: take the earliest finish activity, a m, and you are left with solving the problem of S m,n+1. It is easy to see that each activity needs to be looked at only once: linear time. 11/28/07 17 11/28/07 18 The Algorithm: R-A-S(s, f, i, j) 1. m := i + 1 2. while m < j and s m < f i // find first activity in S ij 3. do m := m + 1. if m < j 5. then return {a m }+R-A-S(s, f, m, j). else return the empty set. The recursive Activity Selector Algorithm: Cormen, Leiserson, Rivest, Stein: Fig. 1.1 The time is - fairly obviously - Theta(n). 11/28/07 19 11/28/07 20 5

A Slightly Different Perspective. Rather than start from a dynamic programming approach, moving to a greedy strategy, start with identifying the characteristics of the greedy approach. 1. Make a choice and leave a subproblem to solve. 2. Prove that the greedy choice is always safe - there is an optimal solution starting from a greedy choice. 3. Prove that, having made a greedy choice, the solution of the remaining subproblem can be added to the greedy choice providing a solution to the original problem. Choice Property. Choice that looks best in the current problem will work: no need to look ahead to its implications for the solution. This is important because it reduces the amount of computational complexity we have to deal with. Can t expect this to hold all the time: maximization of a function with multiple relative maxima on an interval - the gradient method would lead to a greedy choice, but may well lead to a relative maximum that is far from the actual maximum. 11/28/07 21 11/28/07 22 Optimal Substructure. While the greedy choice property tells us that we should be able to solve the problem with little computation, we need to know that the solution can be properly reconstructed: an optimal solution contains optimal sub-solutions to the subproblems. An induction can then be used to turn the construction around. In the case of a problem possessing only the Optimal Substructure property there is little chance that we will be able to find methods more efficient than the dynamic programming (with memoization) methods. Some Theoretical Foundations: unifying the individual methods. Matroids. A matroid is an ordered pair M = (S, I) s.t. 1. S is a finite non-empty set; 2. I is a non-empty family of subsets of S, called the independent subsets of S, such that B! I and A! B, then A! I. We say that I is hereditary if it satisfies this property. Note that Ø is a member of I. 3. If A! I and B! I, and A < B, then there is an element x! B-A such that A!{x}! I. We say that M satisfies the exchange property. Ex.: the set of rows of a matrix. 11/28/07 23 11/28/07 2

Graphic Matroids: M G = (S G, I G ), where we start from an undirected graph G=(V, E). The set S G is defined to be E, the set of edges of G; If A is a subset of E, then A! I G is and only if A is acyclic - i.e., a set of edges is independent if and only if the subgraph G A = (V, A) forms a forest. We have to prove that the object so created (M G ) is, in fact, a matroid. The relevant theorem is: Theorem. If G is undirected graph, then M G = (S G, I G ) is a matroid. Proof. S G = E is finite; I G is hereditary, since a subset of a forest is still a forest (removing edges cannot introduce cycles). We are left with showing M G satisfies the exchange property. Let G A = (V, A) and G B = (V, B) be forests of G, with B > A (A and B are acyclic sets of edges with B containg more edges than A). Claim: a forest with k edges has exactly V - k trees. Proof: start with a forest with no edges and add one edge at a time. G A contains V - A trees, while G B contains V - B trees: G A contains more trees than G B. 11/28/07 25 11/28/07 2 G B must contain some tree T whose vertices are in two trees of G A (everything is acyclic ). Since T is connected, it must contain an edge (u, v) such that vertices u and v are in different trees of G A, and so can be added to G A without creating a cycle. But this is exactly the exchange property. All three properties are satisfied, and M G is a matroid. Def.: given a matroid M = (S, I), we call an element x!/ A an extension of A! I if A! {x}! I. Ex.: if A is an independent set of edges in M G, the edge e is an extension of A if adding e to A does not introduce a cycle. Def.: if A is an independent set in a matroid M, A is maximal if it has no extensions. Theorem 1.. All maximal independent subsets ina matroid have the same size. Proof: if not, and A < B, both maximal, then A can be extended. Contradiction 11/28/07 27 11/28/07 28 7

Ex.: Let M G be a graphic matroid for a connected, undirected graph G. Every maximal independent subset of M G must be a free tree with exactly V - 1 edges: a spanning tree of G. Def.: a matroid is weighted if there is a strictly positive weight function on the edges of the matroid. It can be extended to sets of edges by summation: w(a) = sum x A w(x)! Ex.: minimum spanning tree problem. We must find a subset of the edges that conencts all the vertices and has minimum total length. How is this a matroid problem?? Let M G be a weighted matroid with weight function w (e) = w 0 - w(e), where w(e) is the positive weight function on the edges and w 0 is a positive number larger than the weight of any edge. Each maximal independent subset A corresponds to a spanning tree and w (A) = ( V - 1)w 0 - w(a) for any such set, an independent set that maximizes w (A) is one that minimizes w(a). The algorithm is on the next slide. S[M] denotes the edges, I[M] denotes the independent sets. 11/28/07 29 11/28/07 30 GREEDY(M, w) 1. A! " 2. sort S[ M] into nonincreasing order by weight w 3. for each x #S[ M], taken in non - increasing order by weight w( x). do if A $ { x} #I[ M] 5. then A! A $ { x}. return A Running Time: let n = S. Then sorting takes n lg(n). Line is executed n times, once for each element of S. This requires a check that A! { x} is independent, for time, say O(f(n)). Thus the total time is O(n lg(n) + n f(n)). Furthermore, A is independent. Lemma. (Matroids exhibit the greedy choice property) Suppose that M = (S, I) is a weighted matroid with weight function w and that S is sorted into nondecreasing order by weight. Let x be the first element of S such that {x} is independent, if any such x exists. If x exists, then there exists an optimal subset A of S that contains x. Proof. If no such x exists, the only independent subset is the empty set, and we are done. Otherwise, let B be any nonempty optimal subset. If x! B, let A = B, and we are done. If x!/ B, no element of B has weight greater than x (x is a heaviest independent element and every element of B is independent by the hereditary property of I). 11/28/07 31 11/28/07 32 8

Start with A = {x}. A is independent because {x} is. Using the exchange property, repeatedly find a new element of B that can be added to A, while preserving the independence of A, until A = B. The construction gives that A = (B - {y})! {x} for some y! B, and so w(a) = w(b) - w(y) + w(x) w(b). Since B is optimal, A must also be optimal, and since x! A, The result follows. Lemma. Let M = (S, I) be a matroid. If x is an element of S such that x is not an extension of Ø, then x is not an extension of any independent subset A of S. Proof. The contrapositive: assume x is an extension of an independent subset A of S. Then A! {x} is independent. By the hereditary property {x} is independent, which automatically implies that it is an extension of Ø. Another way of stating this result is that any item than cannot be used right now, cannot be used in the future 11/28/07 33 11/28/07 3 Lemma. (Matroids exhibit the optimal substructure property) Let x be the first element of S chosen by GREEDY for the weighted matroid M = (S, I). The remaining problem of finding the maximum-weight independent subset containing x reduces to finding a maximum-weight independent subset of the weighted matroid M' = (S', I'), where S ' = { y " S :{ x, y} " I}, I ' = { B # S $ { x} : B % { x} " I}, and the weight function for M' is the weight function for M, restricted to S'. (We call M' the contraction of M by the element x) Theorem. (Correctness of the greedy algorithm on matroids) If M=(S, I) is a weighted matroid with weight function w, then the call GREEDY(M, w) returns an optimal subset. Proof. 11/28/07 35 11/28/07 3 9