Topic: Greedy Approximations: Set Cover and Min Makespan Date: 1/30/06



Similar documents
Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Algorithm Design and Analysis

Approximation Algorithms

1 Approximating Set Cover

Load balancing of temporary tasks in the l p norm

11. APPROXIMATION ALGORITHMS

Applied Algorithm Design Lecture 5

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

Distributed Load Balancing for Machines Fully Heterogeneous

Online Scheduling with Bounded Migration

Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs

CSC2420 Spring 2015: Lecture 3

Duplicating and its Applications in Batch Scheduling

A Approximation Algorithm for a Generalization of the Weighted Edge-Dominating Set Problem

The Online Set Cover Problem

8.1 Min Degree Spanning Tree

Definition Given a graph G on n vertices, we define the following quantities:

Classification - Examples

Exponential time algorithms for graph coloring

Online and Offline Selling in Limit Order Markets

Approximability of Two-Machine No-Wait Flowshop Scheduling with Availability Constraints

Completion Time Scheduling and the WSRPT Algorithm

9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1

Improved Algorithms for Data Migration

Connected Identifying Codes for Sensor Network Monitoring

Ronald Graham: Laying the Foundations of Online Optimization

Single machine parallel batch scheduling with unbounded capacity

Class constrained bin covering

2.1 Complexity Classes

Combinatorial Algorithms for Data Migration to Minimize Average Completion Time

Scheduling Shop Scheduling. Tim Nieberg

Graphical degree sequences and realizations

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

4.1 Introduction - Online Learning Model

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Tutorial 8. NP-Complete Problems

Load Balancing. Load Balancing 1 / 24

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

The Relative Worst Order Ratio for On-Line Algorithms

Follow the Perturbed Leader

Determination of the normalization level of database schemas through equivalence classes of attributes

20 Selfish Load Balancing

Load Balancing and Rebalancing on Web Based Environment. Yu Zhang

A Constant Factor Approximation Algorithm for the Storage Allocation Problem

Weighted Sum Coloring in Batch Scheduling of Conflicting Jobs

R u t c o r Research R e p o r t. A Method to Schedule Both Transportation and Production at the Same Time in a Special FMS.

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Stiffie's On Line Scheduling Algorithm

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

Diagonalization. Ahto Buldas. Lecture 3 of Complexity Theory October 8, Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach.

Frequency Capping in Online Advertising

Establishing a Mobile Conference Call Under Delay and Bandwidth Constraints

Introduction to Auction Design

Competitive Analysis of On line Randomized Call Control in Cellular Networks

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Matthias F.M. Stallmann Department of Computer Science North Carolina State University Raleigh, NC

Small Maximal Independent Sets and Faster Exact Graph Coloring

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

Real-Time Scheduling (Part 1) (Working Draft) Real-Time System Example

Lecture 7: NP-Complete Problems

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

Near Optimal Solutions

The Load-Distance Balancing Problem

All trees contain a large induced subgraph having all degrees 1 (mod k)

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Lecture 6 Online and streaming algorithms for clustering

Introduction to Scheduling Theory

Lecture 4 Online and streaming algorithms for clustering

Private Approximation of Clustering and Vertex Cover

The Approximability of the Binary Paintshop Problem

Key words. multi-objective optimization, approximate Pareto set, bi-objective shortest path

ON THE COMPLEXITY OF THE GAME OF SET.

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Scheduling Real-time Tasks: Algorithms and Complexity

Transcription:

CS880: Approximations Algorithms Scribe: Matt Elder Lecturer: Shuchi Chawla Topic: Greedy Approximations: Set Cover and Min Makespan Date: 1/30/06 3.1 Set Cover The Set Cover problem is: Given a set of elements E = {e 1,e 2,...,e n } and a set of m subsets of E, S = {S 1,S 2,...,S n }, find a least cost collection C of sets from S such that C covers all elements in E. That is, Si CS i = E. Set Cover comes in two flavors, unweighted and weighted. In unweighted Set Cover, the cost of a collection C is number of sets contained in it. In weighted Set Cover, there is a nonnegative weight function w : S R, and the cost of C is defined to be its total weight, i.e., S i C w S i). First, we will deal with the unweighted Set Cover problem. The following algorithm is an extension of the greedy vertex cover algorithm that we discussed in Lecture 1. Algorithm 3.1.1 Set CoverE, S): 1. C. 2. While E contains elements not covered by C: a) Pick an element e E not covered by C. b) Add all sets S i containing e to C. To analyze Algorithm 3.1.1, we will need the following definition: Definition 3.1.2 A set E of elements in E is independent if, for all e 1,e 2 E, there is no S i C such that e 1,e 2 S i. Now, we shall determine how strong an approximation Algorithm 3.1.1 is. Say that the frequency of an element is the number of sets that contain that element. Let F denote the maximum frequency across all elements. Thus, F is the largest number of sets from S that we might add to our cover C at any step in the algorithm. It is clear that the elements selected by the algorithm form an independent set, so the algorithm selects no more than F E elements, where E is the set of elements picked in Step 2a. That is, ALG F E. Because every element is covered by some subset in an optimal set cover, we know that E OPT for any independent set E. Thus, ALG F OPT, and Algorithm 3.1.1 is therefore an F approximation. Theorem 3.1.3 Algorithm 3.1.1 is an F approximation to Set Cover. Algorithm 3.1.1 is a good approximation if F is guaranteed to be small. In general, however, there could be some element contained in every set of S, and Algorithm 3.1.1 would be a very poor approximation. So, we consider a different unweighted Set Cover approximation algorithm which uses the greedy strategy to yield a ln n approximation. 1

Algorithm 3.1.4 Set CoverE, S): 1. C. 2. While E contains elements not covered by C: a) Find the set S i containing the greatest number of uncovered elements. b) Add S i to C. Theorem 3.1.5 Algorithm 3.1.4 is a ln n OPT approximation. Proof: Let k = OPT, and let E t be the set of elements not yet covered after step i, with E 0 = E. OPT covers every E t with no more than k sets. ALG always picks the largest set over E t in step t + 1. The size of this largest set must cover at least E t /k in E t ; if it covered fewer elements, no way of picking sets would be able to cover E t in k sets, which contradicts the existence of OPT. So, E t+1 E t E t /k, and, inductively, E t n 1 1/k) t. When E t < 1, we know we are done, so we solve for this t: 1 1 ) t < 1 k n ) k t n < k 1 ln n t ln 1 + 1 ) k 1 t k ln n = OPTln n. Algorithm 3.1.4 finishes within OPTlnn steps, so it uses no more than that many sets. We can get a better analysis for this approximation by considering when E t < k, as follows: n 1 k) 1 t = k t k n 1 e t/k = k because 1 x)1/x 1 e e t/k = n k t = k ln n k. for all x). Thus, after k ln n k steps there remain only k elements. Each subsequent step removes at least one element, so ALG OPT ln n OPT + 1). Theorem 3.1.6 If all sets are of size B, then there exists a ln B + 1) approximation to unweighted Set Cover. Proof: If all sets have size no greater than B, then k n/b. So, B n/k, and Algorithm 3.1.4 gives a ln B + 1) approximation. 2

Now we extend Algorithm 3.1.4 to the weighted case. Here, instead of selecting sets by their number of uncovered elements, we select sets by the efficiency of their uncovered elements, or the number of uncovered elements per unit weight. Algorithm 3.1.7 Weighted Set CoverS, C, E, w): 1. C, and E E. 2. While E contains uncovered elements: a) s argmax X S X E /wx). b) C C s, S S \ {s}, and E E \ S. Algorithm 3.1.7 was first analyzed in [5]. Theorem 3.1.8 Algorithm 3.1.7 achieves a ln n approximation to Weighted Set Cover. Proof: For every picked set S j, define θ j as S j E /ws j ) at the time that S j was picked. For each element e, let S j be the first picked set that covers it, and define coste) = 1/θ j. Notice that coste) = ALG. e E Let us order the elments in the order that they were picked, breaking ties arbitrarily. At the time that the i th element call it e i ) was picked, E contained at least n i + 1 elements. At that point, the per-element cost of OPT is at most OPT/n i + 1). Thus, for at least one of the sets in OPT, we know that S E ws) n i + 1 OPT. Therefore, for the set S j picked by the algorithm, we have θ j n i + 1)/OPT. So, coste i ) OPT n i + 1. Over the execution of Algorithm 3.1.7, the value of i goes from n to 1. Thus, the total cost of each element that the algorithm removes is at most n i=1 OPT OPTln n. n i + 1 Thus, Algorithm 3.1.7 is a ln n approximation to Weighted Set Cover. The above analysis is tight, which we can see by the following example: 3

The dots are elements, and the loops represent the sets of S. Each set has weight 1. The optimal solution is to take the two long sets, with a total cost of 2. If Algorithm 3.1.7 instead selects the leftmost thick set at first, then it will take at least 5 sets. This example generalizes to a family of examples each with 2 k elements, and shows that no analysis of Algorithm 3.1.7 will make it better than a Oln n) approximation. A ln n approximation to Set Cover can also be obtained by other techniques, including LP-rounding. However, Feige showed that no improvement, even by a constant factor, is likely: Theorem 3.1.9 There is no 1 ɛ) ln n approximation to Weighted Set Cover unless NP DTIMEn log log n ). [1] 3.2 Min Makespan Scheduling The Min Makespan Problem is: given n jobs to schedule on m machines, where job i has size s i, schedule the jobs to minimize their makespan. Definition 3.2.1 The makespan of a schedule is the earliest time when all machines have stopped doing work. This problem is NP-hard, as can be seen by a reduction from Partition. The following algorithm due to Ron Graham yields a 2 approximation. Algorithm 3.2.2 Graham s List Scheduling) [2] Given a set of n jobs and a set of m empty machine queues, 1. Order the jobs arbitrarily. 2. Until the job list is empty, move the next job in the list to the end of the shortest machine queue. Theorem 3.2.3 Graham s List Scheduling is a 2 approximation. Proof: Let S j denote the size of job j. Suppose job i is the last job to finish in a Graham s List schedule, and let t i be the time it starts. When job i was placed, its queue was no longer than any other queue, so every queue is full until t i. Thus, ALG = S i + t i S i + P n j=1 S j) S i 1 m = m n j=1 S j + 1 1/m)S i. It s easy to see that S i OPT and that 1 m n j=1 S j OPT. So, we conclude that ALG 2 1/m)OPT, which yields a 2 approximation. This analysis is tight. Suppose that after the jobs are arbitrarily ordered, the job list contains mm 1) unit-length jobs, followed by one m-length job. The algorithm yields a schedule completing in 2m 1 units while the optimal schedule has length m. This algorithm can be improved. For example, by ordering the job list by increasing duration instead of arbitrarily, we get a 4/3) approximation, a result proved in [3]. Also, this problem has a polytime approximation scheme PTAS), given in [4]. However, a notable property of Algorithm 3.2.2 is that it is an online algorithm, i.e., even if the jobs arrive one after another, and we have no information about what jobs may arrive in the furture, we can still use this algorithm to obtain a 2 approximation. 4

References [1] Uriel Feige. A Threshold of lnn for Approximating Set Cover. In J. ACM 454), pp 634-652. 1998) [2] Graham, R. Bounds for Certain Multiprocessing Anomalies. In Bell System Tech. J., 45, pp 1563-1581. 1966) [3] Ronald L. Graham. Bounds on Multiprocessing Timing Anomalies. In SIAM Journal of Applied Mathematics, 172), pp 416-429. 1969) [4] Dorit S. Hochbaum, David B. Shmoys. A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach. In SIAM J. Comput. 173), pp 539-551. 1988) [5] D. S. Johnson. Approximation Algorithms for Combinatorial Problems. In Journal of Computer and System Sciences, 9, pp 256-278. 1974) Preliminary version in Proc. of the 5th Ann. ACM Symp. on Theory of Computing, pp 36-49. 1973) 5