Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three desired features Solve problem to optimality Solve problem in poly-time Solve arbitrary instances of the problem -approximation algorithm Guaranteed to run in poly-time Guaranteed to solve arbitrary instance of the problem Guaranteed to find solution within ratio of true optimum Slides by Kevin Wayne Copyright @ 005 Pearson-Addison Wesley All rights reserved Challenge Need to prove a solution's value is close to optimum, without even knowing what optimum value is Load Balancing Load Balancing Input m identical machines; n jobs, job j has processing time t j Job j must run contiguously on one machine A machine can process at most one job at a time Def Let J(i) be the subset of jobs assigned to machine i The load of machine i is L i = " j # J(i) t j Def The makespan is the maximum load on any machine L = max i L i Load balancing Assign each job to a machine to minimize makespan
Load Balancing: List Scheduling Load Balancing: List Scheduling Analysis List-scheduling algorithm Consider n jobs in some fixed order Assign job j to machine whose load is smallest so far Theorem [Graham, 966] Greedy algorithm is a -approximation First worst-case analysis of an approximation algorithm Need to compare resulting solution with optimal makespan L* List-Scheduling(m, n, t,t,,t n ) { for i = to m { L i % 0 load on machine i J(i) % & jobs assigned to machine i for j = to n { i = argmin k L k J(i) % J(i) ' {j L i % L i + t j machine i has smallest load assign job j to machine i update load of machine i Lemma The optimal makespan L* $ max j t j Pf Some machine must process the most time-consuming job Lemma The optimal makespan Pf The total processing time is " j t j L * " m # j t j One of m machines must do at least a /m fraction of total work Implementation O(n log n) using a priority queue 5 6 Load Balancing: List Scheduling Analysis Load Balancing: List Scheduling Analysis Theorem Greedy algorithm is a -approximation Pf Consider load L i of bottleneck machine i Let j be last job scheduled on machine i When job j assigned to machine i, i had smallest load Its load before assignment is L i - t j ( L i - t j ) L k for all ) k ) m Theorem Greedy algorithm is a -approximation Pf Consider load L i of bottleneck machine i Let j be last job scheduled on machine i When job j assigned to machine i, i had smallest load Its load before assignment is L i - t j ( L i - t j ) L k for all ) k ) m Sum inequalities over all k and divide by m: blue jobs scheduled before j Lemma L i " t j # m $ k L k = m $ k t k # L * machine i j Now L i = (L i " t j ) 3 + t j # L * { # L* # L* Lemma 0 L i - t j L = L i 7 8
Load Balancing: List Scheduling Analysis Load Balancing: List Scheduling Analysis Q Is our analysis tight? A Essentially yes Ex: m machines, m(m-) jobs length jobs, one job of length m Q Is our analysis tight? A Essentially yes Ex: m machines, m(m-) jobs length jobs, one job of length m m = 0 machine idle machine 3 idle machine idle machine 5 idle machine 6 idle machine 7 idle machine 8 idle machine 9 idle machine 0 idle m = 0 list scheduling makespan = 9 optimal makespan = 0 9 0 Load Balancing: LPT Rule Load Balancing: LPT Rule Longest processing time (LPT) Sort n jobs in descending order of processing time, and then run list scheduling algorithm Observation If at most m jobs, then list-scheduling is optimal Pf Each job put on its own machine LPT-List-Scheduling(m, n, t,t,,t n ) { Sort jobs so that t t t n for i = to m { L i % 0 J(i) % & for j = to n { i = argmin k L k J(i) % J(i) ' {j L i % L i + t j load on machine i jobs assigned to machine i machine i has smallest load assign job j to machine i update load of machine i Lemma 3 If there are more than m jobs, L* $ t m+ Pf Consider first m+ jobs t,, t m+ Since the t i 's are in descending order, each takes at least t m+ time There are m+ jobs and m machines, so by pigeonhole principle, at least one machine gets two jobs Theorem LPT rule is a 3/ approximation algorithm Pf Same basic approach as for list scheduling L i = (L i " t j ) 3 + t j { # 3 L * # L* # L* Lemma 3 ( by observation, can assume number of jobs > m )
Load Balancing: LPT Rule Q Is our 3/ analysis tight? A No Center Selection Theorem [Graham, 969] LPT rule is a /3-approximation Pf More sophisticated analysis of same algorithm Q Is Graham's /3 analysis tight? A Essentially yes Ex: m machines, n = m+ jobs, jobs of length m+, m+,, m- and one job of length m 3 Center Selection Problem Center Selection Problem Input Set of n sites s,, s n Center selection problem Select k centers C so that maximum distance from a site to nearest center is minimized Input Set of n sites s,, s n Center selection problem Select k centers C so that maximum distance from a site to nearest center is minimized r(c) k = Notation dist(x, y) = distance between x and y dist(s i, C) = min c # C dist(s i, c) = distance from s i to closest center r(c) = max i dist(s i, C) = smallest covering radius Goal Find set of centers C that minimizes r(c), subject to C = k center site Distance function properties dist(x, x) = 0 (identity) dist(x, y) = dist(y, x) (symmetry) dist(x, y) ) dist(x, z) + dist(z, y) (triangle inequality) 5 6
Center Selection Example Greedy Algorithm: A False Start Ex: each site is a point in the plane, a center can be any point in the plane, dist(x, y) = Euclidean distance Remark: search can be infinite Greedy algorithm Put the first center at the best possible location for a single center, and then keep adding centers so as to reduce the covering radius each time by as much as possible Remark: arbitrarily bad r(c) greedy center k = centers center site center site 7 8 Center Selection: Greedy Algorithm Center Selection: Analysis of Greedy Algorithm Greedy algorithm Repeatedly choose the next center to be the site farthest from any existing center Greedy-Center-Selection(k, n, s,s,,s n ) { C = & repeat k times { Select a site s i with maximum dist(s i, C) Add s i to C site farthest from any center return C Theorem Let C* be an optimal set of centers Then r(c) ) r(c*) Pf (by contradiction) Assume r(c*) < r(c) For each site c i in C, consider ball of radius r(c) around it Exactly one c i * in each ball; let c i be the site paired with c i * Consider any site s and its closest center c i * in C* dist(s, C) ) dist(s, c i ) ) dist(s, c i *) + dist(c i *, c i ) ) r(c*) Thus r(c) ) r(c*) r(c) *-inequality ) r(c*) since c i * is closest center r(c) Observation Upon termination all centers in C are pairwise at least r(c) apart Pf By construction of algorithm C* sites r(c) s c i c i * 9 0
Center Selection Theorem Let C* be an optimal set of centers Then r(c) ) r(c*) Theorem Greedy algorithm is a -approximation for center selection problem The Pricing Method: Vertex Cover Remark Greedy algorithm always places centers at sites, but is still within a factor of of best solution that is allowed to place centers anywhere eg, points in the plane Question Is there hope of a 3/-approximation? /3? Theorem Unless P = NP, there no -approximation for center-selection problem for any < Weighted Vertex Cover Weighted Vertex Cover Weighted vertex cover Given a graph G with vertex weights, find a vertex cover of minimum weight Pricing method Each edge must be covered by some vertex i Edge e pays price p e $ 0 to use vertex i Fairness Edges incident to vertex i should pay ) w i in total for each vertex i : " p e w e= ( i, j) i 9 9 weight = + + weight = 9 9 Claim For any vertex cover S and any fair prices p e : + e p e ) w(s) Proof " pe " " pe " wi = w( S) e# E i# S e= ( i, j) i# S each edge e covered by at least one node in S sum fairness inequalities for each node in S 3
Pricing Method Pricing Method Pricing method Set prices and find vertex cover simultaneously Weighted-Vertex-Cover-Approx(G, w) { foreach e in E p e = 0 p = w e e = ( i, j) i while (- edge i-j such that neither i nor j are tight) select such an edge e increase p e without violating fairness S % set of all tight nodes return S price of edge a-b vertex weight Figure 8 5 6 Pricing Method: Analysis Theorem Pricing method is a -approximation Pf Algorithm terminates since at least one new node becomes tight after each iteration of while loop 6 LP Rounding: Vertex Cover Let S = set of all tight nodes upon termination of algorithm S is a vertex cover: if some edge i-j is uncovered, then neither i nor j is tight But then while loop would not terminate Let S* be optimal vertex cover We show w(s) ) w(s*) w(s) = # w i = # # p e $ i" S i" S e=(i, j) # # p e = # p e $ w(s*) i"v e=(i, j) e" E all nodes in S are tight S, V, prices $ 0 each edge counted twice fairness lemma 7
Weighted Vertex Cover Weighted Vertex Cover: IP Formulation Weighted vertex cover Given an undirected graph G = (V, E) with vertex weights w i $ 0, find a minimum weight subset of nodes S such that every edge is incident to at least one vertex in S Weighted vertex cover Given an undirected graph G = (V, E) with vertex weights w i $ 0, find a minimum weight subset of nodes S such that every edge is incident to at least one vertex in S 0 A 6F 9 Integer programming formulation Model inclusion of each vertex i using a 0/ variable x i 6 B 7G 0 x i = " # $ 0 if vertex i is not in vertex cover if vertex i is in vertex cover 6 3 3C D H I 9 33 Vertex covers in - correspondence with 0/ assignments: S = {i # V : x i = Objective function: maximize " i w i x i 7 E total weight = 55 0 J 3 Must take either i or j: x i + x j $ 9 30 Weighted Vertex Cover: IP Formulation Integer Programming Weighted vertex cover Integer programming formulation INTEGER-PROGRAMMING Given integers a ij and b i, find integers x j that satisfy: (ILP) min # w i x i i " V s t x i + x j $ (i, j) " E x i " {0, i " V max c t x s t Ax " b x integral n " a ij x j # b i $ i $ m j= x j # 0 $ j $ n x j integral $ j $ n Observation If x* is optimal solution to (ILP), then S = {i # V : x* i = is a min weight vertex cover Observation Vertex cover formulation proves that integer programming is NP-hard search problem even if all coefficients are 0/ and at most two variables per inequality 3 3
Linear Programming LP Feasible Region Linear programming Max/min linear objective function subject to linear inequalities Input: integers c j, b i, a ij Output: real numbers x j (P) max c t x s t Ax " b x " 0 (P) max " c j x j n j= n s t " a ij x j # b i $ i $ m j= x j # 0 $ j $ n LP geometry in D x = 0 Linear No x, xy, arccos(x), x(-x), etc Simplex algorithm [Dantzig 97] Can solve LP in practice Ellipsoid algorithm [Khachian 979] Can solve LP in poly-time x = 0 x + x = 6 x + x = 6 33 3 Weighted Vertex Cover: LP Relaxation Weighted Vertex Cover Weighted vertex cover Linear programming formulation (LP) min # w i x i i " V s t x i + x j $ (i, j) " E x i $ 0 i " V Theorem If x* is optimal solution to (LP), then S = {i # V : x* i $ is a vertex cover whose weight is at most twice the min possible weight Pf [S is a vertex cover] Consider an edge (i, j) # E Since x* i + x* j $, either x* i $ or x* j $ ( (i, j) covered Observation Optimal value of (LP) is ) optimal value of (ILP) Pf LP has fewer constraints Note LP is not equivalent to vertex cover Pf [S has desired cost] Let S* be optimal vertex cover Then w i i " S* * # $ # w i x i $ w # i i " S LP is a relaxation x* i $ i " S Q How can solving LP help us find a small vertex cover? A Solve LP and round fractional values 35 36
Weighted Vertex Cover Theorem -approximation algorithm for weighted vertex cover * 7 Load Balancing Reloaded Theorem [Dinur-Safra 00] If P NP, then no -approximation for < 3607, even with unit weights 0 /5 - Open research problem Close the gap 37 Generalized Load Balancing Generalized Load Balancing: Integer Linear Program and Relaxation Input Set of m machines M; set of n jobs J Job j must run contiguously on an authorized machine in M j, M Job j has processing time t j Each machine can process at most one job at a time Def Let J(i) be the subset of jobs assigned to machine i The load of machine i is L i = " j # J(i) t j Def The makespan is the maximum load on any machine = max i L i Generalized load balancing Assign each job to an authorized machine to minimize makespan ILP formulation x ij = time machine i spends processing job j LP relaxation (IP) min (LP) min L s t " x i j = t j for all j # J i " $ L for all i # M x i j j x i j # {0, t j for all j # J and i # M j x i j = 0 for all j # J and i % M j L s t " x i j = t j for all j # J i " $ L for all i # M x i j j x i j % 0 for all j # J and i # M j x i j = 0 for all j # J and i & M j 39 0
Generalized Load Balancing: Lower Bounds Generalized Load Balancing: Structure of LP Solution Lemma Let L be the optimal value to the LP Then, the optimal makespan L* $ L Pf LP has fewer constraints than IP formulation Lemma The optimal makespan L* $ max j t j Pf Some machine must process the most time-consuming job Lemma 3 Let x be solution to LP Let G(x) be the graph with an edge from machine i to job j if x ij > 0 Then G(x) is acyclic Pf (deferred) can transform x into another LP solution where G(x) is acyclic if LP solver doesn't return such an x x ij > 0 G(x) acyclic job machine G(x) cyclic Generalized Load Balancing: Rounding Generalized Load Balancing: Analysis Rounded solution Find LP solution x where G(x) is a forest Root forest G(x) at some arbitrary machine node r If job j is a leaf node, assign j to its parent machine i If job j is not a leaf node, assign j to one of its children Lemma Rounded solution only assigns jobs to authorized machines Pf If job j is assigned to machine i, then x ij > 0 LP solution can only assign positive value to authorized machines Lemma 5 If job j is a leaf node and machine i = parent(j), then x ij = t j Pf Since i is a leaf, x ij = 0 for all j parent(i) LP constraint guarantees " i x ij = t j Lemma 6 At most one non-leaf job is assigned to a machine Pf The only possible non-leaf job assigned to machine i is parent(i) job machine job machine 3
Generalized Load Balancing: Analysis Generalized Load Balancing: Flow Formulation Theorem Rounded solution is a -approximation Pf Let J(i) be the jobs assigned to machine i By Lemma 6, the load L i on machine i has two components: leaf nodes t j j " J(i) j is a leaf Lemma 5 # = # x ij $ # x ij $ L $ L * j " J(i) j is a leaf Lemma j " J LP Lemma (LP is a relaxation) optimal value of LP Flow formulation of LP x i j i x i j j " = t j for all j # J " $ L for all i # M x i j % 0 for all j # J and i # M j x i j = 0 for all j # J and i & M j 0 parent(i) t parent(i) " L * Thus, the overall load L i ) L* Observation Solution to feasible flow problem with value L are in oneto-one correspondence with LP solutions of value L 5 6 Generalized Load Balancing: Structure of Solution Conclusions Lemma 3 Let (x, L) be solution to LP Let G(x) be the graph with an edge from machine i to job j if x ij > 0 We can find another solution (x', L) such that G(x') is acyclic Pf Let C be a cycle in G(x) Augment flow along the cycle C At least one edge from C is removed (and none are added) Repeat until G(x') is acyclic 3 3 flow conservation maintained 3 3 Running time The bottleneck operation in our -approximation is solving one LP with mn + variables Remark Can solve LP using flow techniques on a graph with m+n+ nodes: given L, find feasible flow if it exists Binary search to find L* Extensions: unrelated parallel machines [Lenstra-Shmoys-Tardos 990] Job j takes t ij time if processed on machine i -approximation algorithm via LP rounding No 3/-approximation algorithm unless P = NP 6 6 3 5 5 3 G(x) augment along C G(x') 7 8
Polynomial Time Approximation Scheme 8 Knapsack Problem PTAS ( + )-approximation algorithm for any constant > 0 Load balancing [Hochbaum-Shmoys 987] Euclidean TSP [Arora 996] Consequence PTAS produces arbitrarily high quality solution, but trades off accuracy for time This section PTAS for knapsack problem via rounding and scaling 50 Knapsack Problem Knapsack is NP-Complete Knapsack problem Given n objects and a "knapsack" Item i has value v i > 0 and weighs w i > 0 Knapsack can carry weight up to W we'll assume w i ) W Goal: fill knapsack so as to maximize total value Ex: { 3, has value 0 Item Value 6 Weight W = 3 5 8 8 5 6 7 KNAPSACK: Given a finite set X, nonnegative weights w i, nonnegative values v i, a weight limit W, and a target value V, is there a subset S, X such that: # $ W w i i"s v i i"s # % V SUBSET-SUM: Given a finite set X, nonnegative values u i, and an integer U, is there a subset S, X whose elements sum to exactly U? Claim SUBSET-SUM ) P KNAPSACK Pf Given instance (u,, u n, U) of SUBSET-SUM, create KNAPSACK instance: v i = w i = u i # u i $ U i"s V = W = U # u i % U i"s 5 5
Knapsack Problem: Dynamic Programming Knapsack Problem: Dynamic Programming II Def OPT(i, w) = max value subset of items,, i with weight limit w Case : OPT does not select item i OPT selects best of,, i using up to weight limit w Case : OPT selects item i new weight limit = w w i OPT selects best of,, i using up to weight limit w w i # 0 if i = 0 % OPT(i, w) = $ OPT(i ", w) if w i > w % & max{ OPT(i ", w), v i + OPT(i ", w " w i ) otherwise Running time O(n W) W = weight limit Not polynomial in input size Def OPT(i, v) = min weight subset of items,, i that yields value exactly v Case : OPT does not select item i OPT selects best of,, i- that achieves exactly value v Case : OPT selects item i consumes weight w i, new value needed = v v i OPT selects best of,, i- that achieves exactly value v $ 0 if v = 0 & " if i = 0, v > 0 OPT(i, v) = % & OPT(i #, v) if v i > v '& min{ OPT(i #, v), w i + OPT(i #, v # v i ) otherwise V* ) n v max Running time O(n V*) = O(n v max ) V* = optimal value = maximum v such that OPT(n, v) ) W Not polynomial in input size 53 5 Knapsack: FPTAS Intuition for approximation algorithm Round all values up to lie in smaller range Run dynamic programming algorithm on rounded instance Return optimal items in rounded instance Knapsack: FPTAS Knapsack FPTAS Round up all values: v max = largest value in original instance = precision parameter = scaling factor = v max / n # v i = v i % $ " & ", v ˆ # i = v i % $ " & Item Value Weight 3, 656,3 3,80,03 5,7,800 6 5 8,33,99 7 W = original instance Item Value Weight 7 3 9 5 3 6 5 9 7 W = rounded instance Observation Optimal solution to problems with or are equivalent Intuition close to v so optimal solution using is nearly optimal; small and integral so dynamic programming algorithm is fast v ˆ v Running time O(n 3 / ) Dynamic program II running time is O(n v ˆ max ), where v ˆ max = # v max $ " % & = # n % $ ' & v v v ˆ 55 56
Knapsack: FPTAS Knapsack FPTAS Round up all values: # v i = v i % $ " & " Extra Slides Theorem If S is solution found by our algorithm and S* is any other feasible solution then (+") % v i # % v i i $ S i $ S* Pf Let S* be any feasible solution satisfying weight constraint v i i " S* # $ # v i i " S* $ # v i i " S $ # (v i + %) i " S always round up solve rounded instance optimally never round up by more than $ # v i + n% i" S S ) n DP alg can take v max $ (+&) # v i i" S n = v max, v max ) " i#s v i 57 Load Balancing on Machines Center Selection: Hardness of Approximation Claim Load balancing is hard even if only machines Pf NUMBER-PARTITIONING ) P LOAD-BALANCE Theorem Unless P = NP, there is no -approximation algorithm for metric k-center problem for any < machine machine e a NP-complete by Exercise 86 b a d Machine f b c Machine e g f length of job f c g d yes Pf We show how we could use a ( - ) approximation algorithm for k- center to solve DOMINATING-SET in poly-time Let G = (V, E), k be an instance of DOMINATING-SET Construct instance G' of k-center with sites V and distances d(u, v) = if (u, v) # E d(u, v) = if (u, v) 3 E Note that G' satisfies the triangle inequality Claim: G has dominating set of size k iff there exists k centers C* with r(c*) = see Exercise 89 Thus, if G has a dominating set of size k, a ( - )-approximation algorithm on G' must find a solution C* with r(c*) = since it cannot use any edge of distance 0 Time L 59 60