Dynamic programming Doctoral course Optimization on graphs - Lecture.1 Giovanni Righini January 1 th, 201
Implicit enumeration Combinatorial optimization problems are in general NP-hard and we usually resort to implicit enumeration to solve them to optimality (this is also useful for approximation purposes). Two main methods for implicit enumeration are: branch-and-bound, dynamic programming. Dynamic programming is also an algorithmic framework to solve polynomial time problems efficiently. It also allows to devise pseudo-polynomial time algorithms as well as approximation algorithms.
An introductory example Consider the problem of finding a shortest path from node 0 to 9 on this graph, which is directed, acyclic and layered. 0 6 1 1 9 15 2 10 5 12 6 15 20 9
An introductory example Consider the problem of finding a shortest path from node 0 to 9 on this graph, which is directed, acyclic and layered. 0 6 1 1 9 15 2 10 5 12 6 15 20 9 A greedy algorithm from 0 would produce a path of cost.
An introductory example Consider the problem of finding a shortest path from node 0 to 9 on this graph, which is directed, acyclic and layered. 0 6 1 1 9 15 2 10 5 12 6 15 20 9 A greedy algorithm from 9 would produce a path of cost.
An introductory example Consider the problem of finding a shortest path from node 0 to 9 on this graph, which is directed, acyclic and layered. 0 6 1 1 9 15 2 10 5 12 6 15 20 9 The optimal path has cost 0.
Bellman s optimality principle (195) An optimal policy is made of a set of optimal sub-policies. By policy we mean a sequence of decisions, i.e. of value assignments to the variables). A sub-policy is a sub-sequence of decisions, i.e. of value assignments to a subset of the variables. Richard E. Bellman (New York, 1920-19)
The example revisited 0 6 1 1 9 15 2 10 5 12 6 15 20 9 A decision must be taken every time the path is extended from one layer to the next. A policy is a path from 0 to 9. A sub-policy is any other path.
The example revisited 0 6 1 1 9 15 2 10 5 12 6 15 20 9 The optimal policy is made of pairs of optimal sub-policies of the form (0, i) and (i,9). We only need to store optimal sub-policies. All the other sub-policies can be disregarded.
Dominance Given two sub-policies S and S, S dominates S only if all sub-policies that can be appended to S can also be appended to S, with no greater cost; the cost of S is less than the cost of S. When this occurs, S can be neglected from further consideration: it cannot be part of an optimal policy.
The example revisited 0 6 1 1 9 15 10 2 12 5 6 1 15 20 9 In our example, all sub-policies leading to the same node (paths of the form (0, i)), can be completed by appending to them the same sub-policies (paths of the form (i, 9)). Then we need to store only an optimal one among them. All the others are dominated.
The example finally solved 0 0 6 1 1 9 2 6 1 15 10 12 5 6 15 1 20 15 20 0 26 9 0 Let L = {0,...,L} be the set of layers. Let N l the subset of nodes in layer l L. Let w be the weight function of the arcs of the graph. c(0) = 0 l L, l 1 j N l c(j) = min i Nl 1 {c(i)+w ij }
Terminology and correspondence Nodes in the graph are states. Arcs of the graph are state transitions. Paths in the graphs are (sub-)policies. Values c(i) (costs of shortest paths from 0 to i) are labels. The solution process applies these two rules: Initialization (recursion base): c(0) = 0 Label extension (recursive step): l L, l 1 j N l c(j) = min i Nl 1 {c(i)+w ij }. It resembles recursive programming, but it is bottom-up instead of top-down. The execution of a dynamic programming algorithm resembles the evolution of a discrete dynamic system (automaton). The recursive step requires to solve a very easy optimization problem.
Another example Consider the problem of finding a shortest Hamiltonian path from node 0 to node 5 on this graph, that is directed but cyclic. 1 2 0 5 A policy (i.e. a path from 0 to 5) is feasible if and only if it visits all nodes exactly once.
Dominance? 1 2 0 5 Reaching the same node is no longer sufficient for a path (sub-policy) to dominate another: they do not reach the same state.
Dominance 1 2 0 5 Only if they have also visited the same subset of nodes, then they reach the same state. Initialization: c(0,{0}) = 0 Extension: c(j, S) = min i S\{j} {c(i, S\{j})+w ij } j 0.
Complexity The two D.P. algorithms have different complexity. Ex. 1: n. states = n. of distinct values of j = n. of nodes in the graph. State: (i) [last reached node]; Initialization: c(0) = 0; Extension: c(j) = min i Nl 1 {c(i)+w(i, j)} l 1 L, j N l. Ex. 2: n. states = n. distinct values of (j, S) = exponential number in the n. of nodes in the graph. State: (S, i) [visited subset, last reached node]; Initialization: c(0,{0}) = 0; Extension: c(j, S) = min i S\{j} {c(i, S\{j})+w ij } j 0. This is equivalent to solve the same problem as in Example 1 but on a larger state graph (one node for each distinct value of (j, S)). The complexity of the D.P. algorithm depends on the number of arcs in the state graph.
Comparison with branch-and-bound - 0 1 00 01 10 11 001 010 011 100 101 110-0 1 00 01 11 000 001 011 111 B&B: sub-policies only diverge. D.P.: sub-policies sometimes converge. In both cases graphs are directed, acyclic and layered. But one is an arborescence, the other can be not.
Dynamic Programming in three steps In order to design a D.P. algorithm we need to: 1. put the decisions (i.e. the variables) in a sequence (policy); 2. define the state, i.e. the amount of information needed to transform a sub-policy into a complete policy;. find the recursive extension function (REF) to compute the labels of the states, following the sequence. The state of a dynamic system at time t summarizes the past history of the system up to time t, such that the evolution of the system after t depends on the state at time t but not on how it has been reached. Solving a problem with D.P. amounts at defining a suitable state-transition graph, on which we search for an optimal path. This graph is directed, acyclic and layered. The number of its nodes and arcs determines the complexity of the D.P. algorithm.
Shortest path on a digraph Consider the problem of finding a shortest path from node 0 to node 5 on the directed and cyclic graph of the previous example, assuming there are no cycles with negative cost. 1 2 0 5
Shortest path on a digraph 1 2 0 5 We cannot set the node labels once for ever: we need to correct them every time we discover a better path to the same node. However no optimal path can take more than N 1 arcs (in this case, 5 arcs). The labels we compute at each extension are optimal for each number of arcs of the path.
Shortest path on a digraph We reformulate the problem on a directed, acyclic and layered graph, with N layers. 0 0 0 0 State: (l, i) [n. extensions, last reached node]; 1 1 1 Initialization: c(0, 0) = 0; Extension: c(l, j) = 2 2 2. = min i N {c(l 1, i)+w ij } l 1 L, j N. 5 5 5
Bellman-Ford algorithm (195) The label correcting algorithm obtained in this way is known as the Bellman-Ford algorithm. Time complexity: O(N ), as the number of arcs in the layered graph. Space complexity: O(N 2 ), to represent the graph. For each node in N we need to store a cost and a predecessor. No label can be considered permanent until: either all layers have been labeled, or no label update has occurred from one layer to the next one.
Dijkstra algorithm (1959) Under the hypothesis that arc weights are non-negative, we can permanently set the label of minimum cost in each layer. The label setting algorithm obtained in this way is known as the Dijkstra algorithm. 0 1 1 1 2 2 2. 5 5 b
Dijkstra algorithm (1959) State: (l, i) [layer, node]; Initialization: c(0, 0) = 0; last = 0; Permanent(0) = {0}; Extension: c(l, j) = min{c(l 1, j), c(l 1, last(l 1))+w(last(l 1), j)} l 1 L, j Permanent(l 1); Permanent(l) = Permanent(l 1) {min j Permanent(l 1) {c(l, j)}}. Every node has only two predecessors: the time complexity is O(N 2 ).
Shortest trees and arborescences Bellman-Ford s and Dijkstra s algorithms compute the shortest paths arborescence, that contains all shortest paths from the root node to all the others (for the optimality principle). With a slight modification we can obtain Prim s algorithm to compute the minimum spanning tree on weighted (unoriented) graphs. All these are examples of dynamic programming algorithms for polynomial-time combinatorial optimization problems.
D.P. for NP-hard problems: the knapsack problem min z = j N c j x j s.t. j N a j x j b x j {0, 1} j N Assume b is a known constant and all data are integer. The number of possible solutions is 2 N and, even worse, the problem is NP-hard!
D.P. for NP-hard problems: the knapsack problem Policy: sort the items in N (the variables) from x 1 to x N. State: Feasibility depends on the residual capacity; Cost does not depend on previous decisions. Hence the state is given by the last item considered (i) and the capacity used so far (u). R.E.F.: Initialization: z(0, 0) = 0; Extension: z(i, u) = max{z(i 1, u), z(i 1, u a i )+c i }.
D.P. for NP-hard problems: the knapsack problem i-1, u X i =0 i, u X i =1 i, u+a i The state graph has a layer for each item (variable) j N and b + 1 nodes per layer. Complexity: The graph has O(Nb) nodes and each of them has only two predecessors. Then the D.P. algorithm has complexity O(Nb), which is pseudo-polynomial.
D.P. for strongly NP-hard problems: the ESPP Consider the problem of finding the elementary shortest path from s to t in a weighted directed cyclic graph with general weights. 1 1 s -1-1 t 1 2 The constraint that the path must be elementary is necessary in presence of negative cycles, because otherwise a finite optimal solution may not exist.
D.P. for strongly NP-hard problems: the RCESPP Then we need to put into the state the information on which nodes have already been visited. State: (S, i) (visited subset, last reached node); Initialization: c({s}, s) = 0; Extension: c(s, j) = min i S\{j} {c(s\{j}, i)+w ij } j 0. The problem can also have other constraints, represented as consumptions of resources (capacity, time,...): this problem is called the Resource Constrained Elementary Shortest Path Problem. State: (S, q, i) (visited subset, resource consumption, last reached node); Initialization: c({s}, 0, s) = 0; Extension: c(s, q, j) = min i S\{j} {c(s\{j}, q d j, i)+w ij } j 0, where d j is the consumption of resources when visiting node j.
Dominance A label L = (S, q, i ) dominates a label L = (S, q, i ) only if: S S q q i = i c(l ) c(l ). Complexity: the n. of states grows exponentially with the size of the graph.
Bi-directional extension In bi-directional D.P. labels are extended both forward from vertex s to its successors and backward from vertex t to its predecessors. Backward states, recursive extension functions and dominance rules are symmetrical. A path from s to t is detected each time a forward state and a backward state can be feasibly joined through arc (i, j). Let L fw = (S fw, q fw, i) be a forward path and L bw = (S bw, q bw, j) be a backward path. When they are joined, the cost of the resulting s-t path is c(l fw )+w ij + c(l bw ). The two paths can be joined subject to: S fw S bw = q fw + q bw Q
Stopping at a half-way point We can stop extending a path in one direction when we have the guarantee that the remaining part of the path will be generated in the other direction and therefore no optimal solution will be lost. Then the extension of forward and backward labels is stopped while preserving the guarantee that the optimal solution will be found. Without this the bi-directional algorithm would simply produce twice as many labels compared to the mono-directional one. Possible stop criteria: stop when N/2 arcs have been used; stop when half the overall available amount of a resource has been consumed.
Bounding for fathoming labels Bounding is used as in branch-and-bound algorithms to detect labels that are not worth extending because lead to optimal solutions. For instance, given a label L = (S, q, i) it is possible to compute a lower bound LB(L) on the costs to be paid after visiting node i with the remaining amount Q q of available resources. min LB = b j y j subject to j N\S j N\S y j {0, 1} d j y j Q q j N\S where b j is a lower bound on the cost of visiting node j S and d j is the resource consumption when visiting node j. Label L can be fathomed if c(l)+lb UB, where UB is an incumbent upper bound.
State space relaxation State space relaxation was introduced by Christofides, Mingozzi and Toth in 191. The state space S explored by the D.P. algorithm is projected onto a lower dimensional space T, so that each state in T retains the minimum cost among those of its corresponding states in S (assuming minimization). SSR : S T such that c(t) = min s S:SSR(s)=t {c(s)}. In this way: the number of states to be explored is drastically reduced; some infeasible states s in S can be projected onto a state t corresponding to a feasible solution in T. The D.P. algorithm exploring T instead of S is faster and it does not guarantee to find an optimal solution, but rather a dual bound.
SSR of the set of visited nodes We map each state (S, q, i) onto a new state (σ, q, i), where σ = S represents the number of nodes visited (excluding s). Dominance condition S S is replaced by σ σ. Since σ N the D.P. has pseudo-polynomial time complexity. Since the state does no longer keep information about the set of already visited vertices, cycles are no longer forbidden; the solution is guaranteed to be feasible with respect to the resource constraints; is not guaranteed to be elementary. This technique can also be applied to bi-directional search.
Decremental state space relaxation Decremental SSR allows to tune a trade-off between D.P. with SSR (using σ) and exact D.P. (using S): a set V N of critical nodes is defined; S is now a subset of V; σ counts the overall number of visited nodes. For V = N, DSSR is equivalent to exact D.P.. For V =, DSSR is equivalent to D.P. with SSR. The algorithm is executed multiple times and after each iteration the nodes visited more than once are inserted into V. The algorithm ends when the optimal solution is also feasible (the path is elementary). An increasingly better lower bound is produced at each iteration. Computational experiments show that in many cases a critical set containing about 15% of the nodes is enough.