Lower bounds for Howard s algorithm for finding Minimum Mean-Cost Cycles



Similar documents
Finding and counting given length cycles

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

The Goldberg Rao Algorithm for the Maximum Flow Problem

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A Turán Type Problem Concerning the Powers of the Degrees of a Graph

Exponential time algorithms for graph coloring

Optimal Index Codes for a Class of Multicast Networks with Receiver Side Information

Midterm Practice Problems

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

6.852: Distributed Algorithms Fall, Class 2

A simple criterion on degree sequences of graphs

THE SCHEDULING OF MAINTENANCE SERVICE

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Analysis of Algorithms, I

High degree graphs contain large-star factors

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

On an anti-ramsey type result

Stochastic Inventory Control

Minimum cost maximum flow, Minimum cost circulation, Cost/Capacity scaling

Approximation Algorithms

Every tree contains a large induced subgraph with all degrees odd

The positive minimum degree game on sparse graphs

On Nash-solvability of chess-like games

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Large induced subgraphs with all degrees odd

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

SCORE SETS IN ORIENTED GRAPHS

1 Introduction. Dr. T. Srinivas Department of Mathematics Kakatiya University Warangal , AP, INDIA

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA

All trees contain a large induced subgraph having all degrees 1 (mod k)

Notes from Week 1: Algorithms for sequential prediction

OPRE 6201 : 2. Simplex Method

ABSTRACT. For example, circle orders are the containment orders of circles (actually disks) in the plane (see [8,9]).

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

On the k-path cover problem for cacti

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

Competitive Analysis of On line Randomized Call Control in Cellular Networks

arxiv: v1 [math.pr] 5 Dec 2011

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

Multi-layer Structure of Data Center Based on Steiner Triple System

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Definition Given a graph G on n vertices, we define the following quantities:

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Vilnius University. Faculty of Mathematics and Informatics. Gintautas Bareikis

9.2 Summation Notation

Adaptive Online Gradient Descent

Graphical degree sequences and realizations

Discuss the size of the instance for the minimum spanning tree problem.

The degree, size and chromatic index of a uniform hypergraph

SHORT CYCLE COVERS OF GRAPHS WITH MINIMUM DEGREE THREE

On the independence number of graphs with maximum degree 3

A REMARK ON ALMOST MOORE DIGRAPHS OF DEGREE THREE. 1. Introduction and Preliminaries

Completion Time Scheduling and the WSRPT Algorithm

Improved Algorithms for Data Migration

Practical Guide to the Simplex Method of Linear Programming

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

A NOTE ON OFF-DIAGONAL SMALL ON-LINE RAMSEY NUMBERS FOR PATHS

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Equilibrium computation: Part 1

8.1 Min Degree Spanning Tree

The Binary Blocking Flow Algorithm. Andrew V. Goldberg Microsoft Research Silicon Valley goldberg/

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

Offline sorting buffers on Line

Security-Aware Beacon Based Network Monitoring

Lecture 3: Finding integer solutions to systems of linear equations

On three zero-sum Ramsey-type problems

Graphs without proper subgraphs of minimum degree 3 and short cycles

Reinforcement Learning

ON INDUCED SUBGRAPHS WITH ALL DEGREES ODD. 1. Introduction

Reading 13 : Finite State Automata and Regular Expressions

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

11 Ideals Revisiting Z

Chapter 10: Network Flow Programming

An Empirical Study of Two MIS Algorithms

Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA

Mean Ramsey-Turán numbers

Combinatorial 5/6-approximation of Max Cut in graphs of maximum degree 3

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

Clique coloring B 1 -EPG graphs

Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs

Reinforcement Learning

1 if 1 x 0 1 if 0 x 1

COUNTING INDEPENDENT SETS IN SOME CLASSES OF (ALMOST) REGULAR GRAPHS

Strong Ramsey Games: Drawing on an infinite board

1 Approximating Set Cover

Labeling outerplanar graphs with maximum degree three

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen

Removing Even Crossings

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

The chromatic spectrum of mixed hypergraphs

P versus NP, and More

How To Solve A Minimum Set Covering Problem (Mcp)

Fast Algorithms for Monotonic Discounted Linear Programs with Two Variables per Inequality

Cycles and clique-minors in expanders

On Quantum Hamming Bound

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Transcription:

Lower bounds for Howard s algorithm for finding Minimum Mean-Cost Cycles Thomas Dueholm Hansen 1 and Uri Zwick 2 1 Department of Computer Science, Aarhus University. 2 School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Abstract. Howard s policy iteration algorithm is one of the most widely used algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to weighted directed graphs, which may be viewed as Deterministic MDPs (DMDPs), Howard s algorithm can be used to find Minimum Mean-Cost cycles (MMCC). Experimental studies suggest that Howard s algorithm works extremely well in this context. The theoretical complexity of Howard s algorithm for finding MMCCs is a mystery. No polynomial time bound is known on its running time. Prior to this work, there were only linear lower bounds on the number of iterations performed by Howard s algorithm. We provide the first weighted graphs on which Howard s algorithm performs Ω(n 2 ) iterations, where n is the number of vertices in the graph. 1 Introduction Howard s policy iteration algorithm [11] is one of the most widely used algorithms for solving Markov decision processes (MDPs). The complexity of Howard s algorithm in this setting was unresolved for almost 5 years. Very recently, Fearnley [5], building on results of Friedmann [7], showed that there are MDPs on which Howard s algorithm requires exponential time. In another recent breakthrough, Ye [17] showed that Howard s algorithm is strongly polynomial when applied to discounted MDPs, with a fixed discount ratio. Hansen et al. [1] recently improved some of the bounds of Ye and extended them to the 2-player case. Weighted directed graphs may be viewed as Deterministic MDPs (DMDPs) and solving such DMDPs is essentially equivalent to finding minimum meancost cycles (MMCCs) in such graphs. Howard s algorithm can thus be used to solve this purely combinatorial problem. The complexity of Howard s algorithm in this setting is an intriguing open problem. Fearnley s [5] exponential lower bound seems to depend in an essential way on the use of stochastic actions, so it does not extend to the deterministic setting. Similarly, Ye s [17] polynomial upper bound depends in an essential way on the MDPs being discounted and does not extend to the non-discounted case. Supported by the Center for Algorithmic Game Theory at Aarhus University, funded by the Carlsberg Foundation. 1

The MMCC problem is an interesting problem that has various applications. It generalizes the problem of finding a negative cost cycle in a graph. It is also used as a subroutine in algorithms for solving other problems, such as min-cost flow algorithms, (See, e.g., Goldberg and Tarjan [9].) There are several polynomial time algorithms for solving the MMCC problem. Karp [12] gave an O(mn)-time algorithm for the problem, where m is the number of edges and n is the number of vertices in the input graph. Young et al. [18] gave an algorithm whose complexity is O(mn+n 2 log n). Although this is slightly worse, in some cases, than the running time of Karp s algorithm, the algorithm of Young et al. [18] behaves much better in practice. Dasdan [3] experimented with many different algorithms for the MMCC problem, including Howard s algorithm. He reports that Howard s algorithm usually runs much faster than Karp s algorithm, and is usually almost as fast as the algorithm of Young et al. [18]. A more thorough experimental study of MMCC algorithms was recently conducted by Georgiadis et al. [8]. 3 Understanding the complexity of Howard s algorithm for MMCCs is interesting from both the applied and theoretical points of view. Howard s algorithm for MMCC is an extremely simple and natural combinatorial algorithm, similar in flavor to the Bellman-Ford algorithm for finding shortest paths [1, 2],[6] and to Karp s [12] algorithm. Yet, its analysis seems to be elusive. Howard s algorithm also has the advantage that it can be applied to the more general problem of finding a cycle with a minimum cost-to-time ratio (see, e.g., Megiddo [14, 15]). Howard s algorithm works in iteration. Each iteration takes O(m) time. It is trivial to construct instances on which Howard s algorithm performs n iterations. (Recall that n and m are the number of vertices and edges in the input graph.) Madani [13] constructed instances on which the algorithm performs 2n O(1) iterations. No graphs were known, however, on which Howard s algorithm performed more than a linear number of iterations. We construct the first graphs on which Howard s algorithm performs Ω(n 2 ) iterations, showing, in particular, that there are instances on which its running time is Ω(n 4 ), an order of magnitude slower than the running times of the algorithms of Karp [12] and Young et al. [18]. We also construct n-vertex outdegree-2 graphs on which Howard s algorithm performs 2n O(1) iterations. (Madani s [13] examples used Θ(n 2 ) edges.) This example is interesting as it shows that the number of iterations performed may differ from the number of edges in the graph by only an additive constant. It also sheds some more light on the non-trivial, and perhaps non-intuitive behavior of Howard s algorithm. Our examples still leave open the possibility that the number of iterations performed by Howard s algorithm is always at most m, the number of edges. (The graphs on which the algorithm performs Ω(n 2 ) iterations also have Ω(n 2 ) edges.) We conjecture that this is always the case. 3 Georgiadis et al. [8] claim that Howard s algorithm is not robust. From personal conversations with the authors of [8] it turns out, however, that the version they used is substantially different from Howard s algorithm [11]. 2

2 Howard s algorithm for minimum mean-cost cycles We next describe the specialization of Howard s algorithm for deterministic MDPs, i.e., for finding Minimum Mean-Cost Cycles. For Howard s algorithm for general MDPs, see Howard [11], Derman [4] or Puterman [16]. Let G = (V, E, c), where c : E R, be a weighted directed graph. We assume that each vertex has a unique serial number associated with. We also assume, without loss of generality, that each vertex v V has at least one outgoing edge. k 1 i= c(v i, v i+1 ), If C = v v 1 v k 1 v is a cycle in G, we let val(c) = 1 k where v k = v, be its mean cost. The vertex on C with the smallest serial number is said to be the head of the cycle. Our goal is to find a cycle C that minimizes val(c). A policy π is a mapping π : V V such that (v, π(v)) E, for every v V. A policy π, defines a subgraph G π = (V, E π ), where E π = (v, π(v)) v V }. As the outdegree of each vertex in G π is 1, we get that G π is composed of a collection of disjoint directed cycles with directed paths leading into them. Given a policy π, we assign to each vertex v V a value val π (v ) and a potential pot π (v ) in the following way. Let P π (v ) = v v 1, be the infinite path defined by v i = π(v i 1 ), for i >. This infinite path is composed of a finite path P leading to a cycle C which is repeated indefinitely. If v r = v r+k is the first vertex visited for the second time, then P = v v 1 v r and C = v r v r+1 v r+k. We let v l be the head of the cycle C. We now define val π (v ) = val(c) = 1 k 1 k i= c(v r+i, v r+i+1 ), pot π (v ) = l 1 i= (c(v i, v i+1 ) val(c)). In other words, val π (v ) is the mean cost of C, the cycle into which P π (v ) is absorbed, while pot π (v ) is the distance from v to v l, the head of this cycle, when the mean cost of the cycle is subtracted from the cost of each edge. It is easy to check that values and potentials satisfy the following equations: val π (v) = val π (π(v)), pot π (v) = c(v, π(v)) val π (v) + pot π (π(v)). The appraisal of an edge (u, v) E is defined as the pair: A π (u, v) = ( val π (v), c(u, v) val π (v) + pot π (v) ). Howard s algorithm starts with an arbitrary policy π and keeps improving it. If π is the current policy, then the next policy π produced by the algorithm is defined by π (u) = arg min A π (u, v). v:(u,v) E In other words, for every vertex the algorithm selects the outgoing edge with the lowest appraisal. (In case of ties, the algorithm favors edges in the current policy.) As appraisals are pairs, they are compared lexicographically, i.e., (u, v 1 ) is better 3

than (u, v 2 ) if and only if A π (u, v 1 ) A π (u, v 2 ), where (x 1, y 1 ) (x 2, y 2 ) if and only if x 1 < x 2, or x 1 = x 2 and y 1 < y 2. When π = π, the algorithm stops. The correctness of the algorithm follows from the following two lemmas whose proofs can be found in Howard [11], Derman [4] and Puterman [16]. Lemma 1. Suppose that π is obtained from π by a policy improvement step. Then, for every v V we have (val π (v), pot π (v)) (val π (v), pot π (v)). Furthermore, if π (v) π(v), then (val π (v), pot π (v)) (val π (v), pot π (v)). Lemma 2. If policy π is not modified by an improvement step, then val π (v) is the minimum mean weight of a cycle reachable from v in G. Furthermore by following edges of π from v we get into a cycle of this minimum mean weight. Each iteration of Howard s algorithm takes only O(m) time and is not much more complicated than an iteration of the Bellman-Ford algorithm. 3 A quadratic lower bound We next construct a family of weighted directed graphs for which the number of iterations performed by Howard s algorithm is quadratic in the number of vertices. More precisely, we prove the following theorem: Theorem 1. Let n and m be even integers, with 2n m n2 4 + 3n 2. There exists a weighted directed graph with n vertices and m edges on which Howard s algorithm performs m n + 1 iterations. All policies generated by Howard s algorithm, when run on the instances of Theorem 1, contain a single cycle, and hence all vertices have the same value. Edges are therefore selected for inclusion in the improved policies based on potentials. (Recall that potentials are essentially adjusted distances.) The main idea behind our construction, which we refer to as the dancing cycles construction, is the use of cycles of very large costs, so that Howard s algorithm favors long, i.e., containing many edges, paths to the cycle of the current policy attractive, delaying the discovery of better cycles. Given a graph and a sequence of policies, it is possible check, by solving an appropriate linear program, whether there exist costs for which Howard s algorithm generates the given sequence of policies. Experiments with a program that implements this idea helped us obtain the construction presented below. For simplicity, we first prove Theorem 1 for m = n2 4 + 3n 2. We later note that the same construction works when removing pairs of edges, which gives the statement of the theorem. 3.1 The Construction For every n we construct a weighted directed graph G n = (V, E, c), on V = 2n vertices and E = n 2 + 3n edges, and an initial policy π such that Howard s algorithm performs n 2 + n + 1 iterations on G n when it starts with π. 4

The graph G n itself is fairly simple. (See Figure 1.) Most of the intricacy goes into the definition of the cost function c : E R. The graph G n is composed of two symmetric parts. To highlight the symmetry we let V = vn,, v1, v1, 1, vn}. 1 Note that the set of vertices is split in two, according to whether the superscript is or 1. In order to simplify notation when dealing with vertices with different superscripts, we sometimes refer to v1 as v 1 and to v1 1 as v. The set of edges is: E = (v i, v j ), (v 1 i, v 1 j ) 1 i n, i 1 j n}. We next describe a sequence of policies Π n of length n 2 +n+1. We then construct a cost function that causes Howard s algorithm to generate this long sequence of policies. For 1 l r n and s, 1}, and for l = 1, r = and s =, we define a policy π s : π(v s i) t = vi 1 t vr t vn t for t s or i > l for t = s and i = l for t = s and i < l The policy π s contains a single cycle vs rvr 1 s vl svs r which is determined by its defining edge e s = (vs l, vs r). As shown in Figure 1, all vertices to the left of vl s, the head of the cycle, choose an edge leading furthest to the right, while all the vertices to the right of vl s choose an edge leading furthest to the left. The sequence Π n is composed of the policies π s, where 1 l r n and s, 1}, or l = 1, r = and s =, with the following ordering. Policy π s1 l 1,r 1 precedes policy π s2 l 2,r 2 in Π n if and only if l 1 > l 2, or l 1 = l 2 and r 1 > r 2, or l 1 = l 2 and r 1 = r 2 and s 1 < s 2. (Note that this is a reversed lexicographical ordering on the triplets (l 1, r 1, 1 s 1 ) and (l 2, r 2, 1 s 2 ).) For every 1 l r n and s, 1}, or l = 1, r = and s =, we let f(l, r, s) be the index of π s in Π n, where indices start at. We can now write: ( ((π ) ) Π n = (π k ) n2 +n s 1 l n 1 k= = n l,n r), π s= 1, r= l= We refer to Figure 1 for an illustration of G 4 and the corresponding sequence Π 4. 3.2 The edge costs Recall that each policy π k = π s is determined by an edge e k = e s = (vs l, vs r), where k = f(l, r, s). Let N = n 2 + n. We assign the edges the following exponential costs: c(e k ) = c(v s l, vs r) = n N k, k < N, c(v 1, v 1 1) = c(v 1 1, v 1) = n N, c(vi s, vs i 1 ) =, 2 i n, s, 1}. 5

8 7 14 6 5 13 v4 18 v3 12 v2 4 v1 M v1 1 3 v2 1 11 v3 1 17 v4 1 2 16 1 2 M 1 9 15 19 π : π4,4 : π 2 : π3,4 : π 4 : π3,3 : π 6 : π2,4 : π 8 : π2,3 : π 1 : π2,2 : π 12 : π1,4 : π 14 : π1,3 : π 16 : π1,2 : π 18 : π1,1 : π 1 : π4,4 1 : π 3 : π3,4 1 : π 5 : π3,3 1 : π 7 : π2,4 1 : π 9 : π2,3 1 : π 11 : π2,2 1 : π 13 : π1,4 1 : π 15 : π1,3 1 : π 17 : π1,2 1 : π 19 : π1,1 1 : π 2 : Fig. 1. G 4 and the corresponding sequence Π 4. Π 4 is shown in left-to-right order. Policies π f(,s) = π s are shown in bold, with e s being highlighted. Numbers below edges define costs. means, k > means n k, and M means n N. 6

We claim that with these exponential edge costs Howard s algorithm does indeed produce the sequence Π n. To show that π k+1 is indeed the policy that Howard s algorithm obtains by improving π k, we have to show that π k+1 (u) = arg min A πk (u, v), u V. (1) v:(u,v) E For brevity, we let c(vi s, vs j ) = cs i,j. The only cycle in π k = π s is Cs = vrv s r 1 s vl svs r. As c s i,i 1 =, for 2 i n and s, 1}, we have µ s = val(c s ) = c s r l + 1. As c s = nn k and all cycles in our construction are of size at most n we have n N k 1 µ s n N k. As all vertices have the same value µ s under π k = π s, edges are compared based on the second component of their appraisals A πk (u, v). Hence, (1) becomes: π k+1 (u) = arg min v:(u,v) E c(u, v) + pot πk (v), u V. (2) Note that an edge (u, v 1 ) is preferred over (u, v 2 ) if and only if c(u, v 1 ) c(u, v 2 ) < pot πk (v 2 ) pot πk (v 1 ). Let vl s be the head of the cycle Cs. Keeping in mind that cs i,i 1 =, for 2 i n and s, 1}, it is not difficult to see that the potentials of the vertices under policy π s are given by the following expression: c s i,n (n l + 1)µs if t = s and i < l, pot π s (vi) t = (i l)µ s if t = s and i l, c t 1, iµ s + pot π s (vs 1) if t s. It is convenient to note that we have pot π s (vi t ), for every 1 l r n, 1 i n and s, t, 1}. In the first case (t = s and i < l), this follows from the fact that (vi s, vs n) has a larger index than (vl s, vs r) (note that i < l). In the third case (t = s), it follows from the fact that c t 1, = n N <. 3.3 The dynamics The proof that Howard s algorithm produces the sequence Π n is composed of three main cases, shown in Figure 2. Each case is broken into several subcases. Each subcase on its own is fairly simple and intuitive. Case 1. Suppose that π k = π. We need to show that π k+1 = π 1. We have the following four subcases, shown at the top of Figure 2. 7

(1) π : v n v r (1.1) v l v11 v 1 (1.2) (1.3) vl 1 vr 1 (1.4) (1.5) v 1 n (2) π 1 : v n v r 1 v l v 1 v 1 l v 1 r v 1 n (2.5) v11 (2.4) (2.3) (2.2) (2.1) π 1 : v n v r 1 v l v 1 v 1 1 v 1 l v 1 r v 1 n (3) π 1 l,l : v n v l 1 v 1 v 1 l v 1 n (3.5) v11 (3.4) (3.3) (3.2) (3.1) π l 1,n : v n v l 1 v 1 v 1 1 v 1 l v 1 n Fig. 2. Policies of transitions (1) π to π 1, (2) π 1 to π 1, and (3) π 1 l,l to π l 1,n. Vertices of the corresponding subcases have been annotated accordingly. Case 1.1. We show that π k+1 (vi ) = v i 1, for 2 i n. We have to show that (vi, v i 1 ) beats (v i, v j ), for every 2 i j n, or in other words that c i,i 1 + pot π (v i 1) < c i,j + pot π (v j ), 2 i j n. Case 1.1.1. Assume that j < l. We then have pot π (v i 1 ) = c i 1,n (n l + 1)µ, pot π (v j ) = c j,n (n l + 1)µ. 8

Recalling that c i,i 1 =, the inequality that we have to show becomes c i 1,n < c i,j + c j,n, 2 i j < l n. As the edge (v i 1, v n) comes after (v i, v j ) in our ordering, we have c i 1,n < c i,j. The other term on the right is non-negative and the inequality follows easily. Case 1.1.2. Assume that i 1 < l j. We then have and the required inequality becomes pot π (v i 1 ) = c i 1,n (n l + 1)µ, pot π (v j ) = (j l)µ, c i 1,n < c i,j + (n j + 1)µ, 1 i 1 < l j n. As j n, the inequality again follows from the fact that c i 1,n < c i,j. Case 1.1.3. Assume that l i 1 < j. We then have pot π (v j ) pot π (v i 1) = (i j 1)µ, and the required inequality becomes (j i + 1)µ < c i,j, 1 l i 1 < j n. This inequality holds as (j i + 1)µ < n c c i,j. The last inequality follows as (vl, v r) appears after (vi, v j ) in our ordering. (Note that we are using here, for the first time, the fact that the weights are exponential.) Case 1.2. We show that π k+1 (v 1) = v 1 1. We have to show that c 1, + pot π (v 1 1) < c 1,j + pot π (v j ), 1 j n. This inequality is easy. Note that c 1, = n N, pot π (v 1 1 ), while c 1,j > and pot π (v j ) > nn. Case 1.3. We show that π k+1 (v 1 i ) = v1 n, for 1 i < l. We have to show that c 1 i,n c 1 i,j < pot π (v 1 j ) pot π (v 1 n), 1 i < l, i 1 j < n. Case 1.3.1. Suppose that i = 1 and j =. We need to verify that c 1 1,n c 1 1, < pot π (v 1 ) pot π (v 1 n). As pot π (v 1 ) = pot π (v 1) and pot π (v 1 n) = c 1 1, nµ + pot π (v 1), we have to verify that c 1 1,n < nµ, which follows from the fact that l > 1 and that (vl, v r) has a smaller index than (v1, 1 v1 n ). Case 1.3.2. Suppose that j 1. We have to verify that c 1 i,n c 1 i,j < pot π (v 1 j ) pot π (v 1 n) = (n j)µ, 1 i < l, i 1 j < n. 9

As in Case 1.3.1 we have c 1 i,n µ while c1 i,j >. Case 1.4. We show that π k+1 (v 1 l ) = v1 r. We have to show that c 1 c 1 l,j < pot π (v 1 j ) pot π (v 1 r), l 1 j n, j r. Case 1.4.1. Suppose that l = 1 and j =. As in case 1.3.1, the inequality becomes c 1 1,r < rµ which is easily seen to hold. Case 1.4.2. Suppose that l 1 j < r and < j. We need to show that c 1 c 1 l,j < pot π (v 1 j ) pot π (v 1 r) = (r j)µ, l 1 j n, < j < r. As (vl 1, v1 r) immediately follows (vl, v r) in our ordering, we have c 1 = n 1 c. Thus c 1 µ (r j)µ. As c1 l,j >, the inequality follows. Case 1.4.3. Suppose that r < j. We need to show that c 1 c 1 l,j < pot π (v 1 j ) pot π (v 1 r) = (r j)µ, r < j n, or equivalently that c 1 l,j c1 < (j r)µ, for r < j n,. This follows from the fact that (vl 1, v1 r) comes after (vl 1, v1 j ) in the ordering and that c1 >. Case 1.5. We show that π k+1 (vi 1) = v1 i 1, for l < i n. We have to show that c 1 i,i 1 c 1 i,j < pot π s (v 1 j ) pot π s (v 1 i 1) = (i j 1)µ s, l < i j n. This is identical to case 1.1.3. Case 2. Suppose that π k = π 1 and l < r. We need to show that π k+1 = π 1. The proof is very similar to Case 1 and is omitted. Case 3. Suppose that π k = π 1 l,l. We need to show that π k+1 = π l 1,n. The proof is very similar to Cases 1 and 2 and is omitted. 3.4 Remarks For any l r < n, if the edges (v l, v r) and (v 1 l, v1 r) are removed from G n, then Howard s algorithm skips π and π1, but otherwise Π n remains the same. This can be repeated any number of times, essentially without modifying the proof given in Section 3.3, thus giving us the statement of Theorem 1. Let us also note that the costs presented here have been chosen to simplify the analysis. It is possible to define smaller costs, but assuming c s i,i 1 = for s, 1} and 2 i n, which can always be enforced using a potential transformation, any costs generating Π n must satisfy the following subset of the inequalities from cases 1.4.2 and 2.4.2: c 1 1,r c 1 1,r 1 < µ 1,r = c 1,r r, 2 r n c 1,r 1 c 1,r 2 < µ 1 1,r = c1 1,r r, 3 r n. 1

v1 v2 v3 v4 v5 v6 v7 v8 v9 v1 v11 28 Stage 1 Stage 3 Stage 2 1 8 6 3 1 4 2 5 7 9 Stage 4 Fig. 3. G 5 and the corresponding sequence of policies. Let w 2r s = c s 1,r. Joining the inequalities we get: k w k > (w k 1 w k 3 ), 4 k 2n. 2 It is then easy to see that for integral costs the size of the costs must be exponential in n. 4 A 2n O(1) lower bounds for outdegree-2 graphs In this section we briefly mention a construction of a sequence of outdegree-2 DMDPs on which the number of iterations of Howard s algorithm is only two less than the total number of edges. Theorem 2. For every n 3 there exists a weighted directed graph G n = (V, E, c), where c : E R, with V = 2n + 1 and E = 2 V, on which Howard s algorithm performs E 2 = 4n iterations. The graph used in the proof of Theorem 2 is simply a bidirected cycle on 2n + 1 vertices. (The graph G 5 is depicted at the top of Figure 3.) The proof of Theorem 2 can be found in Appendix A. 5 Concluding remarks We presented a quadratic lower bound on the number of iterations performed by Howard s algorithm for finding Minimum Mean-Cost Cycles (MMCCs). Our 11

lower bound is quadratic in the number of vertices, but is only linear in the number of edges. We conjecture that this is best possible: Conjecture The number of iterations performed by Howard s algorithm, when applied to a weighted directed graph, is at most the number of edges in the graph. Proving (or disproving) our conjecture is a major open problem. Our lower bounds shed some light on the non-trivial behavior of Howard s algorithm, even on deterministic DMDPs, and expose some of the difficulties that need to be overcome to obtain non-trivial upper bounds on its complexity. Our lower bounds on the complexity of Howard s algorithm do not undermine the usefulness of Howard s algorithm, as the instances used in our quadratic lower bound are very unlikely to appear in practice. Acknowledgement We would like to thank Omid Madani for sending us his example [13], and to Mike Paterson for helping us to obtain the results of Section 4. We would also like to thank Daniel Andersson, Peter Bro Miltersen, as well as Omid Madani and Mike Paterson, for helpful discussions on policy iteration algorithms. References 1. R.E. Bellman. Dynamic programming. Princeton University Press, 1957. 2. R.E. Bellman. On a routing problem. Quarterly of Applied Mathematics, 16:87 9, 1958. 3. A. Dasdan. Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Trans. Des. Autom. Electron. Syst., 9(4):385 418, 24. 4. C. Derman. Finite state Markov decision processes. Academic Press, 1972. 5. J. Fearnley. Exponential lower bounds for policy iteration. In Proc. of 37th ICALP, 21. Preliminaey version available at http://arxiv.org/abs/13.3418v1. 6. L. R. Ford, Jr. and D. R. Fulkerson. Maximal flow through a network. Canadian Journal of Mathematics, 8:399 44, 1956. 7. O. Friedmann. An exponential lower bound for the parity game strategy improvement algorithm as we know it. In Proc. of 24th LICS, pages 145 156, 29. 8. L. Georgiadis, A.V. Goldberg, R.E. Tarjan, and R.F.F. Werneck. An experimental study of minimum mean cycle algorithms. In Proc. of 11th ALENEX, pages 1 13, 29. 9. A.V. Goldberg and R.E. Tarjan. Finding minimum-cost circulations by canceling negative cycles. Journal of the ACM, 36(4):873 886, 1989. 1. T.D. Hansen, P.B. Miltersen, and U. Zwick. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. CoRR, abs/18.53, 21. 11. R.A. Howard. Dynamic programming and Markov processes. MIT Press, 196. 12. R.M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete Mathematics, 23(3):39 311, 1978. 13. O. Madani. Personal communication, 28. 12

14. N. Megiddo. Combinatorial optimization with rational objective functions. Mathematics of Operations Research, 4(4):414 424, 1979. 15. N. Megiddo. Applying parallel computation algorithms in the design of serial algorithms. Journal of the ACM, 3(4):852 865, 1983. 16. M.L. Puterman. Markov decision processes. Wiley, 1994. 17. Y. Ye. The simplex method is strongly polynomial for the Markov decision problem with a fixed discount rate. Available at http://www.stanford.edu/ yyye/simplexmdp1.pdf, 21. 18. N.E. Young, R.E. Tarjan, and J.B. Orlin. Faster parametric shortest path and minimum-balance algorithms. Networks, 21:25 221, 1991. 13

APPENDIX A A 2n O(1) lower bounds for outdegree-2 graphs A.1 The construction Let V = v 1,, v 2n+1 }. We let v 2n+2 = v 1 and v = v 2n+1. Let E = (v i, v i 1 ), (v i, v i+1 ) 1 i 2n + 1}. The graph G n is simply a bidirected cycle on 2n + 1 vertices. (The graph G 5 is depicted at the top of Figure 3.) We define costs, for n 3, as follows: i 1 2 i n 2 n 1 n n + 1 n + 2 n + 3 i 2n 2n + 1 c(v i, v i 1 ) 1 2(n i + 1) 3 1 4 2 2(i n) + 1 i 1 i 2n 2n + 1 2n c(v i, v i+1 ) k=n c(v k, v k 1 ) We describe a sequence of policies Π n = (π j ) 4n j=1 of length 4n generated by running Howard s algorithm on G n, starting with π 1. Let us first introduce some notation. We describe a policy π as a sequence of indices [i 1,, i k ], such that for every 1 j 2n + 1, if i l 1 j < i l where l is odd, then π(v j ) = v j+1. Otherwise, π(v j ) = v j 1. We always assume that i = 1 and i k+1 = 2n + 2. In other words, for every j [i, i 1 ) [i 2, i 3 ), we have π(v j ) = v j+1, while for every j [i 1, i 2 ) [i 3, i 4 ), we have π(v j ) = v j 1 The sequence Π n is composed of four subsequences Πn, 1 Πn, 2 Πn, 3 Πn 4 that we call stages. Thus, for example: Π 1 n = [1, 2n + 1], ( [2n k + 1], [k + 1, 2n] ) n 2 k=1 Π 2 n = [n, n + 1, n + 2], [n, 2n], [n] Π 3 n = ( [1, k + 1, n, 2n k, 2n + 1] ) n 2 k= Π 4 n = ( [1, n + 2 k, 2n + 1] ) n+1 k=1 Π 1 n = [1, 2n + 1], [2n], [2, 2n], [2n 1], [3, 2n],, [n + 3], [n 1, 2n] Doing a case analysis, similar to that of Section 3.3, it is possible to prove that the sequence Π n is indeed the sequence of policies generated by running Howard s algorithm on G n, starting at [1, 2n + 1]. The proof can be found in Appendix A. For alternative pictorial description of the sequence Π n, for n = 15, is given in Figure 4 of the appendix. 14

APPENDIX 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 Fig. 4. Numerical representation of the sequence Π 15. Vertices are ordered as in Figure 3. 1 means left and 2 means right. Background colors have been added to roughly highlight the structure of the sequence. The lines separate different stages. 15

APPENDIX A.2 Correctness of the construction Before proving Theorem 2 we first describe a bit of the intuition behind the construction. An alternative pictorial description of the sequence Π n, for n = 15, is given in Figure 4. Consulting both Figure 3 and Figure 4, we see that during stage 1, the central vertices switch their actions back and forth. All the policies in this stage contain a single cycle, but this cycle jumps around from side to side. At the end of Stage 1, the cycle is almost at the middle. The first policy of Stage 2 contains two cycles. The third and last policy of Stage 2 again contains a single cycle. All policies of Stage 3 contain two cycles, a middle cycle and a right cycle. The middle cycle has a larger value. The right cycle is the minimum mean cost cycle of the graph. The vertices to the left of the middle cycle are fooled into going left, making them switch back during stage 4 for the cheaper path to the final cycle. We also note that by adding a self-loop to every vertex and modifying the sequence we have managed to find costs that realize sequences of length 3n 5 for a graph with n states, n being odd, and 3n edges. In order to prove that Π n is indeed generated by Howard s algorithm, we must show that: 1 j 4n 1 1 i 2n+1 : π j+1 (v i ) = arg min (val πj (v k ), c(v i, v k )+pot πj (v k )) v k : k i 1,i+1} Where arg min is applied lexicographically, such that value weighs higher than potential. If π j+1 (v i ) = v i+ti, t i 1, 1}, we can, requiring strict inequality, also express this as: 1 j 4n 1 1 i 2n + 1 : val πj (v i+ti ) < val πj (v i ti ) val πj (v i+ti ) = val πj (v i ti ) and c(v i, v i+ti )+pot πj (v i+ti ) < c(v i, v i ti )+pot πj (v i ti ) We will argue about the four stages seperately. One fact that we will be using repeatedly is the following: Fact Let π be the current policy, and let π be the policy generated from π in one step of Howard s policy iteration algorithm. If for some 1 i 2n + 1, π(v i 1 ) = v i and π(v i ) = v i+1 then: π (v i ) = v i 1 c(v i, v i 1 ) + c(v i 1, v i ) 2 Similarly, for π(v i ) = v i 1 and π(v i+1 ) = v i we get: π (v i ) = v i+1 c(v i, v i+1 ) + c(v i+1, v i ) 2 < val π (v i ) < val π (v i ) To see this, observe that the potential of v i 1 in the first case can be expressed as: pot π (v i 1 ) = c(v i 1, v i ) val π (v i ) + pot π (v i ) = c(v i 1, v i ) + c(v i, v i+1 ) 2val π (v i ) + pot π (v i+1 ) or 16

APPENDIX Since v i 1 and v i+1 have same value in π, we see that π (v i ) = v i 1 if and only if: c(v i, v i 1 ) + pot π (v i 1 ) < c(v i, v i+1 ) + pot π (v i+1 ) c(v i, v i 1 )+c(v i 1, v i )+c(v i, v i+1 ) 2val π (v i )+pot π (v i+1 ) < c(v i, v i+1 )+pot π (v i+1 ) From which the result follows. The second case is shown similarly. Let val(c i ) = (c(v i, v i 1 )+c(v i 1, v i ))/2. We note that for all i = 1,, n 2 we have: val(c i ) > val(c 2n+1 i ) > val(c i+1 ) and val(c n+3 ) > val(c n+1 ) > val(c n 1 ) > val(c n+2 ) > val(c n ) > val(c 2n+1 ) which specifies the order of all cycles of length two of G n according to value. We will say that a vertex v V switches from π to π if π(v) π (v). A.3 Stage 1 Let us first note that in all of stage 1, there is only one cycle in every policy, meaning that all changes are based solely on differences in potential. Case 1, [1, 2n + 1] to [2n]: Since the transition from π 1 = [1, 2n + 1] to π 2 = [2n] is a bit different from the rest we will handle it separately. Let us first note that by Fact all vertices v 1,, v 2n 1 switch from π 1 to π 2. We must show that v 2n+1 also switches, whereas v 2n does not. Let us first describe potentials of v 1, v 2n 1 and v 2n in terms of the potential of v 2n+1 = v : pot π1 (v 1 ) = c(v 1, v ) val π1 (v ) + pot π1 (v ) pot π1 (v 2n 1 ) = pot π1 (v 2n ) = 2n 1 i=1 i=1 c(v i, v i 1 ) (2n 1)val π1 (v ) + pot π1 (v ) c(v i, v i 1 ) (2n)val π1 (v ) + pot π1 (v ) To see that v 2n does not switch, we observe that: c(v 2n, v 2n 1 ) + pot π1 (v 2n 1 ) < c(v 2n, v 2n+1 ) + pot π1 (v ) 2n 1 c(v 2n, v 2n 1 ) + c(v i, v i 1 ) (2n 1)val π1 (v ) + pot π1 (v ) < pot π1 (v ) i=1 i=1 c(v i, v i 1 ) < (2n 1) 1 + 2n i=n c(v i, v i 1 ) 2 17

APPENDIX The last inequality is easily satisfied since n 1 i=1 c(v i, v i 1 ) < 2n i=n c(v i, v i 1 ). Similarly, to see that v 2n+1 does switch, we observe that: i=1 c(v 2n+1, v 2n ) + pot π1 (v 2n ) < c(v 2n+1, v 1 ) + pot π1 (v 1 ) pot π1 (v 2n ) < pot π1 (v 1 ) + i=n c(v i, v i 1 ) c(v i, v i 1 ) (2n)val π1 (v ) < c(v 1, v ) val π1 (v ) + i=n n 1 c(v i, v i 1 ) < (2n 1) 1 + 2n i=n c(v i, v i 1 ) 2 i=2 c(v i, v i 1 ) Which is again satisfied. Hence, we have shown that the transition from π 1 to π 2 is correct. Case 2, [2n k + 1] to [1 + k, 2n]: For 1 k n 2, let π = [2n k + 1] we will show that the next policy π will, indeed, be [1+k, 2n]. For all 2 i 2n, v i switches correctly due to Fact. That is, v i s with i k or 2n k +1 i 2n 1 do not switch whereas the remaining vertices do switch, since the values of the cycles C k,, C 2n k and C 2n+1 are lower than that of C 2n k+1. Thus, it remains to show that v 1 does not switch and v 2n+1 does switch, which will be similar to the analysis in the transition from π 1 to π 2 in case 1. Let us first show that v 1 does not switch. Rather than defining all the relevant potentials in terms of the potential of v 2n k+1, we simply note that going to v 2 both ensures a longer and cheaper path to v 2n k+1 than going to v, and since val(c 2n k+1 ) > this means that v 2 is the preferred choice. To show that v 2n+1 switches we need to be a bit more careful. Let us define the potential of v 1 and v 2n in terms of the potential of v 2n k+1 : pot π (v 1 ) = pot π (v 2n k+1 ) (2n k)val(c 2n k+1 ) pot π (v 2n ) = pot π (v 2n k+1 ) + Now, v 2n+1 switches if and only if: i=2n k+2 c(v i, v i 1 ) (k 1)val(C 2n k+1 ) c(v 2n+1, v 1 ) + pot π (v 1 ) < c(v 2n+1, v 2n ) + pot π (v 2n ) c(v 2n+1, v 1 ) (2n k)val(c 2n k+1 ) < i=n c(v i, v i 1 ) < i=2n k+2 i=2n k+2 c(v i, v i 1 ) (k 1)val(C 2n k+1 ) c(v i, v i 1 ) + (2n + 1)val(C 2n k+1 ) 18

APPENDIX 2n k+1 i=n 2n k+1 c(v i, v i 1 ) = 7 + i=n+3 c(v i, v i 1 ) < 7 + (n k 1) 2val(C 2n k+1 ) < (2n + 1)val(C 2n k+1 ) which is always satisfied. Case 3, [k + 1, 2n] to [2n k]: For 1 k n 3, let π = [k + 1, 2n] we will show that the next policy π will, indeed, be [2n k]. We will proceed very similarly to case (ii). We first observe that all vertices but v 2n 1 and v 2n switch correctly due to Fact. Showing that v 2n 1 does not switch is done in the same way as showing that v 2n does not switch in case 1. We first express the potentials of v 2n 2 and v 2n in terms of the potential of v k+1 : pot π (v 2n 2 ) = 2n 2 i=k+2 c(v i, v i 1 ) (2n k 3)val(C k+1 ) + pot π (v k+1 ) pot π (v 2n ) = c(v 2n+1, v 1 ) (k + 2)val(C k+1 ) + pot π (v k+1 ) We then observe that v 2n 1 does not switch if and only if: c(v 2n 1, v 2n 2 ) + pot π (v 2n 2 ) < c(v 2n 1, v 2n ) + pot π (v 2n ) 2n 2 c(v 2n 1, v 2n 2 )+ n 1 i=k+2 2n 1 i=k+2 i=k+2 c(v i, v i 1 ) < c(v i, v i 1 ) (2n k 3)val(C k+1 ) < c(v 2n+1, v 1 ) (k+2)val(c k+1 ) i=n c(v i, v i 1 ) + (2n 2k 5)val(C k+1 ) c(v i, v i 1 ) < (n k 2) 2val(C k+2 ) < c(v 2n, v 2n 1 )+(2n 2k 5)val(C k+1 ) which is satisfied since c(v 2n, v 2n 1 ) > val(c k+1 ) and val(c k+1 ) > val(c k+2 ). Similarly, to show that v 2n switches we first need the potentials of v 2n 1 and v 2n+1 : pot π (v 2n 1 ) = 2n 1 i=k+2 c(v i, v i 1 ) (2n k 2)val(C k+1 ) + pot π (v k+1 ) pot π (v 2n+1 ) = c(v 2n+1, v 1 ) (k + 1)val(C k+1 ) + pot π (v k+1 ) We observe that v 2n switches if and only if: c(v 2n, v 2n 1 ) + pot π (v 2n 1 ) < c(v 2n, v 2n+1 ) + pot π (v 2n+1 ) 2n 1 c(v 2n, v 2n 1 )+ i=k+2 c(v i, v i 1 ) (2n k 2)val(C k+1 ) < c(v 2n+1, v 1 ) (k+1)val(c k+1 ) 19

n 1 i=k+2 i=k+2 c(v i, v i 1 ) < i=n APPENDIX c(v i, v i 1 ) + (2n 2k 3)val(C k+1 ) c(v i, v i 1 ) < (n k 2) 2val(C k+2 ) < (2n 2k 3)val(C k+1 ) which is satisfied since val(c k+1 ) > and val(c k+1 ) > val(c k+2 ). Case 4, [n 1, 2n] to [n, n+1, n+2]: In fact, the correctness of the transition from [n 1, 2n] to [n, n+1, n+2] is shown in exactly the same way as in case 3. We just want to note, that since c(v n 1, v n 2 ) = 3, c(v n, v n 1 ) = 1, c(v n+1, v n ) = 4 and c(v n+2, v n+1 ) = 2, Fact causes two cycles to be formed. A.4 Stage 2 There are only three policies in stage 2, and we will argue for the correctness of the transitions of each of these three policies to the next separately. In the transition from [n, n + 1, n + 2] to [n, 2n] the correct behaviour of all vertices but v 1, v n, v n+1 and v 2n+1 are ensured by Fact. The correct behaviour of the remaining four vertices follows from val(c n ) < val(c n+2 ). That is, the switches are based on the vertices having different values rather than on differences in potential. In the transition from [n, 2n] to [n] the correct behaviour of all vertices but v 2n 1 and v 2n are ensured by Fact. Rather than specifying the potentials causing v 2n to switch, we note that the two paths going from v 2n to v n 1 have same cost, but that the path going through v 2n 1 is one step longer, which causes the switch. Showing that v 2n 1 does not switch is essentially the same as showing that v 2n 1 does not switch in the transition from [n 1, 2n] to [n, n + 1, n + 2] in case 4 of stage 1, for which the calculations are shown in case 3. The only difference is that the current value is val(c k+2 ) rather than val(c k+1 ), but the final inequalities are satisfied all the same. Let π = [n]. In the transition from [n] to [1, 1, n, 2n, 2n + 1] = [n, 2n, 2n + 1] the correct behaviour of all vertices but v 1 and v 2n+1 are ensured by Fact. v 2n+1 does not switch because the two paths going from v 2n+1 to v n 1 have same costs, whereas the path going through v 2n is three steps longer. To show that v 1 does also not switch, we will need to consider the potentials of v 2 and v 2n+1 in terms of the potential of v n 1 : pot π (v 2 ) = (n 3)val(C n ) + pot π (v n 1 ) pot π (v 2n+1 ) = i=n It follows that v 1 does not switch if and only if: c(v i, v i 1 ) (n + 2)val(C n ) + pot π (v n 1 ) c(v 1, v 2 ) + pot π (v 2 ) < c(v 1, v 2n+1 ) + pot π (v 2n+1 ) 2