Analysis of Different MMAS ACO Algorithms on Unimodal Functions and Plateaus



Similar documents

arxiv: v1 [math.pr] 5 Dec 2011

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

The Trip Scheduling Problem

1 if 1 x 0 1 if 0 x 1

The Goldberg Rao Algorithm for the Maximum Flow Problem

Adaptive Online Gradient Descent

Offline sorting buffers on Line

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

Approximation Algorithms

The van Hoeij Algorithm for Factoring Polynomials

Guessing Game: NP-Complete?

6.852: Distributed Algorithms Fall, Class 2

8.1 Min Degree Spanning Tree

Ant Colony Optimization (ACO)

Design and Analysis of ACO algorithms for edge matching problems

2.3 Convex Constrained Optimization Problems

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

A Turán Type Problem Concerning the Powers of the Degrees of a Graph

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

24. The Branch and Bound Method

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

An Empirical Study of Two MIS Algorithms

The Relative Worst Order Ratio for On-Line Algorithms

LOOKING FOR A GOOD TIME TO BET

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

1 Formulating The Low Degree Testing Problem

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

Broadcasting in Wireless Networks

Large induced subgraphs with all degrees odd

Routing in Line Planning for Public Transport

The Student-Project Allocation Problem

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Lecture 1: Course overview, circuits, and formulas

Factoring & Primality

IN THIS PAPER, we study the delay and capacity trade-offs

THE SCHEDULING OF MAINTENANCE SERVICE

A 2-factor in which each cycle has long length in claw-free graphs

Load Balancing. Load Balancing 1 / 24

Fairness in Routing and Load Balancing

D-optimal plans in observational studies

1 Approximating Set Cover

Adaptive Linear Programming Decoding

School Timetabling in Theory and Practice

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

Private Approximation of Clustering and Vertex Cover

A New Approach to Dynamic All Pairs Shortest Paths

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Cycles and clique-minors in expanders

Master s Theory Exam Spring 2006

Lecture 4: AC 0 lower bounds and pseudorandomness

Distributed Computing over Communication Networks: Maximal Independent Set

LOGNORMAL MODEL FOR STOCK PRICES

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

How To Calculate Max Likelihood For A Software Reliability Model

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Semi-Matchings for Bipartite Graphs and Load Balancing

9.2 Summation Notation

20 Selfish Load Balancing

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

On an anti-ramsey type result

Online and Offline Selling in Limit Order Markets

Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs

Diagonalization. Ahto Buldas. Lecture 3 of Complexity Theory October 8, Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach.

Swarm Intelligence Algorithms Parameter Tuning

An example of a computable

Analysis of Algorithms, I

Shortest Inspection-Path. Queries in Simple Polygons

Smooth functions statistics

Follow the Perturbed Leader

Supplement to Call Centers with Delay Information: Models and Insights

Graphs without proper subgraphs of minimum degree 3 and short cycles

Strategic planning in LTL logistics increasing the capacity utilization of trucks

Influences in low-degree polynomials

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ON THE COMPLEXITY OF THE GAME OF SET.

Using Ant Colony Optimization for Infrastructure Maintenance Scheduling

Key words. multi-objective optimization, approximate Pareto set, bi-objective shortest path

How Asymmetry Helps Load Balancing

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA

FIRST YEAR CALCULUS. Chapter 7 CONTINUITY. It is a parabola, and we can draw this parabola without lifting our pencil from the paper.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Lecture 2: Universality

How To Balance In A Distributed System

Labeling outerplanar graphs with maximum degree three

Transcription:

Noname manuscript No. (will be inserted by the editor) Analysis of Different MMAS ACO Algorithms on Unimodal Functions and Plateaus Frank Neumann Dirk Sudholt Carsten Witt the date of receipt and acceptance should be inserted later Abstract Recently, the first rigorous runtime analyses of ACO algorithms appeared, covering variants of the MAX-MIN ant system and their runtime on pseudo-boolean functions. Interestingly, a variant called 1-ANT is very sensitive to the evaporation factor while Gutjahr and Sebastiani proved partly opposite results for their variant called MMAS bs. These algorithms differ in their update mechanisms and, moreover, MMAS bs only accepts strict improvements in contrast to the 1-ANT. The motivation for this work is manifold. Firstly, we prove that the different behavior of the 1-ANT and MMAS bs results from the different update mechanisms. Secondly, we improve results by Gutjahr and Sebastiani and extend their analyses to the important class of unimodal functions. Thirdly, we compare MMAS bs with a variant that also accepts equally fit solutions as this enables the exploration of plateaus. For well-known plateau functions we show theoretically and experimentally that accepting equally fit solutions drastically reduces the optimization time. Keywords Ant colony optimization, MMAS, Runtime analysis, Theory A conference version appeared in SLS 2007 [21]. Frank Neumann Algorithms and Complexity, Max-Planck-Institut für Informatik, 66123 Saarbrücken, Germany fne@mpi-inf.mpg.de Dirk Sudholt FB Informatik, LS 2, Universität Dortmund, 44221 Dortmund, Germany dirk.sudholt@cs.uni-dortmund.de Carsten Witt FB Informatik, LS 2, Universität Dortmund, 44221 Dortmund, Germany carsten.witt@cs.uni-dortmund.de

2 1 Introduction Randomized search heuristics have been shown to be good problem solvers with various application domains. Two prominent examples belonging to this class of algorithms are Evolutionary Algorithms (EAs) and Ant Colony Optimization (ACO) [6]. Especially ACO algorithms have been shown to be very successful for solving problems from combinatorial optimization. Indeed, the first problem where an ACO algorithm was applied was the Traveling Salesperson Problem (TSP) [5], which is one of the most studied combinatorial problems in computer science. In contrast to many successful applications, theory lags far behind the practical evidence of all randomized search heuristics. In particular in the case of ACO algorithms, the analysis of such algorithms with respect to their runtime behavior has been started only recently. The analysis of randomized search heuristics (e. g., [7]) is carried out as in the classical algorithm community and makes use of several strong methods for the analysis of randomized algorithms [17,18]. Concerning evolutionary algorithms, a lot of progress has been made in recent years in analyzing their runtime behavior. Initial studies considered the behavior on artificial pseudo-boolean functions. This includes results on unimodal [7] or plateau functions [16] as well as investigations for the class of linear functions. Within these studies many useful tools have been developed that are at this time state of the art when analyzing the computational complexity of evolutionary algorithms. Later on, evolutionary algorithms were successfully analyzed on some well-known combinatorial optimization problems. Problems for which results have been obtained include the single-source shortest path problem [26], maximum matchings [10], Eulerian cycles [1, 19], different kinds of spanning tree problems [20, 22, 23] as well as the NP-hard partition problem [30]. Regarding ACO, only convergence results [11] were known until 2006 and analyzing the runtime of ACO algorithms has been pointed out as a challenging task in [4]. First steps into analyzing the runtime of ACO algorithms have been made by Gutjahr [13], and, independently, the first theorems on the runtime of a simple ACO algorithm called 1-ANT were obtained at the same time by Neumann and Witt [24]. Later on this algorithm was further investigated for the optimization of some well-known pseudo- Boolean functions [3]. A conclusion from these investigations is that the 1-ANT is very sensitive w.r. t. the choice of the evaporation factor ρ. Increasing the value of ρ only by a small amount may lead to a phase transition and turn an exponential runtime into a polynomial one. In contrast to this, a simple ACO algorithm called MMAS bs has been investigated in a recent article by Gutjahr and Sebastiani [15] where this phase transition does not occur. The 1-ANT and MMAS bs differ in their pheromone update mechanisms as well as in their selection strategies. While MMAS bs reinforces the current best-so-far solution in each iteration, the 1-ANT only updates its pheromones in case the best-so-far solution is replaced. Regarding the selection, MMAS bs only accepts strict improvements while the 1-ANT also accepts solutions with equal fitness. Gutjahr [14] conjectures that the different behavior of MMAS bs and 1-ANT is due to their different selection strategies. We will disprove this conjecture and show that the different behavior results from the different pheromone update mechanisms. The 1-ANT needs exponential time to optimize even very simple functions like OneMax in case the evaporation factor is set too low [3,24]. Hence the update mechanism of MMAS bs with pheromone updates in each iteration is to be preferred over the

3 Construct(C, τ) 1. v := s, mark v as visited. 2. Let N v be the set of non-visited successors of v in C. If N v then a) Choose successor w N v with probability τ (v,w) / (v,u) u N v τ (v,u). b) Mark w as visited, set v := w and go to 2. 3. Return the solution x and the path P(x) constructed by this procedure. Fig. 1 The construction procedure for pseudo-boolean optimization. mechanism used by the 1-ANT. This motivates us to study MMAS variants with this more persistent update scheme. First, we consider the MMAS algorithm by Gutjahr and Sebastiani [15], in the present paper referred to as MMAS*, and show improved and extended results. In particular, we make use of the method called fitness-based partitions which is well-known from the analysis of evolutionary algorithms. This allows us to improve previous runtime bounds and to prove general results on the important class of unimodal functions. Nevertheless, the selection strategy does make a significant difference for MMAS*. Accepting equally fit solutions allows MMAS algorithms to explore plateaus of solutions with equal fitness. We compare the runtime behavior of MMAS* with a variant MMAS that accepts solutions of equal fitness on functions with plateaus of exponential and polynomial size, respectively. The analyses reveal that MMAS clearly outperforms MMAS* on both functions. As our theoretical results are purely asymptotic, they are accompanied by experimental results. The empirical data shows that the observed performance difference between MMAS and MMAS* is huge, even for small problem dimensions. The remainder of the paper is structured as follows. In Section 2, we introduce the above-mentioned algorithms that are subject of our investigations. In Section 3, we compare the 1-ANT with MMAS* and MMAS on unimodal functions. Section 4 then compares MMAS* with MMAS on plateau functions, supplemented by additional experiments in Section 5. We finish with some conclusions. 2 Algorithms We consider the runtime behavior of several ACO algorithms that work by a common principle to construct new solutions. Solutions for a given problem, bit strings x {0, 1} n for pseudo-boolean functions f : {0, 1} n R, are constructed by a random walk on a so-called construction graph C = (V, E), which is a directed graph with a designated start vertex s V and pheromone values τ : E R on the edges. The construction procedure is shown in Figure 1. We examine the construction graph given in Figure 2, which is known in the literature as Chain [12]. For bit strings of length n, the graph has 3n + 1 vertices and 4n edges. The decision whether a bit x i, 1 i n, is set to 1 is made at node v 3(i 1). If edge (v 3(i 1), v 3(i 1)+1 ) (called 1-edge) is chosen, x i is set to 1 in the constructed solution. Otherwise the corresponding 0-edge is taken, and x i = 0 holds. After this decision has been made, there is only one single edge which can be traversed in the next step. In case that (v 3(i 1), v 3(i 1)+1 ) has been chosen, the next edge is (v 3(i 1)+1, v 3i ), and otherwise the edge (v 3(i 1)+2, v 3i ) will be traversed.

4 v 1 v 4 v 7 v 3(n 1)+1 v 0 x 1 v 3 x 2... v 6 x 3 v 9 v 3(n 1) x n v 3n v 2 v 5 v 8 v 3(n 1)+2 Fig. 2 Construction graph for pseudo-boolean optimization. 1-ANT 1. Set τ (u,v) = 1/2 for all (u, v) A. 2. Create x using Construct(C, τ). 3. Update pheromones w.r.t. x. 4. Repeat a) Create x using Construct(C, τ). b) If f(x) f(x ) then i) x := x. ii) Update pheromones w.r.t. x. Fig. 3 The 1-ANT defined by Neumann and Witt [24]. Note that a pheromone update is only performed in case the best-so-far solution x is replaced. Hence, these edges have no influence on the constructed solution and we can assume τ (v3(i 1),v 3(i 1)+1 ) = τ (v3(i 1)+1,v 3i) and τ (v3(i 1),v 3(i 1)+2 ) = τ (v3(i 1)+2,v 3i) for 1 i n. We ensure that (u, ) E τ (u, ) = 1 for u = v 3i, 0 i n 1, and (,v) τ (,v) = 1 for v = v 3i, 1 i n. Let p i = Prob(x i = 1) be the probability of setting the bit x i to 1 in the next constructed solution. Due to our setting p i = τ (3(i 1),3(i 1)+1) and 1 p i = τ (3(i 1),3(i 1)+2) holds, i. e., the pheromone values correspond directly to the probabilities for choosing the bits in the constructed solution. In addition, following the MAX-MIN ant system by Stützle and Hoos [27], we restrict each τ (u,v) to the interval [1/n, 1 1/n] such that every solution always has a positive probability of being chosen. Depending on whether the edge (u, v) is contained in the path P(x) of the constructed solution x, the pheromone values are updated to τ in the update procedure as follows: { { τ (u,v) min (1 ρ) = τ(u,v) + ρ, 1 1/n } if (u, v) P(x) max { (1 ρ) τ (u,v), 1/n } otherwise. The first algorithm that has been investigated in terms of rigorous runtime analysis is the 1-ANT introduced by Neumann and Witt [24], which is shown in Figure 3. The attentive reader may notice that in the present paper the pheromone values have been rescaled in comparison to [24]. This rescaling was independently proposed by Doerr and Johannsen [2]. The 1-ANT only performs a pheromone update in case the best-so-far solution x is replaced. If ρ is large, such an update has a large effect on the pheromone values. In that case, the 1-ANT is able to reproduce the new best-so-far solution x quite efficiently and to discover better solutions in the region around x. However, if ρ is small, updates are rare and they only have a small effect on the pheromones. This makes it hard for the 1-ANT to focus on promising regions of the search space. The analyses by Neumann and Witt [24] and Doerr, Neumann, Sudholt and Witt [3] have

5 MMAS* 1. Set τ (u,v) = 1/2 for all (u, v) A. 2. Create x using Construct(C, τ). 3. Update pheromones w.r.t. x. 4. Repeat a) Create x using Construct(C, τ). b) If f(x) > f(x ) then x := x. c) Update pheromones w.r.t. x. MMAS 1. Set τ (u,v) = 1/2 for all (u, v) A. 2. Create x using Construct(C, τ). 3. Update pheromones w.r.t. x. 4. Repeat a) Create x using Construct(C, τ). b) If f(x) f(x ) then x := x. c) Update pheromones w.r.t. x. Fig. 4 The algorithms MMAS* and MMAS, differing only in the acceptance criterion in step 4.b). revealed that the 1-ANT is unable to keep track of the best-so-far solution. As the 1-ANT discovers better and better best-so-far solutions, the hurdles for constructed solutions are set higher and higher. If the update strength is too small, the algorithm fails in storing the knowledge gained by new best-so-far solutions in the pheromones. The probability of reproducing the current best-so-far solution or finding a better one decreases and the 1-ANT slowly gets stuck. In effect, even for really simple functions like OneMax, the 1-ANT needs exponential time with overwhelming probability if ρ is small, e. g., less than 1/log n. Neumann and Witt [24] discovered a phase-transition behavior for the 1-ANT on OneMax. The expected runtime of the 1-ANT turns from exponential to polynomial values with increasing ρ when a certain threshold is passed. The same behavior has also been proved for two other test functions by Doerr, Neumann, Sudholt and Witt [3]. In a recent article [15], Gutjahr and Sebastiani investigate a very similar ACO algorithm where, surprisingly, this phase transition behavior does not occur. Therefore, one aim of the present article is to elaborate on the different behavior of these two algorithms. Gutjahr and Sebastiani call their algorithm MMAS bs, we refer to it as MMAS*. This algorithm is shown on the left-hand side of Figure 4. Compared to the 1-ANT, there are two differences. First of all, MMAS* only accepts strict improvements, i. e., new solutions x with f(x) = f(x ) are rejected by MMAS* while they are accepted by the 1-ANT. A second difference is that MMAS* reinforces the best-so-far solution in every generation, regardless whether it has been replaced or not. Hence in generations without improvements the pheromones are changed towards the best-so-far solution in MMAS* while the pheromones remain unchanged in the 1-ANT. Gutjahr [14] conjectures that the different behavior of MMAS bs and 1-ANT depends on the former difference: the fact that MMAS bs only accepts strict improvements. Therefore, we also investigate a variant of MMAS bs that accepts equally fit solutions. We call this algorithm MMAS, it is displayed in the right-hand half of Figure 4. In Section 3, we will analyze the runtime behavior of MMAS* and MMAS on simple unimodal functions. These results are then compared to known results for the 1-ANT. Moreover, we extend and simplify results by Gutjahr and Sebastiani [15] and show how to transfer a popular and effective method for the analysis of evolutionary algorithms to the analysis of MMAS*. ACO algorithms like MMAS* are not far away from evolutionary algorithms. If the value of ρ is chosen large enough in MMAS*, the pheromone borders 1/n or 1 1/n are touched for every bit of the rewarded solution. In this case, MMAS* equals the algorithm called (1+1) EA*, which is known from the analysis of evolutionary algorithms [16].

6 (1+1) EA* 1. Choose x uniformly at random. 2. Repeat a) Create x by flipping each bit of x independently with probability 1/n. b) If f(x) > f(x ) then x := x. (1+1) EA 1. Choose x uniformly at random. 2. Repeat a) Create x by flipping each bit of x independently with probability 1/n. b) If f(x) f(x ) then x := x. Fig. 5 The algorithms (1+1) EA* and (1+1) EA, differing only in the acceptance criterion in step 2.b). As already pointed out in [16], the (1+1) EA* has difficulties with simple plateaus of constant fitness as no search points of the same fitness as the so far best one are accepted. Accepting solutions with equal fitness enables the algorithm to explore plateaus by random walks. Therefore, it seems more natural to replace search points by new solutions that are at least as good. In the case of evolutionary algorithms, this leads to the well-known (1+1) EA which differs from the (1+1) EA* only in step 2.b) of the algorithm. Both the (1+1) EA* and the (1+1) EA are shown in Figure 5. By essentially the same arguments, we also expect MMAS* to outperform MMAS on plateau functions. The corresponding runtime analyses are presented in Section 4. For the analysis of an algorithm, we consider the number of solutions that are constructed by the algorithm until an optimal solution has been obtained for the first time. This is called the optimization time of the algorithm and is a well-accepted measure in the analysis of evolutionary algorithms since each point of time corresponds to a fitness evaluation. Often the expectation of this value is considered and called the expected optimization time. 3 Unimodal Functions, OneMax, and LeadingOnes In the following, we derive upper bounds on the expected optimization time of MMAS* and MMAS on unimodal functions, especially OneMax and LeadingOnes. These results can then be compared to previous results for the 1-ANT [3, 24]. In contrast to the 1-ANT, there is a similarity between MMAS* and evolutionary algorithms that can be exploited to obtain good upper bounds. Suppose that during a run there is a phase where MMAS* never replaces the best-so-far solution x in step 4.b) of the algorithm. This implies that the best-so-far solution is rewarded again and again until all pheromone values have reached their upper or lower borders corresponding to the setting of the bits in x. We can say that x has been frozen in pheromone. The probability to create a 1 for any bit is now either 1/n or 1 1/n. The distribution of constructed solutions equals the distribution of offspring of the (1+1) EA* and (1+1) EA with x as the current search point. We conclude that, as soon as all pheromone values touch their upper or lower bounds, MMAS* behaves like the (1+1) EA* until a solution with larger fitness is encountered. This similarity between ACO and EAs can be used to transfer a well-known method for the runtime analysis from EAs to ACO. Consider the random time t until all pheromones have reached their bounds. We will refer to this random time as freezing time. Gutjahr and Sebastiani [15] bound t from above by ln(n 1)/ln(1 ρ). This holds since a pheromone value which is only increased during t steps is at least min{1 1/n, 1 (1 1/n)(1 ρ) t } after the iterations, pessimistically assuming the worst-case initialization 1/n for this value and 1 1/n for

7 the complementary pheromone value. In the present paper, we use ln(1 ρ) ρ for 0 ρ 1 and arrive at the handy upper bound t ln n ρ. (1) Following Gutjahr and Sebastiani [15], we can now use the freezing time t to derive a general upper bound on the expected optimization time of MMAS*. This bound makes use of the well-known fitness-level method, also called the method of f-based partitions, from the analysis of evolutionary algorithms (see, e. g., [28]). We present a restricted formulation. Let f 1 < f 2 < < f m be an enumeration of all fitness values and let A i, 1 i m contain all search points with fitness f i. In particular, A m contains only optimal search points. Now, let s i, 1 i m 1, be a lower bound on the probability of the (1+1) EA (or, in this case equivalently, the (1+1) EA*) to create an offspring in A i+1 A m, provided the current population belongs to A i. The expected waiting time until such an offspring is created is at most 1/s i and then the set A i is left for good. As every set A i has to be left at most once, the expected optimization time for the (1+1) EA and the (1+1) EA* is bounded above by m 1 i=1 1 s i. (2) Consider t steps of MMAS* and assume x A i. Either the best-so-far fitness increases during this period or all pheromone values are frozen. In the latter case, the probability to create a solution in A i+1 A m is at least s i and the expected time until the best-so-far fitness increases is t +1/s i. We arrive at the following upper bound for MMAS*: m 1 i=1 (t + 1si ). Note that this is a special case of Eq. (13) in [15]. Since t (ln n)/ρ, we obtain the more concrete bound m 1 m ln n 1 +. (3) ρ s i=1 i The right-hand sum is the upper bound obtained for the (1+1) EA and (1+1) EA* from (2). Applying the fitness-level method to MMAS*, we obtain upper bounds that are only by an additive term (m ln n)/ρ larger than the corresponding bounds for (1+1) EA and (1+1) EA*. This additional term results from the (pessimistic) assumption that on all fitness levels MMAS* has to wait until all pheromones are frozen in order to find a better solution. We will see examples where for large ρ this bound is of the same order of growth as the bound for (1+1) EA and (1+1) EA*. However, if ρ is very small, the bound for MMAS* typically grows large. This reflects the long time needed for MMAS* to move away from the initial random search and to focus on promising regions of the search space. In the article by Gutjahr and Sebastiani [15], the proposed fitness-level method is basically applied in the context of the unimodal functions OneMax and Leading- Ones. Our aim is to show how the method can be applied to arbitrary unimodal functions both w. r. t. MMAS and MMAS*. Moreover, we generalize the upper bounds obtained in [15] for the example functions OneMax and LeadingOnes and, for the

8 first time, we show a lower bound on the expected optimization time of MMAS* on LeadingOnes. This allows us to conclude that the fitness-level method can provide almost tight upper bounds. 3.1 General Results Unimodal functions are an important and well-studied class of fitness functions in the literature on evolutionary computation (e. g., [7]). For the sake of completeness, we repeat the definition of unimodality for pseudo-boolean fitness functions. Definition 1 A function f : {0, 1} n R is called unimodal if there exists for each non-optimal search point x a Hamming neighbor x where f(x ) > f(x). Unimodal functions are often believed to be easy to optimize. This holds if the set of different fitness values is not too large. In the following, we consider unimodal functions attaining D different fitness values. Such a function is optimized by the (1+1) EA and (1+1) EA* in expected time O(nD). This bound is transferred to MMAS* by the following theorem. Theorem 1 The expected optimization time of MMAS* on a unimodal function attaining D different fitness values is O((n + log n/ρ)d). Proof By the introductory argument, we only have to set up an appropriate fitnessbased partition. We choose the D sets of preimages of different fitness values. By the unimodality, there is for each current search point x a better Hamming neighbor x of x in a higher fitness-level set. The probability of the (1+1) EA (or, equivalently, MMAS* with all pheromone values at a border) to produce x in the next step is Ω(1/n). By (3), this completes the proof. In order to freeze pheromones after t steps without an improvement, it is essential that equally good solutions are rejected. The fitness-level argumentation, including the bound from (3), cannot directly be transferred to MMAS as switching between equally fit solutions can prevent the system from freezing. Nevertheless, we are able to prove a similar upper bound on the optimization time of MMAS that is essentially by a factor of n 2 worse than the bound for MMAS* in Theorem 1. Theorem 2 The expected optimization time of MMAS on a unimodal function attaining D different fitness values is O(((n 2 log n)/ρ)d). Proof We only need to show that the expected time for an improvement of the best-sofar solution is at most O((n 2 log n)/ρ). The probability that MMAS produces within O((log n)/ρ) steps a solution being at least as good as (not necessarily better than) the best-so-far solution x is Ω(1) since after at most (ln n)/ρ steps without exchanging x all pheromone values have touched their bounds and then the probability of rediscovering x is Ω(1). We now show that the conditional probability of an improvement if x is replaced is Ω(1/n 2 ). Let x 1,..., x m be an enumeration of all solutions with fitness values equal to the best-so-far fitness value. Due to unimodality, each x i, 1 i m, has some better Hamming neighbor y i ; however, the y i need not be disjoint. Let X and Y denote the event to generate some x i or some y i, resp., in the next step. In the worst case

9 y 1,..., y m are the only possible improvements, hence the theorem follows if we can show Prob(Y X Y ) 1/n 2 which is equivalent to Prob(Y ) Prob(X)/(n 2 1). If p(x i ) is the probability to construct x i, we have p(x i )/p(y i ) (1 1 n )/ 1 n = n 1 as the constructions only differ in one bit. Each y i may appear up to n times in the sequence y 1,..., y m, hence Prob(Y ) 1 n m i=1 Prob(y i) and m m Prob(X) = p(x i ) (n 1) p(y i ) n(n 1) Prob(Y ). i=1 i=1 Therefore, Prob(Y ) Prob(X)/(n 2 1) follows. Theorems 1 and 2 show that the expected optimization times of both MMAS and MMAS* are polynomial for all unimodal functions as long as D = poly(n) and ρ = 1/poly(n). Since MMAS and the 1-ANT do not differ in their replacement strategy, this disproves the conjecture in [14] that accepting equally good solutions leads to the phase transition behavior from polynomial to exponential runtimes of the 1-ANT on the following example functions. 3.2 OneMax The probably most often studied example function in the literature on evolutionary computation is the unimodal function OneMax(x) = x 1 + + x n. In the runtime analysis of the 1-ANT on OneMax by Neumann and Witt [24], it is shown that there exists a threshold value for ρ (in our notation basically ρ = O(1/n ǫ ) for some small constant ǫ > 0) below which no polynomial runtime is possible. As argued above, due to Theorems 1 and 2, this phase transition can occur neither with MMAS* nor with MMAS. The popularity of the function OneMax justifies a closer look at the expected optimization time of MMAS*. Theorem 1 yields an upper bound of O(n 2 +(n log n)/ρ). However, this bound can easily be improved by a direct application of the fitness-level method. The following theorem has already been proven for some values of ρ in [15]. We however obtain polynomial upper bounds for all ρ bounded by some inverse polynomial. Theorem 3 The expected optimization time of MMAS* on OneMax is bounded from above by O((n log n)/ρ). Proof The proof is an application of the above-described fitness-level method with respect to the fitness-level sets A i = {x f(x) = i}, 0 i n. Using the arguments in [15] or, equivalently, the upper bound on the expected runtime of the (1+1) EA on OneMax [7] we obtain that the term m 1 i=0 1/(s i) is O(n log n). Using (3), the upper bound O((n log n)/ρ) follows. The bound is never better than Θ(n log n), which describes the expected runtime of the (1+1) EA and (1+1) EA* on OneMax. At the moment, we are not able to show a matching lower bound Ω(nlog n) on the expected optimization time of MMAS*; however, we can show that the expected optimization time is growing with respect to 1/ρ as the upper bound suggests. We state our result in a more general framework: as known from the considerations by Droste, Jansen and Wegener [7], the mutation probability 1/n of the (1+1) EA is optimal for many functions including OneMax.

10 One argument is that the probability mass has to be quite concentrated around the best-so-far solution to allow the (1+1) EA to rediscover the last accepted solution with good probability. Given a mutation probability of α(n), this probability of rediscovery equals (1 α(n)) n, which converges to zero unless α(n) = O(1/n). The following lemma exploits the last observation for a general lower bound on the expected optimization time of both MMAS and MMAS*. Theorem 4 Let f : {0, 1} n R be a function with a unique global optimum. Choosing ρ = 1/poly(n), the expected optimization time of MMAS and MMAS* on f is Ω((log n)/ρ). Proof W. l. o. g., 1 n is the unique optimum. If, for each bit, the success probability (defined as the probability of creating a one) is bounded from above by 1 1/ n then the solution 1 n is created with only exponentially small probability (1 1/ n) n e n. Using the uniform initialization and pheromone update formula of MMAS and MMAS*, the success probability of a bit after t steps is bounded from above by 1 (1 ρ) t /2. Hence, all success probabilities are bounded as desired within t = (ln(n/4))/(2ρ) steps since 1 1 n ln 4)/2 2 (1 ρ)t e (ln 1 = 1 1. 2 n Since ρ = 1/poly(n) and, therefore t = poly(n), the total probability of creating the optimum in t steps is still at most te n = e Ω( n), implying the lower bound on the expected optimization time. Hence, the expected optimization time of MMAS* on OneMax is bounded by Ω((log n)/ρ). It is an open problem to show matching upper and lower bounds. We conjecture that the lower bound for OneMax is far from optimal and that Ω(n/ρ + n log n) holds. 3.3 LeadingOnes Another prominent unimodal example function, proposed by Rudolph [25], is LeadingOnes(x) = n i=1 j=1 i x j, whose function value equals the number of leading ones in the considered bit string x. A non-optimal solution may always be improved by appending a single one to the leading ones. LeadingOnes differs from OneMax in the essential way that the assignment of the bits after the leading ones do not contribute to the function value. This implies that bits at the beginning of the bit string have a stronger influence on the function value than bits at the end. Because of this, the methods developed by Neumann and Witt [24] for the analysis of the 1-ANT on OneMax cannot be used for analyzing the 1-ANT on LeadingOnes as these methods make particular use of the fact that all bits contribute equally to the function value. In a follow-up paper by Doerr, Neumann, Sudholt and Witt [3], the 1-ANT is studied on LeadingOnes by different techniques and it is shown that a similar phase transition behavior as on OneMax exists: for ρ = o(1/log n) (again using the notation of the present paper), the expected optimization time of the 1-ANT

11 is superpolynomially large whereas it is polynomial for ρ = Ω(1/log n) and even only O(n 2 ) for ρ = Ω(1). We already know that this phase transition cannot occur with MMAS* and MMAS on LeadingOnes. The following theorem, special cases of which are contained in [15], shows a specific upper bound for MMAS*. Theorem 5 The expected optimization time of MMAS* on LeadingOnes is bounded from above by O((n log n)/ρ + n 2 ). Proof The theorem follows again by the bound (3). We use the same fitness-based partition as in the proof of Theorem 3 and the expected optimization time O(n 2 ) of the (1+1) EA on LeadingOnes [7]. It is interesting that an almost tight lower bound can be derived. The following theorem shows that the expected optimization time of MMAS* on LeadingOnes is Ω(n/ρ + n 2 ), hence never better than Ω(n 2 ). The proof is lengthy, however, for the case of large ρ, one essential idea is easy to grasp: already in the early stages of the optimization process, many, more precisely Ω(n), pheromone values on 1-edges reach their lower borders 1/n, and the corresponding bits are set to 0. To flip such a bit, events of probability 1/n are necessary. This can be transformed into the lower bound Ω(n 2 ) on the expected optimization time. Theorem 6 Choosing ρ = 1/poly(n), the expected optimization time of MMAS* on LeadingOnes is bounded from below by Ω(n/ρ + n 2 ). Proof We first show the lower bound Ω(n/ρ) on the expected optimization time. Afterwards, this bound is used to prove the second bound Ω(n 2 ). Throughout the proof, we consider only runs of polynomial length since by the assumption ρ = 1/poly(n) the lower bounds to show are both polynomial. This allows us to ignore events with exponentially small probabilities. For the first part, we use the observation by Doerr, Neumann, Sudholt and Witt [3] on the pheromone values outside the block of leadings ones. If the best-so-far Leading- Ones-value equals k, the pheromone values corresponding to the bits k +2,..., n have never contributed to the LO-value implying that each of these bits is unbiasedly set to 1 with probability exactly 1/2 in the next constructed solution. Note that these pheromone values are not necessarily martingales, however, the probability distribution of such a pheromone value is symmetric with expectation 1/2. Hence we distinguish two states for a bit: if it is right of the leftmost zero in the best-so-far solution, its expected pheromone values on 1- and 0-edges equal 1/2; if it is left of the leftmost zero, the pheromone value of its 1-edges are monotonically increasing in each iteration until the border 1 1/n is reached. We call the first state the random state and the second one the increasing state. While all bits in the block n/4 + 1,..., n/2 are in random state, the probability of obtaining a LeadingOnes-value of at least n/2 is bounded from above by 2 Ω(n). Hence, with probability 1 2 Ω(n), the LeadingOnes-value is in the interval [n/4, n/2] at some point of time. The randomness of the bits after the leftmost zero allows us to apply the standard free-rider arguments by Droste, Jansen and Wegener [7]. Hence, with probability 1 2 Ω(n), at least n/12 improvements of the LeadingOnes-value have occurred when it has entered the interval [n/4 + 1, n/2]. This already completes the first part of the proof if ρ = Ω(1). In the following, we therefore study the case ρ = o(1). Assume for some arbitrarily slowly increasing function α(n) = ω(1) that only

12 n/(α(n) 2 ρ) iterations have happened until the first n/12 improvements are done. Then it must hold (by the pigeon-hole principle) that at most n/α(n) times, the number of iterations between two consecutive improvements is large, which is defined as: not bounded by O(1/(α(n)ρ)). Furthermore, by another pigeon-hole-principle argument, there must be at least n/α(n) independent so-called fast phases defined as follows: each fast phase consists of at least r := α(n)/24 consecutive improvements between which the number of iterations is never large, i. e., each time bounded by O(1/(α(n)ρ)). (For a proof, consider a fixed subdivision of n/12 improvements into blocks of r consecutive improvements and show that at most half of these blocks can contain a large number of iterations between some two consecutive improvements.) In the following, we will show that a fast phase has probability o(1). This contradicts (up to failure probabilities of 2 Ω(n/α(n)) ) the assumption we have at least Ω(n/α(n)) fast phases, hence the number of iterations for the first n/12 improvements cannot be bounded by n/(α(n) 2 ρ). Since α(n) can be made arbitrarily slowly increasing, we obtain the lower bound Ω(n/ρ) on the expected optimization time. Consider the event that a fast phase containing r improvements is sufficient to set at least r bits into the increasing state (the phase is called successful then). In the beginning, all these bits are in random state. Hence, with a failure probability at most 2 Ω(r), less than r/4 pheromone values on the corresponding 1-edges (in the following called success probabilities) are at most 1/2. We assume r/4 bits with this property and estimate the probability of setting all these bits to 1 simultaneously in at least one improving step until the end of the phase (which is necessary for the phase to be successful). The success probability of a bit with initial pheromone value 1/2 is still at most p t := 1 (1 ρ) t /2 if it has been only in increasing state for t steps. The total number of iterations in the phase is O(1/ρ) by the definition of a fast phase. Hence, by the end of the phase, all considered success probabilities are at most 1 (1 ρ) O(1/ρ) /2 = 1 Ω(1). The probability of a single improving step setting the r/4 bits to 1 is therefore at most (1 Ω(1)) r/4 = 2 Ω(r). Adding up over all r improving steps and taking into account the above failure probability, the probability of the phase being successful is at most r2 Ω(r) + 2 Ω(r) = 2 Ω(α(n)) = o(1) as suggested. Having proved that the expected number of steps for n/12 improvements is Ω(n/ρ), we conclude that there is a constant c > 0 such that the time between two improvements is at least c/ρ with probability at least 1/2. Otherwise Chernoff bounds would (up to exponentially small failure probabilities) contradict the bound Ω(n/ρ). We exploit this to show that with high probability, a linear number of bits in random state reaches the lower border 1/n on the success probability during the optimization process. This will prove the second lower bound Ω(n 2 ) on the expected optimization time. Consider a bit in random state and a phase of t (ln n)/ρ iterations. We are interested in the event that the bit is set to zero throughout all improvements of the phase, which implies that the success probability is 1/n until the end of the phase (the phase is finished prematurely if the border 1/n is reached in less than t steps). This event has probability Ω(1) for the following reasons: let p 0 be the initial probability of setting the bit to zero and assume p 0 1/2 (which holds with probability at least 1/2). After t steps of the phase, this probability is at least p t := 1 (1 ρ) t /2 if the bit has only been set to zero in the improvements in the phase. If the time between two improvements (and, therefore, exchanges of the best-so-far solution) is always bounded from below by c/ρ, the considered event has probability at least t=1 p tc/ρ. Using 1 x e x for x R and 1 x e 2x for x 1/2, this term can be bounded from

13 below by ( 1 t=1 ) (1 ρ)tc/ρ 2 ) (1 e ct exp(e ct ) 2 t=1 t=1 ( ) ( = exp e ct e c ) = exp 1 e c = Ω(1). t=1 That the phase is of the desired length only with probability 1/2 is no problem either since we can bound the probability of setting the bit to 0 after a short phase by the probability in the beginning of the phase and argue like in the proof of Theorem 2 in [3]. The event of a short phase corresponds to the occurrence of a free-rider and the number of short phases is geometrically distributed with success probability at most 1/2. Hence, using the same calculations as in the mentioned proof, the probability of the desired event is at least t=1 p tc/ρ 2 p tc/ρ t=1 p tc/ρ 2 = Ω(1) as claimed. Using the observation from the last paragraph, it follows that with probability 1 2 Ω(n), at least Ω(n) bits in random state have reached success probability 1/n by the time the LeadingOnes-value enters the interval [n/4, n/2]. The only problem might be that these success probabilities might increase again. However, in each of the remaining improvements, we distinguish the events whether it is relevant for an improvement to set such a bit to 1 or not. If the bit is not relevant since it is still right of the leftmost zero after the improvement, its success probability does not change with probability 1 1/n. Hence, in O(n) improvements there are in expectation still Ω(n) success probabilities equal to 1/n left. Since, with at least constant probability, Ω(n) improvements are necessary, the lower bound Ω(n 2 ) on the expected optimization time follows. In the proof, we had to carefully look at the random bits and to study the structure of the optimization process. It seems to be even harder to prove a corresponding lower bound for MMAS since accepting equally good solutions implies that more than n exchanges of the best-so-far solution can happen. Also additional ideas are required to transfer the proof of Theorem 6 and to obtain an improved lower bound for MMAS* on OneMax. Nevertheless, our analysis for LeadingOnes is important since it contains preliminary proof techniques for lower bounds on the runtime of MMAS algorithms. Moreover, it rigorously shows that the upper bounds derived by the fitness-level method can be asymptotically tight in special cases and almost tight for general ρ. 4 Plateau Functions The general upper bounds from Theorems 1 and 2 for unimodal functions yield a performance gap of only polynomial size between MMAS* and MMAS. This may give the impression that MMAS* and MMAS behave similarly. However, this only holds for functions with a certain gradient towards better solutions. On plateaus, MMAS* and MMAS can have a totally different behavior. Plateaus are regions in the search space where all search points have the same fitness. Consider a function f : {0, 1} n R and assume that the number of different objective values for that function is D. Then there are at least 2 n /D search points with the same objective vector. Often, the number of different objective values for a given function is polynomially bounded. This implies an exponential number of solutions with

14 the same objective value. In the extreme case, we end up with the function Needle where only one single solution has objective value 1 and the remaining ones get value 0. The function is defined as { 1 if x = xopt, Needle(x) := 0 otherwise, where x OPT is the unique global optimum. Gutjahr and Sebastiani [15] compare MMAS* and (1+1) EA* w.r. t. their runtime behavior. For suitable values of ρ that are exponentially small in n, the MMAS* has expected optimization time O(c n ), c 2 an appropriate constant, and beats the (1+1) EA*. The reason is that MMAS* behaves nearly as random search on the search space while the initial solution of the (1+1) EA* has Hamming distance n to the optimal one with probability 2 n. To obtain from such a solution an optimal one, all n bits have to flip, which has expected waiting time n n, leading in summary to an expected optimization time Ω((n/2) n ). In the following, we show a similar result for MMAS* if ρ decreases only polynomially with the problem dimension n. Theorem 7 Choosing ρ = 1/poly(n), the optimization time of MMAS* on Needle is at least (n/6) n with probability 1 e Ω(n). Proof Let x be the first solution constructed by MMAS* and denote by x OPT the optimal one. As it is chosen uniformly at random from the search space, the expected number of positions where x and x OPT differ is n/2 and there are at least n/3 such positions with probability 1 e Ω(n) using Chernoff bounds [17,18]. At these positions of x the wrong edges are rewarded as long as the optimal solution has not been obtained. This implies that the probability to obtain the optimal solution in the next step is at most 2 n/3. After at most t (ln n)/ρ (see Inequality (1)) iterations, the pheromone values of x have touched their borders provided x OPT has not been obtained. The probability of having obtained x OPT within a phase of t steps is at most t 2 n/3 = e Ω(n). Hence, the probability to produce a solution that touches its pheromone borders and differs from x OPT in at least n/3 positions before producing x OPT is 1 e Ω(n). In this case, the expected number of steps to produce x OPT is (n/3) n and the probability of having reached this goal within (n/6) n steps is at most 2 n. The probability to choose an initial solution x that differs from x OPT by n positions is 2 n, and in this case, after all n bits have reached their corresponding pheromone borders, the probability to create x OPT equals n n. Using the ideas of Theorem 7 the following corollary can be proved which asymptotically matches the lower bound for the (1+1) EA* given in [15]. Corollary 1 Choosing ρ = 1/poly(n), the expected optimization time of MMAS* on Needle is Ω((n/2) n ). It is well known that the (1+1) EA that accepts each new solution has expected optimization time O(2 n ) on Needle (see [8, 29]) even though it samples with high probability in the Hamming neighborhood of the latest solution. On the other hand, MMAS* will have a much larger optimization time unless ρ is superpolynomially small (Theorem 7). In the following, we will show that MMAS is, up to polynomial factors, competitive with the (1+1) EA even for large ρ-values.

15 Theorem 8 Choosing ρ = Ω(1), the expected optimization time of MMAS on Needle is O(poly(n)2 n ). Proof By the symmetry of the construction procedure and uniform initialization, we w. l. o. g. assume that the needle x OPT equals the all-ones string 1 n. As in [29], we study the process on the constant function f(x) = 0. The first hitting times for the needle are the same on Needle and the constant function while the invariant limit distribution for the constant function is easier to study since it is uniform over the search space. The proof idea is to study a kind of mixing time t(n) after which each bit is independently set to 1 with a probability of at least 1/2 1/n regardless of the initial pheromone value on its 1-edge. This implies that the probability of creating the needle is at least (1/2 1/n) n e 3 2 n (for n large enough) in some step after at most t(n) iterations. We successively consider independent phases of length t(n) until the needle is sampled. Estimating by a geometrically distributed waiting time, the expected optimization time is bounded by O(t(n) 2 n ). The theorem follows if we can show that t(n) = poly(n). To bound t(n), we note that the limit distribution of each pheromone value is symmetric with expectation 1/2, hence, each bit is set to 1 with probability 1/2 in the limit. We consider a coupling of two independent copies of the Markov chain for the pheromone values of a bit such that we start the chain from two different states. The task is to bound the coupling time c(n), defined as the first point of time where both chains meet in the same border state 1/n or 1 1/n (taking a supremum over any two different initial states). If we know that E(c(n)) is finite, then Markov s inequality yields that c(n) ne(c(n)) with probability at least 1 1/n, and the coupling lemma [17] implies that the total variation distance to the limit distribution is at most 1/n at time ne(c(n)). Hence, the bit is set to 1 with probability at least 1/2 1/n then and we can use t(n) := ne(c(n)). It remains to prove E(c(n)) = poly(n) for the considered coupling. Since ρ = Ω(1), the pheromone value of a bit reaches one of its borders in t = O(log n) (Inequality (1)) iterations with probability at least t t=0 (1 (1 1/n)(1 ρ)t ) = Ω(1/n) (using the same estimations for pheromone values as in [15]). Hence, with probability 1/2, one chain reaches a border in O(n log n) iterations. A pheromone value that is at a border does not change within O(n log n) iterations with probability at least (1 1/n) O(n log n) = 1/poly(n). In this phase, the pheromone value of the other chain reaches this border with probability Ω(1). Repeating independent phases, we have bounded E(c(n)) as required. The function Needle requires an exponential optimization time for each algorithm that has been considered. Often plateaus are much smaller, and randomized search heuristics have a good chance to leave them within a polynomial number of steps. Gutjahr and Sebastiani [15] consider the function NH-OneMax that consists of the Needle-function on k = log n bits and the function OneMax on n k bits, which can only be optimized if the needle has been found on the needle part. The function is defined as ( k ) n NH-OneMax(x) = x i x i. i=1 i=k+1 Taking into account the logarithmic size of the Needle-function of NH-OneMax, MMAS* with polylogarithmically small ρ cannot optimize the needle part within an

16 expected polynomial number of steps. The proof ideas are similar to the ones used in the proof of Theorem 7. After initialization the expected Hamming distance of the needle part to the needle is (log n)/2, and it is at least (log n)/3 with probability 1 o(1). As ρ = 1/polylog(n) holds, the pheromone borders of such a solution are touched before the needle has been found. After this situation has been reached, the expected number of steps to find the needle is Ω(n (log n)/3 ). Using the prescribed ideas the following theorem can be shown. Theorem 9 Choosing ρ = 1/polylog(n), the expected optimization time of MMAS* on NH-OneMax is 2 Ω(log2 n). As the needle part only consists of log n bits, MMAS with ρ = Ω(1) can find the needle after an expected number of O(poly(n)2 log n ) = O(poly(n)) steps (Theorem 8). After this goal has been achieved, the unimodal function OneMax has to be optimized. Together with our investigations for unimodal functions carried out in the previous section (in particular the upper bound from Theorem 2), the following result can be proved. Theorem 10 Choosing ρ = Ω(1), the expected optimization time of MMAS on NH- OneMax is polynomial. 5 Experiments We supplement our theoretical investigations from Section 4 by experiments. The time bounds presented in the previous sections are asymptotic, hence they do not reveal whether certain effects predicted for large n also occur for small problem dimensions. Moreover, experimental data can reveal details that are not captured by our theorems. 5.1 The Performance of MMAS on Needle First, we take a closer look at the MMAS on Needle for several values of ρ. In case ρ is large, i. e. ρ = Ω(1), Theorem 8 predicts an average runtime of poly(n)2 n. This is due to the fact that the process is rapidly mixing. If ρ is extremely large such that MMAS equals the (1+1) EA, the best-so-far solution performs a random walk through the search space and the expected time until the needle is found is about e/(e 1) 2 n 1.58 2 n due to Garnier, Kallel, and Schoenauer [9]. On the other hand, if ρ is extremely small, MMAS degenerates to almost random search, as exploited in the comparison of MMAS* and (1+1) EA* by Gutjahr and Sebastiani [15]. The expected time to find the needle is then close to 2 n. However, we are faced with the open question how MMAS behaves with intermediate values for ρ. We consider Needle for n {8, 16} and exponentially decreasing values for ρ, ρ = 2 1, 2 2, 2 3,... and measure the average runtime of MMAS during 100 independent runs for each setting. The results are shown in Figure 6. Consider the plot for n = 16. First, observe that the average runtime slightly decreases when decreasing ρ from 2 1 to 2 8. A possible explanation is that the (1+1) EA tends to resample the current solution, which happens with probability approximately 1/e. Hence, the expected waiting time until a new solution is sampled is about 1/(1 1/e) = e/(e 1), which explains the bound of about e/(e 1) 2 n by

17 1000 MMAS 2 n e/(e-1)*2 n generations 100 1e+06 2-19 2-17 2-15 2-13 2-11 2-9 2-7 2-5 2-3 2-1 ρ MMAS 2 n e/(e-1)*2 n generations 100000 10000 2-29 2-27 2-25 2-23 2-21 2-19 2-17 2-15 ρ 2-13 2-11 2-9 2-7 2-5 2-3 2-1 Fig. 6 The average runtime of MMAS on Needle with n = 8 (top) and n = 16 (bottom) and exponentially decreasing values of ρ. The data is averaged over 100 runs for each setting and drawn on logarithmic scales. Garnier, Kallel, and Schoenauer [9]. This bound also holds for MMAS with very large ρ. When decreasing ρ, the probability of resampling the current solution decreases and so does the average runtime. When further decreasing ρ, the average runtime increases very fast. We believe that then the mixing time (cf. proof of Theorem 8) increases with some function of 1/ρ. Due to the slow mixing, MMAS tends to spend a lot of time sampling the same region of the search space over and over again. If we are unlucky, MMAS focuses on regions that are far away from the needle where many (say Ω(n)) pheromone values hit the wrong bounds. In this case the probability of generating the needle is far below 2 n, yielding an overly large average optimization time. Lastly, at some point on the ρ-scale, MMAS suddenly turns into almost random search. Here, ρ is so small that typically the needle is found by random search before the pheromone values get the chance to move towards a pheromone bound. A comparison with the lower horizontal line indicates that the average runtime is highly concentrated around 2 n. For n = 8, MMAS shows a similar behavior, although the effects are less pronounced. We conclude that, surprisingly, MMAS shows a phase transition behavior.

18 Although MMAS is competetive with the (1+1) EA for very large and very small values of ρ, it is much more inefficient for intermediate values of ρ. 5.2 Comparing MMAS and MMAS* on Needle Now we compare the performance of MMAS* and MMAS on Needle. The results from Section 4 predict that MMAS clearly outperforms MMAS*. We keep the experimental setting described in the previous subsection and measure the average runtime for MMAS and MMAS*. After the random freezing time t, all pheromones in MMAS* are frozen until the end of the run. If k bits are frozen opposite to the needle bits, the probability to find the needle with the next constructed solution is exactly (1/n) k (1 1/n) n k. This probability is extremely small. For, say, n = 16 and k = 12, it is less than (1/16) 12 = 2 64 and the expected time needed to find the needle is larger than 2 64. As these values are far too large to allow a complete simulation, the optimization times of MMAS* with n = 16 are estimated by adding the expected time to find the needle to the observed freezing time. (In rare cases where the needle was found before all pheromones were frozen, the real optimization time was recorded.) The resulting values still contain randomness as k is a random variable and the freezing time is random. On the other hand the estimations yield a reduction of variance, which gives us an even better picture of the average performance than a complete simulation. The optimization times for n = 8 were obtained by real simulations, though. Figure 7 shows the resulting average optimization times. The data for MMAS is taken over from Figure 6, but the large peaks from Figure 6 now clearly pale in comparison with the average runtime of MMAS*. We see that MMAS and MMAS* behave almost identically for very small values of ρ, since then both algorithms degenerate to pure random search. Focussing on the remaining ρ-values, there is no doubt that MMAS* clearly outperforms MMAS for a broad range of ρ-values. For n = 8, MMAS* is by a factor of more than 100 slower than MMAS. For n = 16, the differences are even more drastic: the average optimization time of MMAS* is often larger than 10 13 while MMAS never exceeds 10 6. 5.3 Comparing MMAS and MMAS* on NH-OneMax We have seen that for Needle the best strategy is to choose ρ so small that MMAS and MMAS* degenerate to random search. This motivates us to also consider the function NH-OneMax. Due to the underlying OneMax-problem, we can expect MMAS and MMAS* to outperform pure random search with a sufficiently large value of ρ. We investigate NH-OneMax with n = 16, i. e., a needle consisting of 4 bits. The size of the needle may look small, but finding needles is more challenging than for Needle. Since the problem dimension is much larger compared to the size of the needle, the pheromone bounds are more extreme compared to Needle. As a consequence, it is harder to find the needle in case some bits have pheromones at bounds opposite to the needle. Figure 8 shows the average optimization times for MMAS and MMAS*. The results show that it is necessary to increase the value of ρ to deal with the OneMax-part of the function until ρ 2 13. Larger values of ρ increase the optimization time of MMAS* as this algorithm is not able to perform a random walk on the Needle-part of the

19 1e+06 MMAS * MMAS 100000 generations 10000 1000 100 1e+16 1e+14 2-19 2-17 2-15 2-13 2-11 2-9 2-7 2-5 2-3 2-1 ρ MMAS * (estimated) MMAS 1e+12 generations 1e+10 1e+08 1e+06 10000 2-29 2-27 2-25 2-23 2-21 2-19 2-17 2-15 2-13 2-11 2-9 2-7 2-5 2-3 2-1 ρ Fig. 7 The average runtimes of MMAS and MMAS* on Needle with n = 8 (top) and n = 16 (bottom) and exponentially decreasing values of ρ. The data is averaged over 100 runs for each setting and drawn on logarithmic scales. The runtime of MMAS* in case n = 16 is estimated by the (conditional) expected remaining optimization time once all pheromones are frozen. 100000 MMAS * MMAS 10000 generations 1000 100 2-33 2-31 2-29 2-27 2-25 2-23 2-21 2-19 2-17 2-15 2-13 2-11 2-9 2-7 2-5 2-3 2-1 ρ Fig. 8 The average runtimes of MMAS and MMAS* on NH-OneMax with n = 16 bits, containing 4 needle bits, and exponentially decreasing values of ρ. The data is averaged over 100 runs for each setting and drawn on logarithmic scales.

20 function. In contrast to this, MMAS also performs well for large values of ρ as the search for the needle is achieved by replacing equally good solutions. 6 Conclusions The rigorous runtime analysis of ACO algorithms is a challenging task where the first results have been obtained only recently. We have compared previous results for an MMAS variant called 1-ANT with known results for a similar MMAS algorithm called MMAS*. These two algorithms behave extremely different on simple functions in case the evaporation factor is small. Gutjahr [14] conjectured that this difference is due to the different selection mechanisms, i. e., the question whether to accept search point of equal fitness. Therefore, we investigated a variant of MMAS*, called MMAS, that accepts equally fit solutions, so MMAS and the 1-ANT only differ in their update mechanisms. A runtime analysis on unimodal functions revealed that MMAS behaves similar to MMAS*. Hence, we could attribute the different behavior of the 1-ANT and MMAS* to the update mechanism. Thereby, we improved and extended previous results by Gutjahr and Sebastiani [15]. We have shown how fitness-level arguments, a powerful and well-known tool from the analysis of evolutionary algorithms, can be transferred to the analysis of ACO algorithms. This shows that these research areas are closely linked and are able to crossfertilize. Moreover, we have obtained runtime analyses for the broad and important class of all unimodal functions. Another goal of this paper was to elaborate on the differences between MMAS* and MMAS. While these algorithms behave similarly on unimodal functions, they have different strategies to cope with plateaus of equal fitness. MMAS accepts search points of equal fitness, thus it is able to explore plateaus by a random walk of the best-so-far solution. For well-known plateau functions such as Needle, replacing equally fit solutions is essential. Both our asymptotical theoretical results and additional experiments clearly show that the performance gap between MMAS* and MMAS is huge. 7 Acknowledgements Dirk Sudholt and Carsten Witt were supported by the Deutsche Forschungsgemeinschaft (SFB) as a part of the Collaborative Research Center Computational Intelligence (SFB 531). References 1. Doerr, B., Hebbinghaus, N., Neumann, F.: Speeding up evolutionary algorithms through restricted mutation operators. In: Proc. of PPSN 06, LNCS, vol. 4193, pp. 978 987 (2006) 2. Doerr, B., Johannnsen, D.: Refined runtime analysis of a basic ant colony optimization algorithm. In: Proc. of CEC 07, pp. 501 507. IEEE Press (2007) 3. Doerr, B., Neumann, F., Sudholt, D., Witt, C.: On the runtime analysis of the 1-ANT ACO algorithm. In: Proc. of GECCO 07, pp. 33 40. ACM (2007). Best paper award. 4. Dorigo, M., Blum, C.: Ant colony optimization theory: A survey. Theor. Comput. Sci. 344, 243 278 (2005) 5. Dorigo, M., Maniezzo, V., Colorni, A.: The ant system: An autocatalytic optimizing process. Tech. Rep. 91-016 Revised, Politecnico di Milano (1991)

6. Dorigo, M., Stützle, T.: Ant Colony Optimization. MIT Press (2004) 7. Droste, S., Jansen, T., Wegener, I.: On the analysis of the (1+1) evolutionary algorithm. Theor. Comput. Sci. 276, 51 81 (2002) 8. Garnier, J., Kallel, L., Schoenauer, M.: Rigorous hitting times for binary mutations. Evolut. Comput. 7(2), 173 203 (1999) 9. Garnier, J., Kallel, L., Schoenauer, M.: Rigorous hitting times for binary mutations. Evolutionary Computation 7(2), 173 203 (1999) 10. Giel, O., Wegener, I.: Evolutionary algorithms and the maximum matching problem. In: Proc. of STACS 03, LNCS, vol. 2607, pp. 415 426 (2003) 11. Gutjahr, W.J.: ACO algorithms with guaranteed convergence to the optimal solution. Inform. Process. Lett. (82), 145 153 (2002) 12. Gutjahr, W.J.: On the finite-time dynamics of ant colony optimization. Methodol. Comput. Appli. Probab. (8), 105 133 (2006) 13. Gutjahr, W.J.: First steps to the runtime complexity analysis of Ant Colony Optimization. Comput. Oper. Res. (2007). (to appear) 14. Gutjahr, W.J.: Mathematical runtime analysis of ACO algorithms: Survey on an emerging issue. Swarm Intelligence (2007). (to appear) 15. Gutjahr, W.J., Sebastiani, G.: Runtime analysis of ant colony optimization with best-sofar reinforcement. Methodology and Computing in Applied Probability (2007). (to appear. Preprint available from http://homepage.univie.ac.at/walter.gutjahr/) 16. Jansen, T., Wegener, I.: Evolutionary algorithms - how to cope with plateaus of constant fitness and when to reject strings of the same fitness. IEEE Trans. Evolut. Comput. 5(6), 589 599 (2001) 17. Mitzenmacher, M., Upfal, E.: Probability and Computing Randomized Algorithms and Probabilistic Analysis. Cambr. Univ. Press (2005) 18. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambr. Univ. Press (1995) 19. Neumann, F.: Expected runtimes of evolutionary algorithms for the eulerian cycle problem. In: Proc. of CEC 04, vol. 1, pp. 904 910. IEEE Press (2004) 20. Neumann, F.: Expected runtimes of a simple evolutionary algorithm for the multi-objective minimum spanning tree problem. European Journal of Operational Research 181 (3), 1620 1629 (2007) 21. Neumann, F., Sudholt, D., Witt, C.: Comparing variants of MMAS ACO algorithms on pseudo-boolean functions. In: Proc. of SLS 2007, LNCS, vol. 4638, pp. 61 75 (2007) 22. Neumann, F., Wegener, I.: Minimum spanning trees made easier via multi-objective optimization. Natural Computing 5(3), 305 319 (2006) 23. Neumann, F., Wegener, I.: Randomized local search, evolutionary algorithms, and the minimum spanning tree problem. Theor. Comput. Sci. 378(1), 32 40 (2007) 24. Neumann, F., Witt, C.: Runtime analysis of a simple ant colony optimization algorithm. In: Proc. ISAAC 06, LNCS, vol. 4288, pp. 618 627. Springer (2006) 25. Rudolph, G.: Convergence Properties of Evolutionary Algorithms. Kovač (1997) 26. Scharnow, J., Tinnefeld, K., Wegener, I.: The analysis of evolutionary algorithms on sorting and shortest paths problems. Journal of Mathematical Modelling and Algorithms pp. 349 366 (2004) 27. Stützle, T., Hoos, H.H.: MAX-MIN ant system. J. Future Gener. Comput. Syst. 16, 889 914 (2000) 28. Wegener, I.: Methods for the analysis of evolutionary algorithms on pseudo-boolean functions. In: R. Sarker, X. Yao, M. Mohammadian (eds.) Evolutionary Optimization, pp. 349 369. Kluwer (2002) 29. Wegener, I., Witt, C.: On the optimization of monotone polynomials by simple randomized search heuristics. Combin. Probab. Comput. 14, 225 247 (2005) 30. Witt, C.: Worst-case and average-case approximations by simple randomized search heuristics. In: Proc. of STACS 05, LNCS, vol. 3404, pp. 44 56 (2005) 21