Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Similar documents
Smart Graphics: Methoden 3 Suche, Constraints

Integer Programming: Algorithms - 3

Inteligencia Artificial Representación del conocimiento a través de restricciones (continuación)

Performance Optimization of I-4 I 4 Gasoline Engine with Variable Valve Timing Using WAVE/iSIGHT

Guessing Game: NP-Complete?

Example Problems. Channel Routing. Lots of Chip Real-estate. Same connectivity, much less space. Slide 1. Slide 2

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Lecture. Simulation and optimization

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Stochastic Local Search Algorithms

Genetic Algorithm. Based on Darwinian Paradigm. Intrinsically a robust search and optimization mechanism. Conceptual Algorithm

Genetic Algorithms and Sudoku

Optimized Software Component Allocation On Clustered Application Servers. by Hsiauh-Tsyr Clara Chang, B.B., M.S., M.S.

Introduction To Genetic Algorithms

The QOOL Algorithm for fast Online Optimization of Multiple Degree of Freedom Robot Locomotion

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

(Refer Slide Time: 2:03)

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve

The Problem of Scheduling Technicians and Interventions in a Telecommunications Company

Polynomial and Rational Functions

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

STUDY OF PROJECT SCHEDULING AND RESOURCE ALLOCATION USING ANT COLONY OPTIMIZATION 1

International Journal of Software and Web Sciences (IJSWS)

PLAANN as a Classification Tool for Customer Intelligence in Banking

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Logistic Regression for Spam Filtering

Nurse Rostering. Jonathan Johannsen CS 537. Scheduling Algorithms

Optimal Tuning of PID Controller Using Meta Heuristic Approach

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Resource Allocation in a Client/Server System for Massive Multi-Player Online Games

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

D A T A M I N I N G C L A S S I F I C A T I O N

OPTIMIZATION TECHNIQUES AND AN INTRODUCTION TO GENETIC ALGORITHMS AND SIMULATED ANNEALING Dr. T. Ghose Dept. of EEE BIT, Mesra

Approximation Algorithms

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lab 4: 26 th March Exercise 1: Evolutionary algorithms

Empirically Identifying the Best Genetic Algorithm for Covering Array Generation

Creating, Solving, and Graphing Systems of Linear Equations and Linear Inequalities

Optimization Algorithms in Function of Binary Character Recognition

Blind Security Testing

Linear Programming. March 14, 2014

Optimizing Testing Efficiency with Error-Prone Path Identification and Genetic Algorithms

APP INVENTOR. Test Review

An Improved ACO Algorithm for Multicast Routing

Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project

f(x) = g(x), if x A h(x), if x B.

A Binary Model on the Basis of Imperialist Competitive Algorithm in Order to Solve the Problem of Knapsack 1-0

Near Optimal Solutions

CHAPTER 3 SECURITY CONSTRAINED OPTIMAL SHORT-TERM HYDROTHERMAL SCHEDULING

Lecture 3: Linear methods for classification

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Ch 8 Potential energy and Conservation of Energy. Question: 2, 3, 8, 9 Problems: 3, 9, 15, 21, 24, 25, 31, 32, 35, 41, 43, 47, 49, 53, 55, 63

The Network Structure of Hard Combinatorial Landscapes

International Journal of Advanced Research in Computer Science and Software Engineering

Lecture 8 February 4

The Dynamics of a Genetic Algorithm on a Model Hard Optimization Problem

Forces between charges

3 Some Integer Functions

Empirical Study of Tabu Search, Simulated Annealing and Multi-Start in Fieldbus Scheduling

6 Scalar, Stochastic, Discrete Dynamic Systems

Nonlinear Optimization: Algorithms 3: Interior-point methods

A Non-Linear Schema Theorem for Genetic Algorithms

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Solving Systems of Linear Equations

Aim. Decimal Search. An Excel 'macro' was used to do the calculations. A copy of the script can be found in Appendix A.

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Hidden Markov Models

Comparison of algorithms for automated university scheduling

1. The Kinetic Theory of Matter states that all matter is composed of atoms and molecules that are in a constant state of constant random motion

Modern Heuristic Optimization Techniques with Applications to Power Systems

Load Balancing. Load Balancing 1 / 24

How I won the Chess Ratings: Elo vs the rest of the world Competition

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Wireless Sensor Networks Coverage Optimization based on Improved AFSA Algorithm

Informed search algorithms. Chapter 4, Sections 1 2 1

Méta-heuristiques pour l optimisation

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Nonlinear Algebraic Equations Example

College of information technology Department of software

Annealing Techniques for Data Integration

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite

Search Algorithm in Software Testing and Debugging

A Software Architecture for a Photonic Network Planning Tool

Math 120 Final Exam Practice Problems, Form: A

The Graphical Method: An Example

Genetic programming with regular expressions

A Review And Evaluations Of Shortest Path Algorithms

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

CHAPTER 1 INTRODUCTION

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005

The Master s Degree with Thesis Course Descriptions in Industrial Engineering

An Empirical Study of Two MIS Algorithms

Projects - Neural and Evolutionary Computing

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

1 Review of Newton Polynomials

Lab 11. Simulations. The Concept

Chapter 11 Number Theory

Transcription:

Local Search and Optimization Chapter 4 Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Outline Local search techniques and optimization Hill-climbing Gradient methods Simulated annealing Genetic algorithms Issues with local search

Local search and optimization Previous lecture: path to goal is solution to problem systematic exploration of search space. This lecture: a state is solution to problem for some problems path is irrelevant. E.g., 8-queens Different algorithms can be used Local search

Goal Satisfaction reach the goal node Constraint satisfaction Optimization optimize(objective fn) Constraint Optimization You can go back and forth between the two problems Typically in the same complexity class Mausam

Local search and optimization Local search Keep track of single current state Move only to neighboring states Ignore paths Advantages: Use very little memory Can often find reasonable solutions in large or infinite (continuous) state spaces. Pure optimization problems All states have an objective function Goal is to find state with max (or min) objective value Does not quite fit into path-cost/goal-state formulation Local search can do quite well on these problems.

Trivial Algorithms Random Sampling Generate a state randomly Random Walk Randomly pick a neighbor of the current state Both algorithms asymptotically complete. Mausam

Hill-climbing (Greedy Local Search) max version function HILL-CLIMBING( problem) return a state that is a local maximum input: problem, a problem local variables: current, a node. neighbor, a node. current MAKE-NODE(INITIAL-STATE[problem]) loop do neighbor a highest valued successor of current if VALUE [neighbor] VALUE[current] then return STATE[current] current neighbor min version will reverse inequalities and look for lowest valued successor

Hill-climbing search a loop that continuously moves towards increasing value terminates when a peak is reached Aka greedy local search Value can be either Objective function value Heuristic function value (minimized) Hill climbing does not look ahead of the immediate neighbors Can randomly choose among the set of best successors if multiple have the best value climbing Mount Everest in a thick fog with amnesia

Landscape of search Hill Climbing gets stuck in local minima depending on?

Example: n-queens Put n queens on an n x n board with no two queens on the same row, column, or diagonal Is it a satisfaction problem or optimization?

Hill-climbing search: 8-queens problem Need to convert to an optimization problem h = number of pairs of queens that are attacking each other h = 17 for the above state

Search Space State All 8 queens on the board in some configuration Successor function move a single queen to another square in the same column. Example of a heuristic function h(n): the number of pairs of queens that are attacking each other (so we want to minimize this)

Hill-climbing search: 8-queens problem Is this a solution? What is h?

Hill-climbing on 8-queens Randomly generated 8-queens starting states 14% the time it solves the problem 86% of the time it get stuck at a local minimum However Takes only 4 steps on average when it succeeds And 3 on average when it gets stuck (for a state space with 8^8 =~17 million states)

Hill Climbing Drawbacks Local maxima Plateaus Diagonal ridges Daniel S. Weld 15

Escaping Shoulders: Sideways Move If no downhill (uphill) moves, allow sideways moves in hope that algorithm can escape Need to place a limit on the possible number of sideways moves to avoid infinite loops For 8-queens Now allow sideways moves with a limit of 100 Raises percentage of problem instances solved from 14 to 94% However. 21 steps for every successful solution 64 for each failure

Tabu Search prevent returning quickly to the same state Keep fixed length queue ( tabu list ) add most recent state to queue; drop oldest Never make the step that is currently tabu ed Properties: As the size of the tabu list grows, hill-climbing will asymptotically become non-redundant (won t look at the same state twice) In practice, a reasonable sized tabu list (say 100 or so) improves the performance of hill climbing in many problems

Escaping Shoulders/local Optima Enforced Hill Climbing Perform breadth first search from a local optima to find the next state with better h function Typically, prolonged periods of exhaustive search bridged by relatively quick periods of hill-climbing Middle ground b/w local and systematic search Mausam

Hill-climbing: stochastic variations Stochastic hill-climbing Random selection among the uphill moves. The selection probability can vary with the steepness of the uphill move. To avoid getting stuck in local minima Random-walk hill-climbing Random-restart hill-climbing Hill-climbing with both

Hill Climbing: stochastic variations When the state-space landscape has local minima, any search that moves only in the greedy direction cannot be complete Random walk, on the other hand, is asymptotically complete Idea: Put random walk into greedy hill-climbing

Hill-climbing with random restarts If at first you don t succeed, try, try again! Different variations For each restart: run until termination vs. run for a fixed time Run a fixed number of restarts or run indefinitely Analysis Say each search has probability p of success E.g., for 8-queens, p = 0.14 with no sideways moves Expected number of restarts? Expected number of steps taken? If you want to pick one local search algorithm, learn this one!!

Hill-climbing with random walk At each step do one of the two Greedy: With prob p move to the neighbor with largest value Random: With prob 1-p move to a random neighbor Hill-climbing with both At each step do one of the three Greedy: move to the neighbor with largest value Random Walk: move to a random neighbor Random Restart: Resample a new current state Mausam

Simulated Annealing Simulated Annealing = physics inspired twist on random walk Basic ideas: like hill-climbing identify the quality of the local improvements instead of picking the best move, pick one randomly say the change in objective function is d if d is positive, then move to that state otherwise: move to this state with probability proportional to d thus: worse moves (very large negative d) are executed less often however, there is always a chance of escaping from local maxima over time, make it less likely to accept locally bad moves (Can also make the size of the move random as well, i.e., allow large steps in state space)

Physical Interpretation of Simulated Annealing A Physical Analogy: imagine letting a ball roll downhill on the function surface this is like hill-climbing (for minimization) now imagine shaking the surface, while the ball rolls, gradually reducing the amount of shaking this is like simulated annealing Annealing = physical process of cooling a liquid or metal until particles achieve a certain frozen crystal state simulated annealing: free variables are like particles seek low energy (high quality) configuration slowly reducing temp. T with particles moving around randomly

Simulated annealing function SIMULATED-ANNEALING( problem, schedule) return a solution state input: problem, a problem schedule, a mapping from time to temperature local variables: current, a node. next, a node. T, a temperature controlling the prob. of downward steps current MAKE-NODE(INITIAL-STATE[problem]) for t 1 to do T schedule[t] if T = 0 then return current next a randomly selected successor of current E VALUE[next] - VALUE[current] if E > 0 then current next else current next only with probability e E /T

Temperature T high T: probability of locally bad move is higher low T: probability of locally bad move is lower typically, T is decreased as the algorithm runs longer i.e., there is a temperature schedule

Simulated Annealing in Practice method proposed in 1983 by IBM researchers for solving VLSI layout problems (Kirkpatrick et al, Science, 220:671-680, 1983). theoretically will always find the global optimum Other applications: Traveling salesman, Graph partitioning, Graph coloring, Scheduling, Facility Layout, Image Processing, useful for some problems, but can be very slow slowness comes about because T must be decreased very gradually to retain optimality

Local beam search Idea: Keeping only one node in memory is an extreme reaction to memory problems. Keep track of k states instead of one Initially: k randomly selected states Next: determine all successors of k states If any of successors is goal finished Else select k best from successors and repeat

Local Beam Search (contd) Not the same as k random-start searches run in parallel! Searches that find good states recruit other searches to join them Problem: quite often, all k states end up on same local hill Idea: Stochastic beam search Choose k successors randomly, biased towards good ones Observe the close analogy to natural selection!

Hey! Perhaps sex can improve search?

Sure! Check out ye book.

Genetic algorithms Twist on Local Search: successor is generated by combining two parent states A state is represented as a string over a finite alphabet (e.g. binary) 8-queens State = position of 8 queens each in a column Start with k randomly generated states (population) Evaluation function (fitness function): Higher values for better states. Opposite to heuristic function, e.g., # non-attacking pairs in 8-queens Produce the next generation of states by simulated evolution Random selection Crossover Random mutation

8 7 6 5 4 3 2 1 String representation 16257483 Can we evolve 8-queens through genetic algorithms?

Evolving 8-queens? Sorry! Wrong queens

Genetic algorithms 4 states for 8-queens problem 2 pairs of 2 states randomly selected based on fitness. Random crossover points selected New states after crossover Random mutation applied Fitness function: number of non-attacking pairs of queens (min = 0, max = 8 7/2 = 28) 24/(24+23+20+11) = 31% 23/(24+23+20+11) = 29% etc

Genetic algorithms Has the effect of jumping to a completely different new part of the search space (quite non-local)

Comments on Genetic Algorithms Genetic algorithm is a variant of stochastic beam search Positive points Random exploration can find solutions that local search can t (via crossover primarily) Appealing connection to human evolution neural networks, and genetic algorithms are metaphors! Negative points Large number of tunable parameters Difficult to replicate performance from one problem to another Lack of good empirical studies comparing to simpler methods Useful on some (small?) set of problems but no convincing evidence that GAs are better than hill-climbing w/random restarts in general

Optimization of Continuous Functions Discretization use hill-climbing Gradient descent make a move in the direction of the gradient gradients: closed form or empirical

Gradient Descent Assume we have a continuous function: f(x 1,x 2,,x N ) and we want minimize over continuous variables X1,X2,..,Xn 1. Compute the gradients for all i: f(x 1,x 2,,x N ) / x i 2. Take a small step downhill in the direction of the gradient: x i x i - λ f(x 1,x 2,,x N ) / x i 3. Repeat. How to select λ Line search: successively double until f starts to increase again

Newton-Raphson applied to function minimization Newton-Raphson method: roots of a polynomial To find roots of g(x), start with x and iterate x x g(x)/g (x) To minimize a function f(x), we need to find the roots of the equation f (x)=0 x x f (x)/f (x) If x is a vector then x x f (x)/f (x) gradient X X H 1 f ( X ) f ( x) Hessian D f(x) H f (x) Hessian is costly to compute (n 2 double derivative entries for an n-dimensional vector) approximations Equivalent to fitting a quadratic function for f in the local neighborhood of x.