Implementation exercises for the course Heuristic Optimization



Similar documents
Non-Parametric Tests (I)

Permutation & Non-Parametric Tests

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

Question 2: How do you solve a matrix equation using the matrix inverse?

Meta-Heuristics for Reconstructing Cross Cut Shredded Text Documents

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media

Factors affecting online sales

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Nonparametric statistics and model selection

Direct Methods for Solving Linear Systems. Matrix Factorization

How To Understand And Solve A Linear Programming Problem

Simplex method summary

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

A Study of Local Optima in the Biobjective Travelling Salesman Problem

SPSS Tests for Versions 9 to 13

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:

DATA ANALYSIS II. Matrix Algorithms

Discovering Local Subgroups, with an Application to Fraud Detection

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

3/25/ /25/2014 Sensor Network Security (Simon S. Lam) 1

Multivariate normal distribution and testing for means (see MKB Ch 3)

The Problem of Scheduling Technicians and Interventions in a Telecommunications Company

StatCrunch and Nonparametric Statistics

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

3 The spreadsheet execution model and its consequences

Terminating Sequential Delphi Survey Data Collection

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Chapter 6. Linear Programming: The Simplex Method. Introduction to the Big M Method. Section 4 Maximization and Minimization with Problem Constraints

NCSS Statistical Software. One-Sample T-Test

The PageRank Citation Ranking: Bring Order to the Web

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

A study of Pareto and Two-Phase Local Search Algorithms for Biobjective Permutation Flowshop Scheduling

NCSS Statistical Software

Chapter 6: Multivariate Cointegration Analysis

Independent t- Test (Comparing Two Means)

STATISTICA Formula Guide: Logistic Regression. Table of Contents

NCSS Statistical Software

One-Way Analysis of Variance (ANOVA) Example Problem

Using Excel for inferential statistics

Tutorial 5: Hypothesis Testing

A Brief Study of the Nurse Scheduling Problem (NSP)

1 Introduction to Matrices

Rank-Based Non-Parametric Tests

Difference tests (2): nonparametric

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Pearson's Correlation Tests

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Non-Inferiority Tests for One Mean

Linear Algebra Review. Vectors

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Multivariate Analysis of Ecological Data

EXCEL SOLVER TUTORIAL

6. Cholesky factorization

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Warshall s Algorithm: Transitive Closure

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

A Study of Differences in Standard & Poor s and Moody s. Corporate Credit Ratings

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

Permutation Tests for Comparing Two Populations

Scheduling Algorithm for Delivery and Collection System

Least Squares Estimation

Iterated Local Search. Variable Neighborhood Search

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Biostatistics: Types of Data Analysis

Lecture 5: Singular Value Decomposition SVD (1)

How To Identify Noisy Variables In A Cluster

Linear Programming. March 14, 2014

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Section 13, Part 1 ANOVA. Analysis Of Variance

Simple Predictive Analytics Curtis Seare

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Regression Analysis: A Complete Example

Introduction to Quantitative Methods

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Data Mining - Evaluation of Classifiers

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Tutorial for proteome data analysis using the Perseus software platform

Chapter 23. Two Categorical Variables: The Chi-Square Test

CS3220 Lecture Notes: QR factorization and orthogonal transformations

UNIVERSITY OF NAIROBI

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Factoring Algorithms

Package HHG. July 14, 2015

AN ALGORITHM FOR DETERMINING WHETHER A GIVEN BINARY MATROID IS GRAPHIC

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Non-Inferiority Tests for Two Means using Differences

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

D-optimal plans in observational studies

Approximation Algorithms

Equilibrium computation: Part 1

Transcription:

Implementation exercises for the course Heuristic Optimization 1 jeremie.dubois-lacoste@ulb.ac.be IRIDIA, CoDE, ULB February 25, 2015 1 Slides based on last year s excersises by Dr. Franco Mascia.

Implement perturbative local search algorithms for the LOP 1 Linear Ordering Problem (LOP) 2 First-improvement and Best-Improvement 3 Transpose, exchange and insert neighborhoods 4 Random initialization vs. CW heuristic 5 Statistical Empirical Analysis

The Linear Ordering Problem (1/3) Ranking in sport tournaments Archeology Aggregation of individual preferences etc.

Linear Ordering Problem (2/3) Given An n x n matrix C, where the value of row i and column j is noted c ij. 1 2 3 4 1 c 11 c 12 c 13 c 14 2 c 21 c 22 c 23 c 24 3 c 31 c 32 c 33 c 34 4 c 41 c 42 c 43 c 44 Objective Find a permutation π of the column and row indices {1,..., n} such that the value f(π) = n i=1 n j=i+1 c π(i)π(j) is maximized.

Linear Ordering Problem (2/3) Given An n x n matrix C, where the value of row i and column j is noted c ij. 1 2 3 4 1 c 11 c 12 c 13 c 14 2 c 21 c 22 c 23 c 24 3 c 31 c 32 c 33 c 34 4 c 41 c 42 c 43 c 44 Objective Find a permutation π of the column and row indices {1,..., n} such that the value f(π) = n i=1 n j=i+1 c π(i)π(j) is maximized.

Linear Ordering Problem example (3/3) π = (1, 2, 3, 4) C = 1 2 3 4 1 1 3 2 4 2 2 1 1 3 3 1 2 2 1 4 4 5 1 4 f(π) = 3 + 2 + 4 + 1 + 3 + 1 = 14 Objective Find a permutation π of the column and row indices {1,..., n} such that the value f(π) = n i=1 n j=i+1 c π(i)π(j) is maximized.

Linear Ordering Problem example (3/3) π = (1, 2, 3, 4) C = 1 2 3 4 1 1 3 2 4 2 2 1 1 3 3 1 2 2 1 4 4 5 1 4 f(π) = 3 + 2 + 4 + 1 + 3 + 1 = 14 Objective Find a permutation π of the column and row indices {1,..., n} such that the value f(π) = n i=1 n j=i+1 c π(i)π(j) is maximized.

Linear Ordering Problem example (3/3) π = (1, 2, 3, 4) C = 1 2 3 4 1 1 3 2 4 2 2 1 1 3 3 1 2 2 1 4 4 5 1 4 f(π) = 3 + 2 + 4 + 1 + 3 + 1 = 14 Objective Find a permutation π of the column and row indices {1,..., n} such that the value f(π) = n i=1 n j=i+1 c π(i)π(j) is maximized.

Linear Ordering Problem example (3/3) π = (1, 2, 3, 4) C = 1 2 3 4 1 1 3 2 4 2 2 1 1 3 3 1 2 2 1 4 4 5 1 4 f(π) = 3 + 2 + 4 + 1 + 3 + 1 = 14 Objective Find a permutation π of the column and row indices {1,..., n} such that the value f(π) = n i=1 n j=i+1 c π(i)π(j) is maximized.

Implement 12 iterative improvements algorithms for the LOP

Implement 12 iterative improvements algorithms for the LOP Pivoting rule: 1 first-improvement 2 best-improvement Neighborhood: 1 Transpose 2 Exchange 3 Insert Initial solution: 1 Random permutation 2 CW heuristic

Implement 12 iterative improvements algorithms for the LOP Pivoting rule: 1 first-improvement 2 best-improvement Neighborhood: 1 Transpose 2 Exchange 3 Insert Initial solution: 1 Random permutation 2 CW heuristic 2 pivoting rules 3 neighborhoods 2 initialization methods = 12 combinations

Implement 12 iterative improvements algorithms for the LOP Do not implement 12 programs! Reuse code and use command-line parameters./lop11 --first --transpose --cw./lop11 --best --exchange --random etc. Use a parameter to choose the instance file, for instance:./lop11 -i <instance file>

Iterative Improvement π := GenerateInitialSolution() while π is not a local optimum do choose a neighbour π N (π) such that F(π ) < F(π) π := π

Iterative Improvement π := GenerateInitialSolution() while π is not a local optimum do choose a neighbour π N (π) such that F(π ) < F(π) π := π Which neighbour to choose? Pivoting rule Best Improvement: choose best from all neighbours of s Better quality Requires evaluation of all neighbours in each step First improvement: evaluate neighbours in fixed order and choose first improving neighbour. More efficient Order of evaluation may impact quality / performance

Iterative Improvement π := GenerateInitialSolution() while π is not a local optimum do choose a neighbour π N (π) such that F(π ) < F(π) π := π Which neighbour to choose? Pivoting rule Best Improvement: choose best from all neighbours of s Better quality Requires evaluation of all neighbours in each step First improvement: evaluate neighbours in fixed order and choose first improving neighbour. More efficient Order of evaluation may impact quality / performance

Iterative Improvement π := GenerateInitialSolution() while π is not a local optimum do choose a neighbour π N (π) such that F(π ) < F(π) π := π Which neighbour to choose? Pivoting rule Best Improvement: choose best from all neighbours of s Better quality Requires evaluation of all neighbours in each step First improvement: evaluate neighbours in fixed order and choose first improving neighbour. More efficient Order of evaluation may impact quality / performance

Iterative Improvement π := GenerateInitialSolution() while π is not a local optimum do choose a neighbour π N (π) such that F(π ) < F(π) π := π Initial solution Random permutation Chenery and Watanabe (CW) heuristic

Iterative Improvement π := GenerateInitialSolution() while π is not a local optimum do choose a neighbour π N (π) such that F(π ) < F(π) π := π Chenery and Watanabe (CW) heuristic Construct the solution by inserting one row at a time, always selecting the most attractive row: the one that maximizes the sum of elements having an influence on objective function. The attractiveness of a row i at step s is: n j=s+1 c π(i)j See example

Iterative Improvement π := GenerateInitialSolution() while π is not a local optimum do choose a neighbour π N (π) such that F(π ) < F(π) π := π Which neighborhood N (π)? Transpose Exchange Insertion

Example: Exchange π i and π j (i j), π = Exchange(π, i, j) π = 1 2 3 4 1 1 2 3 4 2 5 6 7 8 3 9 10 11 12 4 13 14 15 16 π = 1 2 3 4 1 1 4 3 2 4 13 16 15 14 3 9 12 11 10 2 5 8 7 6 Only a subset of the changes affect the objective function Do not recompute the evaluation function from scratch! Equivalent speed-ups with Transpose and Insertion.

Example: Exchange π i and π j (i j), π = Exchange(π, i, j) π = 1 2 3 4 1 1 2 3 4 2 5 6 7 8 3 9 10 11 12 4 13 14 15 16 π = 1 2 3 4 1 1 4 3 2 4 13 16 15 14 3 9 12 11 10 2 5 8 7 6 Only a subset of the changes affect the objective function Do not recompute the evaluation function from scratch! Equivalent speed-ups with Transpose and Insertion.

Example: Exchange π i and π j (i j), π = Exchange(π, i, j) π = 1 2 3 4 1 1 2 3 4 2 5 6 7 8 3 9 10 11 12 4 13 14 15 16 π = 1 2 3 4 1 1 4 3 2 4 13 16 15 14 3 9 12 11 10 2 5 8 7 6 Only a subset of the changes affect the objective function Do not recompute the evaluation function from scratch! Equivalent speed-ups with Transpose and Insertion.

Example: Exchange π i and π j (i j), π = Exchange(π, i, j) π = 1 2 3 4 1 1 2 3 4 2 5 6 7 8 3 9 10 11 12 4 13 14 15 16 π = 1 2 3 4 1 1 4 3 2 4 13 16 15 14 3 9 12 11 10 2 5 8 7 6 Only a subset of the changes affect the objective function Do not recompute the evaluation function from scratch! Equivalent speed-ups with Transpose and Insertion.

Example: Exchange π i and π j (i j), π = Exchange(π, i, j) π = 1 2 3 4 1 1 2 3 4 2 5 6 7 8 3 9 10 11 12 4 13 14 15 16 π = 1 2 3 4 1 1 4 3 2 4 13 16 15 14 3 9 12 11 10 2 5 8 7 6 Only a subset of the changes affect the objective function Do not recompute the evaluation function from scratch! Equivalent speed-ups with Transpose and Insertion.

Instances LOP instances with sizes 150 and 250. More info: http://iridia.ulb.ac.be/ stuetzle/teaching/ho/ Experiments Apply each algorithm k once to each instance i and record its : 1 Relative percentage deviation ki = 100 best-known i cost ki best-known i 2 Computation time (t ki ) Note: use constant random seed across algorithms, for each instance Report for each algorithm k Average relative percentage deviation Sum of computation time

Instances LOP instances with sizes 150 and 250. More info: http://iridia.ulb.ac.be/ stuetzle/teaching/ho/ Experiments Apply each algorithm k once to each instance i and record its : 1 Relative percentage deviation ki = 100 best-known i cost ki best-known i 2 Computation time (t ki ) Note: use constant random seed across algorithms, for each instance Report for each algorithm k Average relative percentage deviation Sum of computation time

Instances LOP instances with sizes 150 and 250. More info: http://iridia.ulb.ac.be/ stuetzle/teaching/ho/ Experiments Apply each algorithm k once to each instance i and record its : 1 Relative percentage deviation ki = 100 best-known i cost ki best-known i 2 Computation time (t ki ) Note: use constant random seed across algorithms, for each instance Report for each algorithm k Average relative percentage deviation Sum of computation time

Is there a statistically significant difference between the solution quality generated by the different algorithms?

Is there a statistically significant difference between the solution quality generated by the different algorithms? Background: Statistical hypothesis tests (1) Statistical hypothesis tests are used to assess the validity of statements about properties of or relations between sets of statistical data. The statement to be tested (or its negation) is called the null hypothesis (H 0 ) of the test. Example: For the Wilcoxon signed-rank test, the null hypothesis is that the median of the differences is zero. The significance level (α) determines the maximum allowable probability of incorrectly rejecting the null hypothesis. Typical values of α are 0.05 or 0.01.

Is there a statistically significant difference between the solution quality generated by the different algorithms? Background: Statistical hypothesis tests (1) Statistical hypothesis tests are used to assess the validity of statements about properties of or relations between sets of statistical data. The statement to be tested (or its negation) is called the null hypothesis (H 0 ) of the test. Example: For the Wilcoxon signed-rank test, the null hypothesis is that the median of the differences is zero. The significance level (α) determines the maximum allowable probability of incorrectly rejecting the null hypothesis. Typical values of α are 0.05 or 0.01.

Is there a statistically significant difference between the solution quality generated by the different algorithms? Background: Statistical hypothesis tests (1) Statistical hypothesis tests are used to assess the validity of statements about properties of or relations between sets of statistical data. The statement to be tested (or its negation) is called the null hypothesis (H 0 ) of the test. Example: For the Wilcoxon signed-rank test, the null hypothesis is that the median of the differences is zero. The significance level (α) determines the maximum allowable probability of incorrectly rejecting the null hypothesis. Typical values of α are 0.05 or 0.01.

Is there a statistically significant difference between the solution quality generated by the different algorithms? Background: Statistical hypothesis tests (1) Statistical hypothesis tests are used to assess the validity of statements about properties of or relations between sets of statistical data. The statement to be tested (or its negation) is called the null hypothesis (H 0 ) of the test. Example: For the Wilcoxon signed-rank test, the null hypothesis is that the median of the differences is zero. The significance level (α) determines the maximum allowable probability of incorrectly rejecting the null hypothesis. Typical values of α are 0.05 or 0.01.

Is there a statistically significant difference between the solution quality generated by the different algorithms? Background: Statistical hypothesis tests (1) Statistical hypothesis tests are used to assess the validity of statements about properties of or relations between sets of statistical data. The statement to be tested (or its negation) is called the null hypothesis (H 0 ) of the test. Example: For the Wilcoxon signed-rank test, the null hypothesis is that the median of the differences is zero. The significance level (α) determines the maximum allowable probability of incorrectly rejecting the null hypothesis. Typical values of α are 0.05 or 0.01.

Is there a statistically significant difference between the solution quality generated by the different algorithms? Background: Statistical hypothesis tests (2) The application of a test to a given data set results in a p-value, which represents the probability that the null hypothesis is incorrectly rejected. The null hypothesis is rejected iff this p-value is smaller than the previously chosen significance level. Most common statistical hypothesis tests are already implemented in statistical software such as the R software environment (http://www.r-project.org/).

Is there a statistically significant difference between the solution quality generated by the different algorithms? Background: Statistical hypothesis tests (2) The application of a test to a given data set results in a p-value, which represents the probability that the null hypothesis is incorrectly rejected. The null hypothesis is rejected iff this p-value is smaller than the previously chosen significance level. Most common statistical hypothesis tests are already implemented in statistical software such as the R software environment (http://www.r-project.org/).

Is there a statistically significant difference between the solution quality generated by the different algorithms? Background: Statistical hypothesis tests (2) The application of a test to a given data set results in a p-value, which represents the probability that the null hypothesis is incorrectly rejected. The null hypothesis is rejected iff this p-value is smaller than the previously chosen significance level. Most common statistical hypothesis tests are already implemented in statistical software such as the R software environment (http://www.r-project.org/).

Is there a statistically significant difference between the solution quality generated by the different algorithms? Example in R best.known <- read.table ("best-known.dat") a.cost <- read.table("lop-best-ex-rand.dat")$v1 a.cost <- 100 * (a.cost - best.known) / best.known b.cost <- read.table("lop-best-ins-rand.dat")$v1 b.cost <- 100 * (b.cost - best.known) / best.known t.test (a.cost, b.cost, paired=t)$p.value [1] 0.8819112 // Greater than 0.05! wilcox.test (a.cost, b.cost, paired=t)$p.value [1] 0.0019212

Is there a statistically significant difference between the solution quality generated by the different algorithms? Example in R best.known <- read.table ("best-known.dat") a.cost <- read.table("lop-best-ex-rand.dat")$v1 a.cost <- 100 * (a.cost - best.known) / best.known b.cost <- read.table("lop-best-ins-rand.dat")$v1 b.cost <- 100 * (b.cost - best.known) / best.known t.test (a.cost, b.cost, paired=t)$p.value [1] 0.8819112 // Greater than 0.05! wilcox.test (a.cost, b.cost, paired=t)$p.value [1] 0.0019212

Is there a statistically significant difference between the solution quality generated by the different algorithms? Example in R best.known <- read.table ("best-known.dat") a.cost <- read.table("lop-best-ex-rand.dat")$v1 a.cost <- 100 * (a.cost - best.known) / best.known b.cost <- read.table("lop-best-ins-rand.dat")$v1 b.cost <- 100 * (b.cost - best.known) / best.known t.test (a.cost, b.cost, paired=t)$p.value [1] 0.8819112 // Greater than 0.05! wilcox.test (a.cost, b.cost, paired=t)$p.value [1] 0.0019212

Is there a statistically significant difference between the solution quality generated by the different algorithms? Example in R best.known <- read.table ("best-known.dat") a.cost <- read.table("lop-best-ex-rand.dat")$v1 a.cost <- 100 * (a.cost - best.known) / best.known b.cost <- read.table("lop-best-ins-rand.dat")$v1 b.cost <- 100 * (b.cost - best.known) / best.known t.test (a.cost, b.cost, paired=t)$p.value [1] 0.8819112 // Greater than 0.05! wilcox.test (a.cost, b.cost, paired=t)$p.value [1] 0.0019212

Is there a statistically significant difference between the solution quality generated by the different algorithms? Example in R best.known <- read.table ("best-known.dat") a.cost <- read.table("lop-best-ex-rand.dat")$v1 a.cost <- 100 * (a.cost - best.known) / best.known b.cost <- read.table("lop-best-ins-rand.dat")$v1 b.cost <- 100 * (b.cost - best.known) / best.known t.test (a.cost, b.cost, paired=t)$p.value [1] 0.8819112 // Greater than 0.05! wilcox.test (a.cost, b.cost, paired=t)$p.value [1] 0.0019212

Is there a statistically significant difference between the solution quality generated by the different algorithms? Example in R best.known <- read.table ("best-known.dat") a.cost <- read.table("lop-best-ex-rand.dat")$v1 a.cost <- 100 * (a.cost - best.known) / best.known b.cost <- read.table("lop-best-ins-rand.dat")$v1 b.cost <- 100 * (b.cost - best.known) / best.known t.test (a.cost, b.cost, paired=t)$p.value [1] 0.8819112 // Greater than 0.05! wilcox.test (a.cost, b.cost, paired=t)$p.value [1] 0.0019212

Exercise 1.2 VND algorithms for the LOP Variable Neighbourhood Descent (VND) k neighborhoods N 1,..., N k π := GenerateInitialSolution() i := 1 repeat choose the first improving neighbor π N i (π) if π then i := i + 1 else π := π i := 1 until i > k

Exercise 1.2 VND algorithms for the LOP Implement 4 VND algorithms for the LOP Pivoting rule: first-improvement Neighborhood order: 1 transpose exchange insert 2 transpose insert exchange Initial solution: 1 CW heuristic

Exercise 1.2 VND algorithms for the LOP Implement 4 VND algorithms for the LOP Instances: Same as 1.1 Experiments: one run of each algorithm per instance Report: Same as 1.1 Statistical tests: Same as 1.1

Instances and barebone code will be soon available at: http://iridia.ulb.ac.be/ stuetzle/teaching/ho/ Deadline is April 10 (23:59) Questions in the meantime? jeremie.dubois-lacoste@ulb.ac.be