Similar documents
( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

Annual Report H I G H E R E D U C AT I O N C O M M I S S I O N - PA K I S TA N


M.A. in Communication Studies


Polynomial Factoring. Ramesh Hariharan

recursively enumerable languages context-free languages regular languages

OPTIMAL SELECTION BASED ON RELATIVE RANK* (the "Secretary Problem")

Solutions of Equations in One Variable. Fixed-Point Iteration II

CODES FOR PHARMACY ONLINE CLAIMS PROCESSING

An Incomplete C++ Primer. University of Wyoming MA 5310

On closed-form solutions of a resource allocation problem in parallel funding of R&D projects

CSE 135: Introduction to Theory of Computation Decidability and Recognizability


DECLARATION OF PERFORMANCE NO. HU-DOP_TN _001

DECLARATION OF PERFORMANCE NO. HU-DOP_TD-25_001

Statistical Machine Translation: IBM Models 1 and 2

New Investigator Form

Stock Exchange of Mauritius Ground Rules for the SEM-10

On the Eigenvalues of Integral Operators

PINPOINT: What and Where?

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

I I I I I I I I I I I I I I I I I I I

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

LIMITS AND CONTINUITY

A Permutation Network

CLOUDS: A Decision Tree Classifier for Large Datasets

1.- L a m e j o r o p c ió n e s c l o na r e l d i s co ( s e e x p li c a r á d es p u é s ).

Option Pricing. Chapter 12 - Local volatility models - Stefan Ankirchner. University of Bonn. last update: 13th January 2014

Modern Physics Laboratory e/m with Teltron Deflection Tube

Rain Sensor "AWS" TYPE CHART and INSTALLATION INSTRUCTION

Overview of Number Theory Basics. Divisibility

IRGP4068DPbF IRGP4068D-EPbF

Manual for SOA Exam MLC.

OHJ-2306 Introduction to Theoretical Computer Science, Fall

An Introduction to the RSA Encryption Method

Monolithic Amplifier PMA2-43LN+ Ultra Low Noise, High IP3. 50Ω 1.1 to 4.0 GHz. The Big Deal

MAT 1341: REVIEW II SANGHOON BAEK

Optimal order placement in a limit order book. Adrien de Larrard and Xin Guo. Laboratoire de Probabilités, Univ Paris VI & UC Berkeley

Chapter 7. Continuity

Social Media Mining. Data Mining Essentials

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

ERDOS PROBLEMS ON IRREGULARITIES OF LINE SIZES AND POINT DEGREES A. GYARFAS*

Statistical Machine Learning

Determination of the normalization level of database schemas through equivalence classes of attributes

The Goldberg Rao Algorithm for the Maximum Flow Problem


Measuring evolution of systemic risk across insurance-reinsurance company networks. Abstract

4.6 Linear Programming duality

Oscar E. Morel UtilX Corporation

Eris Interest Rate Swap Futures: Flex Contract Specifications

B I N G O B I N G O. Hf Cd Na Nb Lr. I Fl Fr Mo Si. Ho Bi Ce Eu Ac. Md Co P Pa Tc. Uut Rh K N. Sb At Md H. Bh Cm H Bi Es. Mo Uus Lu P F.

Some Essential Statistics The Lure of Statistics

GREATEST COMMON DIVISOR

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

How To Solve A Sequential Mca Problem

Trimming a Tree and the Two-Sided Skorohod Reflection

On continued fractions of the square root of prime numbers

IFANCA HALAL PRODUCT CERTIFICATE

Christfried Webers. Canberra February June 2015

Frsq: A Binary Image Coding Method

9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1



Using the Normalized Image Log-Slope, part 3

Chapter 13: Basic ring theory

Today s Topics. Primes & Greatest Common Divisors

Correlation. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

Applied Algorithm Design Lecture 5

Journal of Chemical and Pharmaceutical Research, 2014, 6(3): Research Article. Analysis of results of CET 4 & CET 6 Based on AHP

TIRANTS ARTITEC INOX TARIF BRUT (HTVA)

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

CIPFA. Interactive Timetable Live Online

MA4001 Engineering Mathematics 1 Lecture 10 Limits and Continuity

Big Data trifft Industrie Im Internet der Bosch-Dinge und -Dienste

HCC4541B HCF4541B PROGRAMMABLE TIMER

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Representation of functions as power series

Amply Fws Modules (Aflk n, Sonlu Zay f Eklenmifl Modüller)

General Specifications

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Monte Carlo-based statistical methods (MASM11/FMS091)

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

PROMPT ENGINEERING & TRADING SERVICES CO. W.L.L P.O. BOX BARWA VILLAGE, WAKRA, BUILDING NO. 6 SHOP NO. 19 AND 20 DOHA QATAR

Quick Disconnect. Wiring diagrams show quick. Male Receptacle End View Blue. disconnect pin numbers ( ) 1 Brown. Load. 4 1 Black ( ) 3 Blue ( ) Brown

Real-time Targeted Influence Maximization for Online Advertisements

INSTALLATION OPERATING & MAINTENANCE MANUAL

Transcription:

NoFreeLunchTheoremsforSearch DavidH.Wolpert(dhw@santafe.edu) SFI-TR-95-02-010 WilliamG.Macready(wgm@santafe.edu) TheSantaFeInstitute 1399HydeParkRoad SantaFe,NM,87501 February23,1996 possiblecostfunctions.inparticular,ifalgorithmaoutperformsalgorithmbonsome formexactlythesame,accordingtoanyperformancemeasure,whenaveragedoverall Weshowthatallalgorithmsthatsearchforanextremumofacostfunctionper- Abstract whereboutperformsa.startingfromthisweanalyzeanumberoftheotherapriori characteristicsofthesearchproblem,likeitsgeometryanditsinformation-theoretic aspects.thisanalysisallowsustoderivemathematicalbenchmarksforassessinga costfunctions,thenlooselyspeakingtheremustexistexactlyasmanyotherfunctions time-varyingcostfunctions.weconcludewithsomediscussionofthejustiabilityof particularsearchalgorithm'sperformance.wealsoinvestigateminimaxaspectsof biologically-inspiredsearchmethods. functiontopredictfuturebehaviorofthesearchalgorithmonthatcostfunction,and thesearchproblem,thevalidityofusingcharacteristicsofapartialsearchoveracost 1Manyproblemscanbecastasoptimizationovera\cost"or\tness"function.Insucha problem,wearegivensuchafunction,f:x!y(fbeingthesetofallsuchmappings). Introduction Physicalexamplesofsuchaproblemincludefreeenergyminimization(Y=<)overspin congurations(x=f 1;+1gN),oroverbondangles(X=f<<<gN),etc.Examplesalsoaboundincombinatorialoptimization,rangingfromnumberpartitioningtograph weseekthex'swhichextremizef(thiswilloftenbeimplicitlyassumedinthispaper). Forthatfweseekthesetofx2Xwhichgiverisetoaparticulary2Y.Mostoften, coloringtoscheduling[4]. 1

tematicconstructionofagoodxvalue,x0,fromgoodsub-solutionsspecifyingpartofx0. Themostcelebratedmethodofthistypeisthebranchandboundalgorithm[9].Forthis systematicandexhaustiveapproachtoworkinreasonabletime,onemusthaveaneective Therearetwocommonapproachestotheseoptimizationproblems.Therstisasys- work[11]linkingthecostfunctiontothepropertiesaheuristicmusthaveinordertosearch heuristic,h(n),representingthequalityofsub-solutionsn.thereisextensivetheoretical eciently. [7],andgeneticalgorithms[5]. values.therearemanyalgorithmsofthistype,includinghill-climbing,simulatedannealing solutionsx2xandtheassociatedyvalues,and(triesto)iterativelyimprovesuponthosex Asecondapproachtooptimizationbeginswithapopulationofoneormorecomplete thesealgorithmsaredirectlyapplied,withlittleornomodication,toanycostfunctionina biasesinhowtheytrytoimprovethepopulation(i.e.,thebiasesinhowtheysearchx) must\match"thoseimplicitinthecostfunctiontheyareoptimizing.howeveralmostalways Intuitively,onewouldexpectthatforthisclassofalgorithmstoworkeectively,the wideclassofcostfunctions.theparticularsofthecostfunctionsathandarealmostalways broadclassofproblemsisrarelyjustied. thecostfunctionarecrucial,andblindfaithinanalgorithmtosearcheectivelyacrossa ignored.aswewilldemonstratethough,the\matching"intuitionistrue;theparticularsof onemightexpectthathill-climbingusuallyoutperformshill-descendingifone'sgoalisto ndamaximumofthecostfunction.onemightalsoexpectitwouldoutperformarandom AperformsbetterthanBonaverage,evenifBsometimesoutperformsA.Asanexample, Indeed,onemightexpectthattherearepairsofsearchalgorithmsAandBsuchthat theperformancemeasureused). expectedperformanceofallalgorithmsonthatfunctionareexactlythesame(regardlessof search.inpointoffactthough,asourcentralresultdemonstrates,thisisnotthecase.if wedonottakeintoaccountanyparticularbiasesorpropertiesofourcostfunction,thenthe thisreason(andtoemphasizetheparallelwithsimilarsupervisedlearningresults[16,17]), onlyaswellastheknowledgeconcerningthecostfunctionputintothecostalgorithm.for wehavedubbedourcentralresulta\nofreelunch"(nfl)theorem. Inshort,thereareno\freelunches"foreectiveoptimization;anyalgorithmperforms aspectsofsearch.thisframeworkconstitutesthe\skeleton"oftheoptimizationproblem;it iswhatcanbesaidconcerningsearchbeforeexplicitdetailsofaparticularreal-worldsearch problemareconsidered.theconstructionofsuchaskeletonprovidesalanguagetoaskand ToprovetheNFLtheoremaframeworkhastobedevelopedwhichaddressesthecore nevermindanswered.(weposeandansweranumberofsuchquestionsinthispaper.)in addition,suchaskeletonindicateswherethereal\meat"ofoptimizationlies.itclaries whatthecoreissuesarethatunderlytheeectivenessofthesearchprocess. answerformalquestionsaboutsearch,someofwhichhaveneverbeforeevenbeenasked, andusingittoprovethenfltheorem.weprovethetheoremforbothdeterministicand stochasticsearchalgorithms.section3givesageometricinterpretationofthenfltheorem. Inparticular,inthatsectionweprovideageometricmeaningofwhatitmeansforan Thepaperisorganizedasfollows.Webegininsection2bypresentingourframework 2

tigationofthestatisticalnatureofthesearchproblem,usingtheframeworkdevelopedin section2. algorithmtobewell\matched"toacostfunction. Insomecircumstancestheaveragebehaviorofalgorithmsisnotaninterestingquantity TherestofthepapergoesbeyondtheNFLtheorem.Itconsistsofapreliminaryinves- holdforanydistributionovercostfunctions. bywhichtocomparealgorithms.alternatively,averagesmaybeinteresting,butitisn'tclear whatdistributionovercostfunctionstousetodotheaveraging.weaddresssuchscenarios insection4byinvestigatingminimaxdistinctionsbetweenalgorithms.suchdistinctions ofthenfltheoreminanalyzingoptimization.)amyriadofotherpropertiesofsearchmay thatthoseresultsarederivedfromthenfltheorem,theyillustratethecentralimportance answersleadnaturallyintoresultsconcerningtheinformationtheoreticaspectsofsearch.(in Section5beginstheexplorationofsomeofthequestionsraisedinsection2.Someofthe beinvestigatedusingtechniquessimilartothosedevelopedinthissection.welistasample oftheseinsection9.2. ularsearchalgorithms.wederiveseveralbenchmarksagainstwhichtocomparesuchan (ratherthanrelative)ecacyofanalgorithmonsomesearchproblemthatdoesn'tusethese algorithm'sperformance.wecannotconceiveofanyvaliddemonstrationofthe\absolute" InSection6weturntotheimportantproblemofassessingtheperformanceofpartic- (orsimilar)benchmarks. Section7extendsouranalysistothecaseofsuchtimedependentcostfunctions. Notallsearchproblemsarestatic;insomecasesthecostfunctionchangesovertime. onthatfunction.whenchoosingbetweenalgorithmsbasedontheirobservedperformance thereforeforanydistributionovercostfunctions.thesetheoremsstatethatonecannotuse asearchalgorithm'sbehaviorsofaronaparticularcostfunctiontopredictitsfuturebehavior Insection8weprovidesometheoremsvalidforanysinglexedcostfunction,and itdoesnotsucetomakeanassumptionaboutthecostfunction;some(currentlypoorly understood)assumptionsare'alsobeingmadeabouthowthealgorithmsinquestionare relatedtoeachotherandtothecostfunction. results,andthenoffuturedirectionsforwork. Thepapercanbereadinstages.ArstreadingmighthighlighttheNFLtheoremandits Finally,weconcludeinSection9withageneraldiscussionoftheimplicationsofour ofthenfltheorem.finally,section9.1discussesbroadimplicationsofthenflresult. Section4,whichconsidersminimaxdistinctionsbetweenalgorithms,addresseslimitations NFLtheorem,Eq.(1).Section3thenprovidesageometricunderstandingofthetheorem. broadimplications.suchareadingshouldstartwithsection2foranunderstandingofthe Sections2and3.Suchareadingshouldincludesection5,whichusesourframeworkto tosection6whichusestheframeworktoprovideusefulbenchmarksagainstwhichother demonstratesomeoftheinformationtheoreticaspectsofsearch.itwouldthenmoveon Asecondreadingmightexplorethepotentialrichnessoftheframeworkdevelopedin algorithmsmaybecompared. Analreadingwouldincludesubjectsthatmayconstitutefruitfulextensionsofthe 3

section8,whichprobeswhatmaybelearnedfromalimitedamountofsearchoverasingle, specic,costfunction.thisreadingwouldconcludewithsection9.2wherewelistmany frameworkdevelopedinsections2and3.suchareadingwouldincludesection7,which extendsthenflresultstoaclassoftime-dependentcostfunctions.itwouldalsoinclude directionsforfutureextensions. sense.ifsomeonewishestocomparealgorithmsonsomeotherbasis,wewishthemluck. numberofdistinctevaluationsofthecostfunctionissimplyourchoice.althoughweconsider itquitereasonable,wedonotclaimtobeableto\prove"thatoneshoulduseit,inany Weshouldemphasizethatourcomparingalgorithmsbasedontheirhavingthesame ofirrelevantaprioridistinctionsbetweenalgorithms.(forexample,itsaysthataglobal totalevaluations includingrepeats isfraughtwithdiculties,andresultsinallkinds Howeverasanasideononesuchcomparisonscheme,wenotethatcomparingbasedon therandomguesserwillretraceless.) randomguesserisbetterthanahill-climber,averagedoverallcostfunctions,simplybecause inparticular,theeldofcomputationalcomplexity.unliketheapproachtakeninthispaper,computationalcomplexityignoresthestatisticalnatureofsearchforthemostpart,and concentratesinsteadoncomputationalissues.much(thoughbynomeansall)ofcomputationalcomplexityisconcernedwithphysicallyunrealizablecomputationaldevices(turing Thereareanumberofotherformalapproachestotheissuesinvestigatedinthispaper, Incontrast,theanalysisinthispaperdoesnotconcernitselfwiththecomputationalengineusedbythesearchalgorithm,butratherconcentratesexclusivelyontheunderlying machines)andtheworstcaseamountofresourcestheyrequiretondoptimalsolutions. (realistic)concernsforcomputationalresources. statisticalnatureofthesearchproblem. Futureworkwouldinvolvecombiningourconcernforthestatisticalnatureofsearchwith associatedcostvalues,(x;y)m2(xy)m,toanewpointx02xthathopefullyhaslowcost 2Alloracle-basedsearchalgorithmsrelyonextrapolatingfromanexistingsetofmpointsand NoFreeLunchTheoremforSearch beeitherdeterministicorstochastic.theanalysisofsuchextrapolationscanbeformalized (highcostifwe'researchingforamaximumratherthanaminimum).theextrapolationmay i=1:::mtobeasetofmdistinctsearchpoints(i.e.costevaluations)andassociatedcost asfollows. valuesorderedinsomeway(usuallyaccordingtothetimeatwhichtheyaregenerated)with theorderingindexgivenbyi.letuscallthisapopulationofsizem.wedenotethesetof ForsimplicitytakeXandYtobenite.Denedmfdm(i)gfdxm(i);dym(i)gfor pointx2xischosenbasedonthemembersofthecurrentpopulationd;thepairfx0;f(x0)g allpopulationsofsizembydm. isaddedtod;andtheprocedurerepeats. areanitenumberoffifjxjandjyjarenite.ateachstageofasearchalgorithm,anew Asabove,letfindicateasingle-valuedfunctionfromXtoY:f2YX.Notethatthere 4

hapsprobabilistic)mappingtakinganypopulationtoanewpointinthesearchspace.for simplicityofthepresentation,weassumethatthenewsearchpointhasnotalreadybeen visited.(asdiscussedbelow,relaxingthisassumptiondoesnotaectourresults.)soin Anysearchalgorithmofthe\secondapproach"discussedintheintroductionisa(per- discussedbelow,allourresultsalsoapplytostochasticalgorithms. thispaperwewillonlyexplicitlyconsidersuchdeterministicsearchalgorithms.howeveras D[mDm,andinparticularcontainstheemptyset.Forclarityoftheexposition,in particularadeterministicsearchalgorithmisamappinga:d2d!fxjx62dxg,where tness,itisnecessarytoevaluatethetnessesofalltheneighborsofx.allthoseevaluated pointsarecontainedinthepopulation,notonlyxandtheneighborofxwithhighesttness. ventionalhill-climberthatworksbymovingfromxtothatneighborofxwiththehighest Notethatthepopulationcontainsallpointssampledsofar.Inparticular,inacon- particularcostfunction,f,givenmdistinctcostevaluations.notethat~cisgivenbythey valuesofthepopulation,dym,andisavectoroflengthjyjwhoseithcomponentisthenumber ofmembersinthepopulationdmhavingcostfi.oncewehave~cwecanuseittoassessthe Weareinterestedinthehistogram,~c,ofcostvaluesthatanalgorithm,a,obtainsona ofalgorithmaonf.thisquantityisgivenbytheconditionalprobabilityp(~cjf;m;a). interestedintheconditionalprobabilitythathistogram~cwillbeobtainedundermiterations mighttakethelowestoccupiedbinin~casourperformancemeasure.)consequently,weare qualityofthesearchinanywaywechoose.(forexampleifwearesearchingforminimawe allfofp(~cjf;m;a1)tothesumoverallfofp(~cjf;m;a1).thiscomparisonprovidesa reverseistrue.toperformthecomparison,weusethetrickofcomparingthesumover algorithma1outperformsanotheralgorithma2,comparestof2,thesetoffforwhichthe AnaturalquestionconcerningthisscenarioishowF1,thesetoffforwhichsome majorresultofthispaper:p(~cjf;m;a)isindependentofawhenweaverageoverallcost functions.inotherwords,asisprovenbelow, Theorem:Foranypairofalgorithmsa1anda2, Animmediatecorollaryisthatforanyperformancemeasure(~c),theaverageoverallf XfP(~cjf;m;a1)=XfP(~cjf;m;a2): (1) toaperformancemeasureisirrelevant. ofp((~c)jf;m;a)isindependentofa.sotheprecisewaythatthehistogramismapped ofalgorithma,isindependentofa.thisfollowsfrom P(~cjm;a),whichistheprobabilityweobtainhistogramcaftermdistinctcostevaluations Notethatthenofreelunchresultimpliesthatifweknownothingaboutf,then (inthelaststepwehavereliedonthefactthatthecostfunctiondoesn'tdependoneither P(~cjm;a)=XfP(~cjf;m;a)P(fjm;a)=XfP(~cjf;m;a)P(f) mora).ifweknownothingaboutfthenallfareequallylikely,whichmeansthatfor allf,p(f)=1=jyjjxj.(moregenerally,p(f)reectsour\priorknowledge"concerningf.) 5

isindependentofabythenofreelunchtheorem. Accordingly,forthis\noknowledge"scenario,P(~cjm;a)=jYj jxjpfp(cjf;m;a),which ofthespace.ratheritisthetypicalcase. possiblep(f).)inthis,theuniformp(f)caseisnotsome\pathologicalcase",ontheedge theresultconcernsaveragingoverallthequantityp(~cjm;),whereindexesthesetof Similarly,youcanderiveanNFLresultforaveragingoverallpriors.(Moreformally, ifalgorithma1hasbetterperformancethanalgorithma2oversomesubsetfoffunctions,thena2mustperformbetteronthesetofremainingfunctionsfn.soforexampleif maximumofthecostfunction,hill-climbingandhill-descendingareequivalent,onaverage. simulatedannealingoutperformsgeneticalgorithmsonsomeset,geneticalgorithmsmust outperformsimulatedannealingonfn.asanotherexample,evenifone'sgoalistonda P~c~cP(~cjf;m;a)is,onaverage,thesameforallalgorithms.Moregenerally,foranytwo algorithms,atthepointintheirsearchwheretheyhavebothcreatedapopulationofsizem, AnotherimmediateconsequenceoftheNFLresultisthattheexpectedhistogramE(~cjf;m;a)= Thereareasmanyfforwhichyouralgorithm'sguessesforwheretosearchareworsethan ofrandomsearch.thenflresultsaysthatthereareasmanyf(appropriatelyweighted) forwhichtherandomalgorithmoutperformsyourfavoritesearchalgorithmasvice-versa. Aparticularlystrikingexampleofthislastpointisthecasewherea2isthealgorithm itmayperformrandomlyonthefathand,butthatitmayverywellperformevenworse. randomasforwhichtheyarebetter.theriskyoutakeinchoosinganalgorithmisnotthat somethingaboutf(perhapsspeciedthroughp(f)),ifwefailtoexplicitlyincorporatethat ofthisisdemonstratedbythenfltheorem,whichillustratesthatevenifwedoknow veryrarelyisthatknowledgeexplicitlyusedtohelpsetthealgorithm.theunreasonableness Oftenintherealworldonehassomeaprioriknowledgeconcerningf.Howeveronly onafortuitousmatchingbetweenfanda.thispointisformallyestablishedinsections3 knowledgeintoathenwehavenoassurancestheawillbeeective;wearesimplyrelying obvious.similarly,itmayseemobviousthatifoneuniformlyaveragesoverallf,thenall and8,whichmakenoassumptionswhatsoeverconcerningp(f). algorithmsareequal.(theonlyreasonittakesawholesubsectiontoestablishthisformally isbecausetherearealargenumberof\obvious"thingsthatmustbemathematicized.)yet ManywouldreadilyagreethatamustmatchP(f) thatstatementbordersonthe climbingandhill-descendingareequivalentonaverage,orthat\smart"choosingprocedures performnobetterthan\dumb"ones(seesection8).inaddition,thegeometricnatureof withoutrealizingyouaredoingso.thisiswhy,forexample,itcanbesurprisingthathill- theimplicationsofthestatementarenotsoobvious;itisextremelyeasytocontradictthem search.itistheonlystartingpointwecouldthinkofforinvestigatingthe\skeleton"ofthe searchproblem,before(assumptionsfor)theactualdistributionsintherealworldareput thematchingillustratessomeinterestingaspectsofthesearchproblem(seebelow). in.itshouldbeobviousthatwearenotclaimingthatallf'sareequallylikelyinthereal Weemphasizethattakinguniformaveragesoverf'sissimplyatoolforinvestigating world,andthesignicanceofthenfltheoreminnowaydependsonthevalidityofsucha claim. Resultsfornon-uniformP(f)arediscussedbelow,aftertheproofoftheNFLtheorem. 6

WenowshowthatPfP(~cjf;m;a)hasnodependenceona.Conceptually,theproofis quitesimple;theonlyreasonittakessolongisbecausethereissomebook-keepinginvolved. 2.1 Prooffordeterministicsearch Inaddition,becausemanyofourreadersmaynotbeconversantwiththetechniquesof probabilitytheorywesupplyallthedetails,lengtheningitconsiderably. values.thenweuseinductiontoestablishthea-independenceofthedistributionoverdym. involvesthefollowingsteps:first,wereducethedistributionover~cvaluestooneoverdym hasnobearingonitsfutureperformancesothatallalgorithmsperformequally.theproof Theintuitionissimple:bysummingoverallfthepastperformanceofanalgorithm separately,givingthedesiredresult. upintotwoindependentparts,oneforx2dxmandoneforx62dxm.theseareevaluated Theinductivestepstartsbyrearrangingthedistributionsinquestion.Thenfisbroken Expandingoverallpossibleycomponentsofapopulationofsizem,dym,wesee NowP(~c;dymjf;m)=P(~cjdym;f;m;a)P(dymjf;m;a).Moreover,theprobabilityofobtainingahistogram~cgivenf,d,manda,P(~cjdym;f;m),dependsonlyontheyvaluesof populationdm.therefore XfP(~cjf;m;a)=Xf;dymP(~cjdym)P(dymjf;m;a) XfP(~cjf;m;a)=Xf;dymP(~c;dymjf;m;a) ToprovethattheexpressioninEq.(2)isindependentofaitsucestoshowthatfor =XdymP(~cjdym)XfP(dymjf;m;a) allmanddym,pfp(dymjf;m;a)isindependentofa,sincep(~cjdym)isindependentofa.we willprovethisbyinductiononm. possiblevaluefordy1isf(dx1),sowehave: Form=1wewritethepopulationasd1=fdx1;f(dx1)gwheredx1issetbya.Theonly whereisthekroneckerdeltafunction. XfP(dy1jf;m=1;a)=Xf(dy1;f(dx1)) whichhavecostdy1atpointdx1.thereforethatsumequalsjyjjxj 1,independentofdx1: Nowwhenwesumoverallpossiblecostfunctions(dy1;f(dx1))is1onlyforthosefunctions whichisindependentofa.thisbasestheinduction. XfP(dy1jf;m=1;a)=jYjjXj 1 dym,thensoalsoispfp(dym+1jf;m+1;a).thiswillcompletetheproofofthenflresult. Wenowestablishtheinductivestep,thatifPfP(dymjf;m;a)isindependentofaforall 7

Westartbywriting P(dym+1jf;m+1;a)=P(fdym+1(1);:::;dym+1(m)g;dym+1(m+1)jf;m+1;a) sowehave =P(dym;dym+1(m+1)jf;m+1;a) XfP(dym+1jf;m+1;a)=XfP(dym+1(m+1)jdym;f;m+1;a)P(dymjf;m+1;a): =P(dym+1(m+1)jdm;f;m+1;a)P(dymjf;m+1;a) weexpandoverthesepossiblexvalues,getting Thenewyvalue,dym+1(m+1),willdependonthenewxvalue,fandnothingelse.So XfP(dym+1jf;m+1;a)=Xf;xP(dym+1(m+1)jf;x)P(xjdym;f;m+1;a) =Xf;x(dym+1(m+1);f(x))P(xjdym;f;m+1;a) P(dymjf;m+1;a) expandindxmtoremovethefdependenceinp(xjdym;f;m+1;a): Nextnotethatsincex=a(dxm;dym),itdoesnotdependdirectlyonf.Consequentlywe P(dymjf;m+1;a): XfP(dym+1jf;m+1;a)=X =Xf;dxm(dym+1(m+1);f(a(dm)))P(dmjf;m;a) f;x;dxm(dym+1(m+1);f(x))p(xjdm;a)p(dxmjdym;f;m+1;a) P(dymjf;m+1;a) whereusewasmadeofthefactthatp(xjdm;a)=(x;a(dm))andthefactthatp(dmjf;m+ 1;a)=P(dmjf;m;a). pointsrestrictedtodxmandthosepointsoutsideofdxm.p(dmjf;m;a)willdependonthef valuesdenedoverpointsoutsidedxm.(recallthata(dxm)62dxm.)sowehave valuesdenedoverpointsinsidedxmwhile(dym+1(m+1);f(a(dm)))dependsonlyonthef Wedothesumovercostfunctionsfrst.Thecostfunctionisdenedbothoverthose XfP(dym+1jf;m+1;a)=XdxmX X f(x2dxm) (dym+1(m+1);f(a(dm))): P(dmjf;m;a) ThesumPf(x62dxm)contributesaconstant,jYjjXj m 1,equaltothenumberoffunctions (3) denedoverpointsnotindxmpassingthrough(dxm+1(m+1);f(a(dm))).so XfP(dym+1jf;m+1;a)=jYjjXj m 1f(x2dxm);dxm X 8 P(dmjf;m;a)

= jyjxf;dxmp(dmjf;m;a) Byhypothesistherighthandsideofthisequationisindependentofa,sothelefthandside jyjxfp(dymjf;m;a) 1 mustalsobe.thiscompletestheproofofthenflresult. ofcostvaluesaftermstepsmustalsobeindependentofa.however,italsofollowsthat result.sincethesumpfp(dymjf;m;a)isindependentofa,itfollowsthatthehistograms thedistributionovertimeorderedpopulations(thedym)arealsoidenticalforalla.sowhen WenoteinpassingthattheproofoftheNFLtheoremcanbeusedtoderiveastronger theorderingofcostvaluesisimportant(e.gwhenyouwouldliketogettolowcostquickly) thereisstillnowaytodistinguishbetweenalgorithmswhenweaverageoverallf. ndobjectionable.theseare:i)thebanningofalgorithmsthatmightrevisitthesamepoints 2.2 Therearetworestrictionsonthedenitionofsearchalgorithmsusedsofarthatonemight Moregeneralkindsofsearch eitheralgorithmsthatrevisitpointsand/orarealgorithmsthatarestochastic.sothereis ratherthandeterministically.fortunately,thenflresultcaneasilybeextendedtoinclude nolossofgeneralityinourdenitionofa\searchalgorithm". inxafterplacingthemindx;andii)thebanningofalgorithmsthatworkstochastically algorithma0by\skippingoverallduplications"inthesequenceoffx;ygpairsproduced algorithm\potentiallyretracing".givenapotentiallyretracingalgorithma,produceanew givensome(perhapsempty)d,thealgorithmmightproduceapointx2dx.callsuchan Toseethis,saywehaveadeterministicalgorithma:d2D!fxjx2Xg,sothat originalalgorithmacannotgetstuckforeverinsomesubsetofd,wecanalwaysproduce bythepotentiallyretracingalgorithm.formally,foranyd,a0(d)isdenedastherstx valuefromthesequencefa(;);a(d);a(a(d));:::gthatisnotcontainedindx.solongasthe suchana0froma.(wecanndnoreasontodesignone'salgorithmtonothavean\escape thata0isa\compacted"versionofa. mechanism"thatensuresthatitcannotgetstuckforeverinsomesubsetofd.)wewillsay intheprevioussubsection.thereforetheyobeythenflresultofthatsubsection.sothe thatequationtobethenumberofdistinctpointsinthedx'sproducedbythealgorithms,in NFLresultinEq.(1)holdsevenforpotentiallyretracingalgorithms,ifweredene`m'in Nowanytwocompactedalgorithmsare\searchalgorithms"inthesensethetermisused question,andifweredene`~c'tobethehistogramcorrespondingtothosemdistinctpoints. bylookingatthed'stheyproduceaftersamplingf(x)thesamenumberoftimes.thisis distinctevaluationsoff(x).soitmakessensetocomparepotentiallyretracingalgorithms notbylookingatthed'stheyproduceafterbeingrunthesamenumberoftimes,butrather Moreover,ourreal-worldcostinusinganalgorithmisusuallysetbythenumberof consistentwithusingourredenedmand~c. 9

isstillwell-dened.onlyratherthanbeingdeterministic,thatcompactedalgorithmis stochastic.thisbringsustothegeneralissueofhowtoadaptouranalysistoaddress bestochastic(e.gsimulatedannealing).inthiscasethecompactedversionofthealgorithm Notethatthexatwhichapotentiallyretracingalgorithmbreaksoutofacyclemight stochasticsearchalgorithms. amappingtakinganydtoa(d-dependent)distributionoverxthatequalszeroforallx2dx. Socanbeviewedasa\hyper-parameter",specifyingthefunctionP(dxm+1(m+1)jdm;) forallmandd. Letbeastochasticnon-potentiallyretractingalgorithm.Formally,thismeansthatis stillholds.sothatnflresultholdsevenforstochasticsearchalgorithms.therefore, bythesamereasoningusedtoestablishtheno-free-lunchresultforpotentiallyretracing fordeterministicalgorithms,justwithareplacedbythroughout.doingso,everything Giventhisdenitionof,wecanfollowalongwiththederivationoftheNFLresult deterministicalgorithms,theno-free-lunchresultholdsforpotentiallyretracingstochastic algorithms. speciedthroughp(f))butdon'tincorporatethatknowledgeintoa,thenwehavenoassurancesthatawillbeeective;wearesimplyrelyingonafortuitousmatchingbetween Intuitively,theNFLtheoremillustratesthatevenifweknowsomethingaboutf(perhaps 3 Ageometricinterpretation fanda.thispointisformallyestablishedbyviewingthenfltheoremfromageometric perspective. obtainingsomehistogram,~c,givenmdistinctcostevaluationsusingalgorithmais Considerthespaceofpossiblecostfunctions.Asmentionedbefore,theprobabilityof wherep(f)isthepriorprobabilitythattheoptimizationproblemathandhascostfunction P(~cjm;a)=XfP(~cjm;a;f)P(f): f.wecanviewtheright-handsideofthisequalityasaninnerproductinf: Theorem:DenetheF-spacevectors~vc;a;mand~pby~vc;a;m(f)P(~cjm;a;f)and~p(f) P(f).Then yourcostfunctiongoesintotheprior,~p,overcostfunctions.~ccanbeviewedasxedto Thisisanimportantequation.Anyglobalknowledgeyouhaveaboutthepropertiesof P(~cjm;a)=~vc;a;m~p (4) theconstraintsonthetimewehavetorunouroptimizationalgorithm.thustheoptimal thehistogramyouwanttoobtain(usuallyonewithalowcostvalue),andmisgivenby algorithmisthatwhichhasthelargestprojectiononto~p.alternatively,wecandispense 10

P(f)must\match"a. E(~cjm;a;f).(Similarlyforany\performancemeasure"(~c).Ineithercase,weseethat with~cbyaveragingoverit,toseethate(~cjm;a)isaninnerproductbetween~p(f)and P(f)canbedicult.Consider,forexample,doingTSPproblemswithNcities.Sowe're onlyconsideringcostfunctionsthatcorrespondtosuchaproblem.nowtothedegree thatanypractitionerwouldattackalln-citytspcostfunctionswiththesamealgorithm, Ofcourse,exploitingthisinpracticeisadicultexercise.Evenwritingdownareasonable thatpractitionerimplicitlyignoresdistinctionsbetweensuchcostfunctions.inthis,that practitionerhasimplicitlyagreedthattheproblemisoneofhowtheirxedalgorithmdoes acrossthesetofn-citytspcostfunctions,ratherthanofhowwelltheiralgorithmdoesfor thefactthatitisrestrictedton-citytspproblems,maybeverydiculttodisentangle. thoughthecostfunctionwerenotxed,butisinsteaddescribedbyap(f)thatequals0for allcostfunctionsotherthann-citytspcostfunctions.howeverthedetailsofp(f),beyond someparticularn-citytspproblemtheyhaveathand.inotherwords,theyareactingas ofahasthesimpleinterpretationthatforaparticular~candm,allalgorithmsahavethe sameprojectionontothediagonal,thatisvc;a;m~1=cst(~c;m).fordeterministicalgorithms thecomponentsofvc;a;m(i.e.,theprobabilitiesthatalgorithmagiveshistogram~concost Takingthegeometricview,thenofreelunchresultthatPfP(~cjf;m;a)isindependent alsoimpliespfp2(~cjm;a;f)=cst(~c;m).geometrically,thismeansthatthelengthof~vc;a;m isindependentofa. functionfaftermdistinctcostevaluations)arealleither0or1sothenofreelunchresult thesubsetofthebooleanhypercubehavingthesamehammingdistancefrom~0. onto~1.becausethecomponentsof~vc;a;marebinarywemightalsoview~vc;a;maslyingon Thusallvectors~vc;a;mhavethesamelengthandlieonaconewithconstantprojection ~c.thisisingeneralanjfj 2dimensionalmanifold(wherewerecallthatjFjjYjjXjis particular~c.thealgorithmsinthissetmustlieintheintersectionof2cones oneabout thediagonal,setbytheno-free-lunchtheorem,andonebyhavingthesameprobabilityfor Nowrestrictattentiontothesetofalgorithmsthathavethesameprobabilityofsome thenumberofpossiblecostfunctions).ifwerequireequalityofprobabilityonyetmore~c, wegetyetmoreconstraints. ofthishypercube. InSection5wecalculatetwoquantitiesconcerningthedistributionof~vc;a;macrossvertices TheNFLtheoremdoesnotaddressminimaxpropertiesofsearch.Forexample,saywe're consideringtwodeterministicalgorithms,a1anda2.itmayverywellbethatthereexist 4 Minimaxdistinctionsbetweenalgorithms costfunctionsfsuchthata1'shistogramismuchbetter(accordingtosomeappropriate qualitymeasure)thana2's,butnocostfunctionsforwhichthereverseistrue.forthe NFLtheoremtobeobeyedinsuchascenario,itwouldhavetobetruethattherearemany betterforallthosef.forsuchascenario,inacertainsensea1hasbetter\head-to-head" morefforwhicha2'shistogramisbetterthana1'sthanvice-versa,butitisonlyslightly 11

minimaxbehaviorthana2;therearefforwhicha1beatsa2badly,butnoneforwhicha1 doessubstantiallyworsethana2. denitioncanbeusedifoneisinsteadinterestedin(~c)ordymratherthan~c.) gorithmsa1anda2ithereexistsaksuchthatforatleastonefe(~cjf;m;a1) E(~cj f;m;a2)=k,butthereisnofsuchthate(~cjf;m;a2) E(~cjf;m;a1)=k.(Asimilar Formally,wesaythatthereexistshead-to-headminimaxdistinctionsbetweentwoal- moredicultthananalyzingaveragebehavior(likeinthenfltheorem).presently,very littleisknownaboutminimaxbehaviorinvolvingstochasticalgorithms.inparticular,itis notknownifinsomesenseastochasticversionofadeterministicalgorithmhasbetter/worse Itappearsthatanalyzinghead-to-headminimaxpropertiesofalgorithmsissubstantially todeterministicalgorithms,onlyanextremelypreliminaryunderstandingofminimaxissues hasbeenreached. minimaxbehaviorthanthatdeterministicalgorithm.infact,evenifwestickcompletely Whatwedoknowisthefollowing.Considerthequantity fordeterministicalgorithmsa1anda2(bypa(a)ismeantthedistributionofarandom XfPdym;1;dym;2(z;z0jf;m;a1;a2); andthata2producesapopulationwithycomponentsz0. numberoffsuchthatitisbothtruethata1producesapopulationwithycomponentsz variableaevaluatedata=a).fordeterministicalgorithms,thisquantityisjustthe Theorem:Ingeneral, interchangeofzandz0: InappendixB,itisprovenbyexamplethatthisquantityneednotbesymmetricunder Thismeansthatundercertaincircumstances,evenknowingonlytheYcomponentsofthe XfPdym;1;dym;2(z;z0jf;m;a1;a2)6=XfPdym;1;dym;2(z0;zjf;m;a1;a2): (5) thingconcerningwhatalgorithmproducedeachpopulation. populationsproducedbytwoalgorithmsrunonthesame(unknown)f,wecaninfersome- NowconsiderthequantityXfPC1;C2(z;z0jf;m;a1;a2); againfordeterministicalgorithmsa1anda2.thisquantityisjustthenumberoffsuchthat itisbothtruethata1producesahistogramzandthata2producesahistogramz0.ittoo statementthentheasymmetryofdy'sstatement,sinceanyparticularhistogramcorresponds tomultiplepopulations. neednotbesymmetricunderinterchangeofzandz0(seeappendixb).thisisastronger a1anda2suchthatforsomefa1'shistogramismuchbetterthana2's,butfornof'sisthe reverseistrue.toinvestigatethisprobleminvolveslookingoverallpairsofhistograms(one Itwouldseemthatneitherofthesetworesultsdirectlyimpliesthattherearealgorithms 12

foreachf)suchthatthereisthesamerelative\quality"betweenbothhistograms.simply havinganinequalitybetweenthesumspresentedabovedoesnotseemtodirectlyimplythat therelativequalitybetweentheassociatedpairofhistogramsisasymmetric.(toformally establishthiswouldinvolvecreatingscenariosinwhichthereisaninequalitybetweenthe sums,butnohead-to-headminimaxdistinctions.suchananalysisisbeyondthescopeof thispaper.) forallothers.insuchacase,pfpdym;1;dym;2(z1;z2jf;m;a1;a2)isjustthenumberoffthatresultinthepair(z1;z2).sopfpdym;1;dym;2(z;z0jf;m;a1;a2)=pfpdym;1;dym;2(z0;zjf;m;a1;a2tic,thenforanyparticularfpdym;1;dym;2(z1;z2jf;m;a1;a2)equals1forone(z1;z2)pair,and0 therearehead-to-headminimaxdistinctions.forexample,ifbothalgorithmsaredeterminis- Ontheotherhand,havingthesumsequaldoescarryobviousimplicationsforwhether impliesthattherearenohead-to-headminimaxdistinctionsbetweena1anda2.theconverse doesnotappeartoholdhowever.1 denethefollowingmeasureofthe\quality"overtwo-elementpopulations,q(dy2): canexploittheresultinappendixb,whichconcernsthecasewherejxj=jyj=3.first, Asapreliminaryanalysisofwhethertherecanbehead-to-headminimaxdistinctions,we ii)q(y1;y2)=q(y2;y1)=0. i)q(y2;y3)=q(y3;y2)=2. iii)qofanyotherargument=1. histogramfy2;y3ganda2generatesfy1;y2g). thatforonefa1generatesthehistogramfy1;y2ganda2generatesthehistogramfy2;y3g, butthereisnofforwhichthereverseoccurs(i.e.,thereisnofsuchthata1generatesthe InappendixBweshowthatforthisscenariothereexistpairsofalgorithmsa1anda2such otherfforwhichthedierenceis-2.forthisqthen,algorithma2isminimaxsuperiorto ThedierenceintheQvaluesforthetwoalgorithmsis2forthatf.Howeverthereareno betweena1anda2.foronefthequalityofalgorithmsa1anda2arerespectively0and2. Sointhisscenario,withourdenedmeasureof\quality",thereareminimaxdistinctions algorithma1. maxifdym(i)gtherearenominimaxdistinctionsbetweenalgorithms. distinctionsbetweenthealgorithms.asanexample,itmaywellbethatforq(dym)= Moregenerally,atpresentnothingisknownabout\howbigaproblem"thesekindsof ItisnotcurrentlyknownwhatrestrictionsonQ(dym)areneededfortheretobeminimax asymmetriesare.alloftheexamplesoftheasymmetriesarisewhenthesetofxvaluesa1 lunchtheorem,thesumofallnumbersinrowzequalsthesumofallnumbersincolumnz.thesetwo point's(z;z0)pair.thenourconstraintsarei)bythehypothesisthattherearenohead-to-headminimax distinctions,ifgridpoint(z1;z2)isassignedanon-zeronumber,thensois(z2;z1);andii)bytheno-free- 1Considerthegridofall(z;z0)pairs.Assigntoeachgridpointthenumberoffthatresultinthatgrid andcolumns.althoughagain,likebefore,toformallyestablishthispointwouldinvolveexplicitlycreating constraintsdonotappeartoimplythatthedistributionofnumbersissymmetricunderinterchangeofrows searchscenariosinwhichitholds. 13

those\certainproperties"isnotyetinhand.norisitknownhowgenerictheyare,i.e.,for ofhowthealgorithmsgeneratedtheoverlap,asymmetryarises.aprecisespecicationof whatpercentageofpairsofalgorithmstheyarise.althoughsuchissuesareeasytostate hasvisitedoverlapswiththosethata2hasvisited.givensuchoverlap,andcertainproperties (seeappendixb),itisnotatallclearhowbesttoanswerthem. donotoverlap.suchassuranceshold,forexample,ifwearecomparingtwohill-climbing assurances,therearenoasymmetriesbetweenthetwoalgorithmsform-elementpopulations. algorithmsthatstartfarapart(onthescaleofm)inx.itturnsoutthatgivensuch Howeverconsiderthecasewhereweareassuredthatinmstepstwoparticularalgorithms Doingthisestablishesthefollowing: thoseargumentstothequantitypfpdym;1;dym;2(z;z0jf;m;a1;a2)ratherthanp(~cjf;m;a). Toseethisformally,gothroughtheargumentusedtoprovetheNFLtheorem,butapply Theorem:Ifthereisnooverlapbetweendxm;1anddxm;2,then Animmediateconsequenceofthistheoremisthatundertheno-overlapconditions,PfPC1;C2(z;z0j XfPdym;1;dym;2(z;z0jf;m;a1;a2)=XfPdym;1;dym;2(z0;zjf;m;a1;a2): (6) f;m;a1;a2)issymmetricunderinterchangeofzandz0,asarealldistributionsdetermined isalwaysoverlaptoconsider.sothereisalwaysthepossibilityofasymmetrybetween extrema). fromthisoneoverc1andc2(e.g.,thedistributionoverthedierencebetweenthosec's algorithmsifoneofthemisstochastic. Notethatwithstochasticalgorithms,iftheygivenon-zeroprobabilitytoalldxm,there Werstcalculatethefractionofcostfunctionswhichgiverisetoaspecichistogram~cusing 5algorithmawithmdistinctcostpoints.Thiscalculationallowsus,forexample,toanswer Informationtheoreticaspectsofsearch thefollowingquestion: distinctcostevaluationschosenbyusingageneticalgorithm?" \Whatfractionofcostfunctionswillgiveaparticulardistributionofcostvaluesafterm thisbecauseitmeansthatthefractionisindependentofthealgorithm!sowecananswer thequestionbyusinganalgorithmforwhichthecalculationisparticularlyeasy. Thismayseemanintractablequestion,buttheNFLresultallowsustoanswerit.Itdoes x1;x2;:::;xm.recallthatthehistogram~cisspeciedbygivingthefrequenciesofoccurrence, acrossthex1;x2;:::;xm,foreachofthejyjpossiblecostvalues. ThealgorithmwewilluseisonewhichvisitspointsinXinsomecanonicalorder,say justthemultinomialgivingthenumberofwaysofdistributingthecostvaluesin~c.atthe remainingjxj mpointsinxthecostcanassumeanyofthejyjfvalues. Nowthenumberoff'sgivingthedesiredhistogramunderourspeciedalgorithmis 14

binsin~carescaledbythesameamount.bytheargumentoftheprecedingparagraph,the fractionweareinterestedin,f(~),isgivenbythefollowing: Itwillbeconvenienttodene~1m~c.Notethatthisisinvariantifthecontentsofall ~c=m~isgivenby Theorem:Foranyalgorithm,thefractionofcostfunctionsthatresultinthehistogram f(~)=c1c2cjyjjyjjxj m mjyjjxj =c1c2cjyj jyjm m Stirling'sapproximationtoorderO(1=m),whichisvalidwhenalloftheciarelarge: Accordingly,f(~)canberelatedtotheentropyof~cinthestandardwaybyusing : (7) ln c1c2cjyj!=mlnm jyj m =ms(~)+12h(1 jyj)lnm jyj Xi=1cilnci+12hlnm jyj Xi=1lnii Xi=1lncii wheres(~)= PjYj thefractionofcostfunctionsisgivenbythefollowingformula: Corollary: i=1ilniistheentropyofthehistogram~c.thusforlargeenoughm, wherec(m;jyj)isaconstantdependingonlyonmandjyj. f(~)=c(m;jyj)ems(~) QjYj i=11=2 i: (8) Eq.(8)canbefoundbysummingoverall~lyingontheunitsimplex.Thedetailsofsuch correspondingtothezero-valued~i.howeveryisdened,thenormalizationconstantof acalculationcanbefoundin[15]. Ifsomeofthe~iare0,Eq.(8)stillholds,onlywithYredenedtoexcludethey's algorithmsthatgiverisetoaparticular~c?" \Onagivenvertexoff-space(i.e.,foragivencostfunction),whatisthefractionofall Wenextturntoarelatedquestion: allx)ofcostvalues.specifythishistogramby~;thereareni=ijxjpointsinxfor whichf(x)hasthei'thyvalue. Forthisquestion,theonlysalientfeatureoffisitshistogram(formedbylookingacross withthefollowingintuitivelyreasonableresult,formallyproveninappendixa: leadingorderonthekullback-liebler\distance"[3]between~and~.toseethis,westart Callthefractionweareinterestedinalg(~;~).Itturnsoutthatalg(~;~)dependsto 15

toahistogram~c=m~isgivenby Theorem:Foragivenfwithhistogram~N=jXj~,thefractionofalgorithmsthatgiverise alg(~;~)=qjyj i=1ni jxj m: ci costvaluesfromx.2 Thenormalizationfactorinthedenominatorissimplythenumberofwaysofselectingm (9) ciarelarge: TheproductofbinomialscanbeapproximatedviaStirling'sequationwhenbothNiand lnjyj Yi=1 ci!=jyj Xi=1 12ln2+NilnNi cilnci (Ni ci)ln(ni ci)+ z z2=2 :::,tosecondorderinci=niwehave Weassumeci=Ni1,whichisreasonablewhenmjXj.Sousingtheexpansionln(1 z)= 12(lnNi ln(ni ci) lnci): lnjyj Yi=1 ci!=jyj Xi=1ciln(Ni ci 2Nici 1+ ci) 12lnci+ci 12ln2 Intermsof~and~wenallyobtain(usingm=jXj1) lnjyj Yi=1 Ni ci!= mdkl(~;~)+m mln(m jyj Xi=112ln(im)+m 2jXj(i jxj) jyj i)(1 im+); 2ln2 wheredkl(~;~)piiln(i=i)isthekullback-lieblerdistancebetweenthedistributions Corollary: ~and~. Thusthefractionofalgorithmsisgivenbythefollowing: wheretheconstantcdependsonlyonm,jxj,andjyj. alg(~;~)=c(m;jxj;jyj)e mdkl(~;~) Asbefore,Ccanbecalculatedbysumming~overtheunitsimplex. QjYj i=11=2 i : (10) 2ItcanalsobedeterminedfromtheidentityP~c(Pici;m)Qi Ni 16 ci= PiNi m.

Inthissectionwecalculatecertain\benchmark"performancemeasuresthatallowusto assesstheecacyofanysearchalgorithm. 6 Measuresofalgorithmperformance interestedinp(min(~c)>jf;m;a),whichistheprobabilitythattheminimumcostan f.weconsiderthreequantitiesthatarerelatedtothisconditionalprobabilitythatcanbe algorithmandsinmdistinctevaluationsislargerthan,giventhatthecostfunctionis Considerthecasewherelowcostispreferabletohighcost.Theningeneralweare usedtogaugeanalgorithm'sperformance: ii)thesecondistheformthisconditionalprobabilitytakesfortherandomalgorithm, i)therstquantityistheaverageofthisprobabilityoverallcostfunctions. iii)thethirdisthefractionofalgorithmswhich,foraparticularfandm,resultina~c whoseminimumexceeds. whosebehaviorisuncorrelatedwiththecostfunction. jobṙecallthattherearejyjdistinctcostvalues.withnolossofgeneralityassumethei'th whenusedintherealworld;anyalgorithmthatdoesn'tsurpassthemisdoingaverypoor Thesemeasuresgiveusbenchmarkswhichalltruly\intelligent"algorithmsshouldsurpass increments. costvaluesequalsi.socostvaluesrunfromaminimumof1toamaximumofjyjininteger Therstofourbenchmarkmeasuresis PfP(min(~c)>jf;m;a) Pf1 =Pdym;fP(min(dym)>jdym)P(dymjf;m;a) thatmin(c)=min(dym). whereinthelastlinewehavemarginalizedoveryvaluesofpopulationsofsizemandnoted jyjjxj (11) a.inparticular,itequals1ifthefollowingconditionsaremet i)f(dxm(1))=dym(1) NowconsiderPfP(dymjf;m;a).Thesummandequals0or1forallfanddeterministic iii)f(a[dm(1);dm(2)])=dym(3) ii)f(a[dm(1)])=dym(2) atallotherpoints.thereforexfp(dymjf;m;a)=jyjjxj m: Theserestrictionswillalwaysxthevalueoff(x)atexactlympoints.fiscompletelyfree ::: 17

UsingthisresultinEq.(11)wend XfP(min(~c)>jf;m)= jyjmxdymp((min(dym)>jdym) = jyjm(jyj )m: 1dym3min(dym)>1 X Theorem: Thisestablishesthefollowing: where!()1 =jyjisthefractionofcostlyingabove. XfP(min(~c)>jf;m)=!m(): (12) Corollary:InthelimitofjYj!1, Animmediatecorrolaryisthefollowing: PfE(min(~c)jf;m) Proofsketch:WritePfE(min(~c)jf;m)=PjYj jyj = m+1: 1 =1[!m( 1)!m()]andsubstituteinfor (13)!().Thenreplacethroughoutwith+1.ThisturnsoursumintoPjYj 1 usethefactthatisgoingto0tocanceltermsinthesummand.carryingthroughthe by.totakethelimitof!0,applyl'hopital'sruletotheratiointhesummand.next!yj)m (1 +1!Yj)m].Next,writejYj=b=forsomeb.Multiplyanddivideoursummand =0[+1][(1 algebra,anddividingbyb=,wegetariemannsumoftheformmb2rb0dxx(1 x=b)m 1. Evaluatingtheintegralgivestheresultclaimed.QED. randomlychosencostfunction.(benchmarksthattakeaccountoftheactualcostfunction thedropassociatedwiththeseresults,onemightarguethatthatalgorithmisnotsearching verywell.afterall,thealgorithmisdoingnobetterthanonewouldexpectittofora Inarealworldscenario,unlessone'salgorithmhasitsbest-cost-so-fardropfasterthan athandarepresentedbelow.) informationfromthecurrentpopulation.marginalizingoverhistograms~c,theperformance ofmfortherandomalgorithm,~a,whichpickspointsinxcompletelyrandomly,usingno of~ais Nextwecalculatetheexpectedminimumofthecostvaluesinthepopulationasafunction P(min(~c)jf;m;~a)=X~cP(min(~c)j~c)P(~cjf;m;~a) 18

hasbeencalculatedpreviouslyasqjyj histogram~nofthefunctionf.(thiscanbeviewedasthedenitionof~a.)thisprobability NowP(~cjf;m;~a)istheprobabilityofobtaininghistogram~cinmrandomdrawsfromthe (jxj i=1(ni m)).so ci) P(min(~c)jf;m;~a)= jxj mmx 1c1=0mX cjyj=0(jyj jyj Xi=1ci;m)P(min(~c)j~c) Yi=1 Ni ci! = =PjYj jxj mmxc=0mx 1 cjyj=0(jyj Xi=ci;m)jYj Yi= Ni ci! jxj i=ni ()jxj jxj m m (seefootnoteone) Theorem:Fortherandomalgorithm~a, Thisestablishesthefollowing: (14) where()pjyj i=ni=jxjisthefractionofpointsinxforwhichf(x). P(min(~c)jf;m;~a)=m 1 Yi=0() i=jxj 1 i=jxj: (15) Corollary: Torstorderin1=jXjthistheoremgivesthefollowingresult: Notethattheseresultsallowustocalculateotherquantitiesofinterest,like P(min(c)>jf;m;~a)=m()1 m(m 1)(1 ()) 2() jxj+:::: 1 (16) E(min(~c)jf;m;~a)= Theseresultsalsoprovideausefulbenchmarkagainstwhichanyalgorithmmaybecompared. X=1[P(min(~c)jf;m;~a) P(min(~c)+1jf;m;~a)]: jyj NoteinparticularthatformanycostfunctionscostvaluesaredistributedGaussianly.For 19

suchacase,ifthemeanandvarianceofthegaussianareandrespectively,then()= whichresultina~cwhoseminimumexceedsisgivenby ministic)algorithma,p(~c>jf;m;a)iseither1or0.thereforethefractionofalgorithms erfc(( )=p2)=2,whereerfcisthecomplimentaryerrorfunction. Tocalculatethethirdperformancemeasure,notethatforxedfandm,forany(deter- PaP(min(~c)>jf;m;a) j~c)pap(~cjf;m;a).howevertheratioofthisquantitytopa1isexactlywhatwe Expandingintermsof~c,wecanrewritethenumeratorofthisratioasP~cP(min(~c)> Pa1 : (15)).Thisestablishesthefollowing: Theorem:Forxedfandm,thefractionofalgorithmswhichresultina~cwhoseminimum calculatedwhenweevaluatedmeasureii)(seethebeginningoftheargumentderivingeq. exceedsisgivenbythequantityontheright-handsidesofeqs.(15)and(16). than1/2.forsuchascenario,youralgorithmhasdoneworsethanoverhalfofallsearch ofthe~cproducedinaparticularrunofyouralgorithm,thequantitygivenineq.(16)isless algorithms,forthefandmathand. Soinparticular,considerthescenariowhere,whenevaluatedforequaltotheminimum wellthealgorithm'sperformancecomparestothatoftherandomalgorithm. asmincreases.hereweareinterestedinwhether,asmgrows,thereisanychangeinhow Saythepopulationgeneratedbythealgorithmaaftermstepsisd,anddeney0 Finally,wepresentameasureexplicitlydesignedto\track"analgorithm'sperformance valueofthisnumberofstepsis1 searchalgorithmtosearchx dxandndapointwhoseywaslessthany0.theexpected thatf(x)<y0.nowwecanestimatethenumberofstepsitwouldhavetakentherandom min(~c(d)).letkbethenumberofadditionalstepsittakesthealgorithmtondanxsuch algorithm,onaverage. f(x)<y0.thereforek+1 1=z(d)ishowmuchworseadidthanwouldhavetherandom Sonowimaginelettingarunformanystepsoversometnessfunctionf.Wewishto z(d) 1,wherez(d)isthefractionofX dxforwhich increased.considerthestepwhereandsitsn'thnewvalueofmin(~c).forthatstep, indicatethatsteponourplotasthepoint(n;k+1 1=z(d)).Putdownasmanypointson thereisanassociatedk(thenumberofstepsuntilthenextmin(~c))andz(d).accordingly, makeaplotofhowwelladidincomparisontotherandomalgorithmonthatrun,asm algorithm,thenallthepointsintheplotwillhavetheirordinatevaluesliebelow0.ifthe randomalgorithmwonforanyofthecomparisonsthough,thatwouldmeanapointlying ourplotastherearesuccessivevaluesofmin(~c(d))intherunofaoverf. above0.ingeneral,evenifthepointsalllietoonesideof0,onewouldexpectthatas Ifthroughouttherunaisalwaysabetter\match"tofthanistherandomsearch thesearchprogressesthereiscorresponding(perhapssystematic)variationinhowfaraway 20

from0thepointslie.thatvariationtellsonewhenthealgorithmisenteringharderoreasier partsofthesearch. generatemanyoftheseplotsandthensuperimposethem.thiswouldallowyoutoplotthe onecouldreplacethesinglenumberz(d)characterizingtherandomalgorithmwithafull meanvalueofk+1 1=z(d)asafunctionofnalongwithanassociatederrorbar.(Similarly, Notethatevenforaxedf,byusingdierentstartingpointsforthealgorithmonecould 7distributionoverthenumberofrequiredstepstondanewminimum.) functions.thetime-dependentfunctionsweareconcernedwithstartwithaninitialcost Hereweestablishasetofnofreelunchresultsforacertainclassoftime-dependentcost Time-dependentcostfunctions functionthatispresentwhenwesampletherstxvalue.thenjustbeforethebeginning abijectionbetweenfandf.(notethemappinginducedbytfromftofcanvarywith ofeachsubsequentiterationofthesearchalgorithm,thecostfunctionisdeformedtoanew duringthesamplingoftheithpointasfi+1=ti(fi).weassumethatateachstepi,tiis function,asspeciedbythemappingt:fn!f.3wewritethefunctionpresent theiterationnumber.)ifthisweren'tthecase,theevolutionofcostfunctionscouldnarrow inonaregionoff'sforwhichsomealgorithm,\byluck"asitwere,happenstosamplex twodierentpopulationsofyvalues.asbefore,thepopulationdymisanorderedsetofy ityofthesearchalgorithm.ingeneraltherearetwohistogram-basedschemes,involving valuesthatlieneartheextremizingx. valuescorrespondingtothexvaluesindxm.theparticularyvalueindymmatchingaparticularxvalueindxmisgivenbythecostfunctionthatwaspresentwhenxwassampled. Onedicultywithanalyzingtime-dependentcostfunctionsishowtoassessthequal- ff1(dxm(1));;tm 1(fm 1)(dxm(m))g.Similarly,wehaveDym=fTm 1(fm 1)(dxm(1));;Tm 1(fm 1)(dxm(m foreachofthexvaluesindxm.formallyifdxm=fdxm(1);;dxm(m)gthenwehavedym= Incontrast,thepopulationDymisdenedtobetheyvaluesfromthepresentcostfunction thetimescaleoftheevolutionofthecostfunction.insuchsituationsitmaybeappropriate previouselementsofthepopulationarestillalive,andthereforetheir(current)tnessisof tojudgethequalityofthesearchalgorithmwiththehistograminducedbydym;allthose Insomesituationsitmaybethatthemembersofthepopulation\live"foralongtime,on timescaleofevolutionofthecostfunction,onemayinsteadbeconcernedwiththingslike kindofsituation,itmaymakemoresensetojudgethequalityofthesearchalgorithmwith howwellthelivingmember(s)ofthepopulationtrackthechangingcostfunction.inthat interest.ontheotherhand,ifmembersofthepopulationliveforonlyashorttimeonthe thehistograminducedbydym. toaverageoverallpossiblewaysacostfunctionmaybetime-dependent,i.e.,wewishto avengeoverallt(ratherthanoverallf,asinthenfltheorem).soconsiderthesum 3AnobviousrestrictionwouldbetorequirethatTdoesn'tvarywithtime,sothatitisamappingsimply HerewederiveNFLresultsforbothcriteria.InanalogywiththeNFLtheorem,wewish fromftof.ananalysisfort'slimitedthiswayisbeyondthescopeofthispaperhowever. 21

astherstmemberofthepopulationisconcerned.soconsideronlyhistogramsconstructed inform>1,andsincef1isxed,thereareaprioridistinctionsbetweenalgorithmsasfar PTP(~cj;f1;T;m;a)wheref1istheinitialcostfunction.NoterstthatsinceTonlykicks fromthoseelementsofthepopulationbeyondtherst.wewillprovethefollowing: Theorem:Forall~c,m>1,algorithmsa1anda2,andinitialcostfunctionsf1, XTP(~cjf1;T;m;a1)=XTP(~cjf1;T;m;a2): Wewillshowthatthisresultsholdswhether~cisconstructedfromdymorfromDym.InanalogywiththeproofoftheNFLtheorem,wewilldothisbyestablishingthea-independence WewillbeginbyreplacingeachTinthesumwithasetofcostfunctions,fi,onefor XTP(~cjf;T;m;a)=XTXdxmX (17) ofptp(~cjf;t;m;a). eachiterationofthealgorithm.todothis,westartwiththefollowing: =XdxmX f2fmp(~cj~f;dxm)p(dxmj~f;m;a) P(f2fm;dxmjf1;T;m;a) f2fmp(~cj~f;dxm;t;m;a) wherewehaveindicatedthesequenceofcostfunctions,fi,bythevector~f=(f1;;fm). XTP(f2fmjf1;T;m;a); formally,usingfi+1=ti(fi),wewrite theseriesisoverthevaluestcantakeforoneparticulariterationofthealgorithm.more NextwedecomposethesumoverallpossibleTintoaseriesofsums.Eachsumin XTP(~cjf;T;m;a)=XdxmX XT1(f2;T1(f1))X f2fmp(~cj~f;dxm)p(dxmj~f;m;a) (NotethatPTP(~cjf;T;m;a)isindependentofthevaluesofTi>m 1,sowecanabsorbthose Tm 1(fm;Tm 1(Tm 2(T1(f1)))): valuesintoanoveralla-independentproportionalityconstant.) numberofbijectionsoffthatmapthatxedcostfunctiontofm.thisisjustaconstant, indicest1:::tm 2.NowforxedvaluesoftheoutersumindicesTm 1(Tm 2(T1(f1))) isjustsomexedcostfunction.accordinglytheinnermostsumovertm 1issimplythe Nowlookattheinnermostsum,overTm 1,forsomexedvaluesoftheoutersum (jfj 1)!. 22

SowecandotheTm 1sum,andarriveat XTP(~cjf;T;m;a1)/XdxmX XT1(f2;T1(f1))X f2fmp(~cj~f;dxm)p(dxmj~f;m;a) Tm 1.Infact,allthesumsoverallTicanbedone,leavinguswith NowwecandothesumoverTm 2,intheexactsamemannerwejustdidthesumover Tm 2(fm 1;Tm 2(Tm 3(T1(f1)))): XTP(~cjf;T;m;a1)/XdxmX =XdxmXf2fmP(~cj~f;dxm)P(dxmj~f;m;a) (Inthelaststepwehaveexploitedthestatisticalindependenceofdxmandfm.) ToproceedfurtherwemustdecideifweareinterestedinhistogramsformedfromDymor f2fmp(~cj~f;dxm)p(dxmjf1fm 1;m;a): (18) dym.webeginwithanalysisofthedymcase.forthiscasep(~cj~f;dxm)=p(~cjfm;dxm),since Dymonlyreectscostvaluesfromthelastcostfunction,fm.Pluggingthisinweget XTP(~cjf;T;m;a1)/XdxmX histogramcfromcostvaluesdrawnfromfm.thisconstantwillinvolvethemultinomial Thenalsumoverfmisaconstantequaltothenumberofwaysofgeneratingthe f2fm 1P(dxmjf1fm 1;m;a)XfmP(~cjfm;dxm) theadependence. coecientm theparticulardxm.becauseofthiswecanevaluatethesumoverdxmandtherebyeliminate c1cmandsomeotherfactors.theimportantpointisthatitisindependentof ThiscompletestheproofofEq.(17)forthecasewhere~cisconstructedfromDym. XTP(~cjf;T;m;a)/ f2fm 1XdxmP(dxmjf1fm 1;m;a)/1 X considerablymoredicultsincewecannotsimplifyp(~cj~f;dxm)andthuscannotdecouple thesumsoverfi.nevertheless,thenflresultstillholds.toseethiswebeginbyexpanding Eq.(18)overpossibledymvalues. NextweturnthecasewhereweareinterestednotinDymbutindym.Thiscaseis XTP(~cjf;T;m;a)/XdxmX =XdymP(~cjdym)XdxmX f2fmxdymp(~cjdym)p(dymj~f;dxm) P(dxmjf1fm 1;m;a) myi=1(dym(i);fi(dxm(i))) f2fmp(dxmjf1fm 1;m;a) 23 (19)

areleftwithxtp(~cjf;t;m;a)/xdymp(~cjdym)xdxmx term.soitcontributespfm(dym(m);fm(dxm(m))).thisisaconstant,equaltojyjjxj 1.We Thesumovertheinner-mostcostfunction,fm,onlyhasaneectonthe(dym(i);fi(dxm(i))) m 1 Yi=1(dym(i);fi(dxm(i))): f2fm 1P(dxmjf1fm 1;m;a) Thesumoverdxm(m)isnowtrivial,sowehave XTP(~cjf;T;m;a) /XdymP(~cjdym)X m 1 Yi=1(dym(i);fi(dxm(i))): dxm(1)x dxm(m 1)X f2fm 1P(dxm 1jf1fm 2;m;a) mannertotheschemeweusedtoevaluatethesumsoverfmanddxm(m)thatexistedin remainingpopulationofsizem 1ratherthanm.Consequently,inanexactlyanalogous Eq.(19),wecanevaluateoursumsoverfm 1anddxm(m 1).Doingsosimplygenerates NownotethattheaboveequationisoftheexactsameformasEq.(19),onlywitha morea-independentproportionalityconstants.continuinginthismanner,weevaluateall thesumsoverthefiandarriveat Nowthereisstillalgorithm-dependenceinthisresult.Howeveritisatrivialdependence; XTP(~cjf;T;m;a1)/XdymP(~cjdym)X dxm(1)p(dxm(1)jm;a)(dym(1);f1(dxm(1))): rithms.(alternatively,wecouldconsiderallpointsinthepopulation,eventherst,and aspreviouslydiscussed,itarisescompletelyfromhowthealgorithmselectstherstxpoint stillgetannflresult,ifinadditiontosummingoveralltwesumoverallf1.)soeven initspopulation,dxm(1).sinceweconsideronlythosepointsinthepopulationthatare inthecasewhereweareinterestedindymthenflresultstillshold,subjecttotheminor generatedsubsequenttotherst,ourresultsaysthatthereisnodistinctionsbetweenalgo- basedondymordym.forexample,onemaywishtonotconsiderhistogramsatall;onemay caveatsdelineatedabove. judgethequalityofthesearchbythetnessofthemostrecentmemberofthepopulation. Thereareotherswayofassessingthequalityofthesearchalgorithmbesideshistograms algorithmsasfarasthisquantityisconcerned. determinepfp(~cjf;t;m;a).infact,ingeneraltherecanbeaprioridistinctionsbetween onemaywishtocharacterizewhattheaspectsareoftherelationshipbetweenaandtthat Similarly,thereareothersumsonecouldlookatbesidesthoseoverT.Forexample, 24

Ximplicitlytakentobeacontiguoussetofintegers).ForthisT,ifaisthealgorithmthat rstsamplesfatx1,nextatx1+1,etc.,regardlessofthevaluesinthepopulation,thenfor theshiftoperator,replacingf(x)byf(x 1)forallx(withmin(x) 1max(x),andwith Asanexampleofsuchdistinctions,saythatforalliterationsofthesearchalgorithm,Tis ~c.sopfp(~cjf;t;m;a)isnotindependentofaingeneral. searchalgorithms,evenforthesameshiftt,thereisnotthisrestrictiononthesetofallowed PfP(~cjf;T;m;a)=0forany~ccontainingcountsinmorethanoneYvaluebin.Forother anyf,thehistograminducedbydymisalwaysmadeupofidenticalyvalues.accordingly, samplesatx1+1,exactlylikealgorithma.ontheotherhand,ifthatvalueishigh,it algorithmlooksattheyvalueoftheitsrstsamplepointx1,andifthatvalueislow,it samplessomepointotherthanx1+1.ingeneral,ifone'sgoalistondminimalyvalues, Indeed,considerthesameshiftT,butusedwithadierentalgorithm,^a.Thisnew 8^acanbeexpectedtooutperforma,evenwhenoneaveragesoverallf. OneobviousdicultywiththeNFLresultsdiscussedaboveisthatonecanalwaysargue\oh, wellintherealworldp(f)isnotuniform,sothenflresultsdonotapply,andtherefore Fixedcostfunctionresults I'mokayinusingmyfavoritesearchalgorithm".Ofcourse,thepremisedoesnotfollowfrom notjustifyanalgorithm.inessence,youmustinsteadmakethemuchbiggerassumption poorlysuitedasoneforwhichitiswellsuited.simplyassumingp(f)isnotuniformcan theproposition.uniformp(f)isatypicalp(f).(theuniformaverageofallp(f)isthe thatp(f)doesn'tfallintothehalfofthespaceofallp(f)inwhichyouralgorithmperforms uniformp(f).)sotheactualp(f)mightjustaseasilybeoneforwhichyouralgorithmis (!)legitimatewayofdefendingaparticularsearchalgorithmagainsttheimplicationsofthe worsethantheuniformp(f). NFLtheorems. ularp(f),andthenarguethatyouralgorithmiswellsuitedtothatp(f).thisistheonly Ultimately,theonlywaytojustifyone'ssearchalgorithmistoargueinfavorofapartic- sweepingthanthenflresults,theseresultsholdnomatterwhattherealworld'sdistribution averagingoverthosesearchalgorithmswhilekeepingthecostfunctionxed.althoughless P(f).Certainsuchresultsapplytowaysofchoosingbetweensearchalgorithms,andinvolve Nonetheless,itisclearlyofinteresttoderiveNFL-typeresultsthatareindependentof examinestwopopulationsdandd0,producedbyaanda0respectively,andbasedonthose overcostfunctionsis. populations,decidestouseeitheraora0forthesubsequentpartofthesearch.asanexample, onechoosingprocedureistochooseaifandonlytheleastcostelementindhaslowercost Letaanda0betwosearchalgorithms.Denea\choosingprocedure"asonethat thantheleastcostelementind0.asanotherexample,a\stupid"choosingprocedurewould chooseaifandonlytheleastcostelementindhashighercostthantheleastcostelement Atthepointthatyouuseachoosingprocedure,youwillhavesampledthecostfunction 25

reasons,wecanassumethatthesearchalgorithmchosenbythechoosingproceduredoes thehistogramc>mwhichisthehistogramformedfromd>m.inaddition,foralltheusual thatcomeafterusingthechoosingalgorithm,thenthehistogramtheuserisinterestedinis atallthepointsind[d[d0.accordingly,ifd>mreferstothesamplesofthecostfunction function,observinghowwellanalgorithmhasdonesofartellsusnothingabouthowwellit notreturntoanypointsind[,withoutlossofgenerality4. woulddoifwecontinuetouseitonthesamecostfunction.(forsimplicity,weonlyconsider forusinganyparticularchoosingalgorithm.looselyspeaking,nomatterwhatthecost Thefollowingtheorem,proveninappendixC,tellsuswehavenoapriorijustication deterministicalgorithms.) Theorem:Letdandd0betwoxedpopulationsbothofsizem,thataregeneratedwhen dierentchoosingprocedures.letkbethenumberofelementsinc>m.then thealgorithmsaanda0respectivelyarerunonthecostfunction.letaandbbetwo (Itisimplicitinthistheoremthatthesumexcludesthosealgorithmsaanda0thatdonot Xa;a0P(c>mjf;d;d0;k;a;a0;A)=Xa;a0P(c>mjf;d;d0;k;a;a0;B): (20) resultindandd0respectivelywhenrunonf.) equally,whenforanygivenfsomepopulationswillbemorelikelythanothers.howevereven ifoneweightspopulationsaccordingtotheirprobabilityofoccurrence,itisstilltruethat, onaverage,thechoosingprocedureoneuseshasnoeectonlikelyc>m.thisisestablished Onemightthinkthattheprecedingtheoremismisleading,sinceittreatsallpopulations bythefollowingcorollary. Corrolary:Undertheconditionsgivenintheprecedingtheorem, Proof:Let\proc"refertoourchoosingprocedure.Weareinterestedin Xa;a0P(c>mjf;m;k;a;a0;A)=Xa;a0P(c>mj;f;m;k;a;a0;B): (21) Xa;a0P(c>mjf;m;k;a;a0;proc)= a;a0;d;d0p(c>mjf;d;d0;k;a;a0;proc) X ithasn'tseenyetbutthata0has(andvice-versa).ratherthanhavethedenitionofasomehowdepend ontheelementsind0 d(andsimilarlyfora0),wedealwiththisproblembydeningc>mtobesetonlyby 4acanknowtoavoidtheelementsithasseenbefore.Howeverapriori,ahasnowaytoavoidtheelements P(d;d0jf;k;m;a;a0;proc): populationd>m. thoseelementsind>mthatlieoutsideofd[.(thisissimilartotheprocedurewedevelopedabovetodeal d[aswellasofd>m.italsomeanstheremaybefewerelementsinthehistogramc>mthanthereareinthe withpotentiallyretracingalgorithms.)formally,thismeansthattherandomvariablec>misafunctionof 26

(i.e.,anyparticularpairofvaluesofdandd0).forthatterm,p(d;d0jf;k;m;a;a0;proc) Pullthesumoverdandd0outsidethesumoveraanda0.Consideranyterminthatsum otherwise.(recallthatweareassumingthataanda0aredeterministic.)thismeansthat isjust1forthoseaanda0thatresultindandd0respectivelywhenrunonf,and0 overdandd0isthesameforchoosingproceduresaandb.thereforethefullsumisthe sameforbothprocedures.qed. consideredinourtheorem.accordingly,ourtheoremtellusthatthesummandofthesum thep(d;d0jf;k;m;a;a0;proc)factorsimplyrestrictsoursumoveraanda0totheaanda0 onewillbechoosingamong. choosingprocedure,onemusttakeintoaccountnotonlyp(f)butalsothesearchalgorithms somechoosingprocedureasfarassubsequentsearchisconcerned.tohaveanintelligent TheseresultstellusthatthereisnoassumptionforP(f)that,byitself,justiesusing thatforxedf1andf2,iff1doesbetter(onaverage)withthealgorithmsinsomeset A,thenf2doesbetter(onaverage)withthealgorithmsinthesetofallotheralgorithms. proceduresafalwaysusealgorithmag,andbfalwaysusealgorithma0g.thiscasemeans Theseresultsalsohaveinterestingimplicationsifoneconsidersthe\degenerate"choosing performancethandoestherandomf,thenthatwell-behavedfgivesworsethanrandom Inparticular,ifforsomefavoritealgorithmsacertain\well-behaved"fresultsinbetter behavioronthesetallremainingalgorithms. P(f),thenstupidchoosingprocedures{likechoosingthealgorithmwiththelessdesirable~c relatedtothetheoremabove[16].translatedintothecurrentcontextthatresultsuggests thatifonerestrictsthesumstoonlybeoverthosealgorithmsthatareagoodmatchto Infact,thingsmayverywellbeworsethanthis.Insupervisedlearning,thereisaresult tobesuperiortoadumboneisbeyondthescopeofthispaper.butclearlytherearemany ofwhatexactlythesetofalgorithmssummedovermustbeforasmartchoosingprocedure subtleissuestodisentangle. {outperform\smart"ones(whicharetheoneseveryoneusesinpractice).aninvestigation 9.1 9 DiscussionandFutureWork Inthispaperwepresentaframeworkforinvestigatingsearch.Thisframeworkservesasa \skeleton"forthesearchproblem;ittellsuswhatwecanknowaboutsearchbefore\eshing in"thedetailsofaparticularrealworldsearchproblem.phraseddierently,itprovidesa aboutthem. languageinwhichtodescribesearchalgorithms,andinwhichtoask(andanswer)questions specicallytailoredtomatchthosefeatures.theinverseprocedure farmorepopular givenf,determinecertainsalientfeaturesofit,andthenconstructasearchalgorithm,a, formygivencostfunctionf?"theproperanswertothisquestionistostartwiththe Ultimately,ofcourse,theonlyimportantquestionis,\HowdoIndgoodsolutions insomecommunities istoinvestigatehowspecicalgorithmsperformondierentf's. 27

P(f).Tounderstandthis,rstnotethatwedoinfactknowfexactly.Butatthesame procedure,ofgoingfrom(featuresconcerning)ftoanappropriatea. Thisinverseprocedureisonlyofinteresttothedegreethatithelpsuswithourprimary time,thereismuchaboutfthatweneedtoknowthatiseectivelyunknowntous(e.g., f'sextrema).inthis,itisasthoughfispartiallyunknown.theverynatureofthesearch Notethatoftenthe\salientfeatures"concerningfcanbestatedintermsofadistribution paper. ndingagoodaforaparticularp(f)-exactlytheissueaddressedinsection3ofthis processistoadmitthatyoudon't\know"finfull.asaresult,itmakessenseto(implicitly orotherwise)replacefwithadistributionp(f).inthis,thesearchproblemreducesto andgeneticalgorithms)areunabletocompetewithcarefullyhand-craftedsolutionsfor specicsearchproblems.thetravelingsalesmanproblem(tsp)isanexcellentexample ofsuchasituation;thebestsearchalgorithmsforthetspproblemarehand-tailoredforit Asanexampleofallthis,itiswellknownthatgenericmethods(likesimulatedannealing [12].Linearprogrammingproblemsareanotherexample;thesimplexalgorithmisasearch concerningfandtherebyeectivelyreplacefwithap(f);andthenuseasearchalgorithm situations,theprocedurefollowedbytheresearcheristo:identifysalientaspectsoff(e.g., itisatspproblem,oritisalinearprogrammingproblem);throwawayallotherknowledge algorithmspecicallydesignedtosolvecostfunctionsofaparticulartype.inbothofthese explicitlyknowntoworkwellforthatp(f). pretendthatonesimplyhasageneraltspproblem particularsunknown andusean hasaparticulartravelingsalesmanproblem(tsp)problemathand,onewouldinstead itsextremaaren'tknown),andthereforeonereplacesitwithap(f).forexample,ifone Inotherwords,oneadmitsthatinacertainsensefisnotcompletelyknown(forexample, questionweaddressedwaswhetheritmaybethatsomealgorithmaperformsbetterthan B,onaverage.Ouranswertothisquestion,givenbytheNFLtheoremisthatthisis algorithmwell-suitedtotspproblemsingeneral. impossible.animportantimplicationofthisresultisthe\conservation"natureofsearch, Inourinvestigationofthesearchproblemfromthismatch-a-to-fperspective,therst illustratedbythefollowingexample.ifageneticalgorithmoutperformssimulatedannealing oversomeclassofcostfunctions,thenovertheremainingcostfunctionsfn,simulated annealingmustoutperformthegeneticalgorithm.itshouldbenotedthatthisconservation appliesevenifoneconsiders\adaptive"searchalgorithms[6,18]whichmodifytheirsearch featuresoff. strategybasedonpropertiesofthepopulationof(x Y)pairsobservedsofarinthesearch, andwhichperformthis\adaptation"withoutregardtoanyknowledgeconcerningsalient isviewedasoptimizationoveracostor\tness"function.wefurthersimplifymattersby algorithms).tothisend,considertheextremelysimpliedviewinwhichnaturalselection relationshipbetweennaturalselectioninthebiologicalworldandoptimization(i.e.genetic Itisimportanttobearinmindexactlywhatallofthisdoes(not)implyaboutthe assumingthetnessfunctionisstaticovertime. sinceitbegan,andthereforewedon'tallowanalgorithmtoresamplepointsithadalready Inthispaperwemeasureanalgorithm'sperformancebasedonallXvaluesithassampled 28

evolutionthroughtimeof\generations"consistingoftemporallycontiguoussubsetsofour onemightconsiderdierentmeasures.inparticular,wemaybeprimarilyinterestedinthe population,generationsthatareupdatedbyoursearchalgorithm. visited.ournfltheoremstatesthatallalgorithmsareequivalentbythismeasure.however NFLtheoremdoesnotapplytothisalternativekindofperformancemeasure.Forexample, accordingtothisalternativeperformancemeasure,analgorithmthatresamplesoldpoints inxthataretandaddsthemtothecurrentgenerationwillalwaysdobetterthanone Insuchascenario,itdoesmakesensetoresamplepointsalreadyvisited.Moreover,our selectionmeansthatonly(essentialcharacteristicsof)goodpointsinxarekeptaroundfrom kindofmeasure;weonlyseetheorganismsfromthecurrentgeneration.inaddition,natural thatresamplesoldpointsthatarenott. onegenerationtothenext.accordingly,usingthissecondkindofperformancemeasure, Nowwhenweexaminethebiologicalworldaroundus,weareimplicitlyusingthissecond theenvironment-i.e.,costfunction-didn'tchangeintime,etc.)thisisnothingmorethan thetautologythatnaturalselectionimprovesthetnessofthemembersofageneration. oneexpectsthattheaveragetnessacrossagenerationimproveswithtime.(orwouldif notmeanthatifwewishtodoasearch,andareabletokeeparoundallpointssampledso performswellaccordingtothisgeneration-basedmeasuredoesnotmeananythingconcerning itsperformanceaccordingtothe~c-basedmeasureusedinthispaper.inparticular,itdoes Howevertheevidencegarneredfromexaminingtheworldaroundusthatnaturalselection Yetitispreciselythissituationthatisofinterestintheengineeringworld. far,thatwehaveanyreasontobelievethatnaturalselectionisaneectivesearchstrategy. selectionisaneectivesearchstrategyinthebiologicalworld.wesimplyhavenothada chancetoobservethebehaviorofalternativestrategies.accordingtothenfltheorem,for thatnaturalselectionisaneectivesearchstrategy.itdoesnotevenindicatethatnatural Inshort,theempiricalevidenceofthebiologicalworlddoesnotindicateinanysense (Thisisexactlyanalogoustothefactthathill-descendingcanbeathill-climbingatnding allweknow,thestrategyofbreedingonlytheleasttmembersofthepopulationmayhave tnessmaxima.)thebreed-the-worststrategywillingeneralresultinworserecentgenerations,butsimplythefactthatyouareusingthatstrategyimpliesnothingaboutthequality ofthepopulationsoverthelongterm. selection,onewouldhavetoallowthebreed-the-worststrategytoexploitthesamemassive amountofparallelismexploitedbynaturalselectionintherealworld,wheretherearea hugenumberofgenomesevolvinginparallel.itmaywellbethatthe\blindwatchmaker" Inthisregard,notethattofairlycomparethebreed-the-worststrategywithnatural doneabetterjobatndingtheextremaofthecostfunctionfacedbybiologicalorganisms. hasmanagedtoproducesuchanamazingbiomesimplybyrelyingonmassiveparallelism ratherthanbreed-the-best.nobodyknows;nobodyhastriedtomeasure\howwell"natural butbreed-the-worstwinsinothers. themeasurementsarenallydonewewillndthatnaturalselectionwinsinsomeecosystems selectionvs.breed-the-worstvariesfromecosystemtoecosystem itmaywellbethatwhen selectionworksinthebiologicalworldbefore.indeed,presumablytheecacyofnatural Ontheotherhand,ifwerelaxtheunrealisticassumptionthatthetnessfunctioniscon- 29

ratherthanabreed-the-worststrategy,regardlessoftheecosystem.(suchadvantagescould thatthe\matching"ofsearchalgorithmandcostfunctionrequiredbytheinnerproduct arisefromthefactthatthecostfunctionisbeingdeterminedinpartbythepopulation,so stantovertime,thenitispossiblethattheremaybeadvantagestousingnaturalselection outthatbreed-the-worsthasadvantagesovernaturalselectionforvaryingtnessfunctions vantagesrelativetonaturalselection'sbreed-the-beststrategy.alternatively,itmayturn and/orminimaxconcerns.theseareissuesforfutureresearch. formulamaysomehowbeautomatic.)similarly,thatstrategymayhaveminimaxdisad- betweenthetwosearchalgorithms.thisraisessomeobviousquestionsforfutureresearch: worstmembersofthepopulationforthenextgenerationisequivalenttoonethatkeepsthe bestmembers,onaverage.however,thetnessofthemembersofthegenerationswilldier Tosummarize,bytheNFLtheorem,anygeneration-basedschemethatkeepsonlythe Averagedoverallf,howbigwouldoneexpectthedierencetobe?Foraxedf,andtwo thepopulationwill(likely)beforarandomalgorithmasmgrows? thislastcalculationcomparewiththecalculationmadeaboveofwhatthebestmemberof beinginthecurrentgeneration,howbigwouldoneexpectthedierencetobe?howdoes identicalrandomsearchalgorithmsthatare\directed"dierentlyinwhotheyclassifyas Itisperhapsttingforapaperabouteectivesearchthatweconcludewithabrieflisting 9.2 ofother(research)directionswebelievewarrantfurtherinvestigation. Futurework tooltosolverealproblems.thiswouldinvolvetwosteps.firstweneedamethodof haveusedp(f)todothis,butperhapsthereareotherwaysthatweshouldalsoconsider. incorporatingbroadkindsofknowledgeconcerningfintotheanalysis.inthispaperwe Themostimportantcontinuationofthisworkistoturnourframeworkintoapractical theknowledgeconcerningthecostfunctionthatisimplicitintheheuristicsofbranchand throughtheassemblageofsub-solutions? Boundstrategies.Howdoweincorporatehowthecostofacompletesolution(f)isaccrued Forexample,itisnotyetclearhowto(orevenwhetheroneshould)encapsulateinaP(f) concerningfintoanoptimala.thegoalinitsbroadestsenseistodesignasystemthatcan takeinsuchknowledgeconcerningfandthensolvefortheoptimalagiventhatknowledge. (Forexample,iftheknowledgewereintheformofP(f),onewould\invert"theinner Thesecondstepinthissuggestedprogramistodeterminehowbesttoconvertknowledge onlythetoolsdevelopedinthispaper.manyofthemwerepresentedinthetext.others, therearemanyimportantquestionsrelatedtothisprogramthatshouldbeanalyzableusing productformulasomehow.)onewouldthenusethatatosearchthef. particularlywell-suitedtohelpusunderstandtheconnectionbetweenp(f)andanoptimal Initsfullestsense,thisprogrammaywellinvolvemanyyearsofwork.Nonetheless, thediagonalinfspace(i.e.,frombeinguniformoverallf),howwillcertaina'sbehurt a,are:howfastdoesthecosthistogram~cassociatedwithaparticularalgorithmconverge tothehistogramofthecostvaluesftakesonacrossallofx?asp(f)changesfrom andcertaina'shelped?couldtheaverageoveralla'simprove?forwhatp(f)'sbesides 30

algorithms),forwhatp(f)istheperformanceofthealgorithmsequal?inparticular,if thediagonalareallalgorithmsequal?giventwoparticularalgorithms(ratherthanall P(f)isuniformoversomesubsetFandzerooutside,5whataretheequivalence classesofsearchalgorithmswithidenticalexpectedbehavior? populationcanonlyimprove.soallpreviousstudiesshowingthattnessdoesimprove above.foranyalgorithm,asthesearchprogresses,thetnessofthebestmemberofthe currentlypopularsearchalgorithmsintermsoftheperformancebenchmarkswepresent Asapreliminarystepinthisprogram,itwouldmakesensetoexploretheecacyof bettertheimprovementisthanyouwouldexpectittobesolelyduetothe\ttestcanonly improve"eect.that'swhatourmeasuresaredesignedtoassess. intimeforsomealgorithmareallydon'tproveanything.what'simportantishowmuch rangeofpopulationsizes.thingsshouldbeevenworseifonerandomlysamplesfromthe quitelikelythatonasignicantfractionoftheproblemsinthestandardtestsuites,oneor moreofthecurrentlypopularsearchalgorithmswillfailtoperformwell,atleastforsome Giventherecentexperienceinthesupervisedlearningcommunity[8,13,10],itseems spaceofreal-worldsearchproblems.thisisbecausethereare\selectioneects"ensuring thatthemostcommonlystudiedsearchproblems(i.e.,thoseinthesuites)arethosewhich peopleconsider\reasonable";inpractice,\reasonable"oftensimplymeans\agoodmatch ministicalgorithms.aretherepotentialadvantagestostochasticalgorithms?inparticular, tothealgorithmsi'mfamiliarwith". gorithmsa?i.e.,canonewritep(cjf;m;)=paka;p(cjf;m;a)forsomeexpansion doesitmakesenseto\expand"anystochasticalgorithmintermsofdeterministical- Anotherinterestingseriesofquestionsconcernsdierencesbetweenstochasticanddeter- coecientska;?ifso,itsuggeststhatasp(f)movesfromthediagonaltheperformance algorithmshavecertainminimaxadvantagesoverdeterministicones. of'swillneitherimprovenordegradeasmuchasthatofa's.soitmaybethatstochastic distinctionsoccurin\cycles",inwhichalgorithmais(head-to-headminimax)superior tob,andbtoc,butthencisalsosuperiortoa.argumentsforchoosingbetween minimaxdistinctionsbetweenalgorithms.perhapsthesimplestistocharacterizewhensuch Therearemanyotherissuesthatremaintobeinvestigatedconcerninghead-to-head algorithmsbasedonhead-to-headminimaxdistinctionsaremorepersuasiveintheabsence withtheexample)forsomereasonalgorithmccanberuledoutasacandidatealgorithm ofsuchcycles.howeveritshouldbenotedthateveniftherearesuchcycles,if(tocarryon (e.g.,ittakestoolongtocompute,orisdiculttodealwith,orsimplyisnotinvogue),then adoptedinthispaperandconventionalstatistics.inparticulartheeldofoptimalexperimentaldesign[1]andmorepreciselyactivelearning[2]isconcernedwiththefollowing Otherissuestobeexploredinvolvetherelationbetweenthestatisticalviewofsearch minimaxdistinctions. thefactthatwehaveacycledoesnotprecludechoosingalgorithmabasedonhead-to-head question:thereissomeunknownprobabilisticrelationshipbetweenxandy.ihaveasetof pairsofx-yvaluesformedbysamplingthatrelationship(the\trainingset").atwhatnext 5Asanexample,mightbethesetofcorrelatedcostfunctionsasin[14]. 31

Thisquestionofhowbesttoconductactivelearningisobviouslyverycloselyrelatedtothe searchproblem;futureworkinvolvesseeingwhatresultsintheeldofactivelearningcan XvalueshouldIsampletherelationshipto\best"helpmeinferthefullX-Yrelationship? befruitfullyappliedtosearch. algorithmsaretooneanother.asanexampleofsuchameasure,onecouldsimplysaythat algorithm.accordingly,thisequationprovidesseveralwaystomeasurehow\close"two whatwewanttoknowissetbyit).thersttermontheright-handsideissetbyone's ConsideragainEq.(4).Theleft-handsideiswhatweareinterestedin(ormoregenerally, the(~c-indexed)vectorsp(~cjm;a)areforthosetwoalgorithms,forthatp(f).(onecould onecouldmeasuretheclosenessoftwoalgorithmsforaspecicp(f),byseeinghowclose imaginethatforsomep(f)twoalgorithmswillbeclose,whileforotherstheywillbefar howclosetwoalgorithmsareisgivenbyhowclosetheirvectors~vc;a;mare.alternatively, apart.)asanalexample,givenanalgorithm,onecouldsolveforthep(f)thatoptimizes simulatedannealing,eventhoughitsinternalworkingsarecompletelydierent".onecould fortwoalgorithms,andusethistomeasuretheclosenessofthealgorithmsthemselves. P(~cjm;a)insomenon-trivialsense.OnecouldthenseehowclosetheoptimalP(f)'sare alsoinvestigatehypotheseslike\allalgorithmsthathumansconsider'reasonable'arecloseto oneanother".futureworkinvolvesexploringthesemeasuresoftheclosenessofalgorithms. Withthesekindsofmeasures,onecouldsaythingslike\thisalgorithmisverycloseto changingthesearchalgorithm.thecostfunctiondoesn'tchangewhenwere-encode tothatencoding.howeverinthecontextofthispaper,changingtheencodingmeans duringsearch.normallyonetalksofhowthecostfunctionisencoded,andpossiblechanges Otherfutureworkinvolvesexploringtheimportanceofthe\encoding"schemeoneuses ratherhowwe(thealgorithm)viewthefunctionchanges. encodings"ofcostfunctions.forexample,if(a)isare-encodingofalgorithma,then onemightsaythatacostfunctionfbecomes(f)underthatsamere-encodingip(~cj f;m;a)=p(~cj(f);m;(a))forall~c.(alternatively,onemightsaythat(a)isalegal Nonetheless,onecanimagineseveralwaystocouplere-encodingofalgorithmswith\re- true.)futureworkhereinvolvesseeinghowchangingtheencodingschemeinteractswith P(f)todeterminetheecacyofthesearchprocess. re-encodingschemeforalgorithmsithereisanassociated(f)forwhichtheforegoingis mustbemodied(andhow)ifwestillhavep(f)=xp0(f(x))butnolongerhaveuniform y2y.aninterestingquestionforfutureresearchistoseewhichoftheresultsofthispaper P0(y).(Intuitively,forsuchaP(f),f(x)isbeingsetafteryoupickxasthenextpointto UniformP(f)canberewrittenasP(f)=xP0(f(x)),whereP0(y)isuniformoverall visit,andthisisbeingdonewithoutanyregardforpointsyou'vealreadyseen.hence,one somenearestneighborcoupling? ofalgorithmsareequal?andwhathappensifratherthanequalxp0(f(x)),p(f)involves forwhichallalgorithmsareequal?whatisthemostgeneralp(f)forwhichaparticularpair wouldexpectnfl-resultstohold.)relatedquestionsare:whatisthemostgeneralp(f) thatnotcanbewrittenasxp0(f(x))butforwhichitisstilltruethatallalgorithmsare equal.forexample,sayjyj>jxjandletp(f)bei)uniformoverallfsuchthatforno Inrelationtotherstandlastofthesequestions,itseemsplausiblethatthereareP(f)'s 32

innfl-typeresults,sincethepointsyouhaveseensofartellyounothingaboutwhereyou P(f)hasextremelystrongcouplingbetweentheelementsofthepopulation,incontrastto P(f)'sthatcanbewrittenasxP0(f(x)).YetitseemslikelythattheseP(f)'salsoresult x1;x22xdoesf(x1)=f(x2);andii)zeroforallfthatdon'tobeythiscondition.this shouldsearchnext. more\real-world"p(f)andstillhavenfl-typeresults? holdraiseanintriguingquestion:justhowfarcanonepushfromtheuniformp(f)toa practicalconcerns.yetthesebroaderclassesofp(f)'sforwhichnfl-typeresultsmight InthispaperthechoiceofP(f)(uniform)wasmotivatedbytheoereticalratherthan stoppingcondition,afunctionofallpopulationsuptothepresent,ismet.thenintuitively, bynfl,onewouldexpectthataveragedoverallf,theprobabilitythatyouralgorithmstop hadtimetoexplicatehere.forexample,consideralgorithmsthatkeeprunninguntilsome Finally,therearemanyotherNFL-typeresults,foruniformP(f),thatwehavenot (andsimilar)resultsisthesubjectoffuturework. aftermsamplesoffisindependentofthealgorithmbeingused.theformalproofofthese helpfulconversation,andthesfiforfunding.dhwwouldalsoliketothanktxninc.for funding. WewouldliketothankRajaDas,TalGrossman,PaulHelman,andUnamayO'Reillyfor Acknowledgments References [1]J.O.Berger,StatisticalDecisonTheoryandBayesianAnalysis,Springer-Verlag(1985). [3]T.Cover,J.Thomas,ElementsofInformationTheory,JohnWiley&Sons,(1991). [2]D.Cohn,NeuralNetworkExplorationUsingOptimalExperimentalDesign,MITAI Memo.1491. [4]M.R.Garey,D.S.Johnson,ComputersandIntractability,Freeman(1979). [6]L.Ingber,AdaptiveSimulatedAnnealing,Softwarepackagedocumentation, [5]J.Holland,AdaptationinNaturalandArticialSystems,UniversityofMichiganPress, AnnArbor,(1975). [7]S.Kirkpatrick,C.D.GelattJr.,M.P.Vecchi,Science,220,671,(1983). ftp.alumni.caltech.edu:/pub/ingber/asa.tar.gz. [8]R.Kohavi,personalcommunication.AlsoseeAStudyofCross-ValidationandBootstrapforAccuracyEstimationandModelSelection,tobepresentedatIJCAI1995. [9]E.L.Lawler,D.E.Wood,OperationsResearch,14(4),699-719,(1966). 33

[11]J.Pearl,Heuristics,intelligentsearchstrategiesforcomputerproblemsolving,Addison- [10]P.Murphy,M.Pazzani,JournalofArticialIntelligenceResearch,1,257-275(1994). [12]GerhardReinelt,TheTravelingSalesman,computationalsolutionsforTSPapplications,SpringerVerlagBerlinHeidelberg(1994). Wesley,(1984). [14]P.F.Stadler,Europhys.Lett.20,pp479-482,(1992). [13]C.Schaer,ConservationofGeneralization:ACaseStudy. [15]C.E.M.Strauss,D.H.Wolpert,D.R.Wolf.Alpha,Evidence,andtheEntropicPrior [16]DH.Wolpert,O-trainingseterrorandaprioridistinctionsbetweenlearningalgorithms,TechnicalReportSFI-TR-95-01-003,SantaFeInstitute,1995. (1992). inmaximumentropyandbayesianmethods,ed.alimohammed-djafari,pp113-120, [17]DH.Wolpert,OnOverttingAvoidanceasBias,TechnicalReportSFI-TR-92-03-5001, SantaFeInstitute,1992. [18]D.Yuret,M.delaMaza,DynamicHill-Climbing:OvercomingthelimitationsofoptimizationtechniquesinTheSecondTurkishSymposiumonArticialIntelligenceand A NeuralNetworks,pp208-212,(1993). search Proofrelatedtoinformationtheoreticaspectsof Wewanttocalculatetheproportionofallalgorithmsthatgiveaparticular~cforaparticular butnite-list.thatlistisindexedbyallpossibled's(asidefromthosethatextendover theentireinputspace).eachentryinthelististhextheainquestionoutputsforthat f.weproceedinseveralsteps. d-index. 1)SinceXisnite,populationsarenite.Thereforeany(deterministic)aisahuge- nowonweimplicitlyrestrictthediscussiontounorderedpathsoflengthm.)aparticular is\in"or\from"aparticularfifthereisaunorderedsetofm(x;f(x))pairsidentical thesamexvalue.suchasetisan\unorderedpath".(withoutlossofgenerality,from 2)Consideranyparticularunorderedsetofmx ypairswherenotwoofthepairsshare to.thenumeratorontheright-handsideofeq.(9)isthenumberofunorderedpathsin thegivenfthatgivethedesired~c. ontheright-handsideofeq.(9)-isproportionaltothenumberofa'sthatgivethedesired 3)Claim:Thenumberofunorderedpathsinfthatgivethedesired~c-thenumerator 34

~cforf.(theproofofthisclaimwillconstituteaproofofeq.(9).)furthermore,the ~cforf,andfromitproducesathatisinfandgivesthedesired~c.wewillthenshow proportionalityconstantisindependentoffand~c. of;f,and~c.theproofwillthenbecompletedbyshowingthatissingle-valued,i.e.,by thatforanythenumberofalgorithmsasuchthat(a)=isaconstant,independent 4)Proof:Wewillconstructamapping:a!.takesinanathatgivesthedesired showingthatthereisnoawhohasasimageundermappingmorethanone. Indicatebyd(ord)thissetoftherstmd'sprovidedbyord.(Notethatanyordisitself inturnprovidesasetofmsuccessived's(ifoneincludesthenulld)andafollowingx. (Notethateveryxvalueinanunorderedpathisdistinct.)Eachsuchorderedpathord 5)Anyunorderedpathgivesasetofm!dierentorderedpathsintheobviousmanner. apopulation,buttoavoidconfusionweavoidreferringtoitassuch.) distinctpartiala'sforeach(oneforeachorderedpathcorrespondingto),wehavem! thelistofana,butwithonlythemd(ord)entriesinthelistlledin;theremainingentries areblank.(wesaythatmisthe\length"ofthepartialalgorithm.)sincetherearem! 6)Foranyorderedpathordwecanconstructa\partialalgorithm".Thisconsistsof suchpartiallylled-inlistsforeach. onepartialalgorithmgeneratedfromandthatgive~cwhenrunonf). \consistent"withaparticularfullalgorithm.thisallowsustodene(theinverseof):for anythatisinfandgives~c, 1()(thesetofallathatareconsistentwithatleast 7)Intheobviousmannerwecantalkaboutwhetheraparticularpartialalgorithmis adistinctm-elementpartialalgorithm.ourquestionishowmanyfullalgorithmslistsare rstgenerateallorderedpathsinducedbyandthenassociateeachsuchorderedpathwith give~c, 1()containsthesamenumberofelements,regardlessof,f,orc.Tothatend, 8)Tocompletetherstpartofourproofwemustshowthatforallthatareinfand consistentwithatleastoneofthesepartialalgorithmpartiallists.(howthisquestionis answeredisthecoreofthisappendix.) permutingtheindicesdofallthelists.obviouslysuchareorderingwon'tchangetheanswer toourquestion. 9)Toanswerthisquestion,reordertheentriesineachofthepartialalgorithmlistsby anydindexoftheform((dx(1);dy(1));:::;(dx(im);dy(im)))whoseentryislledin arbitraryconstantyvalueandxjreferstothej'thelementofx.next,createsomearbitrary inanyofourpartialalgorithmlistswithd0(d)((dx(1);z);:::;(dx(i);z)),wherezissome 9)Wewillperformthepermutingbyinterchangingpairsofdindices.First,interchange listswithd00(d0)((x1;z);:::;(xm;z)).(recallthatallthedx(i)mustbedistinct.) butxedorderingofallx2x:(x1;:::;xjxj).theninterchangeanyd0indexoftheform ((dx(1);z;:::;(dx(im);z)whoseentryislledininanyofour(new)partialalgorithm atleastonepartialalgorithmlistin 1()isindependentof,candf.Thiscompletes asisthenumberofsuchlists(it'sm!).thereforethenumberofalgorithmsconsistentwith therstpartoftheproof. 10)Byconstruction,theresultantpartialalgorithmlistsareindependentof,~candf, AandB.ThereisnoorderedpathAordconstructedfromAthatequalsanorderedpath 11)Forthesecondpart,rstchooseany2unorderedpathsthatdierfromoneanother, 35

BordconstructedfromB.SochooseanysuchAordandanysuchBord.Iftheydisagreefor them.iftheyagreeforthatd,thentheyhavethesamedouble-elementd.continueinthis theyagreeforthenulld,thensincetheyaresampledfromthesamef,theyhavethesame single-elementd.iftheydisagreeforthatd,thenthereisnoathatagreeswithbothof thenulld,thenweknowthatthereisno(deterministic)athatagreeswithbothofthem.if havedisagreedatsomepointbynow,andthereforethereisnoathatagreeswithbothof them. manneralltheuptothe(m 1)-elementd.Sincethetwoorderedpathsdier,theymust ain 1(A)thatisalsoin 1(B).Thiscompletestheproof. 12)SincethisistrueforanyAordfromAandanyBordfromB,weseethatthereisno B rithms Proofrelatedtominimaxdistinctionsbetweenalgo- Theproofisbyexample. 1)Lettherstpointa1visitsbex1,andtherstpointa2visitsbex2. ConsiderthreepointsinX,x1;x2,andx3,andthreepointsinY,y1;y2,andy3. 3)Ifatitsrstpointa2seesay1,itjumpstox1.Ifitseesay2,itjumpstox3. 2)Ifatitsrstpointa1seesay1oray2,itjumpstox2.Otherwiseitjumpstox3. ConsiderthecostfunctionthathasastheYvaluesforthethreeXvaluesfy1;y2;y3g, respectively. populationcontainingy2andy3andsuchthata2producesapopulationcontainingy1and (y2;y3). Theproofiscompletedifweshowthatthereisnocostfunctionsothata1producesa Form=2,a1willproduceapopulation(y1;y2)forthisfunction,anda2willproduce y2.therearefourpossiblepairsofpopulationstoconsider: ii)[(y2;y3);(y2;y1)]; i)[(y2;y3);(y1;y2)]; iii)[(y3;y2);(y1;y2)]; ay2itssecondpointmustequala2'srstpoint.thisrulesoutpossibilitiesi)andii). Sinceifitsrstpointisay2a1jumpstox2whichiswherea2starts,whena1'srstpointis iv)[(y3;y2);(y2;y1)]. fy3;s;y2g,forsomevariables.forcaseiii),swouldneedtoequaly1,duetotherstpoint Forpossibilitiesiii)andiv),bya1'spopulationweknowthatfmustbeoftheform 36

ina2'spopulation.howeverforthatcase,thesecondpointa2seeswouldbethevalueatx1, thereforeseeay2,contrarytohypothesis. whichisy3,contrarytohypothesis. population.howeverthatwouldmeanthata2jumpstox3foritssecondpoint,andwould Accordingly,noneofthefourcasesispossible.Thisisacasebothwherethereisno Forcaseiv),weknowthattheswouldhavetoequaly2,duetotherstpointina2's histograms.qed. symmetryunderexchangeofdy'sbetweena1anda2,andnosymmetryunderexchangeof CSinceany(deterministic)searchalgorithmisamappingfromdDtoxX,anysearch algorithmisavectorinthespacexd.thecomponentsofsuchavectorareindexedbythe ProofrelatedtoNFLresultsforxedcostfunctions possiblepopulations,andthevalueforeachcomponentisthexthatthealgorithmproduces dered)elements.thesetofthosepopulationsthatdostartwithdthiswaydenesasetof otherpopulationofsizegreaterthanmhasthe(ordered)elementsofdasitsrstm(or- giventheassociatedpopulation. componentsofanyalgorithmvectora.thosecomponentswillbeindicatedbyad. Considernowaparticularpopulationdofsizem.Givend,wecansaywhetherany thatareequivalenttotherstm<melementsindforsomem.thevaluesofthose componentsforthevectoralgorithmawillbeindicatedbyad.thesecondtypeconsistsof thosecomponentscorrespondingtoallremainingpopulations.intuitively,thesearepopulationsthatarenotcompatiblewithd.someexamplesofsuchpopulationsarepopulations Theremainingcomponentsofaareoftwotypes.Therstisgivenbythosepopulations indicatedbya?d. thatcontainasoneoftheirrstmelementsanelementnotfoundind,andpopulationsthat re-ordertheelementsfoundind.thevaluesofaforcomponentsofthissecondtypewillbe LetprocbeeitherAorB.Weareinterestedin Xa;a0P(c>mjf;d1;d2;k;a;a0;proc) a?d;a0?d0x =Xad;a0d0X Thesummandisindependentofthevaluesofa?danda0?dforeitherofourtwod's. ad;a0d0p(c>mjf;d;d0;k;a;a0;proc): populationsnotconsistentwithd,ofthenumberofpossiblexeachsuchpopulationcould Inaddition,thenumberofsuchvaluesisaconstant.(Itisgivenbytheproduct,overall bemappedto.)therefore,uptoanoverallconstantindependentofd,d0,f,andproc,our sumequals ad;a0d0x Xad;a0d0P(c>mjf;d;d0;ad;a0d0;ad;a0d0;proc): 37

sumreducesto (namely,thevaluethatgivesthenextxelementind),andsimilarlyfora0d0.thereforeour isdened.thismeansthatweactuallyonlyallowonevalueforeachcomponentinad Bydenition,weareimplicitlyrestrictingthesumtothoseaanda0sothatoursummand ad;a0d0p(c>mjf;d;d0;ad;a0d0;proc): X choiceofaora0isxed.accordingly,withoutlossofgenerality,wecanrewriteoursumas isoverthesamecomponentsofaasthesumovera0d0isofa0.nowforxeddandd0,proc's Notethatnocomponentofadliesindx[.Thesameistrueofa0d0.Sooursumoverad withtheimplicitassumptionthatc>missetbyad.thissumisindependentofproc.qed. XadP(c>mjf;d;d0;ad); 38