NoFreeLunchTheoremsforSearch DavidH.Wolpert(dhw@santafe.edu) SFI-TR-95-02-010 WilliamG.Macready(wgm@santafe.edu) TheSantaFeInstitute 1399HydeParkRoad SantaFe,NM,87501 February23,1996 possiblecostfunctions.inparticular,ifalgorithmaoutperformsalgorithmbonsome formexactlythesame,accordingtoanyperformancemeasure,whenaveragedoverall Weshowthatallalgorithmsthatsearchforanextremumofacostfunctionper- Abstract whereboutperformsa.startingfromthisweanalyzeanumberoftheotherapriori characteristicsofthesearchproblem,likeitsgeometryanditsinformation-theoretic aspects.thisanalysisallowsustoderivemathematicalbenchmarksforassessinga costfunctions,thenlooselyspeakingtheremustexistexactlyasmanyotherfunctions time-varyingcostfunctions.weconcludewithsomediscussionofthejustiabilityof particularsearchalgorithm'sperformance.wealsoinvestigateminimaxaspectsof biologically-inspiredsearchmethods. functiontopredictfuturebehaviorofthesearchalgorithmonthatcostfunction,and thesearchproblem,thevalidityofusingcharacteristicsofapartialsearchoveracost 1Manyproblemscanbecastasoptimizationovera\cost"or\tness"function.Insucha problem,wearegivensuchafunction,f:x!y(fbeingthesetofallsuchmappings). Introduction Physicalexamplesofsuchaproblemincludefreeenergyminimization(Y=<)overspin congurations(x=f 1;+1gN),oroverbondangles(X=f<<<gN),etc.Examplesalsoaboundincombinatorialoptimization,rangingfromnumberpartitioningtograph weseekthex'swhichextremizef(thiswilloftenbeimplicitlyassumedinthispaper). Forthatfweseekthesetofx2Xwhichgiverisetoaparticulary2Y.Mostoften, coloringtoscheduling[4]. 1
tematicconstructionofagoodxvalue,x0,fromgoodsub-solutionsspecifyingpartofx0. Themostcelebratedmethodofthistypeisthebranchandboundalgorithm[9].Forthis systematicandexhaustiveapproachtoworkinreasonabletime,onemusthaveaneective Therearetwocommonapproachestotheseoptimizationproblems.Therstisasys- work[11]linkingthecostfunctiontothepropertiesaheuristicmusthaveinordertosearch heuristic,h(n),representingthequalityofsub-solutionsn.thereisextensivetheoretical eciently. [7],andgeneticalgorithms[5]. values.therearemanyalgorithmsofthistype,includinghill-climbing,simulatedannealing solutionsx2xandtheassociatedyvalues,and(triesto)iterativelyimprovesuponthosex Asecondapproachtooptimizationbeginswithapopulationofoneormorecomplete thesealgorithmsaredirectlyapplied,withlittleornomodication,toanycostfunctionina biasesinhowtheytrytoimprovethepopulation(i.e.,thebiasesinhowtheysearchx) must\match"thoseimplicitinthecostfunctiontheyareoptimizing.howeveralmostalways Intuitively,onewouldexpectthatforthisclassofalgorithmstoworkeectively,the wideclassofcostfunctions.theparticularsofthecostfunctionsathandarealmostalways broadclassofproblemsisrarelyjustied. thecostfunctionarecrucial,andblindfaithinanalgorithmtosearcheectivelyacrossa ignored.aswewilldemonstratethough,the\matching"intuitionistrue;theparticularsof onemightexpectthathill-climbingusuallyoutperformshill-descendingifone'sgoalisto ndamaximumofthecostfunction.onemightalsoexpectitwouldoutperformarandom AperformsbetterthanBonaverage,evenifBsometimesoutperformsA.Asanexample, Indeed,onemightexpectthattherearepairsofsearchalgorithmsAandBsuchthat theperformancemeasureused). expectedperformanceofallalgorithmsonthatfunctionareexactlythesame(regardlessof search.inpointoffactthough,asourcentralresultdemonstrates,thisisnotthecase.if wedonottakeintoaccountanyparticularbiasesorpropertiesofourcostfunction,thenthe thisreason(andtoemphasizetheparallelwithsimilarsupervisedlearningresults[16,17]), onlyaswellastheknowledgeconcerningthecostfunctionputintothecostalgorithm.for wehavedubbedourcentralresulta\nofreelunch"(nfl)theorem. Inshort,thereareno\freelunches"foreectiveoptimization;anyalgorithmperforms aspectsofsearch.thisframeworkconstitutesthe\skeleton"oftheoptimizationproblem;it iswhatcanbesaidconcerningsearchbeforeexplicitdetailsofaparticularreal-worldsearch problemareconsidered.theconstructionofsuchaskeletonprovidesalanguagetoaskand ToprovetheNFLtheoremaframeworkhastobedevelopedwhichaddressesthecore nevermindanswered.(weposeandansweranumberofsuchquestionsinthispaper.)in addition,suchaskeletonindicateswherethereal\meat"ofoptimizationlies.itclaries whatthecoreissuesarethatunderlytheeectivenessofthesearchprocess. answerformalquestionsaboutsearch,someofwhichhaveneverbeforeevenbeenasked, andusingittoprovethenfltheorem.weprovethetheoremforbothdeterministicand stochasticsearchalgorithms.section3givesageometricinterpretationofthenfltheorem. Inparticular,inthatsectionweprovideageometricmeaningofwhatitmeansforan Thepaperisorganizedasfollows.Webegininsection2bypresentingourframework 2
tigationofthestatisticalnatureofthesearchproblem,usingtheframeworkdevelopedin section2. algorithmtobewell\matched"toacostfunction. Insomecircumstancestheaveragebehaviorofalgorithmsisnotaninterestingquantity TherestofthepapergoesbeyondtheNFLtheorem.Itconsistsofapreliminaryinves- holdforanydistributionovercostfunctions. bywhichtocomparealgorithms.alternatively,averagesmaybeinteresting,butitisn'tclear whatdistributionovercostfunctionstousetodotheaveraging.weaddresssuchscenarios insection4byinvestigatingminimaxdistinctionsbetweenalgorithms.suchdistinctions ofthenfltheoreminanalyzingoptimization.)amyriadofotherpropertiesofsearchmay thatthoseresultsarederivedfromthenfltheorem,theyillustratethecentralimportance answersleadnaturallyintoresultsconcerningtheinformationtheoreticaspectsofsearch.(in Section5beginstheexplorationofsomeofthequestionsraisedinsection2.Someofthe beinvestigatedusingtechniquessimilartothosedevelopedinthissection.welistasample oftheseinsection9.2. ularsearchalgorithms.wederiveseveralbenchmarksagainstwhichtocomparesuchan (ratherthanrelative)ecacyofanalgorithmonsomesearchproblemthatdoesn'tusethese algorithm'sperformance.wecannotconceiveofanyvaliddemonstrationofthe\absolute" InSection6weturntotheimportantproblemofassessingtheperformanceofpartic- (orsimilar)benchmarks. Section7extendsouranalysistothecaseofsuchtimedependentcostfunctions. Notallsearchproblemsarestatic;insomecasesthecostfunctionchangesovertime. onthatfunction.whenchoosingbetweenalgorithmsbasedontheirobservedperformance thereforeforanydistributionovercostfunctions.thesetheoremsstatethatonecannotuse asearchalgorithm'sbehaviorsofaronaparticularcostfunctiontopredictitsfuturebehavior Insection8weprovidesometheoremsvalidforanysinglexedcostfunction,and itdoesnotsucetomakeanassumptionaboutthecostfunction;some(currentlypoorly understood)assumptionsare'alsobeingmadeabouthowthealgorithmsinquestionare relatedtoeachotherandtothecostfunction. results,andthenoffuturedirectionsforwork. Thepapercanbereadinstages.ArstreadingmighthighlighttheNFLtheoremandits Finally,weconcludeinSection9withageneraldiscussionoftheimplicationsofour ofthenfltheorem.finally,section9.1discussesbroadimplicationsofthenflresult. Section4,whichconsidersminimaxdistinctionsbetweenalgorithms,addresseslimitations NFLtheorem,Eq.(1).Section3thenprovidesageometricunderstandingofthetheorem. broadimplications.suchareadingshouldstartwithsection2foranunderstandingofthe Sections2and3.Suchareadingshouldincludesection5,whichusesourframeworkto tosection6whichusestheframeworktoprovideusefulbenchmarksagainstwhichother demonstratesomeoftheinformationtheoreticaspectsofsearch.itwouldthenmoveon Asecondreadingmightexplorethepotentialrichnessoftheframeworkdevelopedin algorithmsmaybecompared. Analreadingwouldincludesubjectsthatmayconstitutefruitfulextensionsofthe 3
section8,whichprobeswhatmaybelearnedfromalimitedamountofsearchoverasingle, specic,costfunction.thisreadingwouldconcludewithsection9.2wherewelistmany frameworkdevelopedinsections2and3.suchareadingwouldincludesection7,which extendsthenflresultstoaclassoftime-dependentcostfunctions.itwouldalsoinclude directionsforfutureextensions. sense.ifsomeonewishestocomparealgorithmsonsomeotherbasis,wewishthemluck. numberofdistinctevaluationsofthecostfunctionissimplyourchoice.althoughweconsider itquitereasonable,wedonotclaimtobeableto\prove"thatoneshoulduseit,inany Weshouldemphasizethatourcomparingalgorithmsbasedontheirhavingthesame ofirrelevantaprioridistinctionsbetweenalgorithms.(forexample,itsaysthataglobal totalevaluations includingrepeats isfraughtwithdiculties,andresultsinallkinds Howeverasanasideononesuchcomparisonscheme,wenotethatcomparingbasedon therandomguesserwillretraceless.) randomguesserisbetterthanahill-climber,averagedoverallcostfunctions,simplybecause inparticular,theeldofcomputationalcomplexity.unliketheapproachtakeninthispaper,computationalcomplexityignoresthestatisticalnatureofsearchforthemostpart,and concentratesinsteadoncomputationalissues.much(thoughbynomeansall)ofcomputationalcomplexityisconcernedwithphysicallyunrealizablecomputationaldevices(turing Thereareanumberofotherformalapproachestotheissuesinvestigatedinthispaper, Incontrast,theanalysisinthispaperdoesnotconcernitselfwiththecomputationalengineusedbythesearchalgorithm,butratherconcentratesexclusivelyontheunderlying machines)andtheworstcaseamountofresourcestheyrequiretondoptimalsolutions. (realistic)concernsforcomputationalresources. statisticalnatureofthesearchproblem. Futureworkwouldinvolvecombiningourconcernforthestatisticalnatureofsearchwith associatedcostvalues,(x;y)m2(xy)m,toanewpointx02xthathopefullyhaslowcost 2Alloracle-basedsearchalgorithmsrelyonextrapolatingfromanexistingsetofmpointsand NoFreeLunchTheoremforSearch beeitherdeterministicorstochastic.theanalysisofsuchextrapolationscanbeformalized (highcostifwe'researchingforamaximumratherthanaminimum).theextrapolationmay i=1:::mtobeasetofmdistinctsearchpoints(i.e.costevaluations)andassociatedcost asfollows. valuesorderedinsomeway(usuallyaccordingtothetimeatwhichtheyaregenerated)with theorderingindexgivenbyi.letuscallthisapopulationofsizem.wedenotethesetof ForsimplicitytakeXandYtobenite.Denedmfdm(i)gfdxm(i);dym(i)gfor pointx2xischosenbasedonthemembersofthecurrentpopulationd;thepairfx0;f(x0)g allpopulationsofsizembydm. isaddedtod;andtheprocedurerepeats. areanitenumberoffifjxjandjyjarenite.ateachstageofasearchalgorithm,anew Asabove,letfindicateasingle-valuedfunctionfromXtoY:f2YX.Notethatthere 4
hapsprobabilistic)mappingtakinganypopulationtoanewpointinthesearchspace.for simplicityofthepresentation,weassumethatthenewsearchpointhasnotalreadybeen visited.(asdiscussedbelow,relaxingthisassumptiondoesnotaectourresults.)soin Anysearchalgorithmofthe\secondapproach"discussedintheintroductionisa(per- discussedbelow,allourresultsalsoapplytostochasticalgorithms. thispaperwewillonlyexplicitlyconsidersuchdeterministicsearchalgorithms.howeveras D[mDm,andinparticularcontainstheemptyset.Forclarityoftheexposition,in particularadeterministicsearchalgorithmisamappinga:d2d!fxjx62dxg,where tness,itisnecessarytoevaluatethetnessesofalltheneighborsofx.allthoseevaluated pointsarecontainedinthepopulation,notonlyxandtheneighborofxwithhighesttness. ventionalhill-climberthatworksbymovingfromxtothatneighborofxwiththehighest Notethatthepopulationcontainsallpointssampledsofar.Inparticular,inacon- particularcostfunction,f,givenmdistinctcostevaluations.notethat~cisgivenbythey valuesofthepopulation,dym,andisavectoroflengthjyjwhoseithcomponentisthenumber ofmembersinthepopulationdmhavingcostfi.oncewehave~cwecanuseittoassessthe Weareinterestedinthehistogram,~c,ofcostvaluesthatanalgorithm,a,obtainsona ofalgorithmaonf.thisquantityisgivenbytheconditionalprobabilityp(~cjf;m;a). interestedintheconditionalprobabilitythathistogram~cwillbeobtainedundermiterations mighttakethelowestoccupiedbinin~casourperformancemeasure.)consequently,weare qualityofthesearchinanywaywechoose.(forexampleifwearesearchingforminimawe allfofp(~cjf;m;a1)tothesumoverallfofp(~cjf;m;a1).thiscomparisonprovidesa reverseistrue.toperformthecomparison,weusethetrickofcomparingthesumover algorithma1outperformsanotheralgorithma2,comparestof2,thesetoffforwhichthe AnaturalquestionconcerningthisscenarioishowF1,thesetoffforwhichsome majorresultofthispaper:p(~cjf;m;a)isindependentofawhenweaverageoverallcost functions.inotherwords,asisprovenbelow, Theorem:Foranypairofalgorithmsa1anda2, Animmediatecorollaryisthatforanyperformancemeasure(~c),theaverageoverallf XfP(~cjf;m;a1)=XfP(~cjf;m;a2): (1) toaperformancemeasureisirrelevant. ofp((~c)jf;m;a)isindependentofa.sotheprecisewaythatthehistogramismapped ofalgorithma,isindependentofa.thisfollowsfrom P(~cjm;a),whichistheprobabilityweobtainhistogramcaftermdistinctcostevaluations Notethatthenofreelunchresultimpliesthatifweknownothingaboutf,then (inthelaststepwehavereliedonthefactthatthecostfunctiondoesn'tdependoneither P(~cjm;a)=XfP(~cjf;m;a)P(fjm;a)=XfP(~cjf;m;a)P(f) mora).ifweknownothingaboutfthenallfareequallylikely,whichmeansthatfor allf,p(f)=1=jyjjxj.(moregenerally,p(f)reectsour\priorknowledge"concerningf.) 5
isindependentofabythenofreelunchtheorem. Accordingly,forthis\noknowledge"scenario,P(~cjm;a)=jYj jxjpfp(cjf;m;a),which ofthespace.ratheritisthetypicalcase. possiblep(f).)inthis,theuniformp(f)caseisnotsome\pathologicalcase",ontheedge theresultconcernsaveragingoverallthequantityp(~cjm;),whereindexesthesetof Similarly,youcanderiveanNFLresultforaveragingoverallpriors.(Moreformally, ifalgorithma1hasbetterperformancethanalgorithma2oversomesubsetfoffunctions,thena2mustperformbetteronthesetofremainingfunctionsfn.soforexampleif maximumofthecostfunction,hill-climbingandhill-descendingareequivalent,onaverage. simulatedannealingoutperformsgeneticalgorithmsonsomeset,geneticalgorithmsmust outperformsimulatedannealingonfn.asanotherexample,evenifone'sgoalistonda P~c~cP(~cjf;m;a)is,onaverage,thesameforallalgorithms.Moregenerally,foranytwo algorithms,atthepointintheirsearchwheretheyhavebothcreatedapopulationofsizem, AnotherimmediateconsequenceoftheNFLresultisthattheexpectedhistogramE(~cjf;m;a)= Thereareasmanyfforwhichyouralgorithm'sguessesforwheretosearchareworsethan ofrandomsearch.thenflresultsaysthatthereareasmanyf(appropriatelyweighted) forwhichtherandomalgorithmoutperformsyourfavoritesearchalgorithmasvice-versa. Aparticularlystrikingexampleofthislastpointisthecasewherea2isthealgorithm itmayperformrandomlyonthefathand,butthatitmayverywellperformevenworse. randomasforwhichtheyarebetter.theriskyoutakeinchoosinganalgorithmisnotthat somethingaboutf(perhapsspeciedthroughp(f)),ifwefailtoexplicitlyincorporatethat ofthisisdemonstratedbythenfltheorem,whichillustratesthatevenifwedoknow veryrarelyisthatknowledgeexplicitlyusedtohelpsetthealgorithm.theunreasonableness Oftenintherealworldonehassomeaprioriknowledgeconcerningf.Howeveronly onafortuitousmatchingbetweenfanda.thispointisformallyestablishedinsections3 knowledgeintoathenwehavenoassurancestheawillbeeective;wearesimplyrelying obvious.similarly,itmayseemobviousthatifoneuniformlyaveragesoverallf,thenall and8,whichmakenoassumptionswhatsoeverconcerningp(f). algorithmsareequal.(theonlyreasonittakesawholesubsectiontoestablishthisformally isbecausetherearealargenumberof\obvious"thingsthatmustbemathematicized.)yet ManywouldreadilyagreethatamustmatchP(f) thatstatementbordersonthe climbingandhill-descendingareequivalentonaverage,orthat\smart"choosingprocedures performnobetterthan\dumb"ones(seesection8).inaddition,thegeometricnatureof withoutrealizingyouaredoingso.thisiswhy,forexample,itcanbesurprisingthathill- theimplicationsofthestatementarenotsoobvious;itisextremelyeasytocontradictthem search.itistheonlystartingpointwecouldthinkofforinvestigatingthe\skeleton"ofthe searchproblem,before(assumptionsfor)theactualdistributionsintherealworldareput thematchingillustratessomeinterestingaspectsofthesearchproblem(seebelow). in.itshouldbeobviousthatwearenotclaimingthatallf'sareequallylikelyinthereal Weemphasizethattakinguniformaveragesoverf'sissimplyatoolforinvestigating world,andthesignicanceofthenfltheoreminnowaydependsonthevalidityofsucha claim. Resultsfornon-uniformP(f)arediscussedbelow,aftertheproofoftheNFLtheorem. 6
WenowshowthatPfP(~cjf;m;a)hasnodependenceona.Conceptually,theproofis quitesimple;theonlyreasonittakessolongisbecausethereissomebook-keepinginvolved. 2.1 Prooffordeterministicsearch Inaddition,becausemanyofourreadersmaynotbeconversantwiththetechniquesof probabilitytheorywesupplyallthedetails,lengtheningitconsiderably. values.thenweuseinductiontoestablishthea-independenceofthedistributionoverdym. involvesthefollowingsteps:first,wereducethedistributionover~cvaluestooneoverdym hasnobearingonitsfutureperformancesothatallalgorithmsperformequally.theproof Theintuitionissimple:bysummingoverallfthepastperformanceofanalgorithm separately,givingthedesiredresult. upintotwoindependentparts,oneforx2dxmandoneforx62dxm.theseareevaluated Theinductivestepstartsbyrearrangingthedistributionsinquestion.Thenfisbroken Expandingoverallpossibleycomponentsofapopulationofsizem,dym,wesee NowP(~c;dymjf;m)=P(~cjdym;f;m;a)P(dymjf;m;a).Moreover,theprobabilityofobtainingahistogram~cgivenf,d,manda,P(~cjdym;f;m),dependsonlyontheyvaluesof populationdm.therefore XfP(~cjf;m;a)=Xf;dymP(~cjdym)P(dymjf;m;a) XfP(~cjf;m;a)=Xf;dymP(~c;dymjf;m;a) ToprovethattheexpressioninEq.(2)isindependentofaitsucestoshowthatfor =XdymP(~cjdym)XfP(dymjf;m;a) allmanddym,pfp(dymjf;m;a)isindependentofa,sincep(~cjdym)isindependentofa.we willprovethisbyinductiononm. possiblevaluefordy1isf(dx1),sowehave: Form=1wewritethepopulationasd1=fdx1;f(dx1)gwheredx1issetbya.Theonly whereisthekroneckerdeltafunction. XfP(dy1jf;m=1;a)=Xf(dy1;f(dx1)) whichhavecostdy1atpointdx1.thereforethatsumequalsjyjjxj 1,independentofdx1: Nowwhenwesumoverallpossiblecostfunctions(dy1;f(dx1))is1onlyforthosefunctions whichisindependentofa.thisbasestheinduction. XfP(dy1jf;m=1;a)=jYjjXj 1 dym,thensoalsoispfp(dym+1jf;m+1;a).thiswillcompletetheproofofthenflresult. Wenowestablishtheinductivestep,thatifPfP(dymjf;m;a)isindependentofaforall 7
Westartbywriting P(dym+1jf;m+1;a)=P(fdym+1(1);:::;dym+1(m)g;dym+1(m+1)jf;m+1;a) sowehave =P(dym;dym+1(m+1)jf;m+1;a) XfP(dym+1jf;m+1;a)=XfP(dym+1(m+1)jdym;f;m+1;a)P(dymjf;m+1;a): =P(dym+1(m+1)jdm;f;m+1;a)P(dymjf;m+1;a) weexpandoverthesepossiblexvalues,getting Thenewyvalue,dym+1(m+1),willdependonthenewxvalue,fandnothingelse.So XfP(dym+1jf;m+1;a)=Xf;xP(dym+1(m+1)jf;x)P(xjdym;f;m+1;a) =Xf;x(dym+1(m+1);f(x))P(xjdym;f;m+1;a) P(dymjf;m+1;a) expandindxmtoremovethefdependenceinp(xjdym;f;m+1;a): Nextnotethatsincex=a(dxm;dym),itdoesnotdependdirectlyonf.Consequentlywe P(dymjf;m+1;a): XfP(dym+1jf;m+1;a)=X =Xf;dxm(dym+1(m+1);f(a(dm)))P(dmjf;m;a) f;x;dxm(dym+1(m+1);f(x))p(xjdm;a)p(dxmjdym;f;m+1;a) P(dymjf;m+1;a) whereusewasmadeofthefactthatp(xjdm;a)=(x;a(dm))andthefactthatp(dmjf;m+ 1;a)=P(dmjf;m;a). pointsrestrictedtodxmandthosepointsoutsideofdxm.p(dmjf;m;a)willdependonthef valuesdenedoverpointsoutsidedxm.(recallthata(dxm)62dxm.)sowehave valuesdenedoverpointsinsidedxmwhile(dym+1(m+1);f(a(dm)))dependsonlyonthef Wedothesumovercostfunctionsfrst.Thecostfunctionisdenedbothoverthose XfP(dym+1jf;m+1;a)=XdxmX X f(x2dxm) (dym+1(m+1);f(a(dm))): P(dmjf;m;a) ThesumPf(x62dxm)contributesaconstant,jYjjXj m 1,equaltothenumberoffunctions (3) denedoverpointsnotindxmpassingthrough(dxm+1(m+1);f(a(dm))).so XfP(dym+1jf;m+1;a)=jYjjXj m 1f(x2dxm);dxm X 8 P(dmjf;m;a)
= jyjxf;dxmp(dmjf;m;a) Byhypothesistherighthandsideofthisequationisindependentofa,sothelefthandside jyjxfp(dymjf;m;a) 1 mustalsobe.thiscompletestheproofofthenflresult. ofcostvaluesaftermstepsmustalsobeindependentofa.however,italsofollowsthat result.sincethesumpfp(dymjf;m;a)isindependentofa,itfollowsthatthehistograms thedistributionovertimeorderedpopulations(thedym)arealsoidenticalforalla.sowhen WenoteinpassingthattheproofoftheNFLtheoremcanbeusedtoderiveastronger theorderingofcostvaluesisimportant(e.gwhenyouwouldliketogettolowcostquickly) thereisstillnowaytodistinguishbetweenalgorithmswhenweaverageoverallf. ndobjectionable.theseare:i)thebanningofalgorithmsthatmightrevisitthesamepoints 2.2 Therearetworestrictionsonthedenitionofsearchalgorithmsusedsofarthatonemight Moregeneralkindsofsearch eitheralgorithmsthatrevisitpointsand/orarealgorithmsthatarestochastic.sothereis ratherthandeterministically.fortunately,thenflresultcaneasilybeextendedtoinclude nolossofgeneralityinourdenitionofa\searchalgorithm". inxafterplacingthemindx;andii)thebanningofalgorithmsthatworkstochastically algorithma0by\skippingoverallduplications"inthesequenceoffx;ygpairsproduced algorithm\potentiallyretracing".givenapotentiallyretracingalgorithma,produceanew givensome(perhapsempty)d,thealgorithmmightproduceapointx2dx.callsuchan Toseethis,saywehaveadeterministicalgorithma:d2D!fxjx2Xg,sothat originalalgorithmacannotgetstuckforeverinsomesubsetofd,wecanalwaysproduce bythepotentiallyretracingalgorithm.formally,foranyd,a0(d)isdenedastherstx valuefromthesequencefa(;);a(d);a(a(d));:::gthatisnotcontainedindx.solongasthe suchana0froma.(wecanndnoreasontodesignone'salgorithmtonothavean\escape thata0isa\compacted"versionofa. mechanism"thatensuresthatitcannotgetstuckforeverinsomesubsetofd.)wewillsay intheprevioussubsection.thereforetheyobeythenflresultofthatsubsection.sothe thatequationtobethenumberofdistinctpointsinthedx'sproducedbythealgorithms,in NFLresultinEq.(1)holdsevenforpotentiallyretracingalgorithms,ifweredene`m'in Nowanytwocompactedalgorithmsare\searchalgorithms"inthesensethetermisused question,andifweredene`~c'tobethehistogramcorrespondingtothosemdistinctpoints. bylookingatthed'stheyproduceaftersamplingf(x)thesamenumberoftimes.thisis distinctevaluationsoff(x).soitmakessensetocomparepotentiallyretracingalgorithms notbylookingatthed'stheyproduceafterbeingrunthesamenumberoftimes,butrather Moreover,ourreal-worldcostinusinganalgorithmisusuallysetbythenumberof consistentwithusingourredenedmand~c. 9
isstillwell-dened.onlyratherthanbeingdeterministic,thatcompactedalgorithmis stochastic.thisbringsustothegeneralissueofhowtoadaptouranalysistoaddress bestochastic(e.gsimulatedannealing).inthiscasethecompactedversionofthealgorithm Notethatthexatwhichapotentiallyretracingalgorithmbreaksoutofacyclemight stochasticsearchalgorithms. amappingtakinganydtoa(d-dependent)distributionoverxthatequalszeroforallx2dx. Socanbeviewedasa\hyper-parameter",specifyingthefunctionP(dxm+1(m+1)jdm;) forallmandd. Letbeastochasticnon-potentiallyretractingalgorithm.Formally,thismeansthatis stillholds.sothatnflresultholdsevenforstochasticsearchalgorithms.therefore, bythesamereasoningusedtoestablishtheno-free-lunchresultforpotentiallyretracing fordeterministicalgorithms,justwithareplacedbythroughout.doingso,everything Giventhisdenitionof,wecanfollowalongwiththederivationoftheNFLresult deterministicalgorithms,theno-free-lunchresultholdsforpotentiallyretracingstochastic algorithms. speciedthroughp(f))butdon'tincorporatethatknowledgeintoa,thenwehavenoassurancesthatawillbeeective;wearesimplyrelyingonafortuitousmatchingbetween Intuitively,theNFLtheoremillustratesthatevenifweknowsomethingaboutf(perhaps 3 Ageometricinterpretation fanda.thispointisformallyestablishedbyviewingthenfltheoremfromageometric perspective. obtainingsomehistogram,~c,givenmdistinctcostevaluationsusingalgorithmais Considerthespaceofpossiblecostfunctions.Asmentionedbefore,theprobabilityof wherep(f)isthepriorprobabilitythattheoptimizationproblemathandhascostfunction P(~cjm;a)=XfP(~cjm;a;f)P(f): f.wecanviewtheright-handsideofthisequalityasaninnerproductinf: Theorem:DenetheF-spacevectors~vc;a;mand~pby~vc;a;m(f)P(~cjm;a;f)and~p(f) P(f).Then yourcostfunctiongoesintotheprior,~p,overcostfunctions.~ccanbeviewedasxedto Thisisanimportantequation.Anyglobalknowledgeyouhaveaboutthepropertiesof P(~cjm;a)=~vc;a;m~p (4) theconstraintsonthetimewehavetorunouroptimizationalgorithm.thustheoptimal thehistogramyouwanttoobtain(usuallyonewithalowcostvalue),andmisgivenby algorithmisthatwhichhasthelargestprojectiononto~p.alternatively,wecandispense 10
P(f)must\match"a. E(~cjm;a;f).(Similarlyforany\performancemeasure"(~c).Ineithercase,weseethat with~cbyaveragingoverit,toseethate(~cjm;a)isaninnerproductbetween~p(f)and P(f)canbedicult.Consider,forexample,doingTSPproblemswithNcities.Sowe're onlyconsideringcostfunctionsthatcorrespondtosuchaproblem.nowtothedegree thatanypractitionerwouldattackalln-citytspcostfunctionswiththesamealgorithm, Ofcourse,exploitingthisinpracticeisadicultexercise.Evenwritingdownareasonable thatpractitionerimplicitlyignoresdistinctionsbetweensuchcostfunctions.inthis,that practitionerhasimplicitlyagreedthattheproblemisoneofhowtheirxedalgorithmdoes acrossthesetofn-citytspcostfunctions,ratherthanofhowwelltheiralgorithmdoesfor thefactthatitisrestrictedton-citytspproblems,maybeverydiculttodisentangle. thoughthecostfunctionwerenotxed,butisinsteaddescribedbyap(f)thatequals0for allcostfunctionsotherthann-citytspcostfunctions.howeverthedetailsofp(f),beyond someparticularn-citytspproblemtheyhaveathand.inotherwords,theyareactingas ofahasthesimpleinterpretationthatforaparticular~candm,allalgorithmsahavethe sameprojectionontothediagonal,thatisvc;a;m~1=cst(~c;m).fordeterministicalgorithms thecomponentsofvc;a;m(i.e.,theprobabilitiesthatalgorithmagiveshistogram~concost Takingthegeometricview,thenofreelunchresultthatPfP(~cjf;m;a)isindependent alsoimpliespfp2(~cjm;a;f)=cst(~c;m).geometrically,thismeansthatthelengthof~vc;a;m isindependentofa. functionfaftermdistinctcostevaluations)arealleither0or1sothenofreelunchresult thesubsetofthebooleanhypercubehavingthesamehammingdistancefrom~0. onto~1.becausethecomponentsof~vc;a;marebinarywemightalsoview~vc;a;maslyingon Thusallvectors~vc;a;mhavethesamelengthandlieonaconewithconstantprojection ~c.thisisingeneralanjfj 2dimensionalmanifold(wherewerecallthatjFjjYjjXjis particular~c.thealgorithmsinthissetmustlieintheintersectionof2cones oneabout thediagonal,setbytheno-free-lunchtheorem,andonebyhavingthesameprobabilityfor Nowrestrictattentiontothesetofalgorithmsthathavethesameprobabilityofsome thenumberofpossiblecostfunctions).ifwerequireequalityofprobabilityonyetmore~c, wegetyetmoreconstraints. ofthishypercube. InSection5wecalculatetwoquantitiesconcerningthedistributionof~vc;a;macrossvertices TheNFLtheoremdoesnotaddressminimaxpropertiesofsearch.Forexample,saywe're consideringtwodeterministicalgorithms,a1anda2.itmayverywellbethatthereexist 4 Minimaxdistinctionsbetweenalgorithms costfunctionsfsuchthata1'shistogramismuchbetter(accordingtosomeappropriate qualitymeasure)thana2's,butnocostfunctionsforwhichthereverseistrue.forthe NFLtheoremtobeobeyedinsuchascenario,itwouldhavetobetruethattherearemany betterforallthosef.forsuchascenario,inacertainsensea1hasbetter\head-to-head" morefforwhicha2'shistogramisbetterthana1'sthanvice-versa,butitisonlyslightly 11
minimaxbehaviorthana2;therearefforwhicha1beatsa2badly,butnoneforwhicha1 doessubstantiallyworsethana2. denitioncanbeusedifoneisinsteadinterestedin(~c)ordymratherthan~c.) gorithmsa1anda2ithereexistsaksuchthatforatleastonefe(~cjf;m;a1) E(~cj f;m;a2)=k,butthereisnofsuchthate(~cjf;m;a2) E(~cjf;m;a1)=k.(Asimilar Formally,wesaythatthereexistshead-to-headminimaxdistinctionsbetweentwoal- moredicultthananalyzingaveragebehavior(likeinthenfltheorem).presently,very littleisknownaboutminimaxbehaviorinvolvingstochasticalgorithms.inparticular,itis notknownifinsomesenseastochasticversionofadeterministicalgorithmhasbetter/worse Itappearsthatanalyzinghead-to-headminimaxpropertiesofalgorithmsissubstantially todeterministicalgorithms,onlyanextremelypreliminaryunderstandingofminimaxissues hasbeenreached. minimaxbehaviorthanthatdeterministicalgorithm.infact,evenifwestickcompletely Whatwedoknowisthefollowing.Considerthequantity fordeterministicalgorithmsa1anda2(bypa(a)ismeantthedistributionofarandom XfPdym;1;dym;2(z;z0jf;m;a1;a2); andthata2producesapopulationwithycomponentsz0. numberoffsuchthatitisbothtruethata1producesapopulationwithycomponentsz variableaevaluatedata=a).fordeterministicalgorithms,thisquantityisjustthe Theorem:Ingeneral, interchangeofzandz0: InappendixB,itisprovenbyexamplethatthisquantityneednotbesymmetricunder Thismeansthatundercertaincircumstances,evenknowingonlytheYcomponentsofthe XfPdym;1;dym;2(z;z0jf;m;a1;a2)6=XfPdym;1;dym;2(z0;zjf;m;a1;a2): (5) thingconcerningwhatalgorithmproducedeachpopulation. populationsproducedbytwoalgorithmsrunonthesame(unknown)f,wecaninfersome- NowconsiderthequantityXfPC1;C2(z;z0jf;m;a1;a2); againfordeterministicalgorithmsa1anda2.thisquantityisjustthenumberoffsuchthat itisbothtruethata1producesahistogramzandthata2producesahistogramz0.ittoo statementthentheasymmetryofdy'sstatement,sinceanyparticularhistogramcorresponds tomultiplepopulations. neednotbesymmetricunderinterchangeofzandz0(seeappendixb).thisisastronger a1anda2suchthatforsomefa1'shistogramismuchbetterthana2's,butfornof'sisthe reverseistrue.toinvestigatethisprobleminvolveslookingoverallpairsofhistograms(one Itwouldseemthatneitherofthesetworesultsdirectlyimpliesthattherearealgorithms 12
foreachf)suchthatthereisthesamerelative\quality"betweenbothhistograms.simply havinganinequalitybetweenthesumspresentedabovedoesnotseemtodirectlyimplythat therelativequalitybetweentheassociatedpairofhistogramsisasymmetric.(toformally establishthiswouldinvolvecreatingscenariosinwhichthereisaninequalitybetweenthe sums,butnohead-to-headminimaxdistinctions.suchananalysisisbeyondthescopeof thispaper.) forallothers.insuchacase,pfpdym;1;dym;2(z1;z2jf;m;a1;a2)isjustthenumberoffthatresultinthepair(z1;z2).sopfpdym;1;dym;2(z;z0jf;m;a1;a2)=pfpdym;1;dym;2(z0;zjf;m;a1;a2tic,thenforanyparticularfpdym;1;dym;2(z1;z2jf;m;a1;a2)equals1forone(z1;z2)pair,and0 therearehead-to-headminimaxdistinctions.forexample,ifbothalgorithmsaredeterminis- Ontheotherhand,havingthesumsequaldoescarryobviousimplicationsforwhether impliesthattherearenohead-to-headminimaxdistinctionsbetweena1anda2.theconverse doesnotappeartoholdhowever.1 denethefollowingmeasureofthe\quality"overtwo-elementpopulations,q(dy2): canexploittheresultinappendixb,whichconcernsthecasewherejxj=jyj=3.first, Asapreliminaryanalysisofwhethertherecanbehead-to-headminimaxdistinctions,we ii)q(y1;y2)=q(y2;y1)=0. i)q(y2;y3)=q(y3;y2)=2. iii)qofanyotherargument=1. histogramfy2;y3ganda2generatesfy1;y2g). thatforonefa1generatesthehistogramfy1;y2ganda2generatesthehistogramfy2;y3g, butthereisnofforwhichthereverseoccurs(i.e.,thereisnofsuchthata1generatesthe InappendixBweshowthatforthisscenariothereexistpairsofalgorithmsa1anda2such otherfforwhichthedierenceis-2.forthisqthen,algorithma2isminimaxsuperiorto ThedierenceintheQvaluesforthetwoalgorithmsis2forthatf.Howeverthereareno betweena1anda2.foronefthequalityofalgorithmsa1anda2arerespectively0and2. Sointhisscenario,withourdenedmeasureof\quality",thereareminimaxdistinctions algorithma1. maxifdym(i)gtherearenominimaxdistinctionsbetweenalgorithms. distinctionsbetweenthealgorithms.asanexample,itmaywellbethatforq(dym)= Moregenerally,atpresentnothingisknownabout\howbigaproblem"thesekindsof ItisnotcurrentlyknownwhatrestrictionsonQ(dym)areneededfortheretobeminimax asymmetriesare.alloftheexamplesoftheasymmetriesarisewhenthesetofxvaluesa1 lunchtheorem,thesumofallnumbersinrowzequalsthesumofallnumbersincolumnz.thesetwo point's(z;z0)pair.thenourconstraintsarei)bythehypothesisthattherearenohead-to-headminimax distinctions,ifgridpoint(z1;z2)isassignedanon-zeronumber,thensois(z2;z1);andii)bytheno-free- 1Considerthegridofall(z;z0)pairs.Assigntoeachgridpointthenumberoffthatresultinthatgrid andcolumns.althoughagain,likebefore,toformallyestablishthispointwouldinvolveexplicitlycreating constraintsdonotappeartoimplythatthedistributionofnumbersissymmetricunderinterchangeofrows searchscenariosinwhichitholds. 13
those\certainproperties"isnotyetinhand.norisitknownhowgenerictheyare,i.e.,for ofhowthealgorithmsgeneratedtheoverlap,asymmetryarises.aprecisespecicationof whatpercentageofpairsofalgorithmstheyarise.althoughsuchissuesareeasytostate hasvisitedoverlapswiththosethata2hasvisited.givensuchoverlap,andcertainproperties (seeappendixb),itisnotatallclearhowbesttoanswerthem. donotoverlap.suchassuranceshold,forexample,ifwearecomparingtwohill-climbing assurances,therearenoasymmetriesbetweenthetwoalgorithmsform-elementpopulations. algorithmsthatstartfarapart(onthescaleofm)inx.itturnsoutthatgivensuch Howeverconsiderthecasewhereweareassuredthatinmstepstwoparticularalgorithms Doingthisestablishesthefollowing: thoseargumentstothequantitypfpdym;1;dym;2(z;z0jf;m;a1;a2)ratherthanp(~cjf;m;a). Toseethisformally,gothroughtheargumentusedtoprovetheNFLtheorem,butapply Theorem:Ifthereisnooverlapbetweendxm;1anddxm;2,then Animmediateconsequenceofthistheoremisthatundertheno-overlapconditions,PfPC1;C2(z;z0j XfPdym;1;dym;2(z;z0jf;m;a1;a2)=XfPdym;1;dym;2(z0;zjf;m;a1;a2): (6) f;m;a1;a2)issymmetricunderinterchangeofzandz0,asarealldistributionsdetermined isalwaysoverlaptoconsider.sothereisalwaysthepossibilityofasymmetrybetween extrema). fromthisoneoverc1andc2(e.g.,thedistributionoverthedierencebetweenthosec's algorithmsifoneofthemisstochastic. Notethatwithstochasticalgorithms,iftheygivenon-zeroprobabilitytoalldxm,there Werstcalculatethefractionofcostfunctionswhichgiverisetoaspecichistogram~cusing 5algorithmawithmdistinctcostpoints.Thiscalculationallowsus,forexample,toanswer Informationtheoreticaspectsofsearch thefollowingquestion: distinctcostevaluationschosenbyusingageneticalgorithm?" \Whatfractionofcostfunctionswillgiveaparticulardistributionofcostvaluesafterm thisbecauseitmeansthatthefractionisindependentofthealgorithm!sowecananswer thequestionbyusinganalgorithmforwhichthecalculationisparticularlyeasy. Thismayseemanintractablequestion,buttheNFLresultallowsustoanswerit.Itdoes x1;x2;:::;xm.recallthatthehistogram~cisspeciedbygivingthefrequenciesofoccurrence, acrossthex1;x2;:::;xm,foreachofthejyjpossiblecostvalues. ThealgorithmwewilluseisonewhichvisitspointsinXinsomecanonicalorder,say justthemultinomialgivingthenumberofwaysofdistributingthecostvaluesin~c.atthe remainingjxj mpointsinxthecostcanassumeanyofthejyjfvalues. Nowthenumberoff'sgivingthedesiredhistogramunderourspeciedalgorithmis 14
binsin~carescaledbythesameamount.bytheargumentoftheprecedingparagraph,the fractionweareinterestedin,f(~),isgivenbythefollowing: Itwillbeconvenienttodene~1m~c.Notethatthisisinvariantifthecontentsofall ~c=m~isgivenby Theorem:Foranyalgorithm,thefractionofcostfunctionsthatresultinthehistogram f(~)=c1c2cjyjjyjjxj m mjyjjxj =c1c2cjyj jyjm m Stirling'sapproximationtoorderO(1=m),whichisvalidwhenalloftheciarelarge: Accordingly,f(~)canberelatedtotheentropyof~cinthestandardwaybyusing : (7) ln c1c2cjyj!=mlnm jyj m =ms(~)+12h(1 jyj)lnm jyj Xi=1cilnci+12hlnm jyj Xi=1lnii Xi=1lncii wheres(~)= PjYj thefractionofcostfunctionsisgivenbythefollowingformula: Corollary: i=1ilniistheentropyofthehistogram~c.thusforlargeenoughm, wherec(m;jyj)isaconstantdependingonlyonmandjyj. f(~)=c(m;jyj)ems(~) QjYj i=11=2 i: (8) Eq.(8)canbefoundbysummingoverall~lyingontheunitsimplex.Thedetailsofsuch correspondingtothezero-valued~i.howeveryisdened,thenormalizationconstantof acalculationcanbefoundin[15]. Ifsomeofthe~iare0,Eq.(8)stillholds,onlywithYredenedtoexcludethey's algorithmsthatgiverisetoaparticular~c?" \Onagivenvertexoff-space(i.e.,foragivencostfunction),whatisthefractionofall Wenextturntoarelatedquestion: allx)ofcostvalues.specifythishistogramby~;thereareni=ijxjpointsinxfor whichf(x)hasthei'thyvalue. Forthisquestion,theonlysalientfeatureoffisitshistogram(formedbylookingacross withthefollowingintuitivelyreasonableresult,formallyproveninappendixa: leadingorderonthekullback-liebler\distance"[3]between~and~.toseethis,westart Callthefractionweareinterestedinalg(~;~).Itturnsoutthatalg(~;~)dependsto 15
toahistogram~c=m~isgivenby Theorem:Foragivenfwithhistogram~N=jXj~,thefractionofalgorithmsthatgiverise alg(~;~)=qjyj i=1ni jxj m: ci costvaluesfromx.2 Thenormalizationfactorinthedenominatorissimplythenumberofwaysofselectingm (9) ciarelarge: TheproductofbinomialscanbeapproximatedviaStirling'sequationwhenbothNiand lnjyj Yi=1 ci!=jyj Xi=1 12ln2+NilnNi cilnci (Ni ci)ln(ni ci)+ z z2=2 :::,tosecondorderinci=niwehave Weassumeci=Ni1,whichisreasonablewhenmjXj.Sousingtheexpansionln(1 z)= 12(lnNi ln(ni ci) lnci): lnjyj Yi=1 ci!=jyj Xi=1ciln(Ni ci 2Nici 1+ ci) 12lnci+ci 12ln2 Intermsof~and~wenallyobtain(usingm=jXj1) lnjyj Yi=1 Ni ci!= mdkl(~;~)+m mln(m jyj Xi=112ln(im)+m 2jXj(i jxj) jyj i)(1 im+); 2ln2 wheredkl(~;~)piiln(i=i)isthekullback-lieblerdistancebetweenthedistributions Corollary: ~and~. Thusthefractionofalgorithmsisgivenbythefollowing: wheretheconstantcdependsonlyonm,jxj,andjyj. alg(~;~)=c(m;jxj;jyj)e mdkl(~;~) Asbefore,Ccanbecalculatedbysumming~overtheunitsimplex. QjYj i=11=2 i : (10) 2ItcanalsobedeterminedfromtheidentityP~c(Pici;m)Qi Ni 16 ci= PiNi m.
Inthissectionwecalculatecertain\benchmark"performancemeasuresthatallowusto assesstheecacyofanysearchalgorithm. 6 Measuresofalgorithmperformance interestedinp(min(~c)>jf;m;a),whichistheprobabilitythattheminimumcostan f.weconsiderthreequantitiesthatarerelatedtothisconditionalprobabilitythatcanbe algorithmandsinmdistinctevaluationsislargerthan,giventhatthecostfunctionis Considerthecasewherelowcostispreferabletohighcost.Theningeneralweare usedtogaugeanalgorithm'sperformance: ii)thesecondistheformthisconditionalprobabilitytakesfortherandomalgorithm, i)therstquantityistheaverageofthisprobabilityoverallcostfunctions. iii)thethirdisthefractionofalgorithmswhich,foraparticularfandm,resultina~c whoseminimumexceeds. whosebehaviorisuncorrelatedwiththecostfunction. jobṙecallthattherearejyjdistinctcostvalues.withnolossofgeneralityassumethei'th whenusedintherealworld;anyalgorithmthatdoesn'tsurpassthemisdoingaverypoor Thesemeasuresgiveusbenchmarkswhichalltruly\intelligent"algorithmsshouldsurpass increments. costvaluesequalsi.socostvaluesrunfromaminimumof1toamaximumofjyjininteger Therstofourbenchmarkmeasuresis PfP(min(~c)>jf;m;a) Pf1 =Pdym;fP(min(dym)>jdym)P(dymjf;m;a) thatmin(c)=min(dym). whereinthelastlinewehavemarginalizedoveryvaluesofpopulationsofsizemandnoted jyjjxj (11) a.inparticular,itequals1ifthefollowingconditionsaremet i)f(dxm(1))=dym(1) NowconsiderPfP(dymjf;m;a).Thesummandequals0or1forallfanddeterministic iii)f(a[dm(1);dm(2)])=dym(3) ii)f(a[dm(1)])=dym(2) atallotherpoints.thereforexfp(dymjf;m;a)=jyjjxj m: Theserestrictionswillalwaysxthevalueoff(x)atexactlympoints.fiscompletelyfree ::: 17
UsingthisresultinEq.(11)wend XfP(min(~c)>jf;m)= jyjmxdymp((min(dym)>jdym) = jyjm(jyj )m: 1dym3min(dym)>1 X Theorem: Thisestablishesthefollowing: where!()1 =jyjisthefractionofcostlyingabove. XfP(min(~c)>jf;m)=!m(): (12) Corollary:InthelimitofjYj!1, Animmediatecorrolaryisthefollowing: PfE(min(~c)jf;m) Proofsketch:WritePfE(min(~c)jf;m)=PjYj jyj = m+1: 1 =1[!m( 1)!m()]andsubstituteinfor (13)!().Thenreplacethroughoutwith+1.ThisturnsoursumintoPjYj 1 usethefactthatisgoingto0tocanceltermsinthesummand.carryingthroughthe by.totakethelimitof!0,applyl'hopital'sruletotheratiointhesummand.next!yj)m (1 +1!Yj)m].Next,writejYj=b=forsomeb.Multiplyanddivideoursummand =0[+1][(1 algebra,anddividingbyb=,wegetariemannsumoftheformmb2rb0dxx(1 x=b)m 1. Evaluatingtheintegralgivestheresultclaimed.QED. randomlychosencostfunction.(benchmarksthattakeaccountoftheactualcostfunction thedropassociatedwiththeseresults,onemightarguethatthatalgorithmisnotsearching verywell.afterall,thealgorithmisdoingnobetterthanonewouldexpectittofora Inarealworldscenario,unlessone'salgorithmhasitsbest-cost-so-fardropfasterthan athandarepresentedbelow.) informationfromthecurrentpopulation.marginalizingoverhistograms~c,theperformance ofmfortherandomalgorithm,~a,whichpickspointsinxcompletelyrandomly,usingno of~ais Nextwecalculatetheexpectedminimumofthecostvaluesinthepopulationasafunction P(min(~c)jf;m;~a)=X~cP(min(~c)j~c)P(~cjf;m;~a) 18
hasbeencalculatedpreviouslyasqjyj histogram~nofthefunctionf.(thiscanbeviewedasthedenitionof~a.)thisprobability NowP(~cjf;m;~a)istheprobabilityofobtaininghistogram~cinmrandomdrawsfromthe (jxj i=1(ni m)).so ci) P(min(~c)jf;m;~a)= jxj mmx 1c1=0mX cjyj=0(jyj jyj Xi=1ci;m)P(min(~c)j~c) Yi=1 Ni ci! = =PjYj jxj mmxc=0mx 1 cjyj=0(jyj Xi=ci;m)jYj Yi= Ni ci! jxj i=ni ()jxj jxj m m (seefootnoteone) Theorem:Fortherandomalgorithm~a, Thisestablishesthefollowing: (14) where()pjyj i=ni=jxjisthefractionofpointsinxforwhichf(x). P(min(~c)jf;m;~a)=m 1 Yi=0() i=jxj 1 i=jxj: (15) Corollary: Torstorderin1=jXjthistheoremgivesthefollowingresult: Notethattheseresultsallowustocalculateotherquantitiesofinterest,like P(min(c)>jf;m;~a)=m()1 m(m 1)(1 ()) 2() jxj+:::: 1 (16) E(min(~c)jf;m;~a)= Theseresultsalsoprovideausefulbenchmarkagainstwhichanyalgorithmmaybecompared. X=1[P(min(~c)jf;m;~a) P(min(~c)+1jf;m;~a)]: jyj NoteinparticularthatformanycostfunctionscostvaluesaredistributedGaussianly.For 19
suchacase,ifthemeanandvarianceofthegaussianareandrespectively,then()= whichresultina~cwhoseminimumexceedsisgivenby ministic)algorithma,p(~c>jf;m;a)iseither1or0.thereforethefractionofalgorithms erfc(( )=p2)=2,whereerfcisthecomplimentaryerrorfunction. Tocalculatethethirdperformancemeasure,notethatforxedfandm,forany(deter- PaP(min(~c)>jf;m;a) j~c)pap(~cjf;m;a).howevertheratioofthisquantitytopa1isexactlywhatwe Expandingintermsof~c,wecanrewritethenumeratorofthisratioasP~cP(min(~c)> Pa1 : (15)).Thisestablishesthefollowing: Theorem:Forxedfandm,thefractionofalgorithmswhichresultina~cwhoseminimum calculatedwhenweevaluatedmeasureii)(seethebeginningoftheargumentderivingeq. exceedsisgivenbythequantityontheright-handsidesofeqs.(15)and(16). than1/2.forsuchascenario,youralgorithmhasdoneworsethanoverhalfofallsearch ofthe~cproducedinaparticularrunofyouralgorithm,thequantitygivenineq.(16)isless algorithms,forthefandmathand. Soinparticular,considerthescenariowhere,whenevaluatedforequaltotheminimum wellthealgorithm'sperformancecomparestothatoftherandomalgorithm. asmincreases.hereweareinterestedinwhether,asmgrows,thereisanychangeinhow Saythepopulationgeneratedbythealgorithmaaftermstepsisd,anddeney0 Finally,wepresentameasureexplicitlydesignedto\track"analgorithm'sperformance valueofthisnumberofstepsis1 searchalgorithmtosearchx dxandndapointwhoseywaslessthany0.theexpected thatf(x)<y0.nowwecanestimatethenumberofstepsitwouldhavetakentherandom min(~c(d)).letkbethenumberofadditionalstepsittakesthealgorithmtondanxsuch algorithm,onaverage. f(x)<y0.thereforek+1 1=z(d)ishowmuchworseadidthanwouldhavetherandom Sonowimaginelettingarunformanystepsoversometnessfunctionf.Wewishto z(d) 1,wherez(d)isthefractionofX dxforwhich increased.considerthestepwhereandsitsn'thnewvalueofmin(~c).forthatstep, indicatethatsteponourplotasthepoint(n;k+1 1=z(d)).Putdownasmanypointson thereisanassociatedk(thenumberofstepsuntilthenextmin(~c))andz(d).accordingly, makeaplotofhowwelladidincomparisontotherandomalgorithmonthatrun,asm algorithm,thenallthepointsintheplotwillhavetheirordinatevaluesliebelow0.ifthe randomalgorithmwonforanyofthecomparisonsthough,thatwouldmeanapointlying ourplotastherearesuccessivevaluesofmin(~c(d))intherunofaoverf. above0.ingeneral,evenifthepointsalllietoonesideof0,onewouldexpectthatas Ifthroughouttherunaisalwaysabetter\match"tofthanistherandomsearch thesearchprogressesthereiscorresponding(perhapssystematic)variationinhowfaraway 20
from0thepointslie.thatvariationtellsonewhenthealgorithmisenteringharderoreasier partsofthesearch. generatemanyoftheseplotsandthensuperimposethem.thiswouldallowyoutoplotthe onecouldreplacethesinglenumberz(d)characterizingtherandomalgorithmwithafull meanvalueofk+1 1=z(d)asafunctionofnalongwithanassociatederrorbar.(Similarly, Notethatevenforaxedf,byusingdierentstartingpointsforthealgorithmonecould 7distributionoverthenumberofrequiredstepstondanewminimum.) functions.thetime-dependentfunctionsweareconcernedwithstartwithaninitialcost Hereweestablishasetofnofreelunchresultsforacertainclassoftime-dependentcost Time-dependentcostfunctions functionthatispresentwhenwesampletherstxvalue.thenjustbeforethebeginning abijectionbetweenfandf.(notethemappinginducedbytfromftofcanvarywith ofeachsubsequentiterationofthesearchalgorithm,thecostfunctionisdeformedtoanew duringthesamplingoftheithpointasfi+1=ti(fi).weassumethatateachstepi,tiis function,asspeciedbythemappingt:fn!f.3wewritethefunctionpresent theiterationnumber.)ifthisweren'tthecase,theevolutionofcostfunctionscouldnarrow inonaregionoff'sforwhichsomealgorithm,\byluck"asitwere,happenstosamplex twodierentpopulationsofyvalues.asbefore,thepopulationdymisanorderedsetofy ityofthesearchalgorithm.ingeneraltherearetwohistogram-basedschemes,involving valuesthatlieneartheextremizingx. valuescorrespondingtothexvaluesindxm.theparticularyvalueindymmatchingaparticularxvalueindxmisgivenbythecostfunctionthatwaspresentwhenxwassampled. Onedicultywithanalyzingtime-dependentcostfunctionsishowtoassessthequal- ff1(dxm(1));;tm 1(fm 1)(dxm(m))g.Similarly,wehaveDym=fTm 1(fm 1)(dxm(1));;Tm 1(fm 1)(dxm(m foreachofthexvaluesindxm.formallyifdxm=fdxm(1);;dxm(m)gthenwehavedym= Incontrast,thepopulationDymisdenedtobetheyvaluesfromthepresentcostfunction thetimescaleoftheevolutionofthecostfunction.insuchsituationsitmaybeappropriate previouselementsofthepopulationarestillalive,andthereforetheir(current)tnessisof tojudgethequalityofthesearchalgorithmwiththehistograminducedbydym;allthose Insomesituationsitmaybethatthemembersofthepopulation\live"foralongtime,on timescaleofevolutionofthecostfunction,onemayinsteadbeconcernedwiththingslike kindofsituation,itmaymakemoresensetojudgethequalityofthesearchalgorithmwith howwellthelivingmember(s)ofthepopulationtrackthechangingcostfunction.inthat interest.ontheotherhand,ifmembersofthepopulationliveforonlyashorttimeonthe thehistograminducedbydym. toaverageoverallpossiblewaysacostfunctionmaybetime-dependent,i.e.,wewishto avengeoverallt(ratherthanoverallf,asinthenfltheorem).soconsiderthesum 3AnobviousrestrictionwouldbetorequirethatTdoesn'tvarywithtime,sothatitisamappingsimply HerewederiveNFLresultsforbothcriteria.InanalogywiththeNFLtheorem,wewish fromftof.ananalysisfort'slimitedthiswayisbeyondthescopeofthispaperhowever. 21
astherstmemberofthepopulationisconcerned.soconsideronlyhistogramsconstructed inform>1,andsincef1isxed,thereareaprioridistinctionsbetweenalgorithmsasfar PTP(~cj;f1;T;m;a)wheref1istheinitialcostfunction.NoterstthatsinceTonlykicks fromthoseelementsofthepopulationbeyondtherst.wewillprovethefollowing: Theorem:Forall~c,m>1,algorithmsa1anda2,andinitialcostfunctionsf1, XTP(~cjf1;T;m;a1)=XTP(~cjf1;T;m;a2): Wewillshowthatthisresultsholdswhether~cisconstructedfromdymorfromDym.InanalogywiththeproofoftheNFLtheorem,wewilldothisbyestablishingthea-independence WewillbeginbyreplacingeachTinthesumwithasetofcostfunctions,fi,onefor XTP(~cjf;T;m;a)=XTXdxmX (17) ofptp(~cjf;t;m;a). eachiterationofthealgorithm.todothis,westartwiththefollowing: =XdxmX f2fmp(~cj~f;dxm)p(dxmj~f;m;a) P(f2fm;dxmjf1;T;m;a) f2fmp(~cj~f;dxm;t;m;a) wherewehaveindicatedthesequenceofcostfunctions,fi,bythevector~f=(f1;;fm). XTP(f2fmjf1;T;m;a); formally,usingfi+1=ti(fi),wewrite theseriesisoverthevaluestcantakeforoneparticulariterationofthealgorithm.more NextwedecomposethesumoverallpossibleTintoaseriesofsums.Eachsumin XTP(~cjf;T;m;a)=XdxmX XT1(f2;T1(f1))X f2fmp(~cj~f;dxm)p(dxmj~f;m;a) (NotethatPTP(~cjf;T;m;a)isindependentofthevaluesofTi>m 1,sowecanabsorbthose Tm 1(fm;Tm 1(Tm 2(T1(f1)))): valuesintoanoveralla-independentproportionalityconstant.) numberofbijectionsoffthatmapthatxedcostfunctiontofm.thisisjustaconstant, indicest1:::tm 2.NowforxedvaluesoftheoutersumindicesTm 1(Tm 2(T1(f1))) isjustsomexedcostfunction.accordinglytheinnermostsumovertm 1issimplythe Nowlookattheinnermostsum,overTm 1,forsomexedvaluesoftheoutersum (jfj 1)!. 22
SowecandotheTm 1sum,andarriveat XTP(~cjf;T;m;a1)/XdxmX XT1(f2;T1(f1))X f2fmp(~cj~f;dxm)p(dxmj~f;m;a) Tm 1.Infact,allthesumsoverallTicanbedone,leavinguswith NowwecandothesumoverTm 2,intheexactsamemannerwejustdidthesumover Tm 2(fm 1;Tm 2(Tm 3(T1(f1)))): XTP(~cjf;T;m;a1)/XdxmX =XdxmXf2fmP(~cj~f;dxm)P(dxmj~f;m;a) (Inthelaststepwehaveexploitedthestatisticalindependenceofdxmandfm.) ToproceedfurtherwemustdecideifweareinterestedinhistogramsformedfromDymor f2fmp(~cj~f;dxm)p(dxmjf1fm 1;m;a): (18) dym.webeginwithanalysisofthedymcase.forthiscasep(~cj~f;dxm)=p(~cjfm;dxm),since Dymonlyreectscostvaluesfromthelastcostfunction,fm.Pluggingthisinweget XTP(~cjf;T;m;a1)/XdxmX histogramcfromcostvaluesdrawnfromfm.thisconstantwillinvolvethemultinomial Thenalsumoverfmisaconstantequaltothenumberofwaysofgeneratingthe f2fm 1P(dxmjf1fm 1;m;a)XfmP(~cjfm;dxm) theadependence. coecientm theparticulardxm.becauseofthiswecanevaluatethesumoverdxmandtherebyeliminate c1cmandsomeotherfactors.theimportantpointisthatitisindependentof ThiscompletestheproofofEq.(17)forthecasewhere~cisconstructedfromDym. XTP(~cjf;T;m;a)/ f2fm 1XdxmP(dxmjf1fm 1;m;a)/1 X considerablymoredicultsincewecannotsimplifyp(~cj~f;dxm)andthuscannotdecouple thesumsoverfi.nevertheless,thenflresultstillholds.toseethiswebeginbyexpanding Eq.(18)overpossibledymvalues. NextweturnthecasewhereweareinterestednotinDymbutindym.Thiscaseis XTP(~cjf;T;m;a)/XdxmX =XdymP(~cjdym)XdxmX f2fmxdymp(~cjdym)p(dymj~f;dxm) P(dxmjf1fm 1;m;a) myi=1(dym(i);fi(dxm(i))) f2fmp(dxmjf1fm 1;m;a) 23 (19)
areleftwithxtp(~cjf;t;m;a)/xdymp(~cjdym)xdxmx term.soitcontributespfm(dym(m);fm(dxm(m))).thisisaconstant,equaltojyjjxj 1.We Thesumovertheinner-mostcostfunction,fm,onlyhasaneectonthe(dym(i);fi(dxm(i))) m 1 Yi=1(dym(i);fi(dxm(i))): f2fm 1P(dxmjf1fm 1;m;a) Thesumoverdxm(m)isnowtrivial,sowehave XTP(~cjf;T;m;a) /XdymP(~cjdym)X m 1 Yi=1(dym(i);fi(dxm(i))): dxm(1)x dxm(m 1)X f2fm 1P(dxm 1jf1fm 2;m;a) mannertotheschemeweusedtoevaluatethesumsoverfmanddxm(m)thatexistedin remainingpopulationofsizem 1ratherthanm.Consequently,inanexactlyanalogous Eq.(19),wecanevaluateoursumsoverfm 1anddxm(m 1).Doingsosimplygenerates NownotethattheaboveequationisoftheexactsameformasEq.(19),onlywitha morea-independentproportionalityconstants.continuinginthismanner,weevaluateall thesumsoverthefiandarriveat Nowthereisstillalgorithm-dependenceinthisresult.Howeveritisatrivialdependence; XTP(~cjf;T;m;a1)/XdymP(~cjdym)X dxm(1)p(dxm(1)jm;a)(dym(1);f1(dxm(1))): rithms.(alternatively,wecouldconsiderallpointsinthepopulation,eventherst,and aspreviouslydiscussed,itarisescompletelyfromhowthealgorithmselectstherstxpoint stillgetannflresult,ifinadditiontosummingoveralltwesumoverallf1.)soeven initspopulation,dxm(1).sinceweconsideronlythosepointsinthepopulationthatare inthecasewhereweareinterestedindymthenflresultstillshold,subjecttotheminor generatedsubsequenttotherst,ourresultsaysthatthereisnodistinctionsbetweenalgo- basedondymordym.forexample,onemaywishtonotconsiderhistogramsatall;onemay caveatsdelineatedabove. judgethequalityofthesearchbythetnessofthemostrecentmemberofthepopulation. Thereareotherswayofassessingthequalityofthesearchalgorithmbesideshistograms algorithmsasfarasthisquantityisconcerned. determinepfp(~cjf;t;m;a).infact,ingeneraltherecanbeaprioridistinctionsbetween onemaywishtocharacterizewhattheaspectsareoftherelationshipbetweenaandtthat Similarly,thereareothersumsonecouldlookatbesidesthoseoverT.Forexample, 24
Ximplicitlytakentobeacontiguoussetofintegers).ForthisT,ifaisthealgorithmthat rstsamplesfatx1,nextatx1+1,etc.,regardlessofthevaluesinthepopulation,thenfor theshiftoperator,replacingf(x)byf(x 1)forallx(withmin(x) 1max(x),andwith Asanexampleofsuchdistinctions,saythatforalliterationsofthesearchalgorithm,Tis ~c.sopfp(~cjf;t;m;a)isnotindependentofaingeneral. searchalgorithms,evenforthesameshiftt,thereisnotthisrestrictiononthesetofallowed PfP(~cjf;T;m;a)=0forany~ccontainingcountsinmorethanoneYvaluebin.Forother anyf,thehistograminducedbydymisalwaysmadeupofidenticalyvalues.accordingly, samplesatx1+1,exactlylikealgorithma.ontheotherhand,ifthatvalueishigh,it algorithmlooksattheyvalueoftheitsrstsamplepointx1,andifthatvalueislow,it samplessomepointotherthanx1+1.ingeneral,ifone'sgoalistondminimalyvalues, Indeed,considerthesameshiftT,butusedwithadierentalgorithm,^a.Thisnew 8^acanbeexpectedtooutperforma,evenwhenoneaveragesoverallf. OneobviousdicultywiththeNFLresultsdiscussedaboveisthatonecanalwaysargue\oh, wellintherealworldp(f)isnotuniform,sothenflresultsdonotapply,andtherefore Fixedcostfunctionresults I'mokayinusingmyfavoritesearchalgorithm".Ofcourse,thepremisedoesnotfollowfrom notjustifyanalgorithm.inessence,youmustinsteadmakethemuchbiggerassumption poorlysuitedasoneforwhichitiswellsuited.simplyassumingp(f)isnotuniformcan theproposition.uniformp(f)isatypicalp(f).(theuniformaverageofallp(f)isthe thatp(f)doesn'tfallintothehalfofthespaceofallp(f)inwhichyouralgorithmperforms uniformp(f).)sotheactualp(f)mightjustaseasilybeoneforwhichyouralgorithmis (!)legitimatewayofdefendingaparticularsearchalgorithmagainsttheimplicationsofthe worsethantheuniformp(f). NFLtheorems. ularp(f),andthenarguethatyouralgorithmiswellsuitedtothatp(f).thisistheonly Ultimately,theonlywaytojustifyone'ssearchalgorithmistoargueinfavorofapartic- sweepingthanthenflresults,theseresultsholdnomatterwhattherealworld'sdistribution averagingoverthosesearchalgorithmswhilekeepingthecostfunctionxed.althoughless P(f).Certainsuchresultsapplytowaysofchoosingbetweensearchalgorithms,andinvolve Nonetheless,itisclearlyofinteresttoderiveNFL-typeresultsthatareindependentof examinestwopopulationsdandd0,producedbyaanda0respectively,andbasedonthose overcostfunctionsis. populations,decidestouseeitheraora0forthesubsequentpartofthesearch.asanexample, onechoosingprocedureistochooseaifandonlytheleastcostelementindhaslowercost Letaanda0betwosearchalgorithms.Denea\choosingprocedure"asonethat thantheleastcostelementind0.asanotherexample,a\stupid"choosingprocedurewould chooseaifandonlytheleastcostelementindhashighercostthantheleastcostelement Atthepointthatyouuseachoosingprocedure,youwillhavesampledthecostfunction 25
reasons,wecanassumethatthesearchalgorithmchosenbythechoosingproceduredoes thehistogramc>mwhichisthehistogramformedfromd>m.inaddition,foralltheusual thatcomeafterusingthechoosingalgorithm,thenthehistogramtheuserisinterestedinis atallthepointsind[d[d0.accordingly,ifd>mreferstothesamplesofthecostfunction function,observinghowwellanalgorithmhasdonesofartellsusnothingabouthowwellit notreturntoanypointsind[,withoutlossofgenerality4. woulddoifwecontinuetouseitonthesamecostfunction.(forsimplicity,weonlyconsider forusinganyparticularchoosingalgorithm.looselyspeaking,nomatterwhatthecost Thefollowingtheorem,proveninappendixC,tellsuswehavenoapriorijustication deterministicalgorithms.) Theorem:Letdandd0betwoxedpopulationsbothofsizem,thataregeneratedwhen dierentchoosingprocedures.letkbethenumberofelementsinc>m.then thealgorithmsaanda0respectivelyarerunonthecostfunction.letaandbbetwo (Itisimplicitinthistheoremthatthesumexcludesthosealgorithmsaanda0thatdonot Xa;a0P(c>mjf;d;d0;k;a;a0;A)=Xa;a0P(c>mjf;d;d0;k;a;a0;B): (20) resultindandd0respectivelywhenrunonf.) equally,whenforanygivenfsomepopulationswillbemorelikelythanothers.howevereven ifoneweightspopulationsaccordingtotheirprobabilityofoccurrence,itisstilltruethat, onaverage,thechoosingprocedureoneuseshasnoeectonlikelyc>m.thisisestablished Onemightthinkthattheprecedingtheoremismisleading,sinceittreatsallpopulations bythefollowingcorollary. Corrolary:Undertheconditionsgivenintheprecedingtheorem, Proof:Let\proc"refertoourchoosingprocedure.Weareinterestedin Xa;a0P(c>mjf;m;k;a;a0;A)=Xa;a0P(c>mj;f;m;k;a;a0;B): (21) Xa;a0P(c>mjf;m;k;a;a0;proc)= a;a0;d;d0p(c>mjf;d;d0;k;a;a0;proc) X ithasn'tseenyetbutthata0has(andvice-versa).ratherthanhavethedenitionofasomehowdepend ontheelementsind0 d(andsimilarlyfora0),wedealwiththisproblembydeningc>mtobesetonlyby 4acanknowtoavoidtheelementsithasseenbefore.Howeverapriori,ahasnowaytoavoidtheelements P(d;d0jf;k;m;a;a0;proc): populationd>m. thoseelementsind>mthatlieoutsideofd[.(thisissimilartotheprocedurewedevelopedabovetodeal d[aswellasofd>m.italsomeanstheremaybefewerelementsinthehistogramc>mthanthereareinthe withpotentiallyretracingalgorithms.)formally,thismeansthattherandomvariablec>misafunctionof 26
(i.e.,anyparticularpairofvaluesofdandd0).forthatterm,p(d;d0jf;k;m;a;a0;proc) Pullthesumoverdandd0outsidethesumoveraanda0.Consideranyterminthatsum otherwise.(recallthatweareassumingthataanda0aredeterministic.)thismeansthat isjust1forthoseaanda0thatresultindandd0respectivelywhenrunonf,and0 overdandd0isthesameforchoosingproceduresaandb.thereforethefullsumisthe sameforbothprocedures.qed. consideredinourtheorem.accordingly,ourtheoremtellusthatthesummandofthesum thep(d;d0jf;k;m;a;a0;proc)factorsimplyrestrictsoursumoveraanda0totheaanda0 onewillbechoosingamong. choosingprocedure,onemusttakeintoaccountnotonlyp(f)butalsothesearchalgorithms somechoosingprocedureasfarassubsequentsearchisconcerned.tohaveanintelligent TheseresultstellusthatthereisnoassumptionforP(f)that,byitself,justiesusing thatforxedf1andf2,iff1doesbetter(onaverage)withthealgorithmsinsomeset A,thenf2doesbetter(onaverage)withthealgorithmsinthesetofallotheralgorithms. proceduresafalwaysusealgorithmag,andbfalwaysusealgorithma0g.thiscasemeans Theseresultsalsohaveinterestingimplicationsifoneconsidersthe\degenerate"choosing performancethandoestherandomf,thenthatwell-behavedfgivesworsethanrandom Inparticular,ifforsomefavoritealgorithmsacertain\well-behaved"fresultsinbetter behavioronthesetallremainingalgorithms. P(f),thenstupidchoosingprocedures{likechoosingthealgorithmwiththelessdesirable~c relatedtothetheoremabove[16].translatedintothecurrentcontextthatresultsuggests thatifonerestrictsthesumstoonlybeoverthosealgorithmsthatareagoodmatchto Infact,thingsmayverywellbeworsethanthis.Insupervisedlearning,thereisaresult tobesuperiortoadumboneisbeyondthescopeofthispaper.butclearlytherearemany ofwhatexactlythesetofalgorithmssummedovermustbeforasmartchoosingprocedure subtleissuestodisentangle. {outperform\smart"ones(whicharetheoneseveryoneusesinpractice).aninvestigation 9.1 9 DiscussionandFutureWork Inthispaperwepresentaframeworkforinvestigatingsearch.Thisframeworkservesasa \skeleton"forthesearchproblem;ittellsuswhatwecanknowaboutsearchbefore\eshing in"thedetailsofaparticularrealworldsearchproblem.phraseddierently,itprovidesa aboutthem. languageinwhichtodescribesearchalgorithms,andinwhichtoask(andanswer)questions specicallytailoredtomatchthosefeatures.theinverseprocedure farmorepopular givenf,determinecertainsalientfeaturesofit,andthenconstructasearchalgorithm,a, formygivencostfunctionf?"theproperanswertothisquestionistostartwiththe Ultimately,ofcourse,theonlyimportantquestionis,\HowdoIndgoodsolutions insomecommunities istoinvestigatehowspecicalgorithmsperformondierentf's. 27
P(f).Tounderstandthis,rstnotethatwedoinfactknowfexactly.Butatthesame procedure,ofgoingfrom(featuresconcerning)ftoanappropriatea. Thisinverseprocedureisonlyofinteresttothedegreethatithelpsuswithourprimary time,thereismuchaboutfthatweneedtoknowthatiseectivelyunknowntous(e.g., f'sextrema).inthis,itisasthoughfispartiallyunknown.theverynatureofthesearch Notethatoftenthe\salientfeatures"concerningfcanbestatedintermsofadistribution paper. ndingagoodaforaparticularp(f)-exactlytheissueaddressedinsection3ofthis processistoadmitthatyoudon't\know"finfull.asaresult,itmakessenseto(implicitly orotherwise)replacefwithadistributionp(f).inthis,thesearchproblemreducesto andgeneticalgorithms)areunabletocompetewithcarefullyhand-craftedsolutionsfor specicsearchproblems.thetravelingsalesmanproblem(tsp)isanexcellentexample ofsuchasituation;thebestsearchalgorithmsforthetspproblemarehand-tailoredforit Asanexampleofallthis,itiswellknownthatgenericmethods(likesimulatedannealing [12].Linearprogrammingproblemsareanotherexample;thesimplexalgorithmisasearch concerningfandtherebyeectivelyreplacefwithap(f);andthenuseasearchalgorithm situations,theprocedurefollowedbytheresearcheristo:identifysalientaspectsoff(e.g., itisatspproblem,oritisalinearprogrammingproblem);throwawayallotherknowledge algorithmspecicallydesignedtosolvecostfunctionsofaparticulartype.inbothofthese explicitlyknowntoworkwellforthatp(f). pretendthatonesimplyhasageneraltspproblem particularsunknown andusean hasaparticulartravelingsalesmanproblem(tsp)problemathand,onewouldinstead itsextremaaren'tknown),andthereforeonereplacesitwithap(f).forexample,ifone Inotherwords,oneadmitsthatinacertainsensefisnotcompletelyknown(forexample, questionweaddressedwaswhetheritmaybethatsomealgorithmaperformsbetterthan B,onaverage.Ouranswertothisquestion,givenbytheNFLtheoremisthatthisis algorithmwell-suitedtotspproblemsingeneral. impossible.animportantimplicationofthisresultisthe\conservation"natureofsearch, Inourinvestigationofthesearchproblemfromthismatch-a-to-fperspective,therst illustratedbythefollowingexample.ifageneticalgorithmoutperformssimulatedannealing oversomeclassofcostfunctions,thenovertheremainingcostfunctionsfn,simulated annealingmustoutperformthegeneticalgorithm.itshouldbenotedthatthisconservation appliesevenifoneconsiders\adaptive"searchalgorithms[6,18]whichmodifytheirsearch featuresoff. strategybasedonpropertiesofthepopulationof(x Y)pairsobservedsofarinthesearch, andwhichperformthis\adaptation"withoutregardtoanyknowledgeconcerningsalient isviewedasoptimizationoveracostor\tness"function.wefurthersimplifymattersby algorithms).tothisend,considertheextremelysimpliedviewinwhichnaturalselection relationshipbetweennaturalselectioninthebiologicalworldandoptimization(i.e.genetic Itisimportanttobearinmindexactlywhatallofthisdoes(not)implyaboutthe assumingthetnessfunctionisstaticovertime. sinceitbegan,andthereforewedon'tallowanalgorithmtoresamplepointsithadalready Inthispaperwemeasureanalgorithm'sperformancebasedonallXvaluesithassampled 28
evolutionthroughtimeof\generations"consistingoftemporallycontiguoussubsetsofour onemightconsiderdierentmeasures.inparticular,wemaybeprimarilyinterestedinthe population,generationsthatareupdatedbyoursearchalgorithm. visited.ournfltheoremstatesthatallalgorithmsareequivalentbythismeasure.however NFLtheoremdoesnotapplytothisalternativekindofperformancemeasure.Forexample, accordingtothisalternativeperformancemeasure,analgorithmthatresamplesoldpoints inxthataretandaddsthemtothecurrentgenerationwillalwaysdobetterthanone Insuchascenario,itdoesmakesensetoresamplepointsalreadyvisited.Moreover,our selectionmeansthatonly(essentialcharacteristicsof)goodpointsinxarekeptaroundfrom kindofmeasure;weonlyseetheorganismsfromthecurrentgeneration.inaddition,natural thatresamplesoldpointsthatarenott. onegenerationtothenext.accordingly,usingthissecondkindofperformancemeasure, Nowwhenweexaminethebiologicalworldaroundus,weareimplicitlyusingthissecond theenvironment-i.e.,costfunction-didn'tchangeintime,etc.)thisisnothingmorethan thetautologythatnaturalselectionimprovesthetnessofthemembersofageneration. oneexpectsthattheaveragetnessacrossagenerationimproveswithtime.(orwouldif notmeanthatifwewishtodoasearch,andareabletokeeparoundallpointssampledso performswellaccordingtothisgeneration-basedmeasuredoesnotmeananythingconcerning itsperformanceaccordingtothe~c-basedmeasureusedinthispaper.inparticular,itdoes Howevertheevidencegarneredfromexaminingtheworldaroundusthatnaturalselection Yetitispreciselythissituationthatisofinterestintheengineeringworld. far,thatwehaveanyreasontobelievethatnaturalselectionisaneectivesearchstrategy. selectionisaneectivesearchstrategyinthebiologicalworld.wesimplyhavenothada chancetoobservethebehaviorofalternativestrategies.accordingtothenfltheorem,for thatnaturalselectionisaneectivesearchstrategy.itdoesnotevenindicatethatnatural Inshort,theempiricalevidenceofthebiologicalworlddoesnotindicateinanysense (Thisisexactlyanalogoustothefactthathill-descendingcanbeathill-climbingatnding allweknow,thestrategyofbreedingonlytheleasttmembersofthepopulationmayhave tnessmaxima.)thebreed-the-worststrategywillingeneralresultinworserecentgenerations,butsimplythefactthatyouareusingthatstrategyimpliesnothingaboutthequality ofthepopulationsoverthelongterm. selection,onewouldhavetoallowthebreed-the-worststrategytoexploitthesamemassive amountofparallelismexploitedbynaturalselectionintherealworld,wheretherearea hugenumberofgenomesevolvinginparallel.itmaywellbethatthe\blindwatchmaker" Inthisregard,notethattofairlycomparethebreed-the-worststrategywithnatural doneabetterjobatndingtheextremaofthecostfunctionfacedbybiologicalorganisms. hasmanagedtoproducesuchanamazingbiomesimplybyrelyingonmassiveparallelism ratherthanbreed-the-best.nobodyknows;nobodyhastriedtomeasure\howwell"natural butbreed-the-worstwinsinothers. themeasurementsarenallydonewewillndthatnaturalselectionwinsinsomeecosystems selectionvs.breed-the-worstvariesfromecosystemtoecosystem itmaywellbethatwhen selectionworksinthebiologicalworldbefore.indeed,presumablytheecacyofnatural Ontheotherhand,ifwerelaxtheunrealisticassumptionthatthetnessfunctioniscon- 29
ratherthanabreed-the-worststrategy,regardlessoftheecosystem.(suchadvantagescould thatthe\matching"ofsearchalgorithmandcostfunctionrequiredbytheinnerproduct arisefromthefactthatthecostfunctionisbeingdeterminedinpartbythepopulation,so stantovertime,thenitispossiblethattheremaybeadvantagestousingnaturalselection outthatbreed-the-worsthasadvantagesovernaturalselectionforvaryingtnessfunctions vantagesrelativetonaturalselection'sbreed-the-beststrategy.alternatively,itmayturn and/orminimaxconcerns.theseareissuesforfutureresearch. formulamaysomehowbeautomatic.)similarly,thatstrategymayhaveminimaxdisad- betweenthetwosearchalgorithms.thisraisessomeobviousquestionsforfutureresearch: worstmembersofthepopulationforthenextgenerationisequivalenttoonethatkeepsthe bestmembers,onaverage.however,thetnessofthemembersofthegenerationswilldier Tosummarize,bytheNFLtheorem,anygeneration-basedschemethatkeepsonlythe Averagedoverallf,howbigwouldoneexpectthedierencetobe?Foraxedf,andtwo thepopulationwill(likely)beforarandomalgorithmasmgrows? thislastcalculationcomparewiththecalculationmadeaboveofwhatthebestmemberof beinginthecurrentgeneration,howbigwouldoneexpectthedierencetobe?howdoes identicalrandomsearchalgorithmsthatare\directed"dierentlyinwhotheyclassifyas Itisperhapsttingforapaperabouteectivesearchthatweconcludewithabrieflisting 9.2 ofother(research)directionswebelievewarrantfurtherinvestigation. Futurework tooltosolverealproblems.thiswouldinvolvetwosteps.firstweneedamethodof haveusedp(f)todothis,butperhapsthereareotherwaysthatweshouldalsoconsider. incorporatingbroadkindsofknowledgeconcerningfintotheanalysis.inthispaperwe Themostimportantcontinuationofthisworkistoturnourframeworkintoapractical theknowledgeconcerningthecostfunctionthatisimplicitintheheuristicsofbranchand throughtheassemblageofsub-solutions? Boundstrategies.Howdoweincorporatehowthecostofacompletesolution(f)isaccrued Forexample,itisnotyetclearhowto(orevenwhetheroneshould)encapsulateinaP(f) concerningfintoanoptimala.thegoalinitsbroadestsenseistodesignasystemthatcan takeinsuchknowledgeconcerningfandthensolvefortheoptimalagiventhatknowledge. (Forexample,iftheknowledgewereintheformofP(f),onewould\invert"theinner Thesecondstepinthissuggestedprogramistodeterminehowbesttoconvertknowledge onlythetoolsdevelopedinthispaper.manyofthemwerepresentedinthetext.others, therearemanyimportantquestionsrelatedtothisprogramthatshouldbeanalyzableusing productformulasomehow.)onewouldthenusethatatosearchthef. particularlywell-suitedtohelpusunderstandtheconnectionbetweenp(f)andanoptimal Initsfullestsense,thisprogrammaywellinvolvemanyyearsofwork.Nonetheless, thediagonalinfspace(i.e.,frombeinguniformoverallf),howwillcertaina'sbehurt a,are:howfastdoesthecosthistogram~cassociatedwithaparticularalgorithmconverge tothehistogramofthecostvaluesftakesonacrossallofx?asp(f)changesfrom andcertaina'shelped?couldtheaverageoveralla'simprove?forwhatp(f)'sbesides 30
algorithms),forwhatp(f)istheperformanceofthealgorithmsequal?inparticular,if thediagonalareallalgorithmsequal?giventwoparticularalgorithms(ratherthanall P(f)isuniformoversomesubsetFandzerooutside,5whataretheequivalence classesofsearchalgorithmswithidenticalexpectedbehavior? populationcanonlyimprove.soallpreviousstudiesshowingthattnessdoesimprove above.foranyalgorithm,asthesearchprogresses,thetnessofthebestmemberofthe currentlypopularsearchalgorithmsintermsoftheperformancebenchmarkswepresent Asapreliminarystepinthisprogram,itwouldmakesensetoexploretheecacyof bettertheimprovementisthanyouwouldexpectittobesolelyduetothe\ttestcanonly improve"eect.that'swhatourmeasuresaredesignedtoassess. intimeforsomealgorithmareallydon'tproveanything.what'simportantishowmuch rangeofpopulationsizes.thingsshouldbeevenworseifonerandomlysamplesfromthe quitelikelythatonasignicantfractionoftheproblemsinthestandardtestsuites,oneor moreofthecurrentlypopularsearchalgorithmswillfailtoperformwell,atleastforsome Giventherecentexperienceinthesupervisedlearningcommunity[8,13,10],itseems spaceofreal-worldsearchproblems.thisisbecausethereare\selectioneects"ensuring thatthemostcommonlystudiedsearchproblems(i.e.,thoseinthesuites)arethosewhich peopleconsider\reasonable";inpractice,\reasonable"oftensimplymeans\agoodmatch ministicalgorithms.aretherepotentialadvantagestostochasticalgorithms?inparticular, tothealgorithmsi'mfamiliarwith". gorithmsa?i.e.,canonewritep(cjf;m;)=paka;p(cjf;m;a)forsomeexpansion doesitmakesenseto\expand"anystochasticalgorithmintermsofdeterministical- Anotherinterestingseriesofquestionsconcernsdierencesbetweenstochasticanddeter- coecientska;?ifso,itsuggeststhatasp(f)movesfromthediagonaltheperformance algorithmshavecertainminimaxadvantagesoverdeterministicones. of'swillneitherimprovenordegradeasmuchasthatofa's.soitmaybethatstochastic distinctionsoccurin\cycles",inwhichalgorithmais(head-to-headminimax)superior tob,andbtoc,butthencisalsosuperiortoa.argumentsforchoosingbetween minimaxdistinctionsbetweenalgorithms.perhapsthesimplestistocharacterizewhensuch Therearemanyotherissuesthatremaintobeinvestigatedconcerninghead-to-head algorithmsbasedonhead-to-headminimaxdistinctionsaremorepersuasiveintheabsence withtheexample)forsomereasonalgorithmccanberuledoutasacandidatealgorithm ofsuchcycles.howeveritshouldbenotedthateveniftherearesuchcycles,if(tocarryon (e.g.,ittakestoolongtocompute,orisdiculttodealwith,orsimplyisnotinvogue),then adoptedinthispaperandconventionalstatistics.inparticulartheeldofoptimalexperimentaldesign[1]andmorepreciselyactivelearning[2]isconcernedwiththefollowing Otherissuestobeexploredinvolvetherelationbetweenthestatisticalviewofsearch minimaxdistinctions. thefactthatwehaveacycledoesnotprecludechoosingalgorithmabasedonhead-to-head question:thereissomeunknownprobabilisticrelationshipbetweenxandy.ihaveasetof pairsofx-yvaluesformedbysamplingthatrelationship(the\trainingset").atwhatnext 5Asanexample,mightbethesetofcorrelatedcostfunctionsasin[14]. 31
Thisquestionofhowbesttoconductactivelearningisobviouslyverycloselyrelatedtothe searchproblem;futureworkinvolvesseeingwhatresultsintheeldofactivelearningcan XvalueshouldIsampletherelationshipto\best"helpmeinferthefullX-Yrelationship? befruitfullyappliedtosearch. algorithmsaretooneanother.asanexampleofsuchameasure,onecouldsimplysaythat algorithm.accordingly,thisequationprovidesseveralwaystomeasurehow\close"two whatwewanttoknowissetbyit).thersttermontheright-handsideissetbyone's ConsideragainEq.(4).Theleft-handsideiswhatweareinterestedin(ormoregenerally, the(~c-indexed)vectorsp(~cjm;a)areforthosetwoalgorithms,forthatp(f).(onecould onecouldmeasuretheclosenessoftwoalgorithmsforaspecicp(f),byseeinghowclose imaginethatforsomep(f)twoalgorithmswillbeclose,whileforotherstheywillbefar howclosetwoalgorithmsareisgivenbyhowclosetheirvectors~vc;a;mare.alternatively, apart.)asanalexample,givenanalgorithm,onecouldsolveforthep(f)thatoptimizes simulatedannealing,eventhoughitsinternalworkingsarecompletelydierent".onecould fortwoalgorithms,andusethistomeasuretheclosenessofthealgorithmsthemselves. P(~cjm;a)insomenon-trivialsense.OnecouldthenseehowclosetheoptimalP(f)'sare alsoinvestigatehypotheseslike\allalgorithmsthathumansconsider'reasonable'arecloseto oneanother".futureworkinvolvesexploringthesemeasuresoftheclosenessofalgorithms. Withthesekindsofmeasures,onecouldsaythingslike\thisalgorithmisverycloseto changingthesearchalgorithm.thecostfunctiondoesn'tchangewhenwere-encode tothatencoding.howeverinthecontextofthispaper,changingtheencodingmeans duringsearch.normallyonetalksofhowthecostfunctionisencoded,andpossiblechanges Otherfutureworkinvolvesexploringtheimportanceofthe\encoding"schemeoneuses ratherhowwe(thealgorithm)viewthefunctionchanges. encodings"ofcostfunctions.forexample,if(a)isare-encodingofalgorithma,then onemightsaythatacostfunctionfbecomes(f)underthatsamere-encodingip(~cj f;m;a)=p(~cj(f);m;(a))forall~c.(alternatively,onemightsaythat(a)isalegal Nonetheless,onecanimagineseveralwaystocouplere-encodingofalgorithmswith\re- true.)futureworkhereinvolvesseeinghowchangingtheencodingschemeinteractswith P(f)todeterminetheecacyofthesearchprocess. re-encodingschemeforalgorithmsithereisanassociated(f)forwhichtheforegoingis mustbemodied(andhow)ifwestillhavep(f)=xp0(f(x))butnolongerhaveuniform y2y.aninterestingquestionforfutureresearchistoseewhichoftheresultsofthispaper P0(y).(Intuitively,forsuchaP(f),f(x)isbeingsetafteryoupickxasthenextpointto UniformP(f)canberewrittenasP(f)=xP0(f(x)),whereP0(y)isuniformoverall visit,andthisisbeingdonewithoutanyregardforpointsyou'vealreadyseen.hence,one somenearestneighborcoupling? ofalgorithmsareequal?andwhathappensifratherthanequalxp0(f(x)),p(f)involves forwhichallalgorithmsareequal?whatisthemostgeneralp(f)forwhichaparticularpair wouldexpectnfl-resultstohold.)relatedquestionsare:whatisthemostgeneralp(f) thatnotcanbewrittenasxp0(f(x))butforwhichitisstilltruethatallalgorithmsare equal.forexample,sayjyj>jxjandletp(f)bei)uniformoverallfsuchthatforno Inrelationtotherstandlastofthesequestions,itseemsplausiblethatthereareP(f)'s 32
innfl-typeresults,sincethepointsyouhaveseensofartellyounothingaboutwhereyou P(f)hasextremelystrongcouplingbetweentheelementsofthepopulation,incontrastto P(f)'sthatcanbewrittenasxP0(f(x)).YetitseemslikelythattheseP(f)'salsoresult x1;x22xdoesf(x1)=f(x2);andii)zeroforallfthatdon'tobeythiscondition.this shouldsearchnext. more\real-world"p(f)andstillhavenfl-typeresults? holdraiseanintriguingquestion:justhowfarcanonepushfromtheuniformp(f)toa practicalconcerns.yetthesebroaderclassesofp(f)'sforwhichnfl-typeresultsmight InthispaperthechoiceofP(f)(uniform)wasmotivatedbytheoereticalratherthan stoppingcondition,afunctionofallpopulationsuptothepresent,ismet.thenintuitively, bynfl,onewouldexpectthataveragedoverallf,theprobabilitythatyouralgorithmstop hadtimetoexplicatehere.forexample,consideralgorithmsthatkeeprunninguntilsome Finally,therearemanyotherNFL-typeresults,foruniformP(f),thatwehavenot (andsimilar)resultsisthesubjectoffuturework. aftermsamplesoffisindependentofthealgorithmbeingused.theformalproofofthese helpfulconversation,andthesfiforfunding.dhwwouldalsoliketothanktxninc.for funding. WewouldliketothankRajaDas,TalGrossman,PaulHelman,andUnamayO'Reillyfor Acknowledgments References [1]J.O.Berger,StatisticalDecisonTheoryandBayesianAnalysis,Springer-Verlag(1985). [3]T.Cover,J.Thomas,ElementsofInformationTheory,JohnWiley&Sons,(1991). [2]D.Cohn,NeuralNetworkExplorationUsingOptimalExperimentalDesign,MITAI Memo.1491. [4]M.R.Garey,D.S.Johnson,ComputersandIntractability,Freeman(1979). [6]L.Ingber,AdaptiveSimulatedAnnealing,Softwarepackagedocumentation, [5]J.Holland,AdaptationinNaturalandArticialSystems,UniversityofMichiganPress, AnnArbor,(1975). [7]S.Kirkpatrick,C.D.GelattJr.,M.P.Vecchi,Science,220,671,(1983). ftp.alumni.caltech.edu:/pub/ingber/asa.tar.gz. [8]R.Kohavi,personalcommunication.AlsoseeAStudyofCross-ValidationandBootstrapforAccuracyEstimationandModelSelection,tobepresentedatIJCAI1995. [9]E.L.Lawler,D.E.Wood,OperationsResearch,14(4),699-719,(1966). 33
[11]J.Pearl,Heuristics,intelligentsearchstrategiesforcomputerproblemsolving,Addison- [10]P.Murphy,M.Pazzani,JournalofArticialIntelligenceResearch,1,257-275(1994). [12]GerhardReinelt,TheTravelingSalesman,computationalsolutionsforTSPapplications,SpringerVerlagBerlinHeidelberg(1994). Wesley,(1984). [14]P.F.Stadler,Europhys.Lett.20,pp479-482,(1992). [13]C.Schaer,ConservationofGeneralization:ACaseStudy. [15]C.E.M.Strauss,D.H.Wolpert,D.R.Wolf.Alpha,Evidence,andtheEntropicPrior [16]DH.Wolpert,O-trainingseterrorandaprioridistinctionsbetweenlearningalgorithms,TechnicalReportSFI-TR-95-01-003,SantaFeInstitute,1995. (1992). inmaximumentropyandbayesianmethods,ed.alimohammed-djafari,pp113-120, [17]DH.Wolpert,OnOverttingAvoidanceasBias,TechnicalReportSFI-TR-92-03-5001, SantaFeInstitute,1992. [18]D.Yuret,M.delaMaza,DynamicHill-Climbing:OvercomingthelimitationsofoptimizationtechniquesinTheSecondTurkishSymposiumonArticialIntelligenceand A NeuralNetworks,pp208-212,(1993). search Proofrelatedtoinformationtheoreticaspectsof Wewanttocalculatetheproportionofallalgorithmsthatgiveaparticular~cforaparticular butnite-list.thatlistisindexedbyallpossibled's(asidefromthosethatextendover theentireinputspace).eachentryinthelististhextheainquestionoutputsforthat f.weproceedinseveralsteps. d-index. 1)SinceXisnite,populationsarenite.Thereforeany(deterministic)aisahuge- nowonweimplicitlyrestrictthediscussiontounorderedpathsoflengthm.)aparticular is\in"or\from"aparticularfifthereisaunorderedsetofm(x;f(x))pairsidentical thesamexvalue.suchasetisan\unorderedpath".(withoutlossofgenerality,from 2)Consideranyparticularunorderedsetofmx ypairswherenotwoofthepairsshare to.thenumeratorontheright-handsideofeq.(9)isthenumberofunorderedpathsin thegivenfthatgivethedesired~c. ontheright-handsideofeq.(9)-isproportionaltothenumberofa'sthatgivethedesired 3)Claim:Thenumberofunorderedpathsinfthatgivethedesired~c-thenumerator 34
~cforf.(theproofofthisclaimwillconstituteaproofofeq.(9).)furthermore,the ~cforf,andfromitproducesathatisinfandgivesthedesired~c.wewillthenshow proportionalityconstantisindependentoffand~c. of;f,and~c.theproofwillthenbecompletedbyshowingthatissingle-valued,i.e.,by thatforanythenumberofalgorithmsasuchthat(a)=isaconstant,independent 4)Proof:Wewillconstructamapping:a!.takesinanathatgivesthedesired showingthatthereisnoawhohasasimageundermappingmorethanone. Indicatebyd(ord)thissetoftherstmd'sprovidedbyord.(Notethatanyordisitself inturnprovidesasetofmsuccessived's(ifoneincludesthenulld)andafollowingx. (Notethateveryxvalueinanunorderedpathisdistinct.)Eachsuchorderedpathord 5)Anyunorderedpathgivesasetofm!dierentorderedpathsintheobviousmanner. apopulation,buttoavoidconfusionweavoidreferringtoitassuch.) distinctpartiala'sforeach(oneforeachorderedpathcorrespondingto),wehavem! thelistofana,butwithonlythemd(ord)entriesinthelistlledin;theremainingentries areblank.(wesaythatmisthe\length"ofthepartialalgorithm.)sincetherearem! 6)Foranyorderedpathordwecanconstructa\partialalgorithm".Thisconsistsof suchpartiallylled-inlistsforeach. onepartialalgorithmgeneratedfromandthatgive~cwhenrunonf). \consistent"withaparticularfullalgorithm.thisallowsustodene(theinverseof):for anythatisinfandgives~c, 1()(thesetofallathatareconsistentwithatleast 7)Intheobviousmannerwecantalkaboutwhetheraparticularpartialalgorithmis adistinctm-elementpartialalgorithm.ourquestionishowmanyfullalgorithmslistsare rstgenerateallorderedpathsinducedbyandthenassociateeachsuchorderedpathwith give~c, 1()containsthesamenumberofelements,regardlessof,f,orc.Tothatend, 8)Tocompletetherstpartofourproofwemustshowthatforallthatareinfand consistentwithatleastoneofthesepartialalgorithmpartiallists.(howthisquestionis answeredisthecoreofthisappendix.) permutingtheindicesdofallthelists.obviouslysuchareorderingwon'tchangetheanswer toourquestion. 9)Toanswerthisquestion,reordertheentriesineachofthepartialalgorithmlistsby anydindexoftheform((dx(1);dy(1));:::;(dx(im);dy(im)))whoseentryislledin arbitraryconstantyvalueandxjreferstothej'thelementofx.next,createsomearbitrary inanyofourpartialalgorithmlistswithd0(d)((dx(1);z);:::;(dx(i);z)),wherezissome 9)Wewillperformthepermutingbyinterchangingpairsofdindices.First,interchange listswithd00(d0)((x1;z);:::;(xm;z)).(recallthatallthedx(i)mustbedistinct.) butxedorderingofallx2x:(x1;:::;xjxj).theninterchangeanyd0indexoftheform ((dx(1);z;:::;(dx(im);z)whoseentryislledininanyofour(new)partialalgorithm atleastonepartialalgorithmlistin 1()isindependentof,candf.Thiscompletes asisthenumberofsuchlists(it'sm!).thereforethenumberofalgorithmsconsistentwith therstpartoftheproof. 10)Byconstruction,theresultantpartialalgorithmlistsareindependentof,~candf, AandB.ThereisnoorderedpathAordconstructedfromAthatequalsanorderedpath 11)Forthesecondpart,rstchooseany2unorderedpathsthatdierfromoneanother, 35
BordconstructedfromB.SochooseanysuchAordandanysuchBord.Iftheydisagreefor them.iftheyagreeforthatd,thentheyhavethesamedouble-elementd.continueinthis theyagreeforthenulld,thensincetheyaresampledfromthesamef,theyhavethesame single-elementd.iftheydisagreeforthatd,thenthereisnoathatagreeswithbothof thenulld,thenweknowthatthereisno(deterministic)athatagreeswithbothofthem.if havedisagreedatsomepointbynow,andthereforethereisnoathatagreeswithbothof them. manneralltheuptothe(m 1)-elementd.Sincethetwoorderedpathsdier,theymust ain 1(A)thatisalsoin 1(B).Thiscompletestheproof. 12)SincethisistrueforanyAordfromAandanyBordfromB,weseethatthereisno B rithms Proofrelatedtominimaxdistinctionsbetweenalgo- Theproofisbyexample. 1)Lettherstpointa1visitsbex1,andtherstpointa2visitsbex2. ConsiderthreepointsinX,x1;x2,andx3,andthreepointsinY,y1;y2,andy3. 3)Ifatitsrstpointa2seesay1,itjumpstox1.Ifitseesay2,itjumpstox3. 2)Ifatitsrstpointa1seesay1oray2,itjumpstox2.Otherwiseitjumpstox3. ConsiderthecostfunctionthathasastheYvaluesforthethreeXvaluesfy1;y2;y3g, respectively. populationcontainingy2andy3andsuchthata2producesapopulationcontainingy1and (y2;y3). Theproofiscompletedifweshowthatthereisnocostfunctionsothata1producesa Form=2,a1willproduceapopulation(y1;y2)forthisfunction,anda2willproduce y2.therearefourpossiblepairsofpopulationstoconsider: ii)[(y2;y3);(y2;y1)]; i)[(y2;y3);(y1;y2)]; iii)[(y3;y2);(y1;y2)]; ay2itssecondpointmustequala2'srstpoint.thisrulesoutpossibilitiesi)andii). Sinceifitsrstpointisay2a1jumpstox2whichiswherea2starts,whena1'srstpointis iv)[(y3;y2);(y2;y1)]. fy3;s;y2g,forsomevariables.forcaseiii),swouldneedtoequaly1,duetotherstpoint Forpossibilitiesiii)andiv),bya1'spopulationweknowthatfmustbeoftheform 36
ina2'spopulation.howeverforthatcase,thesecondpointa2seeswouldbethevalueatx1, thereforeseeay2,contrarytohypothesis. whichisy3,contrarytohypothesis. population.howeverthatwouldmeanthata2jumpstox3foritssecondpoint,andwould Accordingly,noneofthefourcasesispossible.Thisisacasebothwherethereisno Forcaseiv),weknowthattheswouldhavetoequaly2,duetotherstpointina2's histograms.qed. symmetryunderexchangeofdy'sbetweena1anda2,andnosymmetryunderexchangeof CSinceany(deterministic)searchalgorithmisamappingfromdDtoxX,anysearch algorithmisavectorinthespacexd.thecomponentsofsuchavectorareindexedbythe ProofrelatedtoNFLresultsforxedcostfunctions possiblepopulations,andthevalueforeachcomponentisthexthatthealgorithmproduces dered)elements.thesetofthosepopulationsthatdostartwithdthiswaydenesasetof otherpopulationofsizegreaterthanmhasthe(ordered)elementsofdasitsrstm(or- giventheassociatedpopulation. componentsofanyalgorithmvectora.thosecomponentswillbeindicatedbyad. Considernowaparticularpopulationdofsizem.Givend,wecansaywhetherany thatareequivalenttotherstm<melementsindforsomem.thevaluesofthose componentsforthevectoralgorithmawillbeindicatedbyad.thesecondtypeconsistsof thosecomponentscorrespondingtoallremainingpopulations.intuitively,thesearepopulationsthatarenotcompatiblewithd.someexamplesofsuchpopulationsarepopulations Theremainingcomponentsofaareoftwotypes.Therstisgivenbythosepopulations indicatedbya?d. thatcontainasoneoftheirrstmelementsanelementnotfoundind,andpopulationsthat re-ordertheelementsfoundind.thevaluesofaforcomponentsofthissecondtypewillbe LetprocbeeitherAorB.Weareinterestedin Xa;a0P(c>mjf;d1;d2;k;a;a0;proc) a?d;a0?d0x =Xad;a0d0X Thesummandisindependentofthevaluesofa?danda0?dforeitherofourtwod's. ad;a0d0p(c>mjf;d;d0;k;a;a0;proc): populationsnotconsistentwithd,ofthenumberofpossiblexeachsuchpopulationcould Inaddition,thenumberofsuchvaluesisaconstant.(Itisgivenbytheproduct,overall bemappedto.)therefore,uptoanoverallconstantindependentofd,d0,f,andproc,our sumequals ad;a0d0x Xad;a0d0P(c>mjf;d;d0;ad;a0d0;ad;a0d0;proc): 37
sumreducesto (namely,thevaluethatgivesthenextxelementind),andsimilarlyfora0d0.thereforeour isdened.thismeansthatweactuallyonlyallowonevalueforeachcomponentinad Bydenition,weareimplicitlyrestrictingthesumtothoseaanda0sothatoursummand ad;a0d0p(c>mjf;d;d0;ad;a0d0;proc): X choiceofaora0isxed.accordingly,withoutlossofgenerality,wecanrewriteoursumas isoverthesamecomponentsofaasthesumovera0d0isofa0.nowforxeddandd0,proc's Notethatnocomponentofadliesindx[.Thesameistrueofa0d0.Sooursumoverad withtheimplicitassumptionthatc>missetbyad.thissumisindependentofproc.qed. XadP(c>mjf;d;d0;ad); 38