CallForwarding: ASimpleInterproceduralOptimizationTechnique. fordynamicallytypedlanguages



Similar documents

Number of objects k 2k 4k 8k 16k 32k 64k 128k256k512k 1m 2m 4m 8m

Annual Report H I G H E R E D U C AT I O N C O M M I S S I O N - PA K I S TA N

2Proofbymathematicalinductionplaysacrucialroleinthevericationofprogramtrans-

Structured Representation Models. Structured Information Sources

Center for Teacher Certification Austin Community College


NormalizingIncompleteDatabases

threads threads threads

5. Continuous Random Variables

Tool 1. Greatest Common Factor (GCF)

R E E T O L O C A D. Type: - Turbine 3/20 - Turbine 10/50 - Torpress. Gearbox: - R 93. Optional: Rev: /2005

SCHOOLOFCOMPUTERSTUDIES RESEARCHREPORTSERIES UniversityofLeeds Report95.4

CS711008Z Algorithm Design and Analysis

However,duetoboththescaleandthecomplexityoftheInternet,itisunlikelythatameasure-

IPD Danish Annual Property Index 2013

SAP Predictive Analysis Overview & demo SAPSA yann chagourin y.chagourin@accenture.com


The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Training Assessments Assessments NAEP Assessments (selected sample)

STUDENT ASSESSMENT TESTING CALENDAR

Facts and Myths About image Patterns

LocalErrorRecoveryinSRM: ComparisonofTwoApproaches. Ching-GungLiu,DeborahEstrin,ScottShenkerandLixiaZhang


Case Study. File Transfer Issues Faced by an Engineering Company

Human Resources BW Security Model

FACTORY NEW CALOBRI FORGED IN STOCK

The Lincoln National Life Insurance Company Variable Life Portfolio

SCHOOLOFCOMPUTERSTUDIES RESEARCHREPORTSERIES UniversityofLeeds Report AutomaticDetectionofKeySignature usingnotedistribution

Shelf Life. Shelving in Perforce Sven Erik Knop, Perforce Software

Normal distribution. ) 2 /2σ. 2π σ

Semantic Description of Distributed Business Processes

October 15, 2014 George Hammer 209 Harrogate pi Longwood,FJ32779

!!! 2014!!2015!NONPROFIT!SALARY!&!STAFFING!REPORT! NEW$YORK$CITY$AREA$ $ $ $ $ $ $

When factoring, we look for greatest common factor of each term and reverse the distributive property and take out the GCF.

1 Sufficient statistics

Section 5.1 Continuous Random Variables: Introduction

MATH 4552 Cubic equations and Cardano s formulae

State Survey Results MULTI-LEVEL LICENSURE TITLE PROTECTION

Preapproval Inspections for Manufacturing. Christy Foreman Deputy Director Division of Enforcement B Office of Compliance/CDRH

Chapter R.4 Factoring Polynomials

Functions Recursion. C++ functions. Declare/prototype. Define. Call. int myfunction (int ); int myfunction (int x){ int y = x*x; return y; }

GCF/ Factor by Grouping (Student notes)

State Annual Report Due Dates for Business Entities page 1 of 10

2014 Retiree Insurance Rates

PERCEPTIONS OF GLOBAL BUYERS: THE ASIAN WOOD FURNITURE EXAMPLE. ILO Presentation to Handicraft Exporters / Producers (July 2003)

Statistics Class 10 2/29/2012

10CS35: Data Structures Using C

Introduction to Hypothesis Testing

F CUS your recruitment dollars on top scientists and engineers.

Factoring Trinomials of the Form

MESSAGE TO TEACHERS: NOTE TO EDUCATORS:

What is missing in campaign management today? Shaun Doyle VP Intelligent Marketing Solutions, SAS

CSCI 123 INTRODUCTION TO PROGRAMMING CONCEPTS IN C++

3.5 RECURSIVE ALGORITHMS

Factoring a Difference of Two Squares. Factoring a Difference of Two Squares

Frequentist vs. Bayesian Statistics

CINCINNATI HILLS CHRISTIAN ACADEMY COLLEGE QUESTIONNAIRE FOR STUDENTS

Normal Distribution as an Approximation to the Binomial Distribution

LAYMAN S GUIDE TO USING SSIM

Lead time reduction in a global manufacturing environment or The Volvo Journey

Lecture 11. Sergei Fedotov Introduction to Financial Mathematics. Sergei Fedotov (University of Manchester) / 7

National Automotive Service Task Force

How To Rate Plan On A Credit Card With A Credit Union

Discovery FAQs. 1. What is Discovery?


STATISTICA Formula Guide: Logistic Regression. Table of Contents

Payroll Tax Chart Results

Tenth Problem Assignment

Moving TIM from Good to Great?

HMM Based Enhanced Security System for ATM Payment [AICTE RPS sanctioned grant project, Affiliated to University of Pune, SKNCOE, Pune]

A Better Solution. Multidimensional Services. Seamless Results.

Business Intelligence. BI Security. BI Environment. Business Explorer (BEx) BI Environment Overview. Ad Hoc Query Considerations

Qlik connector for SAP NetWeaver

Time to fill jobs in the US January day. The. tipping point

1.- L a m e j o r o p c ió n e s c l o na r e l d i s co ( s e e x p li c a r á d es p u é s ).

Lecture 9: Bayesian hypothesis testing

HOSE, TUBE, AND PIPE CLAMPS

EMPIRICAL FREQUENCY DISTRIBUTION

Server Load Prediction

The mathematical branch of probability has its

maximo 7 integration guide

TAX PREP FEE PHILOSOPHY. Copyright 2013 Drake Software

HIGHER EDUCATION COMMISSION H-9, Islamabad (Pakistan)

SAP Cloud for Sales Integration to SAP ERP 6.0 End-to-end master data synchronization and process integration

DataIntegrationwithXMLandSemanticWeb Technologies

The Importance of Systems Engineering

SAP Configuration Management at Harley-Davidson Motor Company SAP Product Configurator Info Day

U.S. Department of Housing and Urban Development: Weekly Progress Report on Recovery Act Spending

Practical steps for a successful. PROFIBUS Project. Presented by Dr. Xiu Ji Manchester Metropolitan University

Earliest Due Date (EDD) [Ferrari] Delay EDD. Jitter EDD

How To Write A Pcs Report

SOLVING QUADRATIC EQUATIONS BY THE NEW TRANSFORMING METHOD (By Nghi H Nguyen Updated Oct 28, 2014))

Return-to-Work Outcomes Among Social Security Disability Insurance (DI) Beneficiaries

ECU Pinout and Wiring Comparisons Toyota Tacoma Trucks

Sampling Distributions

Ourmainaiminthispaperistovisuallyunderstandthespatialrelationshipsbetween

Gear Engineering Data. Spur Gear Gear Formulas Drive Selection Horsepower and Torque Tables


Logic and Reasoning Practice Final Exam Spring Section Number

Transcription:

CallForwarding: ASimpleInterproceduralOptimizationTechnique fordynamicallytypedlanguages KoenDeBosschere;ySaumyaDebray;zDavidGudeman;zSampathKannanz ydepartmentofelectronics UniversiteitGent B-9000Gent,Belgium zdepartmentofcomputerscience TheUniversityofArizona Tucson,AZ85721,USA Abstract Thispaperdiscussescallforwarding,asimpleinterproceduraloptimizationtechniquefordynamicallytyped languages.thebasicideabehindtheoptimizationis straightforward:ndanorderingforthe\entryactions" ofaprocedure,andgeneratemultipleentrypointsfor theprocedure,soastomaximizethesavingsrealized fromdierentcallsitesbypassingdierentsetsofentryactions.weshowthattheproblemofcomputing optimalsolutionstoarbitrarycallforwardingproblems isnp-complete,anddescribeanecientgreedyalgorithmfortheproblem.experimentalresultsindicate that(i)thisalgorithmiseective,inthatthesolutions producedaregenerallyclosetooptimal;and(ii)the resultingoptimizationleadstosignicantperformance improvementsforanumberofbenchmarkstested. 1Introduction Thecodegeneratedforafunctionorprocedureina dynamicallytypedlanguagetypicallyhastocarryout varioustypeandrangechecksonitsargumentsbefore itcanoperateonthem.theseruntimetestscanincur asignicantperformanceoverhead.asaverysimple example,considerthefollowingfunctiontocomputethe averageofalistofnumbers: K.DeBosscherewassupportedbytheNationalFundforScienticResearchofBelgiumandbytheBelgianNationalIncentive ProgramforfundamentalresearchinArticialIntelligence.S. DebrayandD.GudemanweresupportedinpartbytheNational ScienceFoundationundergrantnumberCCR-9123520.S.KannanwassupportedinpartbytheNationalScienceFoundation undergrantnumberccr-9108969. 0Copyright1994ACM.AppearedintheProceedingsofthe21stAnnualACMSIGPLAN-SIGACT SymposiumonPrinciplesofProgrammingLanguages,January1994,pp.409{420. ave(l,sum,count)= ifnull(l)thensum/count elseave(tail(l),sum+head(l),count+1) Inastraightforwardimplementationofthisfunction, thecodegeneratedchecksthetypeofeachofitsargumentseachtimearoundtheloop:therstargument mustbea(emptyornon-empty)list,whilethesecond andthirdargumentsmustbenumbers.1notice,however,thatsomeofthistypecheckingisunnecessary:the expressionsum+head(l)evaluatescorrectlyonlyifsum isanumber,inwhichcaseitsvalueisalsoanumber; similarly,count+1evaluatescorrectlyonlyifcountis anumber,andinthatcaseitalsoevaluatestoanumber.thus,oncethetypesofsumandcounthavebeen checkedattheentrytotheloop,furthertypecheckson thesecondandthirdargumentsarenotnecessary. Thefunctioninthisexampleistailrecursive,making iteasytorecognizetheiterativenatureofitscomputationandusesomeformofinvariantcodemotionto movethetypecheckoutoftheloop.ingeneral,however,suchredundantactionsmaybeencounteredwhere thedenitionsarenottailrecursiveandwheretheloop structureisnotaseasytorecognize.analternativeapproach,whichworksingeneral,istogeneratemultiple entrypointsforthefunctionave,sothataparticular callsitecanenteratthe\appropriate"entrypoint,bypassinganycodeitdoesnotneedtoexecute.inthe exampleabove,thiswouldgiveexactlythedesiredresult:tailcalloptimizationwouldcompiletherecursive calltoaveintoajumpinstruction,andnoticingthat therecursivecalldoesnotneedtotestthetypesofits secondandthirdarguments,thetargetofthisjump 1Inreality,thegeneratedcodewoulddistinguishbetweenthe numerictypesintandfloat,e.g.,using\messagesplitting"techniquesasin[5,6] thedistinctionisnotimportanthere,andwe assumeasinglenumerictypeforsimplicityofexposition. 1

wouldbechosentobypassthesetests. However,noticethatintheexampleabove,evenif wegeneratemultipleentrypointsforave,theoptimizationworksonlyifthetestsaregeneratedintheright order:sinceitisnecessarytotestthetypeoftherst argumenteachtimearoundtheloop,thetestsonthe secondandthirdargumentscannotbebypassedifthe typetestontherstargumentprecedesthoseonthe othertwoarguments.asthisexampleillustrates,the orderinwhichthetestsaregeneratedinuencesthe amountofunnecessarycodethatcanbebypassedat runtime,andthereforetheperformanceoftheprogram. Ingeneral,functionsandproceduresindynamically typedlanguagescontainasetof(idempotent)\entry actions,"suchastypetests,initializationactions(especiallyforvariadicprocedures),etc.,thatareexecuted atentrytotheprocedure.moreover,theseactionscan typicallybecarriedoutinanyofanumberofdierent\legal"orders(ingeneral,notallorderingsofentry actionsmaybelegal,sincesomeactionsmaydepend ontheoutcomesofothers forexample,thetypeofan expressionhead(x)cannotbecheckeduntilxhasbeen veriedtobeoftypelist).thecodegeneratedfora procedurethereforeconsistsofasetofentryactionsin someorder,followedbycodeforitsbody.therearea numberofdierentcallsitesforeachprocedure,andat eachcallsitewehavesomeinformationabouttheactual parametersatthatcallsite,allowingthatcalltoskip someoftheseentryactions.moreover,eachcallsitehas adierentexecutionfrequency(estimated,forexample, fromproleinformationorfromthestructureofthe callgraph).ingeneral,dierentcallsiteshavedierent informationavailableabouttheiractualparameters,so thatanorderfortheentryactionsofaprocedurethat isgoodforonecallsite,intermsofthenumberofunnecessaryentryactionsthatcanbeskipped,maynotbe asgoodforanothercallsite.agoodcompilershould thereforeattempttondanorderingontheentryactionsthatmaximizesthebenets,overallcallsites,due tobypassingunnecessarycode.werefertodetermining suchanorderfortheentryactionsandthen\forwarding"thebranchinstructionsatdierentcallsitessoas tobypassunnecessarycodeas\callforwarding." Whilemanysystemscompilefunctionswithmultipleentrypoints,wedonotknowofanythatattempt toordertheentryactionscarefullyinordertoexploit thistothefullest.inthispaper,weaddresstheproblemofdetermininga\good"orderforthesetoftestsa functionorprocedurehastocarryout.weshowthat generatinganoptimalorderisnp-completeingeneral, andgiveanecientalgorithmforselectinganordering usingagreedyheuristic.theresultgeneralizesanumberofoptimizationsfortraditionalcompilers,suchas jumpchaincollapsingandinvariantcodemotionoutof loops.experimentalresultsindicatethat(i)theheuristicisgood,inthattheorderingsitgeneratesareusually notfarfromtheoptimal;and(ii)theresultingoptimizationiseective,inthesensethatittypicallyleadsto signicantspeedimprovements. Theissuesandoptimizationsdiscussedinthispaperareprimarilyattheintermediatecodelevel:for thisreason,wedonotmakemanyassumptionsabout thesourcelanguage,exceptthatacalltoaprocedure typicallyinvolvesexecutingasetofidempotent\entryactions."thiscoversawidevarietyofdynamicallytypedlanguages,e.g.,functionalprogramminglanguagessuchaslispandscheme(e.g.,see[15]),logic programminglanguagessuchasprolog[4],ghc[17] andjanus[11,13],imperativelanguagessuchassetl [14],andobject-orientedlanguagessuchasSmalltalk [10]andSELF[6].Theoptimizationwediscussis likelytobemostbenecialforlanguagesandprogramswhereprocedurecallsarecommon,andwhich arethereforeliabletobenetsignicantlyfromreducingthecostofprocedurecalls.however thetitleof thepapernotwithstanding theoptimizationisnotlimited,apriori,todynamicallytypedlanguages:itis alsoapplicable,inprinciple,toidempotententryactions,suchasinitializationandarrayboundchecks, instaticallytypedlanguages,andsomeoptimizations usedinstaticallytypedlanguages,suchasinverseetareduction/uncurrying/argumentatteninginstandard MLofNewJersey[1],canalsobethoughtofasinstances ofcallforwarding(seesection6). 2TheCallForwardingProblem Asdiscussedintheprevioussection,thecodegeneratedforaprocedureconsistsofasetofentryactions, whichcanbecarriedoutinanumberofdierentlegal orders,followedbythecodeforitsbody.eachprocedurehasanumberofcallsites,andateachcallsite thereissomeinformationabouttheactualparameters forcallsissuedfromthatsite,specifyingwhichentry actionsmustbeexecutedandwhichmaybeskipped.2 Thisismodelledbyassociating,witheachcallsite,a setofentryactionsthatmustbeexecutedbythatcall site.moreover,eachcallsitehasassociatedwithitan estimateofitsexecutionfrequency:suchestimatescan beobtainedfromproleinformation,orfromthestructureofthecallgraphoftheprogram(see,forexample, [3,19]).Finally,dierententryactionsmayrequirea dierentnumberofmachineinstructionstoexecute,and thereforehavedierent\sizes." Ourobjectiveistoordertheentryactionsofthe proceduresinaprogram,andredirectcallssoastoby- 2Theprecisemechanismbywhichthisinformationisobtained, e.g.,dataowanalysis,userdeclarations,etc.,isorthogonaltothe issuesdiscussedinthispaper,andsoisnotaddressedhere.

passunnecessaryactionswherepossible,insuchaway thatthetotalnumberofinstructionsthatareskipped, overtheentireexecutionoftheprogram,isaslargeas possible.however,itisnotdiculttoseethatforany procedurepinaprogram,thecodetosetupandexecuteprocedurecallsinthebodyofpisseparatefrom theentryactionsofp.becauseofthis,theorderof p'sentryactions andtherefore,thenumberofinstructionsthatareskippedbycallstopinanexecutionofthe program neitherinuencenorareinuencedbytheorderoftheentryactionsforanyotherprocedureinthe program.theproblemofmaximizingthetotalnumber ofinstructionsskippedbycallforwardingfortheentire program,then,reducestotheproblemofmaximizing, foreachprocedure,thenumberofinstructionsskipped bycallstothatprocedure.forourpurposes,therefore, thecallforwardingproblemistheproblemofdetermininga\good"orderfortheentryactionsofaprocedure sothatthesavingsaccruingfrombypassingunnecessaryentryactionsoverallcallsitesforthatprocedure, weightedbyexecutionfrequency,isaslargeaspossible. Theproblemcanbegeneralizedbyallowingcodeto becopiedfromaproceduretothecallsitesforthatprocedure.asanexample,supposewehaveaprocedure withentryactionsaandb,andtwocallsites:a,which canskipabutmustexecuteb;andb,whichcanskip bbutmustexecutea.supposetheentryactionsare generatedintheorderha;bi,thencallsiteacanskipa, butbcannotskipbandthereforeexecutesunnecessary code(asymmetricproblemarisesiftheotherpossible orderischosen).asolutionistocopytheentryactionaatthecallsiteb,i.e.,executetheentryaction atbbeforejumpingtothecallee.ifweallowarbitrarilymanyentryactionstobecopiedtocallsitesin thismanner,itistrivialtogenerateanoptimalsolution toanycallforwardingproblem:simplycopytoeach callsitetheentryactionsthatcallsitemustexecute, thenbranchintothecalleebypassingallentryactions atthecallee.thisobviouslyproducesanoptimalsolution,sinceeachcallsiteexecutesexactlythoseentry actionsthatitmustexecute,andcanbedoneeciently inpolynomialtime.however,ithastheproblemthat suchunrestrictedcopyingcanleadtosignicantcode bloat,sincetheremaybemanycallsitesforaprocedure,eachofthemgettingacopyofmostoftheentry actionsforthatprocedure(wehaveobservedthisphenomenoninanumberofapplicationprograms). Thebestsolutiontothisproblemistoimposea globalboundonthetotalnumberofentryactionsthat maybecopied,acrossallthecallsitesoccurringinaprogram,butthisturnsouttobecomplicatedtoimplement becausewhenperformingcallforwardingonanyparticularprocedure,wehavetokeeptrackofthenumberof entryactionscopiedforalltheproceduresintheprogram,includingthosethathavenotyetbeenprocessed bytheoptimizer!asimpleandeectiveapproximationtothisapproachistoassign,foreachprocedure, aboundonthenumberofentryactionsthatcanbe copiedtoeachcallsiteforthatprocedure.ifwestart withaglobalboundonthetotalnumberofentryactionsthatcanbecopied,suchper-procedureboundscan beobtainedby\dividingup"theglobalboundamong theprocedures(possiblytakingintoaccount,foreach procedure,thenumberofcallsitesforitandtheirexecutionfrequencies,sothatprocedureswithdeeplynested callsitescancopymoreentryactionsandtherebyeffectgreateroptimization).adiscussionofheuristics forestablishingsuchper-procedureboundsisbeyond thescopeofthisabstract:wesimplyassume,inthe discussionthatfollows,thatforeachprocedurethereis aboundonthenumberofitsentryactionsthatcanbe copiedtoanycallsite. Thecallforwardingproblemcanthereforebeformulatedintheabstractasfollows: Denition2.1Acallforwardingproblemisa5-tuple he;c;w;s;ki,where: {Eisaniteset(representingtheentryactionsof theprocedureconcerned); {CisamultisetofsubsetsofE(representingthe entryactionsthateachcallsitemustexecute); {w:c?!n,wherenisthesetofnaturalnumbers,isafunctionthatmapseachcallsitetoits \weight",i.e.,executionfrequency; {s:e?!nrepresentsthe\size"ofeachelementofe(representingthenumberofmachine instructionsneededtorealizethecorresponding entryaction);and {k0representsaboundonthenumberofentry actionsthatcanbecopiedtocallsites. AsolutiontoacallforwardingproblemhE;C;w;s;ki isapermutationofe,i.e.,a1-1function:e?! f1;:::;jejg.thecostofasolutionis,intuitively, thetotalnumberofmachineinstructionsexecuted,over allcallsites,giventhattheentryactionsaregeneratedintheorder.givenacallforwardingproblem he;c;w;s;ki,thecostofasolutionforitisdened asfollows.first,letcopied(c;;i)denote(theindices of)thoseentryactionsinthathavetobecopiedtoa callsiteciftheentrypointforcistobypassthersti elementsof: copied(c;;i)=fjjji^?1(j)2cg:

Here,?1(j)denotestheelementofEthatisthejth elementofthepermutation.foranycallsitec2c, giventheboundkonthenumberofactionsthatcanbe copiedtoc,themaximumnumberofentryactionsthat canbeskippedbyc eitherbecausecdoesnothaveto executethataction,orbecauseithasbeencopiedfrom thecalleetothecallsite isgivenby Skip(c;)=maxfi:jcopied(c;;i)jkg: Thecostofasolutioncanthenbeexpressedasthe weightedsum,overallcallsites,of(thesizesof)the instructionsthatcannotbeskippedbythecallsites: cost()= Pc2Cfw(c)s(I)jI2E^(I)> Skip(c;)g: 3AlgorithmicIssues Werstconsiderthecomplexityofdeterminingoptimal solutionstocallforwardingproblems.thefollowing resultshowsthattheexistenceofecientalgorithms forthisisunlikely: Theorem3.1ThedeterminationofanoptimalsolutiontoacallforwardingproblemisNP-complete.ItremainsNP-completeevenifallentryactionshaveequal size. ProofByreductionfromtheOptimalLinearArrangementproblem,whichisknowntobeNP-complete[8,9]. SeetheAppendixfordetails. Thisresultmightverywellbeofonlyacademicinterestifthenumberofentryactionsencounteredintypicalprogramscouldbeguaranteedtobesmall.However,ourexperiencehasbeenthatthisisnotthecase inmanyactualapplications.thereasonforthisisthat, evenifthenumberofargumentstoproceduresissmall formostprogramsencounteredinpractice,itisnot unusualtohaveanumberofentryactionsassociated withasingleargument(e.g.,seesection4),involving typeandrangechecks,patternmatchingandindexing code,pointerchaindereferencing(acommonoperation inlogicprogramminglanguages),andsoon.because ofthis,thetotalnumberofentryactionsinaprocedurecanbequitelarge,makingexhaustivesearchfor anoptimalsolutionimpractical.wethereforeseekecientpolynomialtimeheuristicsforcallforwardingthat producegoodsolutionsforcommoncases. 3.1AGreedyAlgorithm Whiletheproblemofcomputingoptimalsolutionsfor arbitrarycallforwardingproblemsisnp-completein general,agreedyalgorithmappearstoworkquitewell inpractice(seetable1).givenacallforwardingproblemforaprocedurewithaboundofkonthenumber ofactionsthatcanbecopiedfromthecalleetothecall sites,thegeneralideaistopickactionsoneatatime,at eachstepchoosinganactionthatminimizesthecostto bepaidatthatstep.thealgorithmmaintainsalistof callsitesthatdonotneedtoexecutemorethankofthe actionschosenuptothatpoint,andthereforecanstill havesomeactionscopiedtothem suchcallsitesare saidtobeactive.eachactivecallsitechasassociated withitacounter,denotedbycount[c]infigure1,that keepstrackofhowmanymoreactionscanbecopiedto thatcallsite.theweightofanaction,atanypointin thealgorithm,iscomputedasthesumoftheweightsof theactivecallsitesthatneedtoexecutethataction,dividedbythe\size"ofthataction(recallthatthesizeof anactionrepresentsthenumberofmachineinstructions neededtoimplementit) thus,everythingelsebeing equal,anactionthatismoreexpensiveintermsofthe numberofmachineinstructionsitrequireswillhavea smallerweightthanonewithsmallersize,andhencebe pickedearlier,therebyallowingmorecallsitestobypass it.sinceingeneraltheremaybedependenciesbetween instructionsthatrestrictthesetoflegalorderings(e.g., seetheexampleinsection4),thealgorithmrstconstructsadependencygraphwhosenodesaretheentry actionsunderconsideration,andwherethereisanedge fromanodee1toanodee2ife1mustprecedee2in anylegalexecution;thesetofpredecessorsofanodex inthisgraphisdenotedbypreds(x).thealgorithmis simple:itrepeatedlypicksan\available"action(i.e., anactionwhosepredecessorsinthedependencygraph Ghavealreadybeenpicked)ofleastweight,thenupdatesthecountersoftheappropriatecallsitesaswell asthelistofactivecallsites,deletingfromthislistany callsitethathasreacheditslimitofthenumberofactionsthatcanbecopiedfromthecallee.thisprocess continuesuntilallactionshavebeenenumerated.the algorithmisdescribedinfigure1. 4AnExample InthissectionweconsiderinmoredetailtheavefunctionfromSection1toseetheeectofcallforwarding onthecodegenerated.toillustratethefactthatthis optimizationisnotlimitedtocodefortypechecking, weconsiderherearealizationofthisfunctioninprolog. Asinotherlogicprogramminglanguages,unication betweenvariablesinprologcansetupchainsofpointers,andloadingthevalueofavariablerequiresdereferencingsuchchains.anumberofauthorshaveshown thatsignicantperformanceimprovementsarepossible ifthelengthsofthesepointerchainscanbepredictedvia compile-timeanalysis,sothatunnecessarydereferencingcodecanbedeleted[7,12,16];however,theanalyses involvedarefairlycomplex.hereweshowhow,inmany

cases,unnecessarydereferenceoperationscanbeeliminatedusingcallforwarding.theprocedureisdened asfollows: ave([],sum,count,avg):- AvgisSum/Count. ave([h L],Sum,Count,Avg):- Sum1isSum+H,Count1isCount+1, ave(l,sum1,count1,avg). Assumethat,asinmanymodernLispandPrologimplementations,parametersarepassedin(virtualmachine) registers,sothattherstparameterisinregisterarg1, thesecondparameterinregisterarg2,andsoon.figure 2(a)givestheintermediatecodethatmightbegeneratedinastraightforwardway.(Inreality,thegenerated codewoulddistinguishbetweenthenumerictypesint andfloat,e.g.,using\messagesplitting"techniquesas in[5,6] thedistinctionisnotimportanthere,andwe assumeasinglenumerictypeforsimplicityofexposition.)therstsixinstructionsofaveareentryactions thatcanbeexecutedinanyorderwherethedereferencingofaregisterprecedesitsuse.moreover,atthe (recursive)callsiteforave,weknowfromthesemanticsoftheaddinstructionthatarg1andarg2areboth numbers,andthatthereisnoneedforeitherdereferencingortypecheckingoftheseregisters.theentry actionscorrespondingtodereferencingandtypecheckingoftheseregisterscanthereforebebypassedbythe recursivecallsite.assumethatapartfromtherecursive call,thereisanothercallsite(the\initial"call)forthe procedureave.fornotationalbrevityinthediscussion thatfollows,denotetheinstructionsaboveasfollows: Arg1:=deref(Arg1) 7!a Arg2:=deref(Arg2) 7!b Arg3:=deref(Arg3) 7!c if:list(arg1)gotoerr7!d if:number(arg2)gotoerr7!e if:number(arg3)gotoerr7!f Finally,assumethatnocopyingofcodetocallsites isallowed.then,wecanformulatethisasacallforwardingproblemhe;c;w;s;kiasfollows: E=fa;b;c;d;e;fg; C=fc1;c2g,wherec1=fa;b;c;d;e;fgisthe initialcallsite,andc2=fa;dgistherecursive callsite; w=fc17!1;c27!10g,i.e.,weassumethatloops iterateabout10timesontheaverage; the\sizefunction"smapseachentryactionine to1(forsimplicity);and k=0,i.e.,nocopyingofcodetocallsitesisallowed. Initially,thesetofavailableactionsisfa;b;cg,andboth callsitesareactive,sotheweightscomputedforthese actionsare:a:11;b:1;c:1.therearetwoactions, bandc,thathavelowestweight,andoneofthem say,b ispickedbythealgorithm.asaresult,the callsitec1becomesinactive.thesetofavailableactionsatthispointisfa;c;eg,withweights10,0,0respectively.therearetwoactions,cande,withlowest weight,andoneofthem say,c ispicked.thealgorithmproceedsinthismanner,eventuallyproducingthe sequencehb;c;e;f;a;diasasolutiontothiscallforwardingproblem.inotherwords,callforwardingordersthe entryactionssothatthedereferencingandtypetests onarg2andarg3comerst,andcanbeskippedby therecursivecalltoave.theresultingcodeisshownin Figure2(b).Noticethatthecodefordereferencingand typecheckingthesecondandthirdargumentshaveeffectivelybeen\hoisted"outoftheloop.moreover,this hasbeenaccomplished,notbyrecognizinganddealing withloopsinsomespecialway,butsimplybyusing theinformationavailableatcallsites.itisapplicable, therefore,eventocomputationsthatarenotiterative (i.e.,tailrecursive),includingproceduresthatinvolve arbitrarylinear,nonlinear,andmutualrecursion. 5ExperimentalResults Weranexperimentsonanumberofsmallbenchmarks togauge(i)theecacyofgreedyalgorithm,i.e.,the qualityofitssolutionscomparedtotheoptimal;and(ii) theecacyoftheoptimization,i.e.,theperformance improvementsresultingfromit.thenumberspresented reecttheperformanceofjc[11],animplementationof alogicprogramminglanguagecalledjanus[13]ona Sparcstation-1.3Thissystemiscurrentlyavailableby anonymousftpfromcs.arizona.edu. Table1gives,foreachbenchmark,thenumberof machineinstructionsthatwouldbeexecutedoverall callsitesfortheentryactionsintheproceduresonly, using(i)nocallforwarding;(ii)callforwardingusing thegreedyalgorithm;and(iii)optimalcallforwarding. Theweightsforthecallsiteswereestimatedusingthe structureofthecallgraph:weassumedthatontheaverage,eachloopiteratesabout10times,andthebranches ofaconditionalaretakenwithequalfrequency.while theoptimizationswerecarriedoutattheintermediate codelevel,weusedcountsofthenumberofsparcassemblyinstructionsforeachintermediatecodeinstruction, togetherwiththeexecutionfrequenciesestimatedfrom thecallgraphstructure,toestimatetheruntimecost 3Ourimplementationusesavariantofcallforwardingwhere entryactionsarecopiedfromthecalleetothecallsitesaslong asthiswillallowalateractiontobeskipped.

ofthedierentsolutions.theresultsindicatethatthe greedyheuristichasuniformlygoodperformance:on thebenchmarks,itattainstheoptimalsolutionineach caseṫable2givestheimprovementsinspeedresulting fromouroptimizations,andservestoevaluatetheef- cacyofcallforwarding.thetimereportedforeach benchmark,inmilliseconds,isthetimetakentoexecutetheprogramonce.thistimewasobtainedby iteratingtheprogramlongenoughtoeliminatemosteffectsduetomultiprogrammingandclockgranularity, thendividingthetotaltimetakenbythenumberofiterations.theexperimentswererepeated20timesfor eachbenchmark,andtheaveragetimetakenineach case.callforwardingaccountsforimprovementsrangingfromabout12%toover45%.mostofthisimprovementcomesfromcodemotionoutofinnerloops:the vastmajorityoftypetestsetc.inaprocedureappearas entryactionsthatarebypassedinrecursivecallsdueto callforwarding,eectively\hoisting"suchtestsoutof innerloops.asaresult,muchoftheruntimeoverhead fromdynamictypecheckingisoptimizedaway. Table3putsthesenumbersinperspectivebycomparingtheperformanceofjctoQuintusandSicstus Prologs,twowidelyusedcommercialPrologsystems. OncomparingtheperformancenumbersfromTable2 forjcbeforeandafteroptimization,itcanbeseenthat theperformanceofjciscompetitivewiththesesystemsevenbeforetheapplicationoftheoptimizations discussedinthispaper.itiseasytotakeapoorlyengineeredsystemwithalotofinecienciesandgethuge performanceimprovementsbyeliminatingsomeofthese ineciencies.thepointofthistableisthatwhenevaluatingtheecacyofouroptimizations,wewerecareful tobeginwithasystemwithgoodperformance,soasto avoiddrawingoverlyoptimisticconclusions. Finally,Table4comparestheperformanceofour JanussystemwithCcodeforsomesmallbenchmarks.4 Again,thesewererunonaSparcstation1,withccas theccompiler.theprogramswerewritteninthestyle onewouldexpectofacompetentcprogrammer:no recursion(exceptintakandnrev ano(n2)\naive reverse"programforreversingalinkedlistofintegers whereitishardtoavoid),destructiveupdates,andthe useofarraysratherthanlinkedlists(exceptinnrev, whichbydenitiontraversesalist).thesourcecode forthesebenchmarksisgiveninappendixb.itcanbe seenthattheperformanceofjcisnotveryfarfromthat 4TheJanusversionofqsortusedinthistableisslightlydifferentfromthatofTable3:inthiscasethereareexplicitinteger typetestsintheprogramsource,tobeconsistentwithintdeclarationsinthecprogramandallowafaircomparisonbetween thetwoprograms.thepresenceofthesetestsprovidesadditionalinformationtothejccompilerandallowssomeadditional optimizations. ofc,attainingapproximatelythesameperformanceas unoptimizedccode,andbeingonlyaboutafactorof 2,ontheaverage,slowerthanCcodeoptimizedatlevel -O4.Onsomebenchmarks,suchasnrev,jcoutperformsunoptimizedCandisnotmuchslowerthanoptimizedC,eventhoughtheCprogramusesdestructiveassignmentanddoesnotallocatenewconscells, whilejanusisasingleassignmentlanguagewherethe programallocatesnewconscellsateachiteration its performancecanbeattributedatleastinparttothe benetsofcallforwarding. 6RelatedWork Theoptimizationsdescribedherecanbeseenasgeneralizingsomeoptimizationsfortraditionalimperative languages[2].inthespecialcaseofa(conditionalor unconditional)jumpwhosetargetisa(conditionalor unconditional)jumpinstruction,callforwardinggeneralizestheow-of-controloptimizationthatcollapses chainsofjumpinstructions.callforwardingisableto dealwithconditionaljumpstoconditionaljumps(this turnsouttobeanimportantsourceofperformanceimprovementinpractice),whiletraditionalcompilersfor imperativelanguagessuchascandfortrantypically dealonlywithjumpchainswherethereisatmostone conditionaljump(see,forexample,[2],p.556). Whenweconsidercallforwardingforthelastcall inarecursiveprocedure,whatwegetisessentiallya generalizationofcodemotionoutofloops,inthesense thatthecodethatisbypassedduetocallforwardingat aparticularcallsiteneednotbeinvariantwithrespect totheentireloop.thepointisbestillustratedbyan example:considerafunction f(x)=ifx=0then1 elseifp(x)thenf(g(x-1))/*1*/ elsef(h(x-1)) /*2*/ Assumethattheentryactionsforthisfunctioninclude atestthatitsargumentisaninteger,andsupposethat weknow,fromdataowanalysis,thatg()returnsaninteger,butdonotknowanythingaboutthereturntype ofh().fromtheconventionaldenitionofa\loop"in aowgraph(see,forexample,[2]),thereisoneloop intheowgraphofthisfunctionthatincludesboth thetailrecursivecallsitesforf().becauseofourlack ofknowledgeaboutthereturntypeofh(),wecannot claimthat\theargumenttof()isaninteger"isaninvariantfortheentireloop.however,usingcallforwarding,theintegertestintheportionofthelooparising fromcallsite1canbebypassed.eectively,thismoves somecodeoutof\partof"aloop.moreover,ouralgorithmimplementsinterproceduraloptimizationandcan dealwithbothdirectandmutualrecursion,aswellas non-tail-recursivecode,withouthavingtodoanything

special,whiletraditionalcodemotionalgorithmshandle onlytheintra-proceduralcase. Theideaofcompilingfunctionswithmultipleentry pointsisnotnew:manylispsystemsdothis,standardmlofnewjerseyandyalehaskellgeneratedual entrypointsforfunctions,andaquariusprologgeneratesmultipleentrypointsforprimitiveoperations[18]. However,wedonotknowofanysystemthatattempts toordertheentryactionscarefullyinordertomaximize thesavingsfrombypassingentryactions. Someoptimizationsusedinstaticallytypedlanguagescanalsobethoughtofintermsofcallforwarding. Forexample,StandardMLofNewJerseyusesacombinationofthreetransformations inverseeta-reduction, uncurrying,andargumentattening tooptimizefunctionswherealloftheknowncallsitespasstuplesofthe samesizeasarguments,butwherethefunctionmay \escape,"i.e.,notallofcallsitesareknownatcompiletime[1].theideaistohavetheknowncallsites passargumentsinregistersinsteadofconstructingand deconstructingtuplesontheheap,whilecallsitesthat areunknownatcompiletimeexecuteadditionalcode tocorrectlydeconstructthetuplestheypass.thisoptimizationcanbethoughtofintermsofcallforwarding asfollows:supposethateachknowncallsiteforafunctionconstructsandpassesann-tupleastheargument, whichisthendeconstructedwithnselectoperations atthecallee.wecancopythenselectoperations fromthecalleetoeachknowncallsite,andforwardthe callstoenterthecalleebypassingtheseoperations.at eachofthesecallsites,theconstructionoftheargumentn-tuplefollowedbynselectsonitcaneasilybe recognizedasinverseoperationsthatcanbeoptimized toavoidhavingtoactuallybuildtuplesontheheap. Thus,knowncallsitescanbeexecutedeciently,while callsitesthatarenotknownatcompiletimeenterat theoriginalentrypointandexecutetheselectoperationsintheexpectedway.indeed,thewholepointof inverseeta-reductionistogeneratetwoentrypointsfor afunctionsothatknowncallsitescanbypassunnecessarycode:callforwardingcanbeseenasawayof extendingthisideatogetmorethantwoentrypoints wherenecessary. ChambersandUngarconsidercompile-timeoptimizationtechniquestoreduceruntimetypechecking indynamicallytypedobject-orientedlanguages[5,6]. Theirapproachusestypeanalysistogeneratemultiple copiesofprogramfragments,inparticularloopbodies,whereeachcopyisspecializedtoaparticulartype andthereforecanomitsometypetests.someofthe eectsoftheoptimizationwediscuss,e.g.,\hoisting" typetestsoutofloops(seesection4),aresimilarto eectsachievedbytheoptimizationofchambersand Ungar.Ingeneral,however,itisessentiallyorthogonaltotheworkdescribedhere,inthatitisconcerned primarilywithtypeinferenceandcodespecialization ratherthanwithcodeordering.becauseofthis,the twooptimizationsarecomplementary:evenifthebody ofaprocedurehasbeenoptimizedusingthetechniques ofchambersandungar,itmaycontaintypetestsetc. attheentry,whicharecandidatesfortheoptimization wediscuss;conversely,the\messagesplitting"optimizationofchambersandungarcanenhancetheeectsof callforwardingconsiderably. 7Conclusions Thispaperdiscussescallforwarding,asimpleinterproceduraloptimizationtechniquefordynamicallytyped languages.thebasicideabehindtheoptimizationisextremelystraightforward:ndanorderingforthe\entry actions"ofaproceduresuchthatthesavingsrealized fromdierentcallsitesbypassingdierentsetsofentry actions,weightedbytheirestimatedexecutionfrequencies,isaslargeaspossible.itturnsout,however,tobe quiteeectiveforimprovingprogramperformance.we showthattheproblemofcomputingoptimalsolutions toarbitrarycallforwardingproblemsisnp-complete, anddescribeanecientheuristicfortheproblems.experimentalresultsindicatethatthesolutionsproduced aregenerallyoptimalorclosetooptimal,andleadto signicantperformanceimprovementsforanumberof benchmarkstested.avariantoftheseideashasbeen implementedinjc,alogicprogrammingsystemthatis availablebyanonymousftpfromcs.arizona.edu. References [1]A.Appel,CompilingwithContinuations,CambridgeUniversityPress,1992. [2]A.V.Aho,R.SethiandJ.D.Ullman,Compilers{ Principles,TechniquesandTools,Addison-Wesley, 1986. [3]T.BallandJ.Larus,\OptimallyProlingand TracingPrograms",Proc.19th.ACMSymp. onprinciplesofprogramminglanguages,albuquerque,nm,jan.1992,pp.59{70. [4]M.CarlssonandJ.Widen,SICStusPrologUser's Manual,SwedishInstituteofComputerScience, Oct.1988. [5]C.ChambersandD.Ungar,\IterativeType AnalysisandExtendedMessageSplitting:OptimizingDynamicallyTypedObject-OrientedPrograms",Proc.SIGPLAN'90ConferenceonProgrammingLanguageDesignandImplementation, WhitePlains,NY,June1990,pp.150{164.SIG- PLANNoticesvol.25no.6. [6]C.Chambers,D.UngarandE.Lee,\AnEcient ImplementationofSELF,ADynamicallyTyped

Object-OrientedLanguageBasedonPrototypes", Proc.OOPSLA'89,NewOrleans,LA,1989,pp. 49{70. [7]S.K.Debray,\ASimpleCodeImprovement SchemeforProlog",J.LogicProgramming,vol.13 no.1,may1992,pp.57-88. [8]M.R.GareyandD.S.Johnson,Computersand Intractability:AGuidetotheTheoryofNP- Completeness,Freeman,NewYork,1979. [9]M.R.Garey,D.S.Johnson,andL.Stockmeyer, \SomeSimpliedNP-completeGraphProblems", TheoreticalComputerSciencevol.1,pp.237{267, 1976. [10]A.GoldbergandD.Robson,Smalltalk-80:The LanguageanditsImplementation,Addison-Wesley, 1983. [11]D.Gudeman,K.DeBosschere,andS.K.Debray, \jc:anecientandportableimplementation ofjanus",proc.jointinternationalconference andsymposiumonlogicprogramming,washingtondc,nov.1992.mitpress. [12]A.Marien,G.Janssens,A.Mulkers,andM. Bruynooghe,\TheImpactofAbstractInterpretation:AnExperimentinCodeGeneration",Proc. SixthInternationalConferenceonLogicProgramming,Lisbon,June1989,pp.33{47.MITPress. [13]V.Saraswat,K.Kahn,andJ.Levy,\Janus:A steptowardsdistributedconstraintprogramming", inproc.1990northamericanconferenceonlogic Programming,Austin,TX,Oct.1990,pp.431-446. MITPress. [14]J.T.Schwartz,R.B.K.Dewar,E.Dubinsky,and E.Schonberg,ProgrammingwithSets:AnIntroductiontoSETL,Springer-Verlag,1986. [15]G.L.SteeleJr.,CommonLisp:TheLanguage, DigitalPress,1984. [16]A.Taylor,\RemovalofDereferencingandTrailing inprologcompilation",proc.sixthinternational ConferenceonLogicProgramming,Lisbon,June 1989,pp.48{60.MITPress. [17]K.Ueda,\GuardedHornClauses",inConcurrent Prolog:CollectedPapers,vol.1,ed.E.Shapiro,pp. 140-156,1987.MITPress. [18]P.VanRoy,CanLogicProgrammingExecuteas FastasImperativeProgramming?,PhDDissertation,UniversityofCalifornia,Berkeley,Nov.1990. [19]D.W.Wall,\PredictingProgramBehaviorUsing RealorEstimatedProles",Proc.SIGPLAN-91 Conf.onProgrammingLanguageDesignandImplementation,June1991,pp.59{70. AAppendix:ProofofNPCompleteness Thefollowingproblemisusefulindiscussingthecomplexityofoptimalcallforwarding: DenitionA.1TheOptimalLinearArrangement problem(ola)isdenedasfollows:givenagraph G=(V;E)andanintegerk,ndapermutation,f, fromtheverticesinvto1;:::;nsuchthatdeningthe lengthofedge(i;j)tobejf(i)?f(j)j,thetotallength ofalledgesislessthanorequaltok. ThefollowingresultisduetoGarey,Johnson,and Stockmeyer[8,9]: TheoremA.1TheOptimalLinearArrangementproblemisNP-complete. Thefollowingresultgivesthecomplexityofoptimalcall forwarding: Theorem3.1Thedeterminationofanoptimalsolution toacallforwardingproblemisnp-complete.itremains NP-completeevenifeveryentryactionhasequalsize. Proof:Werstformulateoptimalcallforwardingasa decisionproblem,asfollows:\givenacallforwarding problemiandanintegerk0,isthereasolutiontoi withcostnogreaterthank?"werefertothisproblem ascf.theproofisbyreductionfromoptimallineararrangementproblem,which,fromtheorema.1, isnp-complete.letg=(v;e);kbeaparticularinstanceofola.wemakethefollowingtransformation toaninstanceha;c;w;s;kiofcf,where: {Aisthesetofvertices1;:::;ninValongwith twodummyverticessandt; {TheelementsofCarealldoubletonsets: {correspondingtoeachedge(u;v)2e,there isanelementfu;vgincwithweight1: forterminologicalsimplicityinthediscussion thatfollows,werefertotheseelementsas normalsets; {letbethemaximumdegreeofanyvertex ing,thencorrespondingtoeachvertexi2g ofdegreedi,thereisanelementfi;sginc withweight12(?di)(someofthesesets

couldhavezeroweight,inwhichcasethey caneectivelyberemoved):werefertothese elementsasspecialsets; {nally,thereisanelementfs;tgincof weightm,wheremislargeenoughtoensure thatsandthavetobethelasttwoelements inanyoptimalorderingofthevertices(m canbechosentoben3orgreater):werefer tothiselementasaheavyset. {s(i)=1foreveryi2a. {k=0. WealsohavetodenethenumberKthatistobound thecostofthecallforwardingproblemsoconstructed. LetK=14n(n+5)+3M+k=2.Weclaimthatthe instanceofcfsodenedhasasolutionwithcostno greaterthankifandonlyifthegiveninstanceofola hasasolution. ConsideranyproposedorderofelementsinasolutiontotheinstanceofCFdenedabove.Thecostof thissolutioncanbedecomposedasfollows: Aswemarchalongthelistofelements,ateachpoint wecharge=2toeachoftheelementswehaveseenso farbutnottoeitherofthespecialelements.ifvertex i2gisencountered,thechargeof=2onvertexifrom thenoncanbethoughtofaspaying1/2towardseach ofthenormalsetsthatcontainiandpayingtheentire costofthespecialsetthatcontainsi.nowifbothelementsofanormalsethavebeenencountered,thetotal costofthesetwillfromthenonbepickedupbythese chargestothevertices.foranormalsetfi;jg,afteri hasbeenencounteredandbeforejhasbeenencountered theextrachargeof1/2ateachstagewillbechargedto theedge(i;j).breakingupthechargesasabove,one ndsthatforanyorderinwhichsandtnishlast,the chargetotheverticesisaconstantindependentofthe orderandisequalto14(n(n+5))andthechargefor theheavysetisxedat3m.theonlyvariableisthe chargetotheedgesandthischargewillbeexactlyhalf thetotallengthoftheedges,sinceanedgegetscharged onlyafteroneofitsendpointshasbeenencounteredand beforetheotherendpointhasbeenencountered,i.e.for the\duration"ofitslength. ThusthereisaYESanswertotheinstanceofCF createdifandonlyifthetotallengthofall\normal" edgesiskepttokorless,or,inotherwords,ifandonly iftheinstanceofolaisayes-instance.(notethat sincethecostofthespecialsetsisentirelypickedup bythevertices,thelengthsofthespecialedgesdonot matter.) BSourceCodeforSomeBenchmarks ThesourcecodeforthebenchmarksusedinthecomparisonbetweenjcandCisgivenbelow.Forspace reasons,onlythecodeforthemainfunctionsisgiven. nrev:c: typedefstructs{ inthead; structs*tail; }cons_node; cons_node*append(l1,l2) cons_node*l1,*l2; {cons_node*l3; if(l1==null)returnl2; else{ for(l3=l1;l3->tail!=null;l3=l3->tail) ; l3->tail=l2; returnl1; } }cons_node*nrev(l) cons_node*l; {cons_node*l1; if(l==null)returnnull; else{ l1=l->tail; l->tail=null;/*reclaimheadnode*/ returnappend(nrev(l1),l); } }Janus: nrev([],^[]). nrev([h L1],^R):- nrev(l1,^r1),app(r1,[h],^r). app([],l,^l). app([h L1],L2,^[H L3]):-app(L1,L2,^L3). binomial:c: /*fact()asinthefactorialbenchmark*/ intpow(x,i) intx,i; {intprod; for(prod=1;i>0;i--)prod*=x; returnprod; }intchoose(n,k) intn,k; {returnfact(n)/(fact(k)*fact(n-k)); }intbinomial(x,y,n)

intx,y; {inti,prod=0; for(i=0;i<=n;i++) prod+=choose(n,i)*pow(x,i)*pow(y,n-i); returnprod; }Janus: /*fact()asinthefactorialbenchmark*/ pow(x,n,^p):-int(x) pow(x,n,^p,1). pow(x,0,^p,a):-int(x),int(a) P=A. pow(x,n,^p,a):- int(x),int(n),int(a),n>0 pow(x,n-1,^p,x*a). choose(n,k,^c):-int(n),int(k) fact(n,^f1),fact(k,^f2),fact(n-k,^f3), C=F1//(F2*F3). binomial(x,y,n,^z):- int(x),int(y),int(n),n>=0 binomial(x,y,n,^z,n). binomial(_,_,_,^0,0). binomial(x,y,n,^z,k):- int(x),int(y),int(n),int(k),k>0 binomial(x,y,n,^z1,k-1), choose(n,k,^c), pow(x,k,^xp), pow(y,n-k,^yp), Z=Z1+C*Xp*Yp. dnf:c: dnf(in,r,w,b) intin[],r,w,b; {inttemp; while(r<=w){ if(in[w]==0){ temp=in[w];in[w]=in[r];in[r]=temp; R+=1; }elseif(in[w]==1) W-=1; elseif(in[w]==2){ temp=in[w];in[w]=in[b];in[b]=temp; B-=1;W-=1; } } }Janus: dnf(in,r,w,b,^out):- int(r),int(w),r>w Out=In. dnf(in,r,w,b,^out):- int(r),int(w),r=<w,in.w=red dnf(in[r->in.w,w->in.r],r+1,w,b,^out). dnf(in,r,w,b,^out):- int(r),int(w),r=<w,in.w=white dnf(in,r,w-1,b,^out). dnf(in,r,w,b,^out):- int(r),int(w),r=<w,in.w=blue dnf(in[b->in.w,w->in.b],r,w-1,b-1,^out). tak:c: inttak(x,y,z) intx,y,z; {if(x<=y)returnz; returntak(tak(x-1,y,z), tak(y-1,z,x), tak(z-1,x,y)); }Janus: tak(x,y,z,^a):- int(x),int(y),int(z),x>y tak(x-1,y,z,^a1), tak(y-1,z,x,^a2), tak(z-1,x,y,^a3), tak(a1,a2,a3,^a). tak(x,y,z,^a):- int(x),int(y),int(z),x=<y A=Z. factorial:c: intfact(n) intn; {intprod; for(prod=1;n>0;n--) prod*=n; returnprod; }Janus: fact(n,^x):- int(n),n>=0 fact(n,^x,1). fact(n,^f,a):- int(a),int(n),n>0 fact(n-1,^f,a*n). fact(0,^f,a):-int(a) F=A.

Output:AsolutiontoI,i.e.,apermutationofE. Input:AcallforwardingproblemI=hE;C;w;s;ki. Method: beginactivesites:=c; :=hi; foreachc2cdocount[c]:=kod Processed:=;; AvailInstrs:=therootnodesofG; whileavailinstrs6=;do constructthedependencygraphgforlegalexecutionorders; Processed:=Processed[fIg; od; I:=anelementofAvailInstrswiththeleastweightsocomputed; :=appenditotheendof; AvailInstrs:=(AvailInstrsnfIg)[fJ2Ejpreds(J)Processedg; foreachi2availinstrsdo foreachc2activesitess.t.i2cdo/*updatelistofactivesites*/ computetheweightofias(pfw(c)jc2activesitesandi2cg)=s(i); elsecount[c]:=count[c]?1; ifcount[c]=0then deletecfromactivesites; /*extendsolution*/ /*updatelistofavailableinstructions*/ endod; return; od Figure1:AGreedyAlgorithmforCallForwarding

ave:arg1:=deref(arg1) if:number(arg3)gotoerr Arg2:=deref(Arg2) t1:=head(arg1) ifarg1==nilgotol1 if:number(arg2)gotoerr Arg1:=tail(Arg1) Arg3:=deref(Arg3) if:list(arg1)gotoerr ave:arg2:=deref(arg2) t1:=deref(t1) L0:Arg1:=deref(Arg1) ifarg1==nilgotol1 t1:=head(arg1) if:number(arg3)gotoerr if:list(arg1)gotoerr Arg3:=deref(Arg3) Arg1:=tail(Arg1) if:number(arg2)gotoerr L1:t1:=div(Arg2,Arg3) if:number(t1)gotoerr Arg2:=add(Arg2,t1) Arg3:=add(Arg3,1) gotoave if:number(t1)gotoerr t1:=deref(t1) Arg2:=add(Arg2,t1) Arg3:=add(Arg3,1) Figure2:TheEectofCallForwardingonIntermediateCodefortheaveprocedure Arg4:=deref(Arg4) (a)beforecallforwarding assign(arg4,t1) L1:t1:=div(Arg2,Arg3) (b)aftercallforwarding gotol0 Arg4:=deref(Arg4) assign(arg4,t1)

Program hanoi tak nrev qsort nooptimization 1776 492 574 726 greedy 225 172 360 450 optimal 225 172 360 Table1:EcacyofthegreedyCallForwardingheuristic(inSparcassemblyinstructions) factorial merge 129 720 330 24 450 Programw/oforwarding(ms)withforwarding(ms) dnf pi 5963 124 306 1304 25 1304 330 24 binomial 5.95 5.14 %improvement 13.6 25 hanoi tak 186 299 163 207 12.4 Programjc(J)(ms)Sicstus(S)(ms)Quintus(Q)(ms)S/J nrev qsort merge 0.745 1.17 2.31 0.716 0.613 1.87 19.0 30.8 hanoi dnf Table2:PerformanceImprovementduetoCallForwarding 0.356 0.191 38.8 tak nrev 163 300 690 1.8417.7 qsort 207 730 2200 3.5346.3 factorial0.049 0.716 1.87 5.1 1.8 7.9 9.4 2.51 2.73 11.03 10.63 Q/J 4.23 Program nrevtable3:theperformanceofjc,comparedwithsicstusandquintusprolog GeometricMean: 0.44 0.27 8.98 3.31 5.03 binomialjc(j)(ms)c(unopt)(ms)c(opt:-o4)j/c-unoptj/c-opt 0.716 0.89 0.52 0.80 5.51 dnf qsort 5.14 4.76 3.17 1.08 6.72 tak factorial 0.191 0.049 1.33 207 0.191 1.25 208 0.061 0.34 72 1.06 1.62 3.91 1.38 Table4:TheperformanceofjccomparedtoC GeometricMean: 0.049 0.036 1.00 0.98 3.13 2.88 1.36 2.18