forexploitingilpwithinloops,whichcaneectivelyoverlaptheexecutionofoperationsfrom

Transcription

1 SoftwarePipeliningwithRegisterAllocationandSpilling M.AntonErtlChristineEisenbeisz InstitutfurComputersprachen TechnischeUniversitatWien JianWangyAndreasKrall A-040Wien,Austria Argentinierstr.8 (RRG)whichcandynamicallyreecttheregisterrequirementduringsoftwarepipelining. techniqueandstudytheproblemofregisterspillingforsoftwarepipelining.wealsopresent threealgorithms{rpswithoutspilling,rpswithspillingandthesoftwarepipeliningwith mainsanopenproblem.inthispaper,werstpresenttheregisterrequirementgraph alimitednumberofregisters.thepreliminaryexperimentalresultsshowthatthersttwo Then,usingtheRRGasabasis,wedevelopaRegister-Pressure-Sensitive(RPS)scheduling Simultaneousregisterallocationandsoftwarepipeliningisstilllessunderstoodandre- Abstract performanceandthethirdcaneectivelyexploitinstruction-levelparallelismwithinloops algorithmscanecientlyreducetheregisterrequirementwithoutdegradationoftheoptimal Introduction Keywords:Instruction-levelParallelism,LoopScheduling,SoftwarePipelining,Register evenforthosemachineswithasmallregisterle. IthasbeenwellknownthatexploitingInstruction-LevelParallelism(ILP)withinloopshasbecomeakeycompilationissuefortheinstruction-levelparallelprocessorslikeVeryLongInstructionWord(VLIW)andsuperscalarmachines[,,].Softwarepipelininghasbeenproposed dierentiterations[4,5,6,7,8,9,0,,]. knownthatperformingregisterallocationbeforesoftwarepipeliningmayintroduceunacceptable anti-dependencesduetothereuseofregisters,whichmaylimitsoftwarepipelining[7,].on theotherhand,ifsoftwarepipeliningisdonebeforeregisterallocation,moreregistersthan pipeliningisstilllessunderstoodandremainsopen. theperformanceofthepipelinedloop[].however,simultaneousregisterallocationandsoftware necessarymaybeneeded,whichmaycauseunnecessaryregisterspillingsandseverelydegrade RegisterAllocationisanotherkeycompilationissue[,4,5,6,7].Ithasbeenwell Allocation,Spilling,DataDependenceGraph forexploitingilpwithinloops,whichcaneectivelyoverlaptheexecutionofoperationsfrom andtheaustrianscienceandresearchministry. ThisworkwassupportedbytheLiseMeitnerStipendiumfundedbytheAustrianScienceFoundation(FWF) zdr.eisenbeisiswithinria-rocquencourt,domainedevoluceau,bp05-785,lechesnaycedex,france.

2 sincethemid980s[0,8,,6,9],andregisterallocationforsoftwarepipelinedloophas ofregistersneededforagivenmoduloscheduledloop[].ningandgaohavepresented numberofregistersneededforndingsomeoptimalsoftwarepipelinedloop[],buttheydo beenstudiedbymanyresearchersandsomeecienttechniqueshavebeenproposed[0,, aframeworkofregisterallocationforsoftwarepipeliningbywhichtheydeducetheminimal consideredinfewstudies.mangione-smith,etal.developedalowerboundonthenumber 7,5].However,theinteractionbetweenregisterallocationandsoftwarepipeliningwaslately Theinteractionbetweenregisterallocationandloop-freecodeschedulinghasbeenstudied performthemoduloschedulingwithatryforshorteningthelifetimeofavariable,buthedoes hasbeenpresentedbyhu[],inwhichheusestheideaofbidirectionalslack-schedulingto notconsidertheresourceconstraints.acalledlifetime-sensitivemoduloschedulingtechnique notconsidertheregisterspillingproblem. necessaryisneeded.ontheotherhand,fromtherrgwecandynamicallyestimatetheregister theregisterrelatedinformationtoguidetheschedulingprocesssuchthatnomoreregisterthan requirementsuchthatthespillingdecisionandthetradeobetweentheinitiationintervaland controltheregisterpressurecausedbysoftwarepipeliningitself.ononehand,therrggives registerrequirementduringsoftwarepipelining.whilesoftwarepipelining,therrgisusedto understandtheinteractionbetweenregisterallocationandsoftwarepipelining,wepresenta novelframework,calledregisterrequirementgraph(rrg),whichcandynamicallyreectthe Ourapproachespresentedinthispaperaredierentfromalloftheabove.Inorderto registerpressureareecientlymade. pipeliningwithalimitednumberofregisters(section6);(5)givethepreliminaryexperimental resultstoindicatetheeciencyofthethreealgorithms(section7). (RPS)schedulingtechnique(Section4);()Studytheproblemofregisterspillingtoreducethe registerpressurewithoutdegradationoftheoptimalperformance(section5);(4)present duringsoftwarepipelining(section);()usetherrgtodeveloparegister-pressure-sensitive threesoftwarepipeliningalgorithms{rpswithoutspilling,rpswithspillingandthesoftware thispapercanbeconcludedasfollows:()presenttherrgtoestimatetheregisterrequirement Thenextsectiongivesabackgroundtomakethispaperself-contained.Theworkreportedin ThedatadependencesofaloopcanberepresentedbyaLoopDataDependenceGraph(LDDG), DecomposedSoftwarePipelining(DESP) acyclic;secondly,weapplythelistschedulingtechniqueonthemodiedgraphtogenerate asanexample.first,wemodifythelddgbyremovingsomeedgessothatthegraphbecomes distanceandthedelayaretwonon-negativeintegersassociatedwitheachedge.for (O;E;;),whereOistheoperationsetandEthedependenceedgeset;thedependence thesoftwarepipelinedloopbodyundertheresourceconstraints,andusetherow-numberto denotethecycle-numberofeachoperationintheloopbody;thirdly,wedeterminetheiterationnumber(denotedascolumn-numberinthecontextofdesp)ofeachoperationsuchthatall startoftheoperationopofthe(e)thpreviousiteration[,9]. example,e=(op;op0)and((e);(e))denotethatop0canonlybeissued(e)cyclesafterthe datadependencesinlddgaresatised. DESPisanovelmoduloschedulingapproach,anditsideacanbeillustratedbyFigure. onesaredottedifwedonotattach(;)toeachedge. Forallexamplesinthispaper,theloop-independentdependenceedgesaresolidedgeswhereasloop-carried Formally,DESPtheoreticallydecomposestheloopscheduleintotwofunctions,row-number

3 5,,, 4; 6, ; andcolumn-number. LDDG MLDDG Denition.LetG=(O;E;;)betheLDDGofaloop,andavalidloopschedule rn 5,,,4 4 4 step 6, step step 5 5, 4; ; 5,,,4;,4, 6, ; 6 6 5,,6 column-number. Thus,softwarepipeliningcanbedescribedbelowwiththeconceptsofrow-numberand mappingsfromoton(non-negativeintegerset),suchthat forgwithinitiationintervalii.wedenetherow-numberrnandthecolumn-numbercn,two 5,; 6; Denition.(DecomposedSoftwarePipelining)LetG=(O;E;;)betheLDDG (op;)=rn(op)+ii(cn(op) )and(op;i)=(op;)+ii(i ): Figure Decomposed resource-conict; ifandonlyifthefollowingconstraintsaresatised:.resourceconstraints:8opi;opjo,ifrn(opi)=rn(opj),thenopiandopjcannotbe.dependenceconstraints: ofaloop,wesaythattherow-number,rn,andthecolumn-number,cn,arevalidfortheloop, goalofdecomposedsoftwarepipeliningistondvalidrow-numberandcolumn-numberwith IIiscalledastheinitiationintervalorthelengthofthesoftwarepipelinedloopbody.The minimumii. Inourpreviouspapers[4,5,6],wehaveproventhefollowingtheoreticalresults. 9IIN;8e=(op;op0)E;rn(op0) rn(op)+ii((e)+cn(op0) cn(op))(e): where(e)= (e)+d((e)+rn(op) rn(op0))=iie,e=(op;op0). dependenceconstraintsarealsosatised,ifandonlyif,foreachcyclecofthelddg, satisestheresourceconstraints.wecanconstructcolumn-numbercnsuchthatthedata Theorem.ForagivenLDDG,supposewehaveconstructedrow-numberrnwhich extendedtothecaseofmulti-cyclenon-pipelinedoperations. Here,weonlyconsiderthepipelinedoperationsandthesingle-cycleoperations,butthedenitioniseasily 8eC(e)0 X

4 RegisterRequirementGraph datadependenceconstraintsarealsosatised. accounttheresourceconstraints,thenwecanalwaysconstructcolumn-numbersuchthatthe ThefollowingcorallaryisdirectfromTheorem.. Corallary.ForaLDDGwithoutcycle,ifwehaveconstructedrow-numbertakinginto theschedulingprocess(determiningtherow-number). Graph(RRG)whichcandynamicallyestimatedcnij.TheRRGgivestheheuristicstoguide (denotedasdcnij).forexample,supposevariableuiswrittenbyopiandreadbyopj,then dcnijgivestheestimateofthelifetimeofu.thus,werstpresenttheregisterrequirement bythedierencebetweenthecolumn-numbersoftwooperationswhichhaveadatadependence theregisterrequirementofeachvariable.infact,theregisterrequirementismainlydetermined Indecomposedsoftwarepipelining,thecolumn-numberisanimportantparametertocontrol step,weusethefollowingmethodtomodifythelddg[4,5,6]: arenotincludedinthesccs; denotedas(rn0;cn0); ()ndoutallstronglyconnectedcomponents(sccs)inthelddg,removealledgeswhich ()undertheunlimitedresourceconstraints,generateasoftwarepipelinedloopforthesccs, OursoftwarepipeliningframeworkisbasedontheDESPasshowninFigure..Intherst weobtainanacyclicdependencegraphmlddg=(o;em;).anewgraph,calledregister requirementgraph,isdenedasrrg=(o;e;!),where!isaweightoneachedgewhich satisfyingthedatadependencesofthemlddgmustsatisfytheconditionoftheorem.. fromthesccs. GiventheLDDG(O;E;;)ofaloop,aftertherststepofdecomposedsoftwarepipelining, Theremaininggraphisacyclic,denotedasMLDDG.Wehaveproventhatanyrow-numbers ()foreachedgee=(opi;op j)ofsccs,ifrn0(opj) rn0(op i)<(e),thenremovee pipelinedloopbody,weinitiallydene!asfollows: representstheestimateddierencebetweenthecolumn-numbersoftwooperationsintheworst case.letmiibetheestimatedminimuminitiationinterval,beforeschedulingthesoftware determined; E Emasfollows: Whileschedulingthesoftwarepipelinedloopbody,werecompute!(e)foreache=(opi;opj) ()!(e)= (e)+d((e)+mii )=MIIe;8eE Em. ()!(e)= (e);8eem; ()!(e)= (e)+d((e) +rn(opi))=miie,ifrn(opi)isdeterminedbutrn(opj)isnot; ()!(e)= (e)+d((e) (rn(opj) rn(opi)))=miie,ifrn(opi)andrn(opj)bothare rn(op6)=andrn(op)=rn(op4)=. machinemodel.itslddgandmlddgareshowninfigure.()and(),respectively. Figure.()istheinitialRRG.Figure.(4)istheRRGwhenrn(op)=rn(op)=rn(op5)= ()!(e)= (e)++d((e) rn(opj))=miie,ifrn(opj)isdeterminedbutrn(opi)isnot; AnexampleofRRGisgiveninFigure.and.,Figure.()istheloopand()the 4

5 The Original Loop: for i= to n do s=s+a[i] a[i]=s*s*a[i] enddo () The Loop The Code of the Loop Body:. t0=t0+;. t=a[t0];. s=s+t; 4. t=s*s; 5. t=t*t; 6. a[t0]=t Figure. An Example Pipeline Number Operation Latency Memory port Load Store Address ALU Add/Sub Adder FAdd/FSub IAdd/ISub Multiplier FMUL IMUL () The Machine Model operationreadingthevariableinthelddg.thecriticaldenition-usepathofvariableu,cdupu, (0,) 0 isdenedas Adenition-usepathisdenedasapathfromtheoperationwritingavariabletoany (0,) 7 7 8ecdupu!(e)=max X anydupofu(x 8edupu!(e)): (0,) 0 4 (0,) (0,) () MLDDG () (4) givestheestimateoftheregisterrequirementofu. criticaldenition-usepathincludee. LetRRG=(O;E;!),foreachedgeeE,(e)isdenedasthenumberofvariableswhose ()LetRRG=(O;E;!),cdupubethecriticaldenition-usepathofu,thenP8ecdupu!(e) RRGhasthefollowingtwoproperties: Figure. LDDG, MLDDG and RRGs 4RPSScheduling ewhichisinthelddgbutnotincludedinthemlddg,ifeissatised(thatis,rn(opj) (e))(e)registerscomparedtothecasewheneisnotsatised. rn(opi)(e);e=(opi;opj)),thentheregisterrequirementmaybedecreasedbyupto(!(e)+ ()LetRRG=(O;E;!),duringschedulingthesoftwarepipelinedloopbody,foranyedge operations. inthesoftwarepipelinedloopbody; Wepresentthefollowingtwoheuristicstodirecttheschedulingprocess: Inthesecondstepofoursoftwarepipeliningframework,weuselistschedulingonthe ()Delaysomeoperationstobescheduledsuchthatsomedependenceedgescanbesatised ()Developregister-pressure-sensitiveheuristicstodeterminetheschedulingprioritiesfor 5

6 wendoutallschedulableoperationsatthecurrentcycleandputthemintothedataready MLDDG(obtainedintherststep)todeterminetherow-numbersforalloperations.First, Set(DRS),thenweselecttheoperationswiththehighestschedulingprioritytoschedule. somedependenceedgesavoidbeingunnecessarybroken. maybealotofschedulableoperationsinthedrsateachcycle.withoutincreaseofthe estimatedii,itisgreatlypossiblethatsomeoperationscanbedelayedtoschedulesuchthat AsmostdependenceedgesoriginallyintheLDDGhavebeenremovedintheMLDDG,there usingresandnisthenumberofresinthemachine;and t +dn=netheestimatedii,wheretisthecurrentcycle,nisthenumberofoperations is,t+(e)+height(op) theestimatedii,wheretisthecurrentcycle,eisthedependence edgewhichwearewillingtoholdandheight(op)istheheightofopinthemlddg. WesuggestthatanoperationcanbedelayedandremovedfromthecurrentDRSonlyif ()Theoperationdoesnotusethecriticalresources.resisoneofthecriticalresourcesif ()ThelengthsoftheresultingdependencepathsarenotgreaterthantheestimatedII.That willingtohold. withthegreatestvalueof(!(e)+(e))(e),whereeisthedependenceedgewhichweare beputintothedrs,butonlyoperationcanbedelayedandremovedfromthedrs. NextwediscusshowtodeterminetheschedulingprioritiesfortheoperationsoftheDRS. Whentherearemorethanoneoperationwhichcanbedelayed,werstconsidertheoperation FortheexampleofFigure.(),attherstcycle,alloperationsareschedulableandcan schedulingprioritiesasfollows: heightinthemlddg.ifopiandopjarenotresource-conict,thentheyshouldbescheduled att.ifopiandopjareresource-conict,thenweusethesecondheuristictodeterminetheir MLDDGastherstheuristic.Thesecondheuristicissensitivetotheregisterpressureandis derivedfromtherrg. Inordertoobtaintheoptimaltimeeciency,weconsidertheheightofoperationinthe alledgesadjacenttoopi.letrn(opi)=t,were-computethenewvalueof!ofeachedgein ()Ifanoperationisscheduledatt,thenanothershouldbescheduledafterthetthcycle; ()Supposeopiisscheduledatt,letDES(opi)bethedependenceedgesetwhichincluded Atthecurrentcyclet,supposeopiandopjaretheoperationswiththegreatestvalueof ingpriority. DES(opi),denotedas!new.Thus,wecancomputetheregister benefitofopi, (4)Theoperationwithgreatervalueof(theregister-benet)istheonewithhigherschedul- ()Bythesamemethodasstep(),wecompute(opj;t); (opi;t)=x 8eDES(opi)(!(e)!new(e))(e); ofsimultaneouslylivevariablesisgreaterthanthenumberofavailablemachineregisters.the criticalresources. Spillingdecisionareconventionallymadeonlywhenaregisterconictoccurs,thatis,thenumber 4.RegisterSpilling TheestimatedIIcanbederivedfromthecriticalcycleoftheLDDGandthenumberofoperationsusingthe 6

7 pipeliningoverlapstheexecutionoftheoperationsfromdierentiterations,increasingregister pressureandgeneratingexcessivespillcodeinthecaseofsmallmachineregisterles. thattheregistercanbere-usedtokeeptheresultofanewcomputationatthecostofincreasing thenumberofload/storeoperationsandprobablydegradingthecodeperformance.software eectofspillingiskeepingtheresultofacomputationinmemoryratherthaninaregistersuch softwarepipelining?()howtodoaspilling? feasible. isthatspillingdecisionshouldbemadeduringsoftwarepipeliningsuchthattheinteractions thechangeontheregisterrequirementduringsoftwarepipeliningandmakeourstarting-point betweenregisterallocationandloopschedulingcanbeseen.therrgcandynamicallyreect Twoproblemstobediscussedareasfollows:()Whenisaspillingdecisionmadeduring Thissectiondiscussesregisterspillingproblemforsoftwarepipelining.Ourstarting-point thevariable)butmayhavemorethanoneuse(theoperationusingthevariable).werstwant tomakearemark:themeaningofspillinginthecontextofthispaperissomethingdierent fromtheconventionalspillingproblem[4].wesayspillinga(agroupof)use(s)butdonot operationafterthedenitionandaloadoperationbeforethespilleduseareinserted,andother usesstillreferencethevalueofthevariableinaregister. sayspillingavariable(thatis,spillingallitsuses).byspillingause,wemeanthatastore Intheloopbody,wesupposethat,avariableonlyhasadenition(theoperationdening dependenceedgesintothemlddgcanalsodecreasetheregisterrequirement. registers.infact,othermeasureslikedelayingsomeoperationstoscheduleandintroducingsome isneededonlyifthenumberofrequiredregistersisgreaterthanthenumberofavailablemachine FromtheRRG,wecandynamicallyestimatetheregisterrequirementateachcycle.Spilling orothermeasurestodecreasetheregisterrequirement(seenextsection). ()ModifytheMLDDGandtheRRGbyaddingthenecessaryload/storeoperationsandrecomputingthevalueofcorresponding!and. registerstoreachtheestimatedii,werstincreasetheestimatediiandthenconsiderspilling doesnotincreasetheestimatedii.inthecaseofthattherearenotenoughavailablemachine Thespillingprocessconsistsoftwosteps:()Selecta(agroupof)use(s)forspilling; Anothernecessaryconditionforspillingisthattheload/storeoperationscausedbyspilling variableu,undertheassumptionofthatuse(op;u)hasbeenspilled,were-computetheminimal load/storeoperation.moreprecisely,givenause,use(op;u),whereopistheoperationusing registerrequirementofvariableuandthenewintroducedvariable,denotedask0u.thus,the spilling-benetofuse(op;u)isd(ku K0u)=easastoreandaloadareinsertedtotheMLDDG andtherrgforspillingause. Thespilling benefitofauseisdenedasthenumberofsavedregistersperinserted bodycontainstwomultiplications.thesoftwarepipelinedloopbodycanbefoundunderthe twocases:()schedulingwithoutspilling;()schedulingwithspilling. Obviously,ausewithgreatervalueofspilling-benetistheonewithhigherspillingpriority. rn(4)=.itiseasytocomputethenumberofrequiredregisterswhichis. constraintsofthemlddg(showninfigure.())andtheinitialrrg(showninfigure.()).bydelayingoperation,weobtainrn()=rn()=rn(5)=rn(6)=andrn()= WetaketheloopshowninFigure.asanexampletoillustratetheaboveideas.Wediscuss Forthesecondcase,theestimatedIIisalso.Aftercomputingthespilling-benetsofall Fortherstcase,theestimatedIIissincethemachinehasonemultiplierbuttheloop 7

8 uses,wendthatup(op6;t0)hasthegreatestvalueofspilling-benetwhichisd( 7 )=e=, soup(op6;t0)hasthehighestspillingpriority.afterspillingup(op6;t0),themodiedmlddg andthemodiedinitialrrgareshowninfigure5..bydelayingoperation,weobtain rn()=rn()=rn(5)=rn(6)=rn(s)=andrn()=rn(4)=rn(l)=.itiseasyto computethenumberofrequiredregisterswhichis. degradationoftheoptimalsoftwarepipeliningperformanceifthespillingdecisioncanbeecientlycontrolled. 5Algorithms Onthebasisofthelastthreesections,wepresentthreesoftwarepipeliningalgorithms.The rsttwoaresoftwarepipeliningtominimizetheregisterrequirementandthethirdissoftware pipeliningwithalimitednumberofregisters. Animportantobservationisthat,spillingcandecreasetheregisterrequirementwithout OUTPUT:Thesoftwarepipelinedloop; AlgorithmRPS-without-Spilling; INPUT:ThelooptobesoftwarepipelinedanditsLDDG; Thealgorithmisdescribedasfollows: 5.RPSSchedulingwithoutSpilling BEGIN.ConstructtheMLDDG,determinetheestimatedII;.ComputetheheightofeachoperationintheMLDDG;.Findoutalldenition-usepathsofeachvariable,constructtheRRG; 6.DeterminetheschedulingprioritiesofalloperationsintheDRS; 7.Undertheconstraintofresources,selecttheoperationwiththehighestschedulingpriority 5.Findoutthoseoperationswhichcanbedelayedonebyone,removethemfromtheDRS; 4.FindoutallschedulableoperationsandputthemintheDRS; l s () The modified MLDDG () The modified initial RRG Figure 5. Scheduling with Register Spilling l s

9 fromthedrsandplaceitinthecurrentcycle,updatethedsr.thissteprepeatsuntilno operationcanbeplacedinthecurrentcycle; andtherrgandgotostep5; column-numberofeachoperationiscomputedintermsoftherow-numbersandtheii; numbers; 8.Ifalloperationsoftheloophavebeenscheduledthengotostep9;elseupdatetheDRS END; 9.Foreachoperation,letitsrow-numberbeitscycle-number.FromTheorem.,the 5.RPSSchedulingwithSpilling 0.Generatethesoftwarepipelinedloopintermsoftherow-numbersandthecolumn- BEGIN AlgorithmSpill-Checking; whichisdescribedasfollows: checkingstepisinsertedbetweenstep5andstep6.thenewstepcallsaspill-checkingalgorithm ThisalgorithmisdierentfromtheRPS-without-Spillingalgorithminthewaythatanewspill- spilling.inthisstep,ifnousecanbeselectedthenreturn; onthecriticaldenition-usepaths;.undertheconstraintofnotincreasingtheestimatedii,selecta(agroupof)use(s)for.ifthememoryaccessunitisoneofthecriticalresources,thenreturn; END; 4.UpdatetheMLDDG,theRRGandtheDRS;return;.Computethespilling-benetofeachuse,weactuallyonlyconsiderthoseuseswhichare limitednumberofregisters. registerrequirement.inthissectionwepresentanapproachforsoftwarepipeliningwitha Theabovetwoalgorithmstrytoobtaintheoptimalsoftwarepipelinedloopwiththeminimal 5.SoftwarePipeliningwithaLimitedNumberofRegisters sharethesameregistersremainsopenduringsoftwarepipelining. onlyestimatestheregisterrequirementofeachvariable.theproblemofwhichvariablescan isgreaterthanthegivennumberofavailablemachineregistersthenweincreasetheestimated IIsuchthattheregisterrequirementisreduced. Wepresentthefollowingheuristics:LetK0bethegivennumberofavailablemachineregisters;KestbetheestimatednumberofrequiredregistersfromRRG.Anon-negativeintegerN0is introduced.ifkest N0K0thenwecallthealgorithmofRPSschedulingwithspilling;elsewe rstincreasetheestimatedii(maybealsoincreasen0insomecases)tosatisfykest N0K0. registers.ifthenumberisgreaterthank0,thenweincreasetheestimatediiandcalltheal- Aftergettingthesoftwarepipelinedloopbody,wecanpreciselycomputethenumberofrequired However,itisdicultandcomplicatedtopreciselyestimatetheregisterrequirement.RRG Ourideaisthatwerstestimatedtheregisterrequirement,ifthenumberofrequiredregisters 9

10 gorithmofrpswithspillingagain.theprocessrepeatsuntilasoftwarepipelinedloopbodyis obtainedwhoseregisterrequirementisnotgreaterthank0. Theeorttoimplementthealgorithmspresentedinthispaperisunderway.Beforegetting empirically. 6PreliminaryExperimentalResults WehavenotyetanytheoreticalanalysisaboutN0,butwebelievethatN0canbeestimated ourpreliminaryexperimentsaremainlyconductedbyamanualsimulation,wetrytoselect,theotherveexamplesareselectedfromthelivermorebenchmarks,shownintable.as extensiveexperimentaltests,weselectsixexamplestoverifyouralgorithms.exceptforexample somesimpleloopsinarandomway.themachinemodelweuseintheexperimentsisshownin Figure.(). threeschedulingapproaches{desp,rpswithoutspillingandrpswithspilling.although Tablegivestheregisterrequirementsfortheoptimalsoftwarepipeliningperformanceby theinitiationintervals(ii)ofthesoftwarepipelinedloopsare.forexampleand,the forexampleand5.forexampleand5,noimprovementinregisterusecanbeobtainedsince algorithmofrpswithspillingcanfurtherobtainanimprovementoverdespinregisteruseof :%and:%,respectively,withoutdegradationoftheoptimalperformance. DESPfrom7:4%to7:9%inregisterusewithoutdegradationoftheoptimalperformanceexcept column-numbers,thealgorithmofrpswithoutspillingcanstillobtainanimprovementover DESPitselfadoptsthemeasurestoreducetheregisterrequirementwhenitdeterminesthe presentedintableandfigure7..tablegivestheinitiationintervals(ii)obtainedbyour Theresultsofouralgorithmforsoftwarepipeliningwithalimitednumberofregistersare 0. Experimental Examples Example L MII with lcd? Remarks 0 no Figure.() no Kernel 7 yes Kernel 4 8 yes Kernel yes Kernel 6 7 no Kernel note : L = the length of the longest dependence path in the loop body. note : MII = the Minimal II. note : lcd = loop-carried dependence. Table. Register Requirement for Three Scheduling Approaches Example II DESP RPS without Spilling RPS with Spilling note: II = the initiation interval of the software pipelined loop.

11 body(shownintable),representingtheoptimalperformancewhenweonlyexploittheilp and,respectively.therelationsbetweenk0andthespeedupareshowninfigure7..the speedupisdenedasl=ii,wherelisthelengthofthelongestdependencepathintheloop algorithmforthesixexampleswhenthenumberofavailablemachineregisters(k0)is8,6 withintheloopbody.theresultsshowthatouralgorithmcanobtaintheoptimalspeedup whenk0=(theminimalsizeofregisterleinthecurrentilpprocessors)andanaverage acrossiterationsforloopsevenforasmallregisterle(k0=8). speedupof.4whenk0=8,indicatingthatouralgorithmcanstillecientlyexploittheilp Table. Software Pipelining with a Limited Number of Registers (The Initiation Interval of the Software Pipelined Loop) Example The number of available machine registers: speedup Ko = 8 6 forsoftwarepipeliningisstudied.wealsopresentthreealgorithms{rpswithoutspilling,rps theregisterrequirementduringsoftwarepipelining.onthebasisoftherrg,aregister- Pressure-Sensitive(RPS)schedulingtechniqueisdevelopedandtheproblemofregisterspilling 7Conclusion ThispaperpresentstheRegisterRequirementGraph(RRG)whichcandynamicallyreect Ko = 6 withspillingandthesoftwarepipeliningwithalimitednumberofregisters.thepreliminary 4 experimentalresultsindicatethatthersttwoalgorithmscanecientlyimprovetheregister Ko = 8 example ILPacrossiterationsforloopsevenforthosemachineswithasmallregisterle. usewithoutdegradationoftheoptimalperformanceandthethirdcaneectivelyexploitthe Figure 7. Software Pipelining a Limited Number Registers experimentaltests. Thethreealgorithmsarebeingimplementedonourcompilertestbed.Weexpectextensive

12 References []J.A.Fisher,D.Landskov,andB.D.Shriver.Microcodecompaction:Lookingbackward [4]B.R.RauandC.D.Glaeser.Someschedulingtechniquesandaneasilyschedulablehorizon- []B.R.RauandJ.A.Fisher.Instruction-levelparallelprocessing:History,overviewand []F.Gasperoni.Compilationtechniquesforvliwarchitectures.TechnicalReportTR45,New YorkUniversity,March989. perspective.thejournalofsupercomputing,7(),january99. andlookingforward.inproceedingsof98nationalcomputerconference, [5]A.AikenandA.Nicolau.Perfectpipelining:Anewloopparallelizationtechnique.In talarchitectureforhighperformancescienticcomputing.inproceedingsofthe4thin- ternationalsymposiumonmicroprogrammingandmicroarchitectures(micro-4),pages 8{98,October98. [6]P.Y.T.Hsu.HighlyConcurrentScalarProcessing.PhDthesis,UniversityofIllinois, [7]K.Ebcioglu.Acompilationtechniqueforsoftwarepipeliningofloopswithconditional proceedingsofeuropeansymposiumonprogramming,lecturenotesincomputerscience, No.00,pages{5.Spring-Verlag,June988. Urbana-Champaign,986. [9]BogongSuandJianWang.Loop-carrieddependenceandthegeneralURPRsoftware [8]B.Su,S.Ding,andJ.Xia.Urpr-anextensionofurcrforsoftwarepipelining.Inproceedingsofthe9thInternationalSymposiumonMicroprogrammingandMicroarchitectures Microarchitectures(MICRO-0),pages69{79,987. jumps.inproceedingsofthe0thinternationalsymposiumonmicroprogrammingand (MICRO-9),pages04{08,986. [0]R.F.Touzeau.Afortrancompilerforthefps-64scienticcompute.InproceedingsofACM []A.E.Charlesworth.Anapproachtoscienticarrayprocessing:Thearchitecturedesignof pipeliningapproach.inproceedingsofthe4thannualhawaiiinternationalconferenceon []M.S.Lam.ASystolicArrayOptimizingCompiler.PhDthesis,CMU,987.CMU-CS-87- SystemSciences,pages66{7.IEEEandACM,January99. []D.G.Bradlee,S.J.Eggers,andR.R.Henry.Integratedregisterallocationandinstruction SIGPLANSymposiumonCompilerConstruction,984. [4]G.J.Chaitin.Registerallocationandspillingviagraphcoloring.InproceedingsofACM theap-0b/fps-64family.computer,pages8{7,september schedulingforriscs.inproceedingsofthe4thinternationalconferenceonasplos,99. [6]S.S.Pinter.Registerallocationwithinstructionscheduling:Anewapproach.Inproceedings [5]L.J.Hendren,G.R.Gao,E.R.Altman,andC.Mukerji.Registerallocationusingcyclic ofacmsigplanpldi,99. intervalgraph:anewapproachtoanoldproblem.technicalreportacapstechnical Memo,McGillUniversity,99. SIGPLANSymp.onCompilerConstruction,98.

13 [9]S.A.Mahlke,W.Y.Chen,P.P.Chang,andW.W.Hwu.Scalarprogramperformanceon [8]J.R.GoodmanandW.Hsu.Codeschedulingandregisterallocationinlargebasicblocks. [7]B.R.Rau,M.Lee,P.P.Tirumalai,andM.S.Schlansker.Registerallocationforsoftware multiple-instruction-issueprocessorswithalimitednumberofregisters.inproceedingsof InproceedingsofInternationalConferenceonSupercomputing,988. the5thhawaiiinternationalconferenceonsystemsciences,january99. pipelinedloops.inproceedingsofpldi,99. [0]C.Eisenbeis,W.Jalby,andA.Lichnewsky.Compile-timeoptimizationofmemoryand []QiNingandGuangR.Gao.Anovelframeworkofregisterallocationforsoftwarepipelining. []WilliamMangione-Smith,S.G.Abraham,andE.S.Davidson.Registerrequirementsof puting,99. pipelinedprocessors.inproceedingsof99acminternationalconferenceonsupercom- Compilers,989. registerusageonthecray-.inproceedingsofthesecondworkshoponlanguagesand []R.Hu.Lifetime-sensitivemoduloscheduling.InproceedingsofACMSIGPLANPLDI, [4]JianWangandChristineEisenbeis.DecomposedSoftwarePipelining:Anewapproachto TechnicalReportACAPSTechnicalMemo4,McGillUniversity,99. IFIP,North-Holland,January99. exploitinstructionlevelparallelismforloopprograms.inmichelcosnard,kemalebcioglu, andjean-lucgaudiot,editors,proceedingsofifipwg0.workingconferenceonarchitecturesandcompilationtechniquesforfineandmediumgrainparallelism,pages{5. pages58{67,june99. [5]JianWangandChristineEisenbeis.DecomposedSoftwarePipelining.ReseachRepport [6]JianWang,ChristineEisenbeis,MartinJourdan,andBogongSu.DecomposedSoftware RR-88,INRIA-Rocquencourt,France,99. Programming,():57{79,994. Pipelining:Anewperspectiveandanewapproach.InternationalJournalofParallel