Size: px
Start display at page:

Download ""

Transcription

1 ImplementingMultiprocessorScheduling EricW.ParsonsandKennethC.Sevcik Disciplines ComputerSystemsResearchInstitute UniversityofToronto icallyarerarelyeverimplementedandevenmorerarelyincorporated criticism,however,isthatproposeddisciplinesthatarestudiedanalyt- considerableamountofanalyticresearchinthisarearecently.afrequent temsistheschedulingofparalleljobs.consequently,therehasbeena Abstract.Animportantissueinmultiprogrammedmultiprocessorsys- intocommercialschedulingsoftware.inthispaper,weseektobridge thisgapbydescribinghowatleastonecommercialschedulingsystem, namelyplatformcomputing'sloadsharingfacility,canbeextendedto oftheexibilityallowedinallocatingprocessors.inevaluatingtheperformanceofthesedisciplines,wendthatpreemptioncansignicantltiprocessorschedulingdisciplines,eachdieringconsiderablyinterms Wethendescribethedesignandimplementationofanumberofmul- ofthetypeofpreemptionthatisassumedtobeavailableandinterms supportawidevarietyofnewschedulingdisciplines. signicantlyaectedbytransientloads. reduceoverallresponsetimes,butthattheperformanceofdisciplines thatmustcommittoallocationswhenajobisrstactivatedcanbe 1Introduction ally-intensivescienticmodelingtoi/o-intensivedatabases,forthepurposeof obtainingcomputationalresults,measuringapplicationperformance,orsimply necessary.usersofthesesystemsrunapplicationsthatrangefromcomputationlation,mechanismstosharesuchsystemsamongusersarebecomingincreasingly Aslarge-scalemultiprocessorsystemsbecomeavailabletoagrowinguserpopu- debuggingnewparallelcodes.whileinthepast,systemsmayhavebeenacquired systemsanimportantproblem. forthebenetoflargeusercommunities,makingtheecientschedulingofthese exclusivelyforusebyasmallnumberofindividuals,theyarenowbeinginstalled run-to-completion(rtc)disciplinesandhaveverylittleexibilityinadjusting evenmorerarelyeverbecomepartofcommercialschedulingsystems.thecommercialschedulingsystemspresentlyavailable,forthemostpart,onlysupporquentcriticismsmadeisthatproposeddisciplinesarerarelyimplementedand Althoughmuchanalyticresearchhasbeendoneinthisarea,oneofthefre-

2 processorallocations.theseconstraintscanleadtobothhighresponsetimesand needforbothpreemptionandmechanismsforadjustingprocessorallocationsof lowsystemutilizations.ontheotherhand,mostresearchresultssupportthe jobsġiventhatanumberofhigh-performancecomputingcentershavebegun portthesecenters,however,mechanismstoextendexistingsystemswithexternal clearthatexistingcommercialschedulingsoftwareisofteninadequate.tosup- todeveloptheirownschedulingsoftware[hen95,lif95,sczl96,wmks96],itis inthistypeofsoftware. withouthavingtore-implementmuchofthebasefunctionalitytypicallyfound ware[sczl96].thisallowsnewschedulingpoliciestobeeasilyimplemented, (customer-provided)policiesarestartingtobecomeavailableincommercialsoft- oftheanalyticresearchandpracticalimplementationsofschedulingdisciplines. Assuch,wedescribetheimplementationofanumberofschedulingdisciplines, Furthermore,wedescribehowdierenttypesofknowledge(e.g.,amountofcomputationalworkorspeedupcharacteristics)canbeincludedinthedesignofthese involvingvarioustypesofjobpreemptionandprocessorallocationexibility. Theprimaryobjectiveofthispaperistohelpbridgethegapbetweensome disciplines.asecondaryobjectiveofourworkistobrieyexaminethebenets preemptionandknowledgemayhaveontheperformanceofparallelscheduling presentmotivationforthetypesofschedulingdisciplinesthatwechosetoimplement.insect.3,wedescribeloadsharingfacility(lsf),thecommercial and5,wedescribeanextensionlibrarywehavedevelopedtofacilitatethedevelopmentofmultiprocessorschedulingdisciplines,followedbythesetofdisciplines wehaveimplemented.finally,wepresentourexperimentalresultsinsect.6and ourconclusionsinsect.7. 2Background Therehavebeenmanyanalyticstudiesdoneonparallel-jobschedulingsinceit softwareschedulingsoftwareonwhichwebasedourimplementation.insects.4 Theremainderofthepaperisorganizedasfollows.Inthenextsection,we threadsofajobmayhavetowaitforsignicantamountsoftimeatsynchronizationpoints.thiscaneitherresultinlargecontext-switchoverheadsorwasted processorcycles.ingeneral,asinglethreadisassociatedwitheachprocessor,an First,theperformanceofasystemcanbesignicantlydegradedifajobis notgivenexclusiveuseoftheprocessorsonwhichitisrunning.otherwise,the observations. wasrstexaminedinthelateeighties.muchofthisworkhasledtothreebasic numberofprocessorsandstillachievegoodperformance[mz94].(inthelatter approachwhichisknownascoordinatedorgangscheduling[ous82,fr92].sometimes,however,itispossibletomultiplexthreadsofthesamejobonareduced case,itisstillassumedthatonlythreadsfromasinglejobaresimultaneously activeonanygivenprocessor.)

3 someexibilityinallocatingprocessorscansignicantlyimproveoverallperformance[gst91,sev94,nss93,rsd+94].inmostsystems,usersspecifyprecisely Second,jobsgenerallymakemoreecientuseoftheprocessingresources givensmallerprocessorsallocations.asaresult,providingtheschedulerwith thenumberofprocessorswhichshouldbeallocatedtoeachjob,apracticethat memory,andamaximum,correspondingtothepointafterwhichnofurther processorsarelikelytobebenecial.insomecases,itmayalsobenecessaryto iesaminimumprocessorallocation,usuallyresultingfromconstraintsdueto specifyadditionalconstraintsontheallocation,suchasbeingapoweroftwo. isknownasrigidscheduling.inadaptiveschedulingdisciplines,theuserspec- Ifavailable,specicknowledgeaboutjobs,suchasamountofworkorspeedup atlightloads,givingthemgoodresponsetimes.astheloadincreases,however, characteristics,canfurtheraidtheschedulerinallocatingprocessorsinexcess ofminimumallocations. allocationsizescanbedecreasedsoastoimprovetheeciencywithwhichthe processorsareutilized,andhenceallowingahigherloadtobesustained(i.e., ahighersustainablethroughput).also,adaptivedisciplinescanbetterutilize Inadaptivedisciplines,jobscanbeallocatedalargenumberofprocessors processorsthanrigidonesbecause,withthelatter,processorsareoftenleftidle duetopackingineciencies,whileadaptivedisciplinescanadjustallocationsto makeuseofallavailableprocessors. veryhighdegreeofvariabilityintheamountofcomputationalwork(alsoknown asservicedemand)[cmv94,fn95,gib96].inotherwords,mostjobshavevery smallservicedemandsbutafewjobscanrunforaverylongtime.run-tocompletion(rtc)disciplinesexhibitveryhighresponsetimesbecauseoncea Thethirdobservationisthatworkloadsfoundinpracticetendtohavea long-runningjobisdispatched,shortjobsmustwaitaconsiderableamountof timebeforeprocessorsbecomeavailable.preemptioncansignicantlyreduce themeanresponsetimesoftheseworkloadsrelativetorun-to-completiondisciplines[ps95]. inincreasingorderofimplementationcomplexity. tobepreempted.inthispaper,weconsiderthreedistincttypesofpreemption, needtobeinvokedfrequentlytobeuseful,sinceonlylong-runningjobseverneed andcomplextosupport.fortunately,resultsindicatethatpreemptiondoesnot Unlikethesequentialcase,preemptionofparalleljobscanbequiteexpensive SimpleInsimplepreemption,ajobmaybepreemptedbutitsthreadsmay MigratableInmigratablepreemption,ajobmaybepreemptedanditsthreads availableonmessage-passingsystems. tosupport(asthreadsneedonlybestopped),andmaybetheonlytype notbemigratedtoanotherprocessor.thistypeofpreemptionistheeasiest migrated.normally,thistypeofpreemptioncanbeeasilysupportedin shared-memorysystems,butensuringthatdataaccessedbyeachthread isalsomigratedappropriatelycanbedicult.inmessage-passingsystems, operating-systemsupportformigrationisnotusuallyprovided,butcheck-

4 eithermpiorpvm[pl96].whenacheckpointisrequested,therun-time providesatransparentcheckpointingfacilityforparallelapplicationsthatuse libraryushesanynetworkcommunicationsandi/oandsavestheimagesof eachprocessinvolvedinthecomputationtodisk;whenthejobisrestarted, pointingcanoftenbeemployedinstead.1forexample,thecondorsystem swapping,exceptthatallkernelresourcesarerelinquished. resumesthecomputationfromthepointatwhichthelastcheckpointwas taken.assuch,usingcheckpointingtopreemptajobissimilarincostto therun-timelibraryre-establishesthenecessarynetworkconnectionsand MalleableInmalleablepreemption,thesizeofajob'sprocessorallocationmay bechangedafterithasbegunexecution,afeaturethatnormallyrequiresexplicitsupportwithintheapplication.2intheprocesscontrolapproach,the applicationmustbedesignedtotoadaptdynamicallytochangesinprocessorallocationwhileitisrunning[tg89,gts91,nvz96].asthistypeof storecheckpointsinaformatthatisindependentofallocatedprocessors, thusallowingthejobtobesubsequentlyrestartedonadierentnumberof Forthesecases,itmightbepossibletomodifytheapplicationsoasto checkpointing,oftenusedbylong-runningjobstotoleratesystemfailures. supportisuncommon,asimplerstrategymaybetorelyonapplication-level previouslystudiedispresentedinfig.1,classiedaccordingtothetypeofpreemptionavailableandtheexibilityinprocessorallocation(i.e.,rigidversus Arepresentativesampleofcoordinatedschedulingdisciplinesthathavebeen processors. gratable,malleable)canbeappliedtoalladaptivedisciplines,butonlysimple tiontheyassumetobeavailable,whichcanincludeservicedemand,speedup adaptive).adaptivedisciplinesarefurthercategorizedbythetypeofinforma- characteristics,andmemoryrequirements.3alltypesofpreemption(simple,mi- andmigratablepreemptionaremeaningfulforrigiddisciplines.thedisciplines proposedinthispaperarehighlightedinitalics.(amorecompleteversionof thistablecanbefoundelsewhere[par97].) IBMSP-2system.ArecentextensiontoLoadLevelerthathasbecomepopular iseasy[lif95,sczl96].thisisarigidrtcschedulerthatusesexecutiontimeinformationprovidedbytheusertooerbothgreaterpredictabilityand bettersystemutilization.whenausersubmitsajob,theschedulerindicates LoadLevelerisacommercialschedulingsystemdesignedprimarilyforthe immediatelyatimebywhichthatjobwillberun;jobsthataresubsequently submittedmayberunbeforethisjobonlyiftheydonotdelaythestartofany 2Malleablepreemptionisoftentermeddynamicpartitioningintheliterature,butwe 1Althoughthecostsofthisapproachmayappeartobelarge,wehavefoundthat 3Somerigidschedulersdouseservice-demandinformationifavailable,butthisdistinctionisnotshowninthistable. nditmoreconvenienttotreatitasatypeofpreemption. onthroughput,evenwithlargecheckpointingoverheads. signicantreductionsinmeanresponsetimescanbeachievedwithminimalimpact

5 \LSF-";fortheadaptiveones,aregularanda\SUBSET"versionareprovided. theliterature.disciplinespresentedinthispaperareitalicizedandhavetheprex Table1.Representativesetofdisciplinesthathavebeenproposedandevaluatedin RTC PPJ[RSD+94] RTC[ZM90] Rigid A+,A+&mM[Sev89] ASP[ST93] Adaptive WorkSpeedupMem. yesmin/maxno EASY[Lif95] LoadLeveler NQS LSF Equal,IP[RSD+94] SDF[CMV94] PWS[GST91] yes pws no LSF-RTC LSF-RTC-AD(SUBSET)eithereithereither AVG[CMV94] AVG,Adapt- no avg no Preemption simple LSF-PREEMPT (matrix)[ous82] Cosched migratable(other)[ous82] Cosched Round-Robin[ZM90] LSF-PREEMPT- AD(SUBSET) eithereithereither RRJob[MVZ93] LSF-MIG LSF-MIG-AD(SUBSET)eithereithereither FB-ASP,FB-PWS no pws no malleable (notapplicable) BUDDY,EPOCH[MZ95]no Partition[TG89,MVZ93]no FOLD,EQUI[MZ94] Equi/Dynamic W&E[BG96] yes MPA[PS96b,PS96a] no AD(SUBSET) LSF-MALL- eithereithereither yes yes no

6 enoughprocessorsforsucienttime). previously-scheduledjob'sexecution(i.e.,agapexistsintheschedulecontaining (LSF).BybuildingontopofLSF,wefoundthatwecouldmakedirectuseof tensionstoanothercommercialschedulingsystem,calledloadsharingfacility LSFformanyaspectsofjobmanagement,includingtheuserinterfacesforsubmittingandmonitoringjobs,aswellasthelow-levelmechanismsforstarting, Thedisciplinesthatwepresentinthispaperhavebeenimplementedasex- stopping,andresumingjobs.lsfrunsonalargenumberofplatforms,including thesp-2,sgichallenge,sgiorigin,andhpexemplar,makingitanattractive vehicleforthistypeofschedulingresearch.ourworkisbasedonlsfversion 2.2a. nowbecomingpopularforparalleljobschedulingonmultiprocessorsystems.of 3LoadSharingFacility Althoughoriginallydesignedforloadbalancinginworkstationclusters,LSFis greatestrelevancetothisworkisthebatchsubsystem. tocongurequeuessothathigher-priorityjobspreemptlowerpriorityones(a Eachqueueisassociatedwithasetofprocessors,apriority,andmanyother thehighest-prioritynon-emptyqueueandrununtilcompletion,butitispossible parametersnotdescribedhere.bydefault,jobsareselectedinfcfsorderfrom Queuesprovidethebasisformuchofthecontrolovertheschedulingofjobs. ofajobisdenedbythequeuetowhichthejobhasbeensubmitted. featurethatiscurrentlyavailableonlyforthesequential-jobcase).thepriority executedbeforelongones.moreover,lsfcanbeconguredtopreemptlower higherprioritythanlongerjobs(seefig.1).anadministratorcoulddeneseveralqueues,eachinturncorrespondingtoincreasingservicedemandandhaving Toillustratetheuseofqueues,considerapolicywhereshorterjobshave decreasingpriority.ifjobsaresubmittedtothecorrectqueue,shortjobswillbe priorityjobsifhigherpriorityonesarrive,givingshortjobsstillbetterresponsiveness.topermitenforcementofthepolicy,lsfcanbeconguredtoterminate anyjobthatexceedstheexecution-timethresholddenedforthequeue. quired.whenlsfndsasucientnumberofprocessorssatisfyingtheresource Aspartofsubmittingajob,ausercanspecifythenumberofprocessorsre- processors,passingtothisprocessalistofprocessors.themasterprocesscan constraintsforthejob,itspawnsanapplication\master"processononeofthe ThecurrentversionofLSFprovidesonlylimitedsupportforparalleljobs. ofthemasterprocess,andassuch,arenotknowntothelsfbatchscheduling theparallelcomputation.theslaveprocessesarecompletelyunderthecontrol thenusethislistofprocessorstospawnanumberof\slave"processestoperform signals,andmanagingterminaloutput. programmingactivities,suchasspawningremoteprocesses,propagatingunix system.lsfdoesprovide,however,alibrarythatsimpliesseveraldistributed

7 Processors Short Jobs Priority=10 Preemptive Run Limit=5 mins Medium Jobs Priority=5 Preemptive/Preemptable Run Limit=60 mins Long jobsinthequeuetopreemptlower-priorityjobs).execution-timelimitsassociatedwith Fig.1.Exampleofapossiblesequential-jobqueuecongurationinLSFtofavourshortrunningjobs.Jobssubmittedtotheshort-jobqueuehavethehighestpriority,followed Priority=0 eachqueueenforcetheintendedpolicy. ingjobsinthequeuetobepreemptedbyhigher-priorityjobs)andpreemptive(allowing bymedium-andlong-jobqueues.thequeuesareconguredtobepreemptable(allow- Preemptable No Run Limit 4SchedulingExtensionLibrary schedulingtobecontrolled.ourschedulingdisciplinesareimplementedwithin extensiveapplication-programmerinterface(api),allowingmanyaspectsofjob LSFtoexperimentwiththenewdisciplines.Forthispurpose,LSFprovidesan requireanylsfsourcecodemodications,asthisallowsanyexistingusersof Theidealapproachtodevelopingnewschedulingdisciplinesisonethatdoesnot aprocessdistinctfromlsf,andarethuscalledschedulingextensions. mustuseasetoflsfroutinestoopenthelsfevent-loggingle,processeach ratherthanschedulingextensions.asaresult,theinterfacesareverylowlevel timeforajob informationcommonlyrequiredbyascheduler theprogrammer andcanbequitecomplextouse.forexample,todeterminetheaccumulatedrun TheLSFAPI,however,isdesignedtoimplementLSF-relatedcommands informationwheneverpossible.clearly,itisdicultforaschedulingextension size,requiringseveralsecondstoprocessinitsentirety,itisnecessarytocache logiteminturn,andcomputethetimebetweeneachpairofsuspend/resume totakecareofsuchdetailsandtoobtaintheinformationeciently. eventsforthejob.sincetheevent-loggingleistypicallyseveralmegabytesin providesimpleandecientaccesstoinformationaboutjobs(e.g.,processors currentlyusedbyajob),aswellastomanipulatethestateofjobsinthesystem Oneofourgoalswasthustodesignaschedulingextensionlibrarythatwould

8 components: JobandSystemInformationCache(JSIC)Thiscomponentservesasa (e.g.,suspendormigrateajob).thisfunctionalityislogicallydividedintotwo LSFInteractionLayer(LIL)Thiscomponentprovidesagenericinterface queues,andjobsforitsownbook-keepingpurposes.4 toalllsf-relatedactivities.inparticular,itupdatesthejsicdatastructuresbyqueryingthelsfbatchsystemandtranslateshigh-levelparallel-jociplinetoassociateauxiliary,discipline-specicinformationwithprocessors, cacheofsystemandjobinformationobtainedfromlsf.italsoallowsadis- Thebasicdesignsofallourschedulingdisciplinesarequitesimilar.Each schedulingoperations(e.g.,suspendjob)intotheappropriatelsf-specic ones. disciplineisassociatedwithadistinctsetoflsfqueues,whichthediscipline queueisdesignatedasthesubmitqueue,andotherqueuesareusedbythe tobescheduledbythecorrespondingschedulingdiscipline.normally,onelsf schedulingdisciplineasafunctionofajob'sstate.forexample,pendingjobs usestomanageitsownsetofjobs.alllsfjobsinthissetofqueuesareassumed maybeplacedinonelsfqueue,stoppedjobsinanother,andrunningjobs neverdispatchesthem,andarunningqueuewouldbeconguredsothatlsf suchactionsbyswitchingjobsfromonelsfqueuetoanother.continuingthe theprocessesofajobdirectly;rather,itimplicitlyrequestslsftoperform sameexample,apendingqueuewouldbeconguredsothatitacceptsjobsbut inathird.aschedulingdisciplineneverexplicitlydispatchesormanipulates immediatelydispatchesanyjobinthisqueueontheprocessorsspeciedforthe simplybyspecifyingtheappropriatelsfqueue,andcantracktheprogressof job.inthisway,ausersubmitsajobtobescheduledbyaparticulardiscipline anystateinformationthatneedstobepersistentcanbeencodedbythequeue inwhicheachjobresides.thisapproachgreatlysimpliesthere-initialization thejobusingallthestandardlsfutilities. queuesanddatastructures,wehavefoundthatthisisrarelynecessarybecause oftheschedulingextensionintheeventthattheextensionfailsatsomepoint, Althoughitispossibleforaschedulingdisciplinetocontaininternaljob animportantpropertyofanyproductionschedulingsystem. system.(forexample,onepartitioncouldbeusedforproductionworkloads overheadsifdierentdisciplinesarebeingusedindierentpartitionsofthe withinthesameextensionprocess,afeaturethatismostusefulinreducing whileanothercouldbeusedtoexperimentwithanewschedulingdiscipline.) Givenourdesign,itispossibleforseveralschedulingdisciplinestocoexist RetrievingsystemandjobinformationfromLSFcanplacesignicantloadon themasterprocessor,5imposingalimitonthenumberofextensionprocesses thatcanberunconcurrently.sinceeachschedulingdisciplineisassociatedwitha 4InfutureversionsofLSF,itwillbepossibleforinformationassociatedwithjobsto 5LSFrunsitsbatchscheduleronasingle,centralizedprocessor. besavedinloglessothatitwillnotbelostintheeventthattheschedulerfails.

9 New Scheduling Disciplines Sched Disc1 Sched Disc2 Sched Disc3... JSIC Scheduling Extension Library Data Objects LIL dierentsetoflsfqueues,thesetofprocessorsassociatedwitheachdiscipline canbedenedbyassigningprocessorstothecorrespondingqueuesusingthe forprocessorinformation.) LSFqueueadministrationtools.(Normally,eachdisciplineusesasinglequeue EASY[Lif95,SCZL96,Gib96,Gib97].OneofthegoalsofGibbons'workwasto uling.hefoundthat,formanyworkloads,historicalinformationcouldprovide determinewhetherhistoricalinformationaboutajobcouldbeexploitedinsched- studyinganumberofrigidschedulingdisciplines,includingtwovariantsof TheextensionlibrarydescribedherehasalsobeenusedbyGibbonsin upto75%ofthebenetsofhavingperfectinformation.forthepurposeof hiswork,gibbonsaddedanadditionalcomponenttotheextensionlibraryto theoriginaleasydisciplinetotakeintoaccountthisknowledgeandshowed gather,store,andanalyzehistoricalinformationaboutjobs.hethenadapted schedulingdisciplinesstudiedbygibbonsaredescribedelsewhere[gib96,gib97]. thehistoricaldatabase)isshowninfig.2.theextensionprocesscontainsthe howperformancecouldbeimproved.thehistoricaldatabaseanddetailsofthe extensionlibraryandeachofthedisciplinesconguredforthesystem.theextensionprocessmainlineessentiallysleepsuntilaschedulingeventoratimeout (correspondingtotheschedulingquantum)occurs.themainlinethenprompts Thehigh-levelorganizationoftheschedulingextensionlibrary(notincluding theliltoupdatethejsicandcallsadesignatedmethodforeachofthecon- gureddisciplines.next,wedescribeeachcomponentoftheextensionlibraryin detail. Poll thesameprocess. tensionlibrarysupportsmultipleschedulingdisciplinesrunningconcurrentlywithin Fig.2.High-leveldesignofschedulingextensionextensionlibrary.Asshown,theex- LSF Batch Subsystem

10 4.1JobandSystemInformationCache designofourschedulingdisciplines: considerationthetypesofoperationsthatwefoundtobemostcriticaltothe thatarepartoftheextension.ourdatastructuresweredesignedtakinginto aboutjobs,queues,andprocessorsthatarerelevanttotheschedulingdisciplines TheJobandSystemInformationCache(JSIC)containsalltheinformation {Aschedulermustbeabletoscansequentiallythroughthejobsassociated asimplemanneranyjob-relatedinformationobtainedfromlsf(e.g.,run withaparticularlsfqueue.foreachjob,itmustthenbeabletoaccessin {Finally,aschedulermustbeabletoassociatebook-keepinginformationwith {ItmustbeabletoscantheprocessorsassociatedwithanyLSFqueueand determinethestateofeachoneofthese(e.g.,availableorunavailable). times,processorsonwhichajobisrunning,lsfjobstate). LSFjobidentiers(jobId),allowingecientlookupofindividualjobs.Also,a Pointerstoinstancesoftheseobjectsarestoredinajobhashtablekeyedby Inourlibrary,informationabouteachactivejobisstoredinaJobInfoobject. eitherjobsorprocessors(e.g.,thesetofjobsrunningonagivenprocessor). listofjobidentiersismaintainedforeachqueue,permittingecientscanning ofjobsinanygivenqueue(intheordersubmittedtolsf). jobswouldalsobesuitableifitisguaranteedthataprocessorisneverassociated objectinstanceexistsforeachjob.forprocessors,ontheotherhand,wefound objectsassociatedwitheachqueue.usingaglobalapproachsimilartothatfor itconvenient(forexperimentalreasons)tohavedistinctprocessorinformation Theinformationassociatedwithajobisglobal,inthatasingleJobInfo processorname.foreach,thestateoftheprocessorandalistofjobsrunning thecaseonoursystem.similartojobs,processorsassociatedwithaqueuecan ontheprocessorcanbeobtained. withmorethanonedisciplinewithinanextension,butthiswasnotnecessarily bescannedsequentially,orcanbeaccessedthroughahashtablekeyedonthe LSFonlysupportsapollinginterface,however,theLILmust,foreachupdate ThemostsignicantfunctionoftheLSFinteractionlayeristoupdatetheJSIC datastructurestoreectthecurrentstateofthesystemwhenprompted.since 4.2LSFInteractionLayer(LIL) inthejsic.aspartofthisupdate,thejsicmustalsoprocessaneventlogging request,fetchalldatafromlsfandcompareittothatwhichiscurrentlystored representsalargefractionofthetotalextensionlibrarycode.(theextension le,sincecertaintypesofinformation(e.g.,totaltimespending,suspended, andrunning)arenotprovideddirectlybylsf.assuch,thejsicupdatecode libraryisapproximately1.5kloc.) ToupdatetheJSIC,theLILperformsthefollowingthreeactions:

11 {ItobtainsthelistofallactivejobsinthesystemfromLSF.Eachjobrecord {Itopenstheevent-loggingle,readsanyneweventsthathaveoccurredsince informationabouteachjobisrecordedinthejsic. asthejobstatus(e.g.,running,stopped),processorset,andqueue.allthis starttime,resourcerequirements,aswellassomedynamicinformation,such returnedbylsfcontainssomestaticinformation,suchasthesubmittime, {ItobtainsthelistofprocessorsassociatedwitheachqueueandqueriesLSF thelastupdate,andre-computesthependingtime,aggregateprocessorrun times)arecomputed. time,andwall-clockruntimeforeachjob.aswell,aggregateprocessorand wall-clockruntimessincethejobwaslastresumed(termedresidualrun ourextensions,wedonotusethedefaultsetofresourcestoavoidhavinglsf licenses,orswapspace,requiredbythejobcanbespecieduponsubmission.in LSFprovidesamechanismbywhichtheresources,suchasphysicalmemory, forthestatusofeachoftheseprocessors. makeanyschedulingdecisions,butratheraddanewsetofpseudo-resourcesthat thejobinfostructure. extension.aspartoftherstactionperformedbythelilupdateroutine,this informationisextractedfromthepseudo-resourcespecicationsandstoredin maximumprocessorallocationsorservicedemand,directlytothescheduling areusedtopassparametersorinformationaboutajob,suchasminimumand levelschedulingoperationsintolow-levellsfcalls. TheremainingLILfunctions,illustratedinTable2,basicallytranslatehigh- setprocessorsthisoperationdenesthelistofprocessorstobeallocatedtoa Table2.High-levelschedulingfunctionsprovidedbyLSFInteractionLayer. Operation switch Thisoperationmovesajobfromonequeuetoanother. job.lsfdispatchesthejobbycreatingamasterprocessonthe rstprocessorinthelist;asdescribedbefore,themasterprocess Description suspend resume resources(e.g.,physicalmemory). virtualresourcestheypossess,butnormallyreleaseanyphysical Thisoperationsuspendsajob.Theprocessesofthejobholdonto Thisoperationresumesajobthathaspreviouslybeensuspended. usesthelisttospawnitsslaveprocesses. migrate Thisoperationinitiatesthemigrationprocedureforajob.Itdoes notactuallymigratethejob,butratherplacesthejobinapendingstate,allowingittobesubsequentlyrestartedonadierent setofprocessors. PreemptionConsiderationsTheLSFinteractionlayermakescertainassumptionsaboutthewayinwhichjobscanbepreempted.Forsimplepreemption,ajobcanbesuspendedbysendingitaSIGTSTPsignal,whichisdelivered

12 tothemasterprocess;thisprocessmustthenpropagatethesignaltoitsslaves (whichisautomatedinthedistributedprogramminglibraryprovidedbylsf) toensurethatallprocessesbelongingtothejobarestopped.similarly,ajob beinthisstate(assumingdiskspaceforcheckpointingisabundant). canberesumedbysendingitasigcontsignal. emptedjobsdonotoccupyanykernelresources,allowinganynumberofjobsto plementedviaacheckpointingfacility,asdescribedinsect.2.asaresult,pre- Toidentifymigratablejobs,wesetanLSFaginthesubmissionrequest Incontrast,weassumethatmigratableandmalleablepreemptionareim- checkpointsignal(inourcase,thesigusr2signal),andthensendlsfamigrate indicatingthatthejobisre-runnable.tomigratesuchajob,werstsendita setprocessorsinterface).inmostcases,however,weswitchsuchajobtoa queuethathasbeenconguredtonotdispatchjobspriortosubmittingthe requestforthejob.thiswouldnormallycauselsftoterminatethejob(with pendingjob. migrationrequest,causingthejobtobesimplyterminatedandrequeuedasa asigtermsignal)andrestartitonthesetofprocessorsspecied(usingthe case,anynumberofprocessorscanbespecied. samenumberofprocessorsasintheinitialallocation,whileinthemalleable Inthemigratablecase,theschedulingdisciplinealwaysrestartsajobusingthe ticaltothatformigratingajob,theonlydierencebeingthewayitisused. Theinterfaceforchangingtheprocessorallocationofamalleablejobisiden- 4.3ASimpleExample workloads,thisapproachwillgreatlyimproveresponsetimeswithoutrequiring Toillustratehowtheextensionlibrarycanbeusedtoimplementadiscipline, variabilityinservicedemands,asistypicallythecaseevenforbatchsequential considerasequential-job,multi-levelfeedbackdisciplinethatdegradesthepriorityofjobsastheyacquireprocessingtime.iftheworkloadhasahighdegreeof usethesamequeuecongurationasshowninfig.1;weeliminatetherun-time userstospecifytheservicedemandsofjobsinadvance.forthisdiscipline,wecan infig.1);whenthejobhasacquiredacertainamountofprocessingtime,the higher-priorityqueuestolower-priorityonesastheyacquireprocessingtime. limits,however,astheschedulingdisciplinewillautomaticallymovejobsfrom reliesonthelsfbatchsystemtodispatch,suspend,andresumejobsasa schedulingextensionswitchesthejobtothemedium-priorityqueue,andafter somemoreprocessingtime,tothelow-priorityqueue.inthisway,theextension Usersinitiallysubmittheirjobstothehigh-priorityqueue(labeledshortjobs byexaminingthejobsineachofthethreequeues. 5Parallel-JobSchedulingDisciplines functionofthejobsineachqueue.userscantracktheprogressofjobssimply implementedaslsfextensions.importanttothedesignofthesedisciplinesare Wenowturnourattentiontotheparallel-jobschedulingdisciplinesthatwehave

13 thecostsassociatedwithusinglsfonourplatform.itcantakeuptothirty theloadonthemaster(scheduling)processortoanacceptablelevel. secondstodispatchajobonceitisreadytorun.migratableormalleablepreemptiontypicallyrequiresmorethanaminutetoreleasetheprocessorsassociated withajob;theseprocessorsareconsideredtobeunavailableduringthistime. Finally,schedulingdecisionsaremadeatmostonceeveryvesecondstokeep (i.e.,open)butpreventinganyofthesejobsfrombeingdispatchedautomaticallybylsf(i.e.,inactive).asecondqueue,calledtherunqueue,isusedburation.apendingqueueisdenedandconguredtoallowjobstobesubmitted theschedulertostartjobs.thisqueueisopen,active,andpossessesabsolutely Thedisciplinesdescribedinthissectionallshareacommonjobqueuecong- jobsinthisqueue.finally,athirdqueue,calledthestoppedqueue,isdenedto noloadconstraints.aschedulingextensionusesthisqueuebyrstspecifying assistinmigratingjobs.ittooisconguredtobeopenbutinactive.whenlsf jobtothisqueue;giventhequeueconguration,lsfimmediatelydispatches theprocessorsassociatedwithajob(i.e.,setprocessors)andthenmovingthe ispromptedtomigrateajobinthisqueue,itterminatesandrequeuesthejob, preservingitsjobidentier.inallourdisciplines,preemptedjobsareleftinthis processorallocation,thedesiredvaluelyingbetweentheminimumandmaximum.rigiddisciplinesusethedesiredvaluewhileadaptivedisciplinesarefree tochooseanyallocationbetweentheminimumandthemaximumvalues. Eachjobinoursystemisassociatedwithaminimum,desired,andmaximum queuetodistinguishthemfromjobsthathavenothadachancetorunyet(in thependingqueue). teristicsarespeciedintermsofthefractionofworkthatissequential.basically, service-demandinformationisusedtorunjobshavingtheleastremainingprocessingtime(tominimizemeanresponsetimes)andspeedupinformationisused oftheamountofcomputationrequiredonasingleprocessorandspeedupcharac- Ifprovidedtothescheduler,servicedemandinformationisspeciedinterms tofavourecientjobsinallocatingprocessors.sincejobscanvaryconsiderably 5.1Run-to-CompletionDisciplines intermsoftheirspeedupcharacteristics,computingtheremainingprocessing Next,wedescribetherun-to-completiondisciplines.Allthreevariantslistedin timewillonlybeaccurateifspeedupinformationisavailable. tension.thelsf-rtcdisciplineisdenedasfollows: similarand,assuch,areimplementedinasinglemoduleoftheschedulingex- Table1(i.e.,LSF-RTC,LSF-RTC-AD,andLSF-RTC-ADSUBSET)arequite LSF-RTCWheneverajobarrivesordeparts,theschedulerrepeatedlyscans TheLSFsystem,andhencetheJSIC,maintainsjobsinorderofarrival,so queue. available.itassignsprocessorstothejobandswitchesthejobtotherun thependingqueueuntilitndstherstjobforwhichenoughprocessorsare thedefaultrtcdisciplineisfcfs(skippinganyjobsattheheadofthequeue

14 providedtothescheduler,thenjobsarescannedinorderofincreasingservice forwhichnotenoughprocessorsareavailable).ifservice-demandinformationis bysetiaetal.[st93],exceptthatjobsareselectedforexecutiondierently skipping). demand,resultinginashortestprocessingtime(spt)discipline(againwith jobs(andhencecannotbecalledasp). becausethelsf-baseddisciplinestakeintoaccountmemoryrequirementsof TheLSF-RTC-ADdisciplineisverysimilartotheASPdisciplineproposed LSF-RTC-ADWheneverajobarrivesordeparts,theschedulerscansthe schedulerthenassignsprocessorstotheselectedjobsandswitchesthesejobs t,leftoverprocessorsareusedtoequalizeprocessorallocationsamongselectedjobs(i.e.,givingprocessorstojobshavingthesmallestallocation).the tosatisfythejob'sminimumprocessorrequirements.whennomorejobs pendingqueue,selectingtherstjobforwhichenoughprocessorsremain cessor,inturn,tothejobwhoseeciencywillbehighestafterthealloca- tion.thisapproachminimizesboththeprocessorandmemoryoccupancyin Ifspeedupinformationisavailable,theschedulerallocateseachleftoverpro- totherunqueue. throughput[ps96a]. adistributed-memoryenvironment,leadingtothehighestpossiblesustainable processors).sinceweassumethatajobutilizesprocessorsmoreecientlyas jobsinexcesstoeachofthejob'sminimumprocessorallocation(termedsurplus Thebasicprincipleistotrytominimizethenumberofprocessorsallocatedto areutilizedbyapplyinganalgorithmknownasasubset-sumalgorithm[mt90]. TheSUBSETvariantseekstoimprovetheeciencywithwhichprocessors principleallowsthesystemtorunatahigheroveralleciency. itsallocationsizedecreases(downtotheminimumallocationsize),thenthis LSF-RTC-ADSUBSETLetLbethenumberofjobsinthesystemandNbe thenumberofjobsselectedbytherst-talgorithmusedinlsf-rtc-ad. ThescheduleronlycommitstorunningtherstN0ofthesejobs,where (isatunableparameterthatdetermineshowaggressivelythescheduler N0=Nmax(1 L N;0) seekstominimizesurplusprocessorsastheloadincreases;forourexperiments,wechose=5.)usinganyleftoverprocessorsandleftoverjobs, theschedulerappliesthesubset-sumalgorithmtoselectthesetofjobsthat minimizesthenumberofsurplusprocessors.thejobschosenbythesubsetsumalgorithmareaddedtothelistofjobsselectedtorun,andanysurplus suspendedbuttheirprocessesmaynotbemigrated.sincetheresourcesusedby SimplePreemptiveDisciplinesInsimplepreemptivedisciplines,jobsmaybe processorsareallocatedasinlsf-rtc-ad.

15 ensuringthatnomorethanacertainnumberofprocesseseverexistonanygiven carefultonotover-commitsystemresources.inourdisciplines,thisisachievedby jobsarenotreleasedwhentheyareinapreemptedstate,however,onemustbe processor.inamoresophisticatedimplementation,wemightinsteadensurethat theswapspaceassociatedwitheachprocessorwouldneverbeovercommitted. wefoundthisapproachtobeproblematic.consideralong-runningjob,either discipline,weallowajobtopreemptanotheronlyifitpossessesthesamedesired mightoccurifjobswerenotalignedinthisway.6intheadaptivediscipline, processorallocation.thisistominimizethepossibilityofpackinglossesthat Thetwovariantsofthepreemptivedisciplinesarequitedierent.Intherigid packinglosseswiththeadaptive,simplepreemptivediscipline. onewouldbeconguredforalargeallocationsize,causingthem,andhence theentiresystem,torunineciently.asaresult,wedonotattempttoreduce thatisdispatchedbythescheduler.anysubsequentjobspreemptingthisrst arrivingduringanidleperiodorhavingalargeminimumprocessorrequirement, LSF-PREEMPTWheneverajobarrivesordepartsorwhenaquantumexpires,theschedulerre-evaluatestheselectionofjobscurrentlyrunning. apendingorstoppedjob,accordingtothefollowingcriteria: Then,theschedulerdeterminesifanyrunningjobshouldbepreemptedby AvailableprocessorsarerstallocatedinthesamewayasinLSF-RTC. 2.Ifnoservice-demandinformationisavailable,theaggregatecumulative 1.Astoppedjobcanonlypreemptajobrunningonthesamesetofprocessorsasthoseforwhichitiscongured.Apendingjobcanpreempt anyrunningjobthathasasamedesiredprocessorallocationvalue. fractionlessthanthatoftherunningjob(inourcase,weusethevalue of10%). otherwise,theservicedemandofthepreemptingjobmustbea(dierent) lessthanthatoftherunningjob(inourcase,weusethevalueof50%); processortimeofthependingorstoppedjobmustbesomefraction 3.Therunningjobmusthavebeenrunningforatleastacertainspecied 4.Thenumberofprocessespresentonanyprocessorcannotexceedaprespeciednumber(inourcase,veprocesses). amountoftime(oneminuteinourcase,sincesuspensionandresumption onlyconsistofsendingaunixsignaltoallprocessesofthejob). leastacquiredaggregateprocessingtimeischosenrstifnoservice-demand knowledgeisavailable,ortheonewiththeshortestremainingservicedemand Ifseveraljobscanpreemptagivenrunningjob,theonewhichhasthe ulingjobs,whereeachrowofthematrixrepresentsadierentsetofjobstorun Ouradaptive,simplepreemptivedisciplineusesamatrixapproachtosched- ifservice-demandknowledgeisavailable. 6Packinglossesoccurwhenprocessorsareleftidle,eitherbecausethereareaninsuf- onlysomeoftheprocessorsrequiredbystoppedjobsareavailable. cientnumbertomeettheminimumprocessorrequirementsofpendingjobsorif

16 cipline,anincomingjobisplacedintherstrowofthematrixthathasenough andthecolumnstheprocessorsinthesystem.inousterhout'sco-schedulingdis- LSF-PREEMPT-ADWhenevertheschedulerisawakened(dueeithertoan ourapproach,weuseamoredynamicapproach. freeprocessorsforthejob;ifnosuchrowexists,thenanewoneiscreated.in arrivalordepartureortoaquantumexpiry),thesetofjobscurrentlyrunningorstopped(i.e.,preempted)isorganizedintothematrixjustdescribed, usingtherstrowforthosejobsthatarerunning.eachrowisthenexaminedinturn.foreach,theschedulerpopulatestheuncommittedprocessors jobareuncommittedintherowcurrentlybeingexamined.)thescheduler apendingjob;thesejobscanswitchrowsifallprocessorsbeingusedbythe withthebestpending,stopped,orrunningjobs.(ifservice-demandinformationisavailable,currently-stoppedorrunningjobsmaybepreferableto run.ifsuchjobscannotbeaccommodatedintherowbeingexamined,then theschedulerskipstothenextrow. lessthantheminimumtimesincelastbeingstartedorresumed,continueto Oncethesetofjobsthatmightberunineachrowhasbeendetermined,the alsoensuresthatjobsthatarecurrentlyrunning,butwhichhaverunfor shortestremainingservicedemand.processorsintheselectedrowavailable ingtimeor,ifservice-demandinformationisavailable,thejobhavingthe forpendingjobsaredistributedasbefore(i.e.,equi-allocationifnospeedup knowledgeisavailable,orfavouringecientjobsifitis). schedulerchoosestherowthathasthejobhavingtheleastacquiredprocessferencebetweenthetwotypesisthat,inthemigratablecase,jobsarealways MigratableandMalleablePreemptiveDisciplinesIncontrasttothesimplepreemptivedisciplines,themigratableandmalleableonesassumethatajob processors. resumedwiththesamenumberofprocessorsallocatedwhenthejobrststarted, canbecheckpointedandrestartedatalaterpointintime.theprimarydif- LSF-MIGWheneverajobarrivesordepartsorwhenaquantumexpires, whereasinthemalleablecase,ajobcanberestartedwithadierentnumberof amountoftime(inourcase,tenminutes,sincemigrationandprocessor recongurationarerelativelyexpensive)areallowedtocontinuerunning. currently-runningjobswhichhavenotrunforatleastacertaincongurable Processorsnotusedbythesejobsareconsideredtobeavailableforreassignment.Theschedulerthenusesarst-talgorithmtoselectthejobs theschedulerre-evaluatestheselectionofjobscurrentlyrunning.first, Asbefore,ifservice-demandinformationisavailable,jobsareselectedin fromthoseremainingtorunnext,usingajob'sdesiredprocessorallocation. LSF-MIG-ADandLSF-MALL-ADApartfromtheiradaptiveness,these orderofleastremainingservicedemand. twodisciplinesareverysimilartothelsf-migdiscipline.inthemalleable

17 version,theschedulerusesthesamerst-talgorithmasinlsf-migto usesthesizeofajob'scurrentprocessorallocationinsteadofitsminimumif favouringecientjobsotherwise.inthemigratableversion,thescheduler usinganequi-allocationapproachifnospeedupinformationisavailable,and todetermineifajobts.anyleftoverprocessorsarethenallocatedasbefore, selectjobs,exceptthatitalwaysusesajob'sminimumprocessorallocation anddoesnotchangethesizeofsuchajob'sprocessorallocationifselected thejobhasalreadyrun(i.e.,hasbeenpreempted)intherst-talgorithm, ciplineshavealsobeenimplemented. Similartotherun-to-completioncase,SUBSET-variantsoftheadaptivedis- torun. qualitativeinnature.therearetworeasonsforthis.first,experimentsmustbe Theevaluationofthedisciplinesdescribedintheprevioussectionisprimarily 6PerformanceResults performedinrealtimeratherthaninsimulatedtime,requiringaconsiderable amountoftimetoexecutearelativelysmallnumberofjobs.moreover,failures thatcan(anddo)occurduringtheexperimentscansignicantlyinuencethe intendourimplementationstodemonstratethepracticalityofadisciplineand results,althoughsuchfailurescanbetoleratedbythedisciplines.second,we toobserveitsperformanceinarealcontext,ratherthantoanalyzeitsperformanceunderawidevarietyofconditions(forwhichasimulationwouldbemore suitable). tions(now),consistingofsixteenibm43p(133mhz,powerpc604)systems, connectedbythreeindependentnetworks(155mbpsatm,100mbpsethernet, 10MbpsEthernet). Theexperimentalplatformfortheimplementationisanetworkofworksta- realparallelapplication.thisisimportantinthecontextofournetworkof sources,yetbehaveinotherrespects(e.g.,executiontime,preemption)asa plicationdesignedtorepresentrealapplications.thebasicreasonforusinga syntheticapplicationisthatitcouldbedesignedtonotuseanyprocessingre- Toexercisetheschedulingsoftware,weuseaparameterizablesyntheticap- teststobeinconclusiveifjobswererunatlowpriority. thesystemfrombeingusedbyothersduringthetests,orwouldhavecausedthe workstations,becausethesystemisbeingactivelyusedbyanumberofother researchers.usingreal(compute-intensive)applicationswouldhaveprevented everrunningonagivenprocessorandthatallprocessesassociatedwiththejob arerunningsimultaneously.assuch,thebehaviourofourdisciplines,whenused inconjunctionwithoursyntheticapplication,isidenticaltothatofadedicated systemrunningcompute-intensiveapplications.infact,byassociatingadierent Eachofourschedulingdisciplinesensuresthatonlyasingleoneofitsjobsis setofqueueswitheachdiscipline,eachoneconguredtouseallprocessors,it waspossibletoconductseveralexperimentsconcurrently.(thejobssubmitted toeachsubmitqueueforthedierentdisciplinesweregeneratedindependently.)

18 allocationsusingthestandardmechanismprovidedbylsf.finally,itcanbe modelawiderangeofrealapplications.second,itsupportsadaptiveprocessor checkpointedandrestarted,tomodelbothmigratableandmalleablejobs. easilyparameterizedwithrespecttospeedupandservicedemand,allowingitto Thesyntheticapplicationpossessesthreeimportantfeatures.First,itcanbe formeanresponsetimeandmakespanmeasurements.(themakespanisthemaximumcompletiontimeofanyjobinthesetofjobsunderconsideration,assuming moderately-heavyload.asmallinitialnumberofthesejobs(e.g.,200)aretagged accordingtoapoissonarrivalprocess,usinganarrivalratethatreectsa Anexperimentconsistsofsubmittingasequenceofjobstothescheduler eightprocessorsinreality.thus,allprocessorallocationsaremultiplesofeight, representativeoflargesystems,weassumethateachprocessorcorrespondsto alljobsinthisinitialsethaveleftthesystem.tomaketheexperimentmore andtheminimumallocationiseightprocessors.scalingthenumberofprocessorsinthiswayaectsthesyntheticapplicationindeterminingtheamountof remainingservicedemandforajob. 6.1WorkloadModel timeitshouldexecuteandtheschedulingdisciplinesindeterminingtheexpected thattherstjobarrivesattimezero.)eachexperimentterminatesonlywhen to128processors)[hot96b,hot96a].themostsignicantdierenceisthatthe measurementsmadeoverthepastyearatthecornelltheorycenter(scaled tributionwhosemedianis2985seconds.7theparametersareconsistentwith meanof8000seconds(2.2hours)andcoecientofvariation(cv)of4,adis- Servicedemandsforjobsaredrawnfromahyper-exponentialdistribution,with andmalleablepreemptioncases,weonlypreemptajobifithasrunatleast10 minutes,sincepreemptionrequiresatleastoneminute.)alldisciplinesreceived exactlythesamesequenceofjobsinanyparticularexperiment,andingeneral, resultsasitonlymagniesschedulingoverheads.(recallthatinthemigratable meanisaboutaquarterofthatactuallyobserved,whichshouldnotundulyaect individualexperimentsrequiredanywherefrom24to48hourstocomplete. betweentheminimumandthemaximumprocessorallocationsforthejob. processors,andmaximumsizesaresetatsixteen.8thisdistributionissimilarto allocationsizeusedforrigiddisciplinesischosenfromauniformdistribution thoseusedinpreviousstudiesinthisarea[ps96a,mz95,set95].theprocessor Minimumprocessorallocationsizesareuniformlychosenfromonetosixteen informationcanonlybeobtainedifalargefractionofthetotalworkinthe workloadhasgoodspeedup,andmoreover,iflarger-sizedjobstendtohavebetter speedupthansmaller-sizedones[ps96a].assuch,welet75%ofthejobshave Ithasbeenshownpreviouslythatperformancebenetsofknowingspeedup 8Notethatmaximumprocessorallocationinformationisonlyusefulatlighterloads, 7The25%,50%,and75%quantilesare1230,2985,and6100seconds,respectively. sinceatheavyloads,jobsseldomreceivemanymoreprocessorsthantheirminimum allocation.

19 goodspeedup,where99.9%oftheworkisperfectlyparallelizable(corresponding 6.2ResultsandLessonsLearned toaspeedupof114on128processors).poorspeedupjobshaveaspeedupof6.4 on8processorsandaspeedupof9.3on128processors.9 Theperformanceresultsofalldisciplinesunderthefourknowledgecases(no preemptive,rigiddisciplinedoesnotoeranyadvantagesoverthecorresponding magnitude)thanthemigratableormalleablepreemptivedisciplines.thesimple fortherun-to-completiondisciplinesaremuchhigher(byuptoanorderof intable3andsummarizedinfigs.3and4.ascanbeseen,theresponsetimes knowledge,service-demandknowledge,speedupknowledge,orboth)aregiven allowingajobtoonlypreemptanotherthathasthesamedesiredprocessor run-to-completionversion.thereasonisthatthereisinsucientexibilityin andmalleabledisciplines(seefig.4).intheformercase,makespansdecreased bynearly50%fromtherigidtotheadaptivevariantusingthesubset-sumalgorithm.toachievethisimprovement,however,themeanresponsetimesgenerally Adaptabilityappearstohavethemostpositiveeectforrun-to-completion regard. requirement.theadaptivepreemptivedisciplineisconsiderablybetterinthis increasedbecauseprocessorallocationstendedtobesmaller(leadingtolonger averageruntimes).inthemalleablecase,adaptabilityresultedinsmallerbut noticeabledecreasesinmakespans(5{10%).itshouldbenotedthattheopportunityforimprovementismuchlowerthaninthertccasebecausetheminimum eitherthemeanresponsetime(fortheformer)orthemakespan(forthelatter) makespansofapproximately78000seconds). makespanis65412secondsforthisexperiment(comparedtoactualobserved knowledgehadlimitedbenetintherun-to-completiondisciplinesbecausethe highresponsetimesresultfromlong-runningjobsbeingactivated,whichthe werelarge,butmaynotbeassignicantasonemightexpect.service-demand Service-demandandspeedupknowledgeappearedtobemosteectivewhen tsofhavingservicedemandinformation.highlightingthisdierence,weoften foundqueuelengthsforrun-to-completiondisciplinestogrowashighas60jobs, disciplines,themultilevelfeedbackapproachachievedthemajorityofthebene- schedulermustdoatsomepoint.inthemigratableandmalleablepreemptive bestartedatthesametimeasahigh-eciencyjob;eveninthebestcase,the whileformigratableormalleabledisciplines,theywererarelylargerthanve. maximumeciencyofapoor-speedupjobwillonlybe58%givenaminimum becausepoor-speedupjobscanrarelyruneciently.(toutilizeprocessorseciently,suchajobmusthavealowminimumprocessorrequirement,andmust Givenourworkload,wefoundspeedupknowledgetobeoflimitedbenet processorallocationofeightafterscaling.)fromtheresults,onecanobservethat 9Suchatwo-speedup-classworkloadappearstobesupportedbydatafromtheCornell toitselapsedtime[par97]. TheoryCenterifweexaminetheamountofCPUtimeconsumedbyeachjobrelative

20 oftime;inthesecases,aminimumboundonthemeanresponsetimesisreported(indicatedbya>)andthenumberofunnishedjobs isgiveninparenthesis. Discipline NoKnowledgeService-Demand Speedup Both MRTMakespanMRTMakespanMRTMakespanMRTMakespan Table3.PerformanceofLSF-basedschedulingdisciplines.Insometrials,thedisciplinedidnotterminatewithinareasonableamount LSF-RTC-ADSUBSET LSF-PREEMPT-AD >2293>219105(2) >1342>192031(1) LSF-MIG-ADSUBSET >1347>193772(1) LSF-MALL-ADSUBSET

21 12000 Observed Mean Response Times NONE Service Demand Speedup Both Fig.3.Observedmeanresponsetimesforeachdiscipline Observed Makespans NONE Service Demand Speedup Both Fig.4.Observedmakespansforeachdiscipline.

22 Long Job Long Job long-runningjobs,thesystemrarelyreachesastatewhereallprocessorsareavailable, schedulertoactivatejobshavinglargeminimumprocessorrequirements.becauseofthe Fig.5.Eectsofhighlyvariableservicedemandsontheabilityforarun-to-completion whichisnecessarytoscheduleajobhavingalargeminimumprocessorrequirement. Long Job service-demandknowledgecansometimesnegatethebenetsofhavingspeedup Time knowledgeasjobshavingtheleastremainingservicedemand(ratherthanleast acquiredprocessingtime)aregivenhigherpriority. ofourschedulers,inordertofurtherunderstandtheperformanceresults.our observationscanbesummarizedasfollows: {Jobshavinglargeminimumprocessorrequirementscanoftenexperience Whileperformingtheourexperiments,wemonitoredthebehaviourofeach havinglargeminimumprocessorrequirement. havingalargeservicedemand,makingitdiculttoeverscheduleajob ThisbehaviourisillustratedinFig.5.Evenatlightloads,itisquitelikelyfor haveahighdegreeofvariability,thereisoftenatleastonejobrunning signicantdelaysinrun-to-completiondisciplines.sinceservicedemands {Adaptiverun-to-completiondisciplinescanleadtomorevariablemakespans. tobeavailableatthetimeitmakesitsschedulingdecision. disciplinescannotcounteractthiseectbecauseitstillrequiresallprocessors someprocessorstobeoccupied,preventingthedispatchingofajobhavinga largeprocessorrequirement.eventheuseofthesubsetvariantofthertc ecutiontimeoftheselongjobsissetinadvance.intheadaptivecase,a Ina200-jobworkload,themakespanisdictatedessentiallybythelongrunningjobsinthesystem(e.g.,inoneofourexperiments,onejobhad schedulermayallocatesuchjobsasmallnumberofprocessors,whichis makespanofarigiddisciplinewillberelativelypredictablebecausetheex- asequentialservicedemandof265000seconds,oralmost74hours).the goodfromaneciencystandpoint,butcanleadtomuchlongermakespans. Also,iflongjobsareallocatedfewprocessors,whichtendstooccurinmost adaptivedisciplinesastheloadincreases,theselongjobswilloccupyprocessorsforlongerperiodsoftime(relativetotherigidcase).thiscanmake everndenoughavailableprocessors. itevenmoredicultforjobswithlargeminimumprocessorrequirementsto Processors

23 Theconclusionisthatrun-to-completiondisciplinesareevenmoreproblematicthanoriginallyindicated.Ithaspreviouslybeenshownhowhigments. canalsoleadtostarvationforjobshavinglargeminimumprocessorrequire- variabilityinservicedemandscanleadtopoorresponsetimesifmemory isabundant;theseobservationsshowthathighlyvariableservicedemands {Migratabledisciplinescansignicantlyreduceresponsetimesrelativeto RTCones.However,adaptiveversionsofmigratabledisciplinescanexhibit unpredictablecompletiontimesforlong-runningjobs,asaschedulermust committoanallocationwhenajobisrstactivated.insomecases,the tohaveotherprocessorssubsequentlybecomeavailable.inaproductionenvironment,thismayencourageuserssubmittinghighservice-demandjobs schedulerallocatesasmallnumberofprocessorstolong-runningjobs,only tospecifyalargeminimumprocessorallocationsimplytoensurethattheir leadingtopotentialstarvationproblems.(thiswasthecauseofthelarge eectonthesustainablethroughput. jobscompletewithinamoredesirableamountoftime,buthavinganegative ADSUBSETexperiments.)Inordertoresumesuchajoboncestopped,the Inothercases,long-runningjobswereallocatedalargenumberofprocessors, makespansinthefull-knowledgelsf-migrate-adandlsf-migrate- schedulermustbecapableofpreemptingasucientnumberofrunningjobs tosatisfythestoppedjob'sprocessorrequirement.thiscanbedicultat highloadswherejobswithsmallprocessorallocationsarecontinuouslybeing {Fromauser'sperspective,malleabledisciplinesaremostattractive.During atleasttenminutes.inarealworkload,webelievethisproblemwillbecome demandbecomessmaller. started,suspended,andresumed,sinceweonlypreemptjobsthathaverun andastheloadbecomeslighter,long-runningjobsreceivemoreprocessors. periodsofheavyload,thesystemallocatesjobsasmallnumberofprocessors, lessimportantastheratioofthemigrationoverheadtothemeanservice Unusedprocessorsarisingfromimperfectpackingareneveraproblem,allowingahighlevelofutilizationtobeachieved.Also,jobsrarelyexperienctionuponactivatingajobforthersttime.Asaresult,adaptivemalleable starvationbecausetheschedulerdoesnotcommititselftoaprocessoralloca- lowresponsetimesandhighthroughputs(evengivena10%re-allocation overhead). disciplinesconsistentlyperformedbestandhavethehighestpotentialfor 7Conclusions disciplineswereimplementedonanetworkofworkstations,theycanbeusedon eachwithvaryingdegreesofknowledgeofjobcharacteristics.althoughthese widerangeofdisciplines,fromrun-to-completiontomalleablepreemptiveones, basedonplatformcomputing'sloadsharingfacility(lsf).weconsidera Inthispaper,wepresentthedesignofparallel-jobschedulingimplementations, anydistributed-memorymultiprocessorsystemsupportinglsf.

Contributions to Gang Scheduling

Contributions to Gang Scheduling CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

More information

Priority-Driven Scheduling

Priority-Driven Scheduling Priority-Driven Scheduling Advantages of Priority-Driven Scheduling Priority-driven scheduling is easy to implement. It does not require the information on the release times and execution times of the

More information

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and

More information

Scheduling Aperiodic and Sporadic Jobs in Priority- Driven Systems

Scheduling Aperiodic and Sporadic Jobs in Priority- Driven Systems Scheduling Aperiodic and Sporadic Jobs in Priority- Driven Systems Ingo Sander ingo@kth.se Liu: Chapter 7 IL2212 Embedded Software 1 Outline l System Model and Assumptions l Scheduling Aperiodic Jobs l

More information

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline

More information

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get

More information

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts

Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts Objectives Chapter 5: CPU Scheduling Introduce CPU scheduling, which is the basis for multiprogrammed operating systems Describe various CPU-scheduling algorithms Discuss evaluation criteria for selecting

More information

Grid Engine Training Introduction

Grid Engine Training Introduction Grid Engine Training Jordi Blasco (jordi.blasco@xrqtc.org) 26-03-2012 Agenda 1 How it works? 2 History Current status future About the Grid Engine version of this training Documentation 3 Grid Engine internals

More information

Introduction. Scheduling. Types of scheduling. The basics

Introduction. Scheduling. Types of scheduling. The basics Introduction In multiprogramming systems, when there is more than one runable (i.e., ready), the operating system must decide which one to activate. The decision is made by the part of the operating system

More information

Completely Fair Scheduler and its tuning 1

Completely Fair Scheduler and its tuning 1 Completely Fair Scheduler and its tuning 1 Jacek Kobus and Rafał Szklarski 1 Introduction The introduction of a new, the so called completely fair scheduler (CFS) to the Linux kernel 2.6.23 (October 2007)

More information

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,

More information

University Dortmund. Robotics Research Institute Information Technology. Job Scheduling. Uwe Schwiegelshohn. EPIT 2007, June 5 Ordonnancement

University Dortmund. Robotics Research Institute Information Technology. Job Scheduling. Uwe Schwiegelshohn. EPIT 2007, June 5 Ordonnancement University Dortmund Robotics Research Institute Information Technology Job Scheduling Uwe Schwiegelshohn EPIT 2007, June 5 Ordonnancement ontent of the Lecture What is ob scheduling? Single machine problems

More information

Commonly Used Approaches to Real-Time Scheduling

Commonly Used Approaches to Real-Time Scheduling Integre Technical Publishing Co., Inc. Liu January 13, 2000 8:46 a.m. chap4 page 60 C H A P T E R 4 Commonly Used Approaches to Real-Time Scheduling This chapter provides a brief overview of three commonly

More information

Real Time Scheduling Basic Concepts. Radek Pelánek

Real Time Scheduling Basic Concepts. Radek Pelánek Real Time Scheduling Basic Concepts Radek Pelánek Basic Elements Model of RT System abstraction focus only on timing constraints idealization (e.g., zero switching time) Basic Elements Basic Notions task

More information

4003-440/4003-713 Operating Systems I. Process Scheduling. Warren R. Carithers (wrc@cs.rit.edu) Rob Duncan (rwd@cs.rit.edu)

4003-440/4003-713 Operating Systems I. Process Scheduling. Warren R. Carithers (wrc@cs.rit.edu) Rob Duncan (rwd@cs.rit.edu) 4003-440/4003-713 Operating Systems I Process Scheduling Warren R. Carithers (wrc@cs.rit.edu) Rob Duncan (rwd@cs.rit.edu) Review: Scheduling Policy Ideally, a scheduling policy should: Be: fair, predictable

More information

Periodic Task Scheduling

Periodic Task Scheduling Periodic Task Scheduling Radek Pelánek Motivation and Assumptions Examples of Periodic Tasks sensory data acquisition control loops action planning system monitoring Motivation and Assumptions Simplifying

More information

A Multi-criteria Job Scheduling Framework for Large Computing Farms

A Multi-criteria Job Scheduling Framework for Large Computing Farms A Multi-criteria Job Scheduling Framework for Large Computing Farms Ranieri Baraglia a,, Gabriele Capannini a, Patrizio Dazzi a, Giancarlo Pagano b a Information Science and Technology Institute - CNR

More information

The RWTH Compute Cluster Environment

The RWTH Compute Cluster Environment The RWTH Compute Cluster Environment Tim Cramer 11.03.2013 Source: D. Both, Bull GmbH Rechen- und Kommunikationszentrum (RZ) How to login Frontends cluster.rz.rwth-aachen.de cluster-x.rz.rwth-aachen.de

More information

Self-Tuning Job Scheduling Strategies for the Resource Management of HPC Systems and Computational Grids

Self-Tuning Job Scheduling Strategies for the Resource Management of HPC Systems and Computational Grids Self-Tuning Job Scheduling Strategies for the Resource Management of HPC Systems and Computational Grids Dissertation von Achim Streit Schriftliche Arbeit zur Erlangung des Grades eines Doktors der Naturwissenschaften

More information

Scheduling 0 : Levels. High level scheduling: Medium level scheduling: Low level scheduling

Scheduling 0 : Levels. High level scheduling: Medium level scheduling: Low level scheduling Scheduling 0 : Levels High level scheduling: Deciding whether another process can run is process table full? user process limit reached? load to swap space or memory? Medium level scheduling: Balancing

More information

Chapter 5. Process design

Chapter 5. Process design Chapter 5 Process design Slack et al s model of operations management Direct Product and service design Design Operations Management Deliver Develop Process design Location, layout and flow Key operations

More information

OS OBJECTIVE QUESTIONS

OS OBJECTIVE QUESTIONS OS OBJECTIVE QUESTIONS Which one of the following is Little s formula Where n is the average queue length, W is the time that a process waits 1)n=Lambda*W 2)n=Lambda/W 3)n=Lambda^W 4)n=Lambda*(W-n) Answer:1

More information

Objectives. Chapter 5: Process Scheduling. Chapter 5: Process Scheduling. 5.1 Basic Concepts. To introduce CPU scheduling

Objectives. Chapter 5: Process Scheduling. Chapter 5: Process Scheduling. 5.1 Basic Concepts. To introduce CPU scheduling Objectives To introduce CPU scheduling To describe various CPU-scheduling algorithms Chapter 5: Process Scheduling To discuss evaluation criteria for selecting the CPUscheduling algorithm for a particular

More information

Real- Time Scheduling

Real- Time Scheduling Real- Time Scheduling Chenyang Lu CSE 467S Embedded Compu5ng Systems Readings Ø Single-Processor Scheduling: Hard Real-Time Computing Systems, by G. Buttazzo. q Chapter 4 Periodic Task Scheduling q Chapter

More information

OPERATING SYSTEMS SCHEDULING

OPERATING SYSTEMS SCHEDULING OPERATING SYSTEMS SCHEDULING Jerry Breecher 5: CPU- 1 CPU What Is In This Chapter? This chapter is about how to get a process attached to a processor. It centers around efficient algorithms that perform

More information

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010. Road Map Scheduling Dickinson College Computer Science 354 Spring 2010 Past: What an OS is, why we have them, what they do. Base hardware and support for operating systems Process Management Threads Present:

More information

Operating System Tutorial

Operating System Tutorial Operating System Tutorial OPERATING SYSTEM TUTORIAL Simply Easy Learning by tutorialspoint.com tutorialspoint.com i ABOUT THE TUTORIAL Operating System Tutorial An operating system (OS) is a collection

More information

TowardConvergenceinJobSchedulersfor ParallelSupercomputers DrorG.FeitelsonandLarryRudolph

TowardConvergenceinJobSchedulersfor ParallelSupercomputers DrorG.FeitelsonandLarryRudolph TowardConvergenceinJobSchedulersfor ParallelSupercomputers DrorG.FeitelsonandLarryRudolph availableabouttheworkload,andtheoperationsthattheschedulermay Abstract.Thespaceofjobschedulersforparallelsupercomputersis

More information

SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC

SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC Paper BI222012 SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC ABSTRACT This paper will discuss at a high level some of the options

More information

Resource Models: Batch Scheduling

Resource Models: Batch Scheduling Resource Models: Batch Scheduling Last Time» Cycle Stealing Resource Model Large Reach, Mass Heterogeneity, complex resource behavior Asynchronous Revocation, independent, idempotent tasks» Resource Sharing

More information

Multi-core real-time scheduling

Multi-core real-time scheduling Multi-core real-time scheduling Credits: Anne-Marie Déplanche, Irccyn, Nantes (many slides come from her presentation at ETR, Brest, September 2011) 1 Multi-core real-time scheduling! Introduction: problem

More information

4. Fixed-Priority Scheduling

4. Fixed-Priority Scheduling Simple workload model 4. Fixed-Priority Scheduling Credits to A. Burns and A. Wellings The application is assumed to consist of a fixed set of tasks All tasks are periodic with known periods This defines

More information

Enabling Interactive Jobs in Virtualized Data Centers (Extended Abstract)

Enabling Interactive Jobs in Virtualized Data Centers (Extended Abstract) 1 Enabling Interactive Jobs in Virtualized Data Centers (Extended Abstract) John Paul Walters, Bhagyashree Bantwal, and Vipin Chaudhary Department of Computer Science and Engineering University at Buffalo,

More information

W4118 Operating Systems. Instructor: Junfeng Yang

W4118 Operating Systems. Instructor: Junfeng Yang W4118 Operating Systems Instructor: Junfeng Yang Outline Introduction to scheduling Scheduling algorithms 1 Direction within course Until now: interrupts, processes, threads, synchronization Mostly mechanisms

More information

Dynamic Load Balancing in a Network of Workstations

Dynamic Load Balancing in a Network of Workstations Dynamic Load Balancing in a Network of Workstations 95.515F Research Report By: Shahzad Malik (219762) November 29, 2000 Table of Contents 1 Introduction 3 2 Load Balancing 4 2.1 Static Load Balancing

More information

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource

Batch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource PBS INTERNALS PBS & TORQUE PBS (Portable Batch System)-software system for managing system resources on workstations, SMP systems, MPPs and vector computers. It was based on Network Queuing System (NQS)

More information

Munis Payroll Entry Instructions

Munis Payroll Entry Instructions Munis Payroll Entry Instructions All pay processors will be entering time for administrators, teachers, extended employment, extra duty pay, LTE s, etc. via the Time and Attendance screens in Munis. Anything

More information

Process Scheduling. Process Scheduler. Chapter 7. Context Switch. Scheduler. Selection Strategies

Process Scheduling. Process Scheduler. Chapter 7. Context Switch. Scheduler. Selection Strategies Chapter 7 Process Scheduling Process Scheduler Why do we even need to a process scheduler? In simplest form, CPU must be shared by > OS > Application In reality, [multiprogramming] > OS : many separate

More information

Chapter 5 Linux Load Balancing Mechanisms

Chapter 5 Linux Load Balancing Mechanisms Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still

More information

Konzepte von Betriebssystem-Komponenten. Linux Scheduler. Valderine Kom Kenmegne Valderinek@hotmail.com. Proseminar KVBK Linux Scheduler Valderine Kom

Konzepte von Betriebssystem-Komponenten. Linux Scheduler. Valderine Kom Kenmegne Valderinek@hotmail.com. Proseminar KVBK Linux Scheduler Valderine Kom Konzepte von Betriebssystem-Komponenten Linux Scheduler Kenmegne Valderinek@hotmail.com 1 Contents: 1. Introduction 2. Scheduler Policy in Operating System 2.1 Scheduling Objectives 2.2 Some Scheduling

More information

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354

159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354 159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1

More information

Delay Scheduling. A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling

Delay Scheduling. A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling Delay Scheduling A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley,

More information

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization An Experimental Model to Analyze OpenMP Applications for System Utilization Mark Woodyard Principal Software Engineer 1 The following is an overview of a research project. It is intended

More information

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading Announcements Reading Chapter 5 Chapter 7 (Monday or Wednesday) Basic Concepts CPU I/O burst cycle Process execution consists of a cycle of CPU execution and I/O wait. CPU burst distribution What are the

More information

Linux Process Scheduling Policy

Linux Process Scheduling Policy Lecture Overview Introduction to Linux process scheduling Policy versus algorithm Linux overall process scheduling objectives Timesharing Dynamic priority Favor I/O-bound process Linux scheduling algorithm

More information

A High Performance Computing Scheduling and Resource Management Primer

A High Performance Computing Scheduling and Resource Management Primer LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was

More information

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS CPU SCHEDULING CPU SCHEDULING (CONT D) Aims to assign processes to be executed by the CPU in a way that meets system objectives such as response time, throughput, and processor efficiency Broken down into

More information

Automatic load balancing and transparent process migration

Automatic load balancing and transparent process migration Automatic load balancing and transparent process migration Roberto Innocente rinnocente@hotmail.com November 24,2000 Download postscript from : mosix.ps or gzipped postscript from: mosix.ps.gz Nov 24,2000

More information

Design and Implementation of Distributed Process Execution Environment

Design and Implementation of Distributed Process Execution Environment Design and Implementation of Distributed Process Execution Environment Project Report Phase 3 By Bhagyalaxmi Bethala Hemali Majithia Shamit Patel Problem Definition: In this project, we will design and

More information

Operating Systems. III. Scheduling. http://soc.eurecom.fr/os/

Operating Systems. III. Scheduling. http://soc.eurecom.fr/os/ Operating Systems Institut Mines-Telecom III. Scheduling Ludovic Apvrille ludovic.apvrille@telecom-paristech.fr Eurecom, office 470 http://soc.eurecom.fr/os/ Outline Basics of Scheduling Definitions Switching

More information

Scheduling Algorithms and Support Tools for Parallel Systems

Scheduling Algorithms and Support Tools for Parallel Systems Scheduling Algorithms and Support Tools for Parallel Systems Igor Grudenić Fakultet elektrotehnike i računarstva, Unska 3, Zagreb Abstract High Perfomance Computing (HPC) is an evolving trend in computing

More information

Programming and Scheduling Model for Supporting Heterogeneous Architectures in Linux

Programming and Scheduling Model for Supporting Heterogeneous Architectures in Linux Programming and Scheduling Model for Supporting Heterogeneous Architectures in Linux Third Workshop on Computer Architecture and Operating System co-design Paris, 25.01.2012 Tobias Beisel, Tobias Wiersema,

More information

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances: Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

Scheduling in SAS 9.3

Scheduling in SAS 9.3 Scheduling in SAS 9.3 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. Scheduling in SAS 9.3. Cary, NC: SAS Institute Inc. Scheduling in SAS 9.3

More information

Parallel and Sequential Job Scheduling in Heterogeneous Clusters: A Simulation Study using Software in the Loop

Parallel and Sequential Job Scheduling in Heterogeneous Clusters: A Simulation Study using Software in the Loop 21, HCS Research Lab. All Rights Reserved. Parallel and Sequential Job Scheduling in Heterogeneous Clusters: A Simulation Study using Software in the Loop Dave E. Collins* and Alan D. George** High-performance

More information

Introduction to Scheduling Theory

Introduction to Scheduling Theory Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

More information

Enhancing the Monitoring of Real-Time Performance in Linux

Enhancing the Monitoring of Real-Time Performance in Linux Master of Science Thesis Enhancing the Monitoring of Real-Time Performance in Linux Author: Nima Asadi nai10001@student.mdh.se Supervisor: Mehrdad Saadatmand mehrdad.saadatmand@mdh.se Examiner: Mikael

More information

CPU Scheduling Outline

CPU Scheduling Outline CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different

More information

Employee Tracker Time & Attendance System. Time Banking

Employee Tracker Time & Attendance System. Time Banking Employee Tracker Time & Attendance System Time Banking Table of Contents 2. Overview 3. Absent Codes 5. Time Bank Setup 7. Assign Time Banks to Employees 11. Time Bank Withdrawals from Transactions 13.

More information

A CP Scheduler for High-Performance Computers

A CP Scheduler for High-Performance Computers A CP Scheduler for High-Performance Computers Thomas Bridi, Michele Lombardi, Andrea Bartolini, Luca Benini, and Michela Milano {thomas.bridi,michele.lombardi2,a.bartolini,luca.benini,michela.milano}@

More information

ICS 143 - Principles of Operating Systems

ICS 143 - Principles of Operating Systems ICS 143 - Principles of Operating Systems Lecture 5 - CPU Scheduling Prof. Nalini Venkatasubramanian nalini@ics.uci.edu Note that some slides are adapted from course text slides 2008 Silberschatz. Some

More information

Overview of Presentation. (Greek to English dictionary) Different systems have different goals. What should CPU scheduling optimize?

Overview of Presentation. (Greek to English dictionary) Different systems have different goals. What should CPU scheduling optimize? Overview of Presentation (Greek to English dictionary) introduction to : elements, purpose, goals, metrics lambda request arrival rate (e.g. 200/second) non-preemptive first-come-first-served, shortest-job-next

More information

Ecole des Mines de Nantes. Journée Thématique Emergente "aspects énergétiques du calcul"

Ecole des Mines de Nantes. Journée Thématique Emergente aspects énergétiques du calcul Ecole des Mines de Nantes Entropy Journée Thématique Emergente "aspects énergétiques du calcul" Fabien Hermenier, Adrien Lèbre, Jean Marc Menaud menaud@mines-nantes.fr Outline Motivation Entropy project

More information

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum Scheduling Yücel Saygın These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum 1 Scheduling Introduction to Scheduling (1) Bursts of CPU usage alternate with periods

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Scheduling Algorithms for Dynamic Workload

Scheduling Algorithms for Dynamic Workload Managed by Scheduling Algorithms for Dynamic Workload Dalibor Klusáček (MU) Hana Rudová (MU) Ranieri Baraglia (CNR - ISTI) Gabriele Capannini (CNR - ISTI) Marco Pasquali (CNR ISTI) Outline Motivation &

More information

Load Balancing in Distributed System. Prof. Ananthanarayana V.S. Dept. Of Information Technology N.I.T.K., Surathkal

Load Balancing in Distributed System. Prof. Ananthanarayana V.S. Dept. Of Information Technology N.I.T.K., Surathkal Load Balancing in Distributed System Prof. Ananthanarayana V.S. Dept. Of Information Technology N.I.T.K., Surathkal Objectives of This Module Show the differences between the terms CPU scheduling, Job

More information

Cloud Management: Knowing is Half The Battle

Cloud Management: Knowing is Half The Battle Cloud Management: Knowing is Half The Battle Raouf BOUTABA David R. Cheriton School of Computer Science University of Waterloo Joint work with Qi Zhang, Faten Zhani (University of Waterloo) and Joseph

More information

2. How many years has your charter school been in operation?

2. How many years has your charter school been in operation? Oregon Charter School Director Survey 1. Who is the sponsor of your charter school? Local District 96.0% 72 Oregon Department of Education 4.0% 3 Name of Sponsoring District 69 answered question 75 skipped

More information

Batch Scheduling and Resource Management

Batch Scheduling and Resource Management Batch Scheduling and Resource Management Luke Tierney Department of Statistics & Actuarial Science University of Iowa October 18, 2007 Luke Tierney (U. of Iowa) Batch Scheduling and Resource Management

More information

Scheduling algorithms for Linux

Scheduling algorithms for Linux Scheduling algorithms for Linux Anders Peter Fugmann IMM-THESIS-2002-65 IMM Trykt af IMM, DTU Foreword This report is the result of a masters thesis entitled Scheduling algorithms for Linux. The thesis

More information

Real-Time Scheduling 1 / 39

Real-Time Scheduling 1 / 39 Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A

More information

Processor Scheduling. Queues Recall OS maintains various queues

Processor Scheduling. Queues Recall OS maintains various queues Processor Scheduling Chapters 9 and 10 of [OS4e], Chapter 6 of [OSC]: Queues Scheduling Criteria Cooperative versus Preemptive Scheduling Scheduling Algorithms Multi-level Queues Multiprocessor and Real-Time

More information

U-LITE Network Infrastructure

U-LITE Network Infrastructure U-LITE: a proposal for scientific computing at LNGS S. Parlati, P. Spinnato, S. Stalio LNGS 13 Sep. 2011 20 years of Scientific Computing at LNGS Early 90s: highly centralized structure based on VMS cluster

More information

CS4410 - Fall 2008 Homework 2 Solution Due September 23, 11:59PM

CS4410 - Fall 2008 Homework 2 Solution Due September 23, 11:59PM CS4410 - Fall 2008 Homework 2 Solution Due September 23, 11:59PM Q1. Explain what goes wrong in the following version of Dekker s Algorithm: CSEnter(int i) inside[i] = true; while(inside[j]) inside[i]

More information

CPU Scheduling. CPU Scheduling

CPU Scheduling. CPU Scheduling CPU Scheduling Electrical and Computer Engineering Stephen Kim (dskim@iupui.edu) ECE/IUPUI RTOS & APPS 1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling

More information

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems

A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems A Study on the Application of Existing Load Balancing Algorithms for Large, Dynamic, Heterogeneous Distributed Systems RUPAM MUKHOPADHYAY, DIBYAJYOTI GHOSH AND NANDINI MUKHERJEE Department of Computer

More information

Common Approaches to Real-Time Scheduling

Common Approaches to Real-Time Scheduling Common Approaches to Real-Time Scheduling Clock-driven time-driven schedulers Priority-driven schedulers Examples of priority driven schedulers Effective timing constraints The Earliest-Deadline-First

More information

Scheduling and Resource Management in Computational Mini-Grids

Scheduling and Resource Management in Computational Mini-Grids Scheduling and Resource Management in Computational Mini-Grids July 1, 2002 Project Description The concept of grid computing is becoming a more and more important one in the high performance computing

More information

Scheduling Support for Heterogeneous Hardware Accelerators under Linux

Scheduling Support for Heterogeneous Hardware Accelerators under Linux Scheduling Support for Heterogeneous Hardware Accelerators under Linux Tobias Wiersema University of Paderborn Paderborn, December 2010 1 / 24 Tobias Wiersema Linux scheduler extension for accelerators

More information

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting

More information

Grid Scheduling Dictionary of Terms and Keywords

Grid Scheduling Dictionary of Terms and Keywords Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status

More information

Real-Time Scheduling (Part 1) (Working Draft) Real-Time System Example

Real-Time Scheduling (Part 1) (Working Draft) Real-Time System Example Real-Time Scheduling (Part 1) (Working Draft) Insup Lee Department of Computer and Information Science School of Engineering and Applied Science University of Pennsylvania www.cis.upenn.edu/~lee/ CIS 41,

More information

Process design. Process design. Process design. Operations strategy. Supply network design. Layout and flow Design. Operations management.

Process design. Process design. Process design. Operations strategy. Supply network design. Layout and flow Design. Operations management. Process design Source: Joe Schwarz, www.joyrides.com Process design Process design Supply network design Operations strategy Layout and flow Design Operations management Improvement Process technology

More information

LoadLeveler Overview. January 30-31, 2012. IBM Storage & Technology Group. IBM HPC Developer Education @ TIFR, Mumbai

LoadLeveler Overview. January 30-31, 2012. IBM Storage & Technology Group. IBM HPC Developer Education @ TIFR, Mumbai IBM HPC Developer Education @ TIFR, Mumbai IBM Storage & Technology Group LoadLeveler Overview January 30-31, 2012 Pidad D'Souza (pidsouza@in.ibm.com) IBM, System & Technology Group 2009 IBM Corporation

More information

Efficiency of Batch Operating Systems

Efficiency of Batch Operating Systems Efficiency of Batch Operating Systems a Teodor Rus rus@cs.uiowa.edu The University of Iowa, Department of Computer Science a These slides have been developed by Teodor Rus. They are copyrighted materials

More information

Distributed Operating Systems. Cluster Systems

Distributed Operating Systems. Cluster Systems Distributed Operating Systems Cluster Systems Ewa Niewiadomska-Szynkiewicz ens@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of Technology E&IT Department, WUT 1 1. Cluster

More information

Operating Systems Lecture #6: Process Management

Operating Systems Lecture #6: Process Management Lecture #6: Process Written by based on the lecture series of Dr. Dayou Li and the book Understanding 4th ed. by I.M.Flynn and A.McIver McHoes (2006) Department of Computer Science and Technology,., 2013

More information

Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,

More information

Networking Virtualization Using FPGAs

Networking Virtualization Using FPGAs Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,

More information

AASPI SOFTWARE PARALLELIZATION

AASPI SOFTWARE PARALLELIZATION AASPI SOFTWARE PARALLELIZATION Introduction Generation of multitrace and multispectral seismic attributes can be computationally intensive. For example, each input seismic trace may generate 50 or more

More information

Abstract: Motivation: Description of proposal:

Abstract: Motivation: Description of proposal: Efficient power utilization of a cluster using scheduler queues Kalyana Chadalvada, Shivaraj Nidoni, Toby Sebastian HPCC, Global Solutions Engineering Bangalore Development Centre, DELL Inc. {kalyana_chadalavada;shivaraj_nidoni;toby_sebastian}@dell.com

More information

Near-Dedicated Scheduling

Near-Dedicated Scheduling Near-Dedicated Scheduling Chris Brady, CRI, Boulder, Colorado, USA, Mary Ann Ciuffini, NCAR, Boulder, Colorado, USA, Bryan Hardy, CRI, Boulder, Colorado, USA ABSTRACT: With the advent of high-performance

More information

A Review on Load Balancing In Cloud Computing 1

A Review on Load Balancing In Cloud Computing 1 www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 6 June 2015, Page No. 12333-12339 A Review on Load Balancing In Cloud Computing 1 Peenaz Pathak, 2 Er.Kamna

More information

Comparative Study of Distributed Resource Management Systems SGE, LSF, PBS Pro, and LoadLeveler

Comparative Study of Distributed Resource Management Systems SGE, LSF, PBS Pro, and LoadLeveler Comparative Study of Distributed Resource Management Systems SGE, LSF, PBS Pro, and LoadLeveler Yonghong Yan, Barbara Chapman {yanyh,chapman}@cs.uh.edu Department of Computer Science University of Houston

More information

CPU Scheduling. CSC 256/456 - Operating Systems Fall 2014. TA: Mohammad Hedayati

CPU Scheduling. CSC 256/456 - Operating Systems Fall 2014. TA: Mohammad Hedayati CPU Scheduling CSC 256/456 - Operating Systems Fall 2014 TA: Mohammad Hedayati Agenda Scheduling Policy Criteria Scheduling Policy Options (on Uniprocessor) Multiprocessor scheduling considerations CPU

More information

Embedded Systems. 6. Real-Time Operating Systems

Embedded Systems. 6. Real-Time Operating Systems Embedded Systems 6. Real-Time Operating Systems Lothar Thiele 6-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition Chapter 5: CPU Scheduling Silberschatz, Galvin and Gagne 2009 Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Operating

More information

Predictable response times in event-driven real-time systems

Predictable response times in event-driven real-time systems Predictable response times in event-driven real-time systems Automotive 2006 - Security and Reliability in Automotive Systems Stuttgart, October 2006. Presented by: Michael González Harbour mgh@unican.es

More information