v 16 v 17 v 21 v 22 v 23

Size: px
Start display at page:

Download "v 16 v 17 v 21 v 22 v 23"

Transcription

1 SchedulingMultithreadedComputations byworkstealing TheUniversityofTexasatAustin CharlesE.Leiserson RobertD.Blumofe MITLaboratoryforComputerScience ticalmethodofschedulingthiskindofdynamicmimd-stylecomputationis\work structured)multithreadedcomputationsonparallelcomputers.apopularandprac- Thispaperstudiestheproblemofecientlyschedulingfullystrict(i.e.,well- Abstract multithreadedcomputationswithdependencies. stealing,"inwhichprocessorsneedingworkstealcomputationalthreadsfromother processors.inthispaper,wegivetherstprovablygoodwork-stealingschedulerfor theminimumexecutiontimewithaninnitenumberofprocessors.moreover,the computationonpprocessorsusingourwork-stealingschedulerist1=p+o(t1),where T1istheminimumserialexecutiontimeofthemultithreadedcomputationandT1is Specically,ouranalysisshowsthattheexpectedtimetoexecuteafullystrict atmosto(pt1(1+nd)smax),wheresmaxisthesizeofthelargestactivationrecordof requirement.wealsoshowthattheexpectedtotalcommunicationofthealgorithmis anythreadandndisthemaximumnumberoftimesthatanythreadsynchronizeswith spacerequiredbytheexecutionisatmosts1p,wheres1istheminimumserialspace threeoftheseboundsareexistentiallyoptimaltowithinaconstantfactor. schedulersaremorecommunicationecientthantheirwork-sharingcounterparts.all itsparent.thiscommunicationboundjustiesthefolkwisdomthatwork-stealing 1Forecientexecutionofadynamicallygrowing\multithreaded"computationonaMIMD- styleparallelcomputer,aschedulingalgorithmmustensurethatenoughthreadsareactive Introduction ofconcurrentlyactivethreadsremainswithinreasonablelimitssothatmemoryrequirements concurrentlytokeeptheprocessorsbusy.simultaneously,itshouldensurethatthenumber arenotundulylarge.moreover,theschedulershouldalsotrytomaintainrelatedthreads andwassupportedinpartbyanarpahigh-performancecomputinggraduatefellowship ThisresearchwasdonewhileRobertD.BlumofewasattheMITLaboratoryforComputerScience ThisresearchwassupportedinpartbytheAdvancedResearchProjectsAgencyunderContractN

2 computations:worksharingandworkstealing.inworksharing,wheneveraprocessor Needlesstosay,achievingallthesegoalssimultaneouslycanbedicult. onthesameprocessor,ifpossible,sothatcommunicationbetweenthemcanbeminimized. generatesnewthreads,theschedulerattemptstomigratesomeofthemtootherprocessors inhopesofdistributingtheworktounderutilizedprocessors.inworkstealing,however, Twoschedulingparadigmshavearisentoaddresstheproblemofschedulingmultithreaded underutilizedprocessorstaketheinitiative:theyattemptto\steal"threadsfromother processors.intuitively,themigrationofthreadsoccurslessfrequentlywithworkstealing byawork-stealingscheduler,butthreadsarealwaysmigratedbyawork-sharingscheduler. thanwithworksharing,sincewhenallprocessorshaveworktodo,nothreadsaremigrated communication.sincethen,manyresearchershaveimplementedvariantsonthisstrategy Theseauthorspointouttheheuristicbenetsofworkstealingwithregardstospaceand allelexecutionoffunctionalprograms[16]andhalstead'simplementationofmultilisp[30]. Thework-stealingideadatesbackatleastasfarasBurtonandSleep'sresearchonpar- search.recently,zhangandortynski[48]haveobtainedgoodboundsonthecommunication [11,21,23,29,34,37,46].Rudolph,Slivkin-Allalouf,andUpfal[43]analyzedarandomizedwork-stealingstrategyforloadbalancingindependentjobsonaparallelcomputer,and KarpandZhang[33]analyzedarandomizedwork-stealingstrategyforparallelbacktrack requirementsofthisalgorithm. aswellasdataowcomputations[2]inwhichthreadsmaystallduetoadatadependency. strict"(well-structured)multithreadedcomputations.thisclassofcomputationsencompassesbothbacktracksearchcomputations[33,48]anddivide-and-conquercomputations[47], Inthispaper,wepresentandanalyzeawork-stealingalgorithmforscheduling\fully Weanalyzeouralgorithmsinastringentatomic-accessmodelsimilartotheatomicmessagepassingmodelof[36]inwhichconcurrentaccessestothesamedatastructureareserially queuedbyanadversary. multithreadedcomputationswhichisprovablyecientintermsoftime,space,andcommunication.weprovethattheexpectedtimetoexecuteafullystrictcomputationonp processorsusingourwork-stealingschedulerist1=p+o(t1),wheret1istheminimum Ourmaincontributionisarandomizedwork-stealingschedulingalgorithmforfullystrict timewithaninnitenumberofprocessors.inaddition,thespacerequiredbytheexecution isatmosts1p,wheres1istheminimumserialspacerequirement.theseboundsarebetterthanpreviousboundsforwork-sharingschedulers[10],andthework-stealingscheduler serialexecutiontimeofthemultithreadedcomputationandt1istheminimumexecution ismuchsimplerandeminentlypractical.partofthisimprovementisduetoourfocusingonfullystrictcomputations,ascomparedtothe(general)strictcomputationsstudied O(PT1(1+nd)Smax),whereSmaxisthesizeofthelargestactivationrecordofanythread andndisthemaximumnumberoftimesthatanythreadsynchronizeswithitsparent.this boundisexistentiallytighttowithinaconstantfactor,meetingthelowerboundofwu andkung[47]forcommunicationinparalleldivide-and-conquer.incontrast,work-sharing in[10].wealsoprovethattheexpectedtotalcommunicationoftheexecutionisatmost requirementsofparallelcomputations.cullerandarvind[19]andruggieroandsargeant schedulershavenearlyworst-casebehaviorforcommunication.thus,ourresultsbolsterthe folkwisdomthatworkstealingissuperiortoworksharing. Othershavestudiedandcontinuetostudytheproblemofecientlymanagingthespace 2

3 [44]giveheuristicsforlimitingthespacerequiredbydataowprograms.Burton[14]shows andanalyzedschedulingalgorithmswithprovablygoodtimeandspacebounds.itisnot spacebounds.blelloch,gibbons,matias,andnarlikar[3,4]havealsorecentlydeveloped Burton[15]hasdevelopedandanalyzedaschedulingalgorithmwithprovablygoodtimeand howtolimitspaceincertainparallelcomputationswithoutcausingdeadlock.morerecently, theoreticmodelofmultithreadedcomputationsintroducedin[10],whichprovidesatheo- reticalbasisforanalyzingschedulers.section3givesasimpleschedulingalgorithmwhich Theremainderofthispaperisorganizedasfollows.InSection2wereviewthegraph- yetclearwhetheranyofthesealgorithmsareaspracticalasworkstealing. usesacentralqueue.this\busy-leaves"algorithmformsthebasisforourrandomizedworkstealingalgorithm,whichwepresentinsection4.insection5weintroducetheatomic-access modelthatweusetoanalyzeexecutiontimeandcommunicationcostsforthework-stealing andcommunicationcostofthework-stealingalgorithm.toconclude,insection7webriey boundalongwithadelay-sequenceargument[41]insection6toanalyzetheexecutiontime algorithm,andwepresentandanalyzeacombinatorial\ballsandbins"gamethatweuse discusshowthetheoreticalideasinthispaperhavebeenappliedtothecilkprogramming toderiveaboundonthecontentionthatarisesinrandomworkstealing.wethenusethis languageandruntimesystem[8,25],aswellasmakesomeconcludingremarks. 2Thissectionreprisesthegraph-theoreticmodelofmultithreadedcomputationintroduced in[10].wealsodenewhatitmeansforcomputationstobe\fullystrict."weconclude Amodelofmultithreadedcomputation withastatementofthegreedy-schedulingtheorem,whichisanadaptationoftheoremsby Brent[13]andGraham[27,28]ondagscheduling. quentialorderingofunit-timeinstructions.theinstructionsareconnectedbydependency edges,whichprovideapartialorderingonwhichinstructionsmustexecutebeforewhich otherinstructions.infigure1,forexample,eachshadedblockisathreadwithcircles Amultithreadedcomputationiscomposedofasetofthreads,eachofwhichisase- representinginstructionsandthehorizontaledges,calledcontinueedges,representingthe sequentialordering.thread 5ofthisexamplecontains3instructions:v10,v11,andv12. usetostorethevaluesonwhichtheycompute. itachunkofmemory,calledanactivationframe,thattheinstructionsofthethreadcan Theinstructionsofathreadmustexecuteinthissequentialorderfromtherst(leftmost) instructiontothelast(rightmost)instruction.inordertoexecuteathread,weallocatefor processorsofap-processorparallelcomputerexecutewhichinstructionsateachstep.an executionscheduledependsontheparticularmultithreadedcomputationandthenumberp ofprocessors.inanygivenstepofanexecutionschedule,eachprocessorexecutesatmost AP-processorexecutionscheduleforamultithreadedcomputationdetermineswhich rentlywiththespawnedthread.weconsiderspawnedthreadstobechildrenofthethread ingathreadislikeasubroutinecall,exceptthatthespawningthreadcanoperateconcur- oneinstruction. thatdidthespawning,andathreadmayspawnasmanychildrenasitdesires.inthisway, Duringthecourseofitsexecution,athreadmaycreate,orspawn,otherthreads.Spawn- 3

4 Γ 1 v 1 v 2 v 16 v 17 v 21 v 22 v 23 Γ 2 Γ 6 Figure1:Amultithreadedcomputation.Thiscomputationcontains23instructionsv1;v2;:::;v23 v 3 v 6 v 9 v 13 v 14 v v 18 v 19 v and6threads 1; 2;:::; Γ 3 Γ 4 Γ 5 v 4 v 5 v 7 v 8 v 10 v 11 v dren.thespawntreeistheparallelanalogofacalltree.inourexamplecomputation,the spawntree'srootthread 1hastwochildren, 2and 6,andthread 2hasthreechildren, threadsareorganizedintoaspawntreeasindicatedinfigure1bythedownward-pointing, shadeddependencyedges,calledspawnedges,thatconnectthreadstotheirspawnedchil- 12 executionschedulemustobeythisedgeinthatnoprocessormayexecuteaninstructionin thespawnoperation intheparentthreadtotherstinstructionofthechildthread.an 3, 4,and 5.Threads 3, 4, 5,and 6,whichhavenochildren,areleafthreads. aspawnedchildthreaduntilafterthespawninginstructionintheparentthreadhasbeen Eachspawnedgegoesfromaspecicinstruction theinstructionthatactuallydoes v7cannotbeexecuteduntilafterthespawninginstructionv6.consistentwithourunit-time instructionexecutes,itallocatesanactivationframeforthenewchildthread.onceathread modelofinstructions,asingleinstructionmayspawnatmostonechild.whenthespawning executed.inourexamplecomputation(figure1),duetothespawnedge(v6;v7),instruction Whenthelastinstructionofathreadexecutes,itdeallocatesitsframeandthethreaddies. bycontinueandspawnedges.consideraninstructionthatproducesadatavaluetobe hasbeenspawnedanditsframehasbeenallocated,wesaythethreadisaliveorliving. consumedbyanotherinstruction.suchaproducer/consumerrelationshipprecludesthe consuminginstructionfromexecutinguntilaftertheproducinginstruction.toenforce Anexecutionschedulegenerallyrespectsotherdependenciesbesidesthoserepresented suchorderings,otherdependencyedges,calledjoinedges,mayberequired,asshownin beforetheproducinginstructionhasexecuted,executionoftheconsumingthreadcannot continue thethreadstalls.oncetheproducinginstructionexecutes,thejoindependencyis Figure1bythecurvededges.Iftheexecutionofathreadarrivesataconsuminginstruction resolutionanddetectioncanbeaccomplishedusingmechanismssuchasjoincounters[8], ready.amultithreadedcomputationdoesnotmodelthemeansbywhichjoindependencies getresolvedorbywhichunresolvedjoindependenciesgetdetected.inimplementation, resolved,whichenablestheconsumingthreadtoresumeitsexecution thethreadbecomes futures[30],ori-structures[2]. instructionhasatmostaconstantnumberofjoinedgesincidentonit.thisassumption Wemaketwotechnicalassumptionsregardingjoinedges.Werstassumethateach 4

5 isconsistentwithourunit-timemodelofinstructions.thesecondassumptionisthatno continuestobereadytoexecuteforatleastonemoreinstruction. joinedgesentertheinstructionimmediatelyfollowingaspawn.thisassumptionmeans thatwhenaparentthreadspawnsachildthread,theparentcannotimmediatelystall.it inthisgraphhavebeenexecuted.sothatexecutionschedulesexist,thisgraphmustbe andnoprocessormayexecuteaninstructionuntilafteralloftheinstruction'spredecessors edgesofthecomputation.thesedependencyedgesformadirectedgraphofinstructions, Anexecutionschedulemustobeytheconstraintsgivenbythespawn,continue,andjoin executed. executionschedule,aninstructionisreadyifallofitspredecessorsinthedaghavebeen acyclic.thatis,itmustbeadirectedacyclicgraph,ordag.atanygivenstepofan frameshavebeendeallocated.althoughthisassumptionisnotabsolutelynecessary,itgives childrendie,andthus,athreaddoesnotdeallocateitsactivationframeuntilallitschildren's theexecutionanaturalstructure,anditwillsimplifyouranalysesofspaceutilization.in Wemakethesimplifyingassumptionthataparentthreadremainsaliveuntilallits (orifsuchstorageisavailable,thenwedonotaccountforit).therefore,thespaceused thecomputation;thereisnoglobalstorageavailabletothecomputationoutsidetheframes accountingforspaceutilization,wealsoassumethattheframesholdallthevaluesusedby threadsatthattime,andthetotalspaceusedinexecutingacomputationisthemaximum atagiventimeinexecutingacomputationisthetotalsizeofallframesusedbyallliving suchvalueoverthecourseoftheexecution. activationframeisallocatedandthisframeremainsallocatedaslongasthethreadremains nectedbydependencyedges.theinstructionsareconnectedbycontinueedgesintothreads, andthethreadsformaspawntreewiththespawnedges.whenathreadisspawned,an Tosummarize,amultithreadedcomputationcanbeviewedasadagofinstructionscon- alive.alivingthreadmaybeeitherreadyorstalledduetoanunresolveddependency. thanonemultithreadedcomputation.inthatcase,wesaytheprogramisnondeterministic.ifthesamemultithreadedcomputationisgeneratedbytheprogramontheinput Agivenmultithreadedprogramwhenrunonagiveninputcansometimesgeneratemore nomatterhowthecomputationisscheduled,thentheprogramisdeterministic.inthis cally,weshallnotworryabouthowthemultithreadedcomputationisgenerated.instead, weshallstudyitspropertiesinanaposteriorifashion. paper,weshallanalyzemultithreadedcomputations,notmultithreadedprograms.speci- thekindsofsyncrhonizationsthatcanoccurarerestricted.astrictmultithreadedcomputationisoneinwhichalljoinedgesfromathreadgotoanancestorofthethreadin Becausemultithreadedcomputationswitharbitrarydependenciescanbeimpossibleto scheduleeciently[10],westudysubclassesofgeneralmultithreadedcomputationsinwhich theactivationtree.inastrictcomputation,theonlyedgeintoasubtree(emanatingfrom itsargumentsareavailable,althoughtheargumentscanbegarneredinparallel.afully spawnedge(v2;v3).thus,strictnessmeansthatathreadcannotbeinvokedbeforeallof thecomputationoffigure1isstrict,andtheonlyedgeintothesubtreerootedat 2isthe outsidethesubtree)isthespawnedgethatspawnsthesubtree'srootthread.forexample, strictcomputationisoneinwhichalljoinedgesfromathreadgotothethread'sparent.a fullystrictcomputationis,inasense,a\well-structured"computation,inthatalljoinedges fromasubtree(ofthespawntree)emanatefromthesubtree'sroot.theexamplecompu- 5

6 tationoffigure1isfullystrict.anymultithreadedcomputationthatcanbeexecutedina depth-rstmanneronasingleprocessorcanbemadeeitherstrictorfullystrictbyaltering thedependencystructure,possiblyaectingtheachievableparallelism,butnotaectingthe semanticsofthecomputation[5]. lengthtobethelengthofalongestdirectedpathinthedag.ourexamplecomputation workofthecomputationtobethetotalnumberofinstructionsandthecritical-path computerintermsofthecomputation's\work"and\critical-pathlength."wedenethe WequantifyandboundtheexecutiontimeofacomputationonaP-processorparallel (Figure1)haswork23andcritical-pathlength10.Foragivencomputation,letT(X)denote thetimetoexecutethecomputationusingp-processorexecutionschedulex,andlet denotetheminimumexecutiontimewithpprocessors theminimumbeingtakenoverallpprocessorexecutionschedulesforthecomputation.thent1istheworkofthecomputation, TP=min XT(X) sincea1-processorcomputercanonlyexecuteoneinstructionateachstep,andt1isthe critical-pathlength,sinceevenwitharbitrarilymanyprocessors,eachinstructiononapath mustexecuteserially.noticethatwemusthavetpt1=p,becausepprocessorscan executeonlypinstructionspertimestep,andofcourse,wemusthavetpt1. provedin[10,20],extendstheseresultsminimallytoshowthatthisupperboundontpcan thisupperboundisuniversallyoptimaltowithinafactorof2.thefollowingtheorem, processorexecutionschedulesxwitht(x)t1=p+t1.asthesumoftwolowerbounds, EarlyworkondagschedulingbyBrent[13]andGraham[27,28]showsthatthereexistP- ready,thenallexecute. Pinstructionsareready,thenPinstructionsexecute,andiffewerthanPinstructionsare beobtainedbygreedyschedules:thoseinwhichateachstepoftheexecution,ifatleast executionschedulexachievest(x)t1=p+t1. T1andcritical-pathlengthT1,andforanynumberPofprocessors,anygreedyP-processor Theorem1(Thegreedy-schedulingtheorem)Foranymultithreadedcomputationwithwork Generally,weareinterestedinschedulesthatachievelinearspeedup,thatisT(X)= O(T1=P).Foragreedyschedule,linearspeedupoccurswhentheparallelism,whichwe denetobet1=t1,satisest1=t1=(p). stackdepthofathreadtobethesumofthesizesoftheactivationframesofallitsancestors, includingitself.thestackdepthofamultithreadedcomputationisthemaximumstack depthofanyofitsthreads.weshalldenotebys1theminimumamountofspacepossiblefor Toquantifythespaceusedbyagivenexecutionscheduleofacomputation,wedenethe any1-processorexecutionofamultithreadedcomputation,whichisequaltothestackdepth ofthecomputation.lets(x)denotethespaceusedbyap-processorexecutionschedule Xofamultithreadedcomputation.Weshallbeinterestedinthoseexecutionschedulesthat exhibitatmostlinearexpansionofspace,thatis,s(x)=o(s1p),whichisexistentially optimaltowithinaconstantfactor[10]. 6

7 Onceathread hasbeenspawnedinastrictcomputation,asingleprocessorcancomplete 3theexecutionoftheentiresubcomputationrootedat evenifnootherprogressismade Thebusy-leavesproperty stall.asweshallsee,thispropertyallowsanexecutionscheduletokeeptheleaves\busy." at thatisready.inparticular,noleafthreadinastrictmultithreadedcomputationcan untilthetime dies,thereisalwaysatleastonethreadfromthesubcomputationrooted onotherpartsofthecomputation.inotherwords,fromthetimethethread isspawned computationwithworkt1,critical-pathlengtht1,andstackdepths1,thereexistsapprocessorexecutionschedulexthatachievestimet(x)t1=p+t1andspaces(x)s1p Inthissection,weshowthatforanynumberPofprocessorsandanystrictmultithreaded Bycombiningthis\busy-leaves"propertywiththegreedyproperty,wederiveexecution schedulesthatsimultaneouslyexhibitlinearspeedupandlinearexpansionofspace. simultaneously.wegiveasimpleonlinep-processorparallelalgorithm thebusy-leaves thealgorithmhascomputedandexecutedtherstt 1stepsoftheexecutionschedule. randomizedwork-stealingalgorithmpresentedinsection4. Algorithm tocomputesuchaschedule.thissimplealgorithmwillformthebasisforthe revealedsofarintheexecutiontocomputeandexecutethetthstepoftheschedule.in Atthetthstep,thealgorithmusesonlyinformationfromtheportionofthecomputation TheBusy-LeavesAlgorithmoperatesonlineinthefollowingsense.Beforethetthstep, particular,itdoesnotuseanyinformationfrominstructionsnotyetexecutedorthreadsnot yetspawned. ThoughwedescribethealgorithmasaP-processorparallelalgorithm,weshallnotanalyzeit thisglobalpool,andwhenaprocessorneedswork,itremovesareadythreadfromthepool. isuniformlyavailabletoallpprocessors.whenspawnsoccur,newthreadsareaddedto TheBusy-LeavesAlgorithmmaintainsalllivingthreadsinasinglethreadpoolwhich contendingforaccesstothepool.infact,weshallonlyanalyzepropertiesoftheschedule itselfandignorethecostincurredbythealgorithmincomputingtheschedule.(scheduling assuch.specically,incomputingthetthstepoftheschedule,wealloweachprocessortoadd threadstothethreadpoolanddeletethreadsfromit.thus,weignoretheeectsofprocessors processoreitherisidleorhasathreadtoworkon.thoseprocessorsthatareidlebeginthe threadintheglobalthreadpoolandallprocessorsidle.atthebeginningofeachstep,each overheadswillbeanalyzedfortherandomizedwork-stealingalgorithm,however.) stepbyattemptingtoremoveanyreadythreadfromthepool.iftherearesucientlymany TheBusy-LeavesAlgorithmoperatesasfollows.Thealgorithmbeginswiththeroot readythreadsinthepooltosatisfyalloftheidleprocessors,theneveryidleprocessorgets thathasathreadtoworkonexecutesthenextinstructionfromthatthread.ingeneral, areadythreadtoworkon.otherwise,someprocessorsremainidle.then,eachprocessor tothefollowingrules. onceaprocessorhasathread,callit a,toworkon,itexecutesaninstructionfrom aat eachstepuntilthethreadeitherspawns,stalls,ordies,inwhichcase,itperformsaccording ➊Spawns:Ifthethread aspawnsachild b,thentheprocessornishesthecurrent stepbyreturning atothethreadpool.theprocessorbeginsthenextstepworking on b. 7

8 step threadpool processoractivity 321 1:v1 2:v3 p1v2 1:v16 p :v4 2:v6 4:v7 v5 6:v18 v17 v :v9 5:v10 v8 1:v21 2:v13 v :v15 1:v23 v11 v12 1:v22 v14 workedonandtheinstructionexecutedbyeachofthe2processors,p1andp2,ateachstep.living justaftereachidleprocessorhasremovedareadythread.italsoliststhereadythreadbeing putationoffigure1.thisscheduleliststhelivingthreadsintheglobalthreadpoolateachstep Figure2:A2-processorexecutionschedulecomputedbytheBusy-LeavesAlgorithmforthecom- threadsthatarereadyarelistedinbold.theotherlivingthreadsarestalled. ➋Stalls:Ifthethread astalls,thentheprocessornishesthecurrentstepbyreturning ➌Dies:Ifthethread adies,thentheprocessornishesthecurrentstepbycheckingto atothethreadpool.theprocessorbeginsthenextstepidle. idle. andnootherprocessorisworkingon b,thentheprocessortakes bfromthepool andbeginsthenextstepworkingon b.otherwise,theprocessorbeginsthenextstep seeif a'sparentthread bcurrentlyhasanylivingchildren.if bhasnolivechildren thebusy-leavesalgorithmonthecomputationoffigure1.rule➊:atstep2,processor p1workingonthread 1executesv2whichspawnsthechild 2,sop1places 1backinthe pool(tobepickedupatthebeginningofthenextstepbytheidlep2)andbeginsthenext Figure2illustratesthesethreerulesina2-processorexecutionschedulecomputedby 2executesv15and 2dies,sop1retrievestheparent 1fromthepoolandbeginsthenext 1stalls,sop2returns 1tothepoolandbeginsthenextstepidle(andremainsidlesince stepworkingon 2.Rule➋:Atstep8,processorp2workingonthread 1executesv21and stepworkingon 1. thethreadpoolcontainsnoreadythreads).rule➌:atstep13,processorp1workingon spawnsubtreeatanytimestepttobetheportionofthespawntreeconsistingofjust execution,everyleafinthe\spawnsubtree"hasaprocessorworkingonit.wedenethe LeavesAlgorithmmaintainsthebusy-leavesproperty:ateverytimestepduringthe Besidesbeinggreedy,foranystrictcomputation,theschedulecomputedbytheBusy- 8

9 thosethreadsthatarealiveatstept.torestatethebusy-leavesproperty,ateverytimestep, property,buteverystrictmultithreadedcomputationdoes.webeginbyshowingthatany nowprovethisfactandshowthatitimplieslinearexpansionofspace.itisworthnoting thatnoteverymultithreadedcomputationhasaschedulethatmaintainsthebusy-leaves everylivingthreadthathasnolivingdescendantshasaprocessorworkingonit.weshall schedulethatmaintainsthebusy-leavespropertyexhibitslinearexpansionofspace. Proof: schedulexthatmaintainsthebusy-leavespropertyusesspaceboundedbys(x)s1p. Lemma2ForanymultithreadedcomputationwithstackdepthS1,anyP-processorexecution andtherefore,thespaceinuseatanytimesteptisatmosts1p. mostpleaves.foreachsuchleaf,thespaceusedbyitandallofitsancestorsisatmosts1, Forschedulesthatmaintainthebusy-leavesproperty,theupperboundS1Pisconser- Thebusy-leavespropertyimpliesthatatalltimestepst,thespawnsubtreehasat vative.bychargings1spaceforeachbusyleaf,wemaybeovercharging.forsomecom- putations,byknowingthattheschedulepreservesthebusy-leavesproperty,wecanappeal directlytothefactthatthespawnsubtreeneverhasmorethanpleavestoobtaintight boundsonspaceusage[6]. Theorem3ForanynumberPofprocessorsandanystrictmultithreadedcomputationwith computesaschedulethatisbothgreedyandmaintainsthebusy-leavesproperty. Wenishthissectionbyshowingthatforstrictcomputations,theBusy-LeavesAlgorithm whosespacesatisess(x)s1p. ap-processorexecutionschedulexwhoseexecutiontimesatisest(x)t1=p+t1and workt1,critical-pathlengtht1,andstackdepths1,thebusy-leavesalgorithmcomputes Lemma2ifwecanshowthattheBusy-LeavesAlgorithmmaintainsthebusy-leavesproperty. Weprovethisfactbyinductiononthenumberofsteps.Attherststepofthealgorithm,the Proof: sincethebusy-leavesalgorithmcomputesagreedyschedule.thespaceboundfollowsfrom Thetimeboundfollowsdirectlyfromthegreedy-schedulingtheorem(Theorem1), eitherspawns,stalls,ordies.rule➊:if aspawnsachild b,then aisnotaleaf(evenifit aprocessorhasathread atoworkon,itexecutesinstructionsfromthatthreaduntilit onit.wemustshowthatallofthealgorithmrulespreservethebusy-leavesproperty.when spawnsubtreecontainsjusttherootthreadwhichisaleaf,andsomeprocessorisworking mayturnintoaleaf.inthiscase,theprocessorworkson bunlesssomeotherprocessor wasbefore)and bisaleaf.inthiscase,theprocessorworkson b,sothenewleafisbusy. alreadyis,sothenewleafisguaranteedtobebusy. Rule➋:If astalls,then acannotbealeafsinceinastrictcomputation,theunresolved dependencymustcomefromadescendant.rule➌:if adies,thenitsparentthread b ecientexecutionschedulesanddoesoperateonline,itsurelydoesnotdosoeciently, mustbecomputedecientlyonline,andthoughthebusy-leavesalgorithmdoescompute schedule,andweknowhowtondit.butthesefactstakeusonlysofar.executionschedules Wenowknowthateverystrictmultithreadedcomputationhasanecientexecution andinthefollowingsections,weprovethatitisbothecientandscalable. contendforaccess.inthenextsection,wepresentadistributedonlineschedulingalgorithm, isaconsequenceofemployingasinglecentralizedthreadpoolatwhichallprocessorsmust exceptpossiblyinthecaseofsmall-scalesymmetricmultiprocessors.thislackofscalability 9

10 4tithreadedcomputationsonaparallelcomputer.Also,wepresentanimportantstructural Inthissection,wepresentanonline,randomizedwork-stealingalgorithmforschedulingmul- Arandomizedwork-stealingalgorithm algorithmcausesatmostalinearexpansionofspace.thislemmareappearsinsection6to lemmawhichisusedattheendofthissectiontoshowthatforfullystrictcomputations,this showthatforfullystrictcomputations,thisalgorithmachieveslinearspeedupandgenerates existentiallyoptimalamountsofcommunication. Algorithmisdistributedacrosstheprocessors.Specically,eachprocessormaintainsaready Threadscanbeinsertedonthebottomandremovedfromeitherend.Aprocessortreats dequedatastructureofthreads.thereadydequehastwoends:atopandabottom. IntheWork-StealingAlgorithm,thecentralizedthreadpooloftheBusy-Leaves migratedtootherprocessorsareremovedfromthetop. deque.itstartsworkingonthethread,callit a,andcontinuesexecuting a'sinstructions itsreadydequelikeacallstack,pushingandpoppingfromthebottom.threadsthatare until aspawns,stalls,dies,orenablesastalledthread,inwhichcase,itperformsaccording tothefollowingrules. Ingeneral,aprocessorobtainsworkbyremovingthethreadatthebottomofitsready ➊Spawns:Ifthethread aspawnsachild b,then aisplacedonthebottomofthe ➋Stalls:Ifthethread astalls,itsprocessorchecksthereadydeque.ifthedeque containsanythreads,thentheprocessorremovesandbeginsworkonthebottommost readydeque,andtheprocessorcommencesworkon b. beginsworkonit.(thiswork-stealingstrategyiselaboratedbelow.) stealsthetopmostthreadfromthereadydequeofarandomlychosenprocessorand thread.ifthereadydequeisempty,however,theprocessorbeginsworkstealing:it ➌Dies:Ifthethread adies,thentheprocessorfollowsrule➋asinthecaseof a ➍Enables:Ifthethread aenablesastalledthread b,thenow-readythread bis placedonthebottomofthereadydequeof a'sprocessor. stalling. rule➍forthecasewhenathreadenablesastalledthread,theserulesareanalogoustothe rulesofthebusy-leavesalgorithm,andasweshallsee,rule➍isneededtoensurethatthe performrule➍forenablingandthenrule➋forstallingorrule➌fordying.exceptfor Athreadcansimultaneouslyenableastalledthreadandstallordie,inwhichcasewerst algorithmmaintainsimportantstructuralproperties,includingthebusy-leavesproperty. themultithreadedcomputationisplacedinthereadydequeofoneprocessor,whiletheother processorsstartworkstealing. TheWork-StealingAlgorithmbeginswithallreadydequesempty.Therootthreadof beginsworkonthetopthread.ifthevictim'sreadydequeisempty,however,thethieftries Thethiefqueriesthereadydequeofthevictim,andifitisnonempty,thethiefremovesand athiefandattemptstostealworkfromavictimprocessorchosenuniformlyatrandom. Whenaprocessorbeginsworkstealing,itoperatesasfollows.Theprocessorbecomes again,pickinganothervictimatrandom. 10

11 Γ k ready deque Γ spawnedachild.thedashededgesarethe\dequeedges"introducedinsection6. Figure3:Thestructureofaprocessor'sreadydeque.Theblackinstructionineachthreadindicates thethread'scurrentlyreadyinstruction.onlythread kmayhavebeenworkedonsinceitlast 2 Γ 1 Γ 0 executing Wenowstateandproveanimportantlemmaonthestructureofthreadsintheready thread timeandcommunication.figure3illustratesthelemma. usedlaterinthissectiontoanalyzeexecutionspaceandinsection6toanalyzeexecution dequeofanyprocessorduringtheexecutionofafullystrictcomputation.thislemmais Lemma4IntheexecutionofanyfullystrictmultithreadedcomputationbytheWork-Stealing thread.let 0bethethreadthatpisworkingon,letkbethenumberofthreadsinp'sready Algorithm,consideranyprocessorpandanygiventimestepatwhichpisworkingona inp'sreadydequesatisfythefollowingproperties: top,sothat 1isthebottommostand kisthetopmost.ifwehavek>0,thenthethreads deque,andlet 1; 2;:::; kdenotethethreadsinp'sreadydequeorderedfrombottomto ➀Fori=1;2;:::;k,thread iistheparentof i 1. Proof: ➁Ifwehavek>1,thenfori=1;2;:::;k 1,thread ihasnotbeenworkedonsince itspawned i 1. processorpexecutesaninstructionfromthread 0.Let 1; 2;:::; kdenotethekthreads therootthreadinsomeprocessor'sreadydequeandallotherreadydequesempty,sothe lemmavacuouslyholdsattheoutset.now,consideranystepofthealgorithmatwhich Theproofisastraightforwardinductiononexecutiontime.Executionbeginswith inp'sreadydequebeforethestep,andsupposethateitherk=0orbothpropertieshold. propertiesholdafterthestep. algorithmandshowthattheyallpreservethelemma.thatis,eitherk0=0orboth denotethek0threadsinp'sreadydequeafterthestep.wenowlookattherulesofthe Let 0denotethethread(ifany)beingworkedonbypafterthestep,andlet 01; 02;:::; 0k0 Property➀:Ifk0>1,thenforj=2;3;:::;k0,thread 0jistheparentof 0j 1,sincebefore andcommencesworkonthechild.thus, 0isthechild,wehavek0=k+1>0,and forj=1;2;:::;k0,wehave 0j= j 1.SeeFigure4.Now,wecancheckbothproperties. Rule➊:If 0spawnsachild,thenppushes 0ontothebottomofthereadydeque thespawnwehavek>0,whichmeansthatfori=1;2;:::;k,thread iistheparentof i 1. 11

12 Moreover, 01isobviouslytheparentof 0.Property➁:Ifk0>2,thenforj=2;3;:::;k0 1, spawnonlyjustoccurred. k>1,whichmeansthatfori=1;2;:::;k 1,thread ihasnotbeenworkedonsinceit thread 0jhasnotbeenworkedonsinceitspawned 0j 1,becausebeforethespawnwehave spawned i 1.Finally,thread 01hasnotbeenworkedonsinceitspawned 0,becausethe Γ k Γ 2 Γ Figure4:Thereadydequeofaprocessorbeforeandafterthethread 0thatitisworkingon 3 Γ 1 Γ spawnsachild.(notethatthethreads 0and 0arenotactuallyinthedeque;theyarethe (a)beforespawn. (b)afterspawn. 2 Γ 0 Γ 1 Γ readydequeisempty,sotheprocessorcommencesworkstealing,andwhentheprocessor threadsbeingworkedonbeforeandafterthespawn.) stealsandbeginsworkonathread,wehavek0=0.ifk>0,thereadydequeisnot empty,sotheprocessorpopsthebottommostthreadothedequeandcommencesworkon Rules➋and➌:If 0stallsordies,thenwehavetwocasestoconsider.Ifk=0,the 0 Forj=1;2;:::;k0,thread 0jistheparentof 0j 1,sincefori=1;2;:::;k,thread iisthe have 0j= j+1.seefigure5.now,ifk0>0,wecancheckbothproperties.property➀: parentof i 1.Property➁:Ifk0>1,thenforj=1;2;:::;k0 1,thread 0jhasnotbeen it.thus,wehave 0= 1(thepoppedthread)andk0=k 1,andforj=1;2;:::;k0,we meansthatfori=2;3;:::;k 1,thread ihasnotbeenworkedonsinceitspawned i 1. workedonsinceitspawned 0j 1,becausebeforethestallordeathwehavek>2,which Γ k Γ k Γ k Γ 2 Γ (Notethatthethreads 0and 0arenotactuallyinthedeque;theyarethethreadsbeingworked Figure5:Thereadydequeofaprocessorbeforeandafterthethread 0thatitisworkingondies. (a)beforedeath. (b)afterdeath. 1 Γ 1 Γ 0 Γ viouslystalledthreadmustbe 0'sparent.First,weobservethatwemusthavek=0.If onbeforeandafterthedeath.) Rule➍:If 0enablesastalledthread,thenduetothefullystrictcondition,thatpre- 0 12

13 thebottomofthereadydeque.wehave 0= 0andk0=k+1=1with 01denotingthe apply.withk=0,thereadydequeisemptyandtheprocessorplacestheparentthreadon bebottommostinthereadydeque.thus,thisparentthreadisreadyandrule➍doesnot wehavek>0,thentheprocessor'sreadydequeisnotempty,andthisparentthreadmust newlyenabledparent.weonlyhavetochecktherstproperty.property➀:thread 01is afterthestealwehavek0=k 1.Ifk0>0holds,thenbothpropertiesareclearlypreserved. obviouslytheparentof 0. notinvokeanyoftheaboverules clearlypreservethelemma. Allotheractionsbyprocessorp suchasworkstealingorexecutinganinstructionthatdoes Ifsomeotherprocessorstealsathreadfromprocessorp,thenwemusthavek>0,and k 1andbroughtbacktoprocessorp'sreadydeque.Thekeyobservationisthatwhen kis kisstolenfromprocessorpandthenstallsonitsnewprocessor.later, kisreenabledby workedonsinceitspawned k 1,sinceProperty➁excludes k.thissituationariseswhen Beforemovingon,itisworthpointingouthowitmayhappenthatthread khasbeen k 2; k 3;:::; 0showninFigure3werespawnedafter kwasreenabled. reenabled,processorp'sreadydequeisemptyandpisworkingon k 1.Theotherthreads Theorem5ForanyfullystrictmultithreadedcomputationwithstackdepthS1,theWork- executingafullystrictcomputation. WeconcludethissectionbyboundingthespaceusedbytheWork-StealingAlgorithm Proof: StealingAlgorithmrunonacomputerwithPprocessorsusesatmostS1Pspace. hasaprocessorworkingonit.ifwecanestablishthisfact,thenlemma2completesthe proof. leavesproperty:ateverytimestepoftheexecution,everyleafinthecurrentspawnsubtree LiketheBusy-LeavesAlgorithm,theWork-StealingAlgorithmmaintainsthebusy- ofsomeprocessor.butlemma4guaranteesthatnoleafthreadsitsinaprocessor'sready readyandthereforemusteitherhaveaprocessorworkingonitorbeinthereadydeque sequenceoflemma4.ateverytimestep,everyleafinthecurrentspawnsubtreemustbe ThattheWork-StealingAlgorithmmaintainsthebusy-leavespropertyisasimplecon- dequewhiletheprocessorworksonsomeotherthread. whenmultiplethiefprocessorssimultaneouslyattempttostealfromthesamevictim. however,wemusttakecaretodeneamodelforcopingwiththecontentionthatmayarise nicationboundsforthework-stealingalgorithm.beforewecanproceedwiththisanalysis, Withthespaceboundinhand,wenowturnattentiontoanalyzingthetimeandcommu- executionofamultithreadedcomputationbythework-stealingalgorithm.weintroduce Thissectionpresentsthe\atomic-access"modelthatweusetoanalyzecontentionduringthe 5 Atomicaccessesandtherecyclinggame incurredbyrandom,asynchronousaccessesinthismodel.weshallusetheresultsofthis acombinatorial\ballsandbins"game,whichweusetoboundthetotalamountofdelay sectioninsection6,whereweanalyzethework-stealingalgorithm. Algorithm.WeassumethatthemachineisanasynchronousparallelcomputerwithP Theatomic-accessmodelisthemachinemodelweusetoanalyzetheWork-Stealing 13

14 themodelofkarpandzhang[33].theyassumethatifconcurrentstealrequestsaremade theatomicmessage-passingmodelof[36].thisassumptionismorestringentthanthatin processors,anditsmemorycanbeeitherdistributedorshared.ouranalysisassumesthat toadeque,inonetimestep,onerequestissatisedandalltheothersaredenied.inthe concurrentaccessestothesamedatastructureareseriallyqueuedbyanadversary,asin Theonlyconstraintontheadversaryisthatifthereisatleastonerequestforadeque,then byanadversary,ratherthanbeingdenied.moreover,fromthecollectionofwaitingrequests foragivendeque,theadversarygetstochoosewhichisservicedandwhichcontinuetowait. atomic-accessmodel,wealsoassumethatonerequestissatised,buttheothersarequeued theadversarycannotchoosethatnonebeserviced. islikelytobeproportionaltothetotalnumbermofrequests,nomatterwhichprocessors processorstopdequeswitheachprocessorallowedatmostoneoutstandingrequest,then thetotalamountoftimethattheprocessorsspendwaitingfortheirrequeststobesatised ThemainresultofthissectionistoshowthatifrequestsaremaderandomlybyP maketherequestsandnomatterhowtherequestsaredistributedovertime.inorderto bytheadversary. provethisresult,weintroducea\ballsandbins"gamethatmodelstheeectsofqueueing executedbytheadversary.initially,allpballsareinareservoirseparatefromthepbins. whichisequaltothenumberofbins.theparametermisthetotalnumberofballtosses ballsaretossedatrandomintobins.theparameterpisthenumberofballsinthegame, The(P;M)-recyclinggameisacombinatorialgameplayedbytheadversary,inwhich Ateachstepofthegame,theadversaryexecutesthefollowingtwooperationsinsequence: 1.Theadversarychoosessomeoftheballsinthereservoir(possiblyallandpossibly none),andthenforeachoftheseballs,theadversaryremovesitfromthereservoir, 2.TheadversaryinspectseachofthePbinsinturn,andforeachbinthatcontainsat selectsoneofthepbinsuniformlyandindependentlyatrandom,andtossestheball leastoneball,theadversaryremovesanyoneoftheballsinthebinandreturnsitto intoit. tosseshavebeenmadeandallballshavebeenremovedfromthebinsandplacedbackinthe TheadversaryispermittedtomakeatotalofMballtosses.ThegameendswhenMball thereservoir. reservoir. isinthereservoir,itmeansthattheball'sownerisnotmakingastealrequest.ifaballis rithm.wecanvieweachballandeachbinasbeingownedbyadistinctprocessor.ifaball inabin,itmeansthattheball'sownerhasmadeastealrequesttothedequeofthebin's TherecyclinggamemodelstheservicingofstealrequestsbytheWork-StealingAlgo- andreturnedtothereservoir,itmeansthattherequesthasbeenserviced. owner,butthattherequesthasnotyetbeensatised.whenaballisremovedfromabin adversaryistomakethetotaldelayaslargeaspossible.thenextlemmashowsthatdespite delayd=ptt=1nt,wheretisthetotalnumberofstepsinthegame.thegoalofthe correspondtostealrequeststhathavenotbeensatised.weshallbeinterestedinthetotal Aftereachsteptofthegame,therearesomenumberntofballsleftinthebins,which 14

15 Lemma6Forany>0,withprobabilityatleast1,thetotaldelayinthe(P;M)-recycling tothereservoir,thetotaldelayisunlikelytobelarge. thechoicesthattheadversarymakesaboutwhichballstotossintobinsandwhichtoreturn modeliso(m+plgp+plg(1=))withprobabilityatleast1,andtheexpectedtotaldelay isatmostm. thetotaldelayincurredbymrandomrequestsmadebypprocessorsintheatomic-access gameiso(m+plgp+plg(1=)).1theexpectedtotaldelayisatmostm.inotherwords, ballfromeachbinisimmaterial,andthus,wecanassumethatballsarequeuedintheirbins Proof: andwhentheadversarytossesaball,itisplacedonthebackofthequeue.ifseveralballs inarst-in-rst-out(fifo)order.theadversaryremovesballsfromthefrontofthequeue, Werstmaketheobservationthatthestrategybywhichtheadversarychoosesa aretossedintothesamebinatthesamestep,theycanbeplacedonthebackofthequeue ballistossed. inanyorder.thereasonthatassumingafifodisciplineforqueuingballsinabindoesnot aectthetotaldelayisthatthenumberofballsinagivenbinatagivenstepisthesame nomatterwhichballisremoved,andwhereballsaretossedhasnothingtodowithwhich totalnumberofstepsthatnishwithballrinabin.then,wehave orinthereservoir.denethedelayofballrtobetherandomvariablerdenotingthe Foranygivenballandanygivenstep,thestepeithernisheswiththetheballinabin ithtimeitistosseduntilitisreturnedtothereservoir.denealsotheithdelayofaball Denetheithcycleofaballtobethosestepsinwhichtheballremainsinabinfromthe D=PXr=1r: (1) tobethenumberofstepsinitsithcycle. have=pmi=1di. ofball1.ifweletmdenotethenumberoftimesthatball1istossedbytheadversary,and fori=1;2;:::;m,letdibetherandomvariabledenotingtheithdelayofball1,thenwe Weshallanalyzethetotaldelaybyfocusing,withoutlossofgenerality,onthedelay=1 byanotherballreitheronceornotatall.consequently,wecandecomposeeachrandom theadversaryfollowsthefiforule,itfollowsthattheithcycleofball1canbedelayed placesitinsomebinkandballrisremovedfrombinkduringtheithcycleofball1.since Wesaythattheithcycleofball1isdelayedbyanotherballriftheithtossofball1 variablediintoasumdi=xi2+xi3++ximofindicatorrandomvariables,where Thus,wehave xir=(1iftheithcycleofball1isdelayedbyballr; 0otherwise. 1=isatmostpolynomialinMandP[40]. 1GregPlaxtonoftheUniversityofTexas,AustinhasimprovedthisboundtoO(M)forthecasewhen =mxi=1pxr=2xir: (2) 15

16 delayedbyballr.foranysuchsets,weclaimthat setsofpairs(i;r),eachofwhichcorrespondstotheeventthattheithcycleofball1is Wenowproveanimportantpropertyoftheseindicatorrandomvariables.Considerany Thecruxofprovingtheclaimistoshowthat Pr8<:^ (i;r)2s(xir=1)9=;p jsj: (3) wheres0=s f(i;r)g,whencetheclaim(3)followsfrombayes'stheorem. Pr8<:xir=1^ (i0;r0)2s0(xi0r0=1)9=;1=p; (4) withprobabilityeither1=por0,andhence,withprobabilityatmost1=p.conditioningon tossofball1,itfallsintowhateverbincontainsballr,ifany.apriori,thiseventhappens saryfollowsthefiforule,wehavethatxir=1onlyif,whentheadversaryexecutestheith WecanderiveInequality(4)fromacarefulanalysisofdependencies.Becausetheadver- tellsnothingaboutwheretheithtossofball1goes.therefore,theserandomvariablesare creasethisprobability,aswenowargueintwocases.intherstcase,theindicatorrandom variablesxi0r0,wherei06=i,tellwhetherothercyclesofball1aredelayed.thisinformation anycollectionofeventsrelatingwhichballsdelaythisorothercyclesofball1cannotin- independentofxir,andthus,theprobability1=pupperboundisnotaected.inthesecond containingballr0,butthisinformationtellsusnothingaboutwhetheritgoestothebin case,theindicatorrandomvariablesxir0tellwhethertheithtossofball1goestothebin randballr0arelocated.moreover,no\collusion"amongtheindicatorrandomvariables providesanymoreinformation,andthusinequality(4)holds. containingballr,becausetheindicatorrandomvariablestellusnothingtorelatewhereball orexceedagivenvalue,theremustbesomesetcontainingoftheseindicatorrandom canbeexpressesasasumofm(p 1)indicatorrandomvariables.Inorderfortoequal variables,eachofwhichmustbe1.foranyspecicsuchset,inequality(3)saysthatthe Equation(2)showsthatthedelayencounteredbyball1throughoutallofitscycles probabilityisatmostp thatallrandomvariablesinthesetare1.sincethereare m(p 1) (emp=)suchsets,whereeisthebaseofthenaturallogarithm,wehave PrfgemP =em P whenevermaxf2em;lgp+lg(1=)g. Althoughouranalysiswasperformedforball1,itappliestoanyotherballaswell. =P; exceedsmaxf2emr;lgp+lg(1=)gisatmost=p.byboole'sinequalityandequation(1), Consequently,foranygivenballrwhichistossedmrtimes,theprobabilitythatitsdelayr 16

17 itfollowsthatwithprobabilityatleast1,thetotaldelaydisatmost DPXr=1maxf2emr;lgP+lg(1=)g sincem=ppr=1mr. TheupperboundE[D]Mcanbeobtainedasfollows.Recallthateachristhe =(M+PlgP+Plg(1=)); sumof(p 1)mrindicatorrandomvariables,eachofwhichhasexpectationatmost1=P. turnbacktothework-stealingalgorithm. linearityofexpectation,weobtaine[d]m. Therefore,bylinearityofexpectation,E[r]mr.UsingEquation(1)andagainusing WiththisboundonthetotaldelayincurredbyMrandomrequestsnowinhand,we 6tithreadedcomputationwiththeWork-StealingAlgorithm.Foranyfullystrictcomputation Inthissection,weanalyzethetimeandcommunicationcostofexecutingafullystrictmul- Analysisofthework-stealingalgorithm withworkt1andcritical-pathlengtht1,weshowthattheexpectedrunningtimewith Pprocessors,includingschedulingoverhead,isT1=P+O(T1).Moreover,forany>0, theexecutiontimeonpprocessorsist1=p+o(t1+lgp+lg(1=)),withprobabilityat fullystrictcomputationiso(pt1(1+nd)smax),wherendisthemaximumnumberofjoin least1.wealsoshowthattheexpectedtotalcommunicationduringtheexecutionofa edgesfromathreadtoitsparentandsmaxisthelargestsizeofanyactivationframe. victimsimultaneously.inthiscase,aswehaveindicatedintheprevioussection,wemake isdistributed,andsothereisnocontentionatacentralizeddatastructure.nevertheless,it isstillpossibleforcontentiontoarisewhenseveralthieveshappentodescendonthesame UnlikeintheBusy-LeavesAlgorithm,the\readypool"intheWork-StealingAlgorithm work-stealingresponsetakesanyconstantamountoftime. request.thisassumptioncanberelaxedwithoutmateriallyaectingtheresultssothata Wefurtherassumethatittakesunittimeforaprocessortorespondtoawork-stealing theconservativeassumptionthatanadversaryseriallyqueuesthework-stealingrequests. dollars,onefromeachprocessor.ateachstep,eachprocessorplacesitsdollarinoneof multithreadedcomputationwithworkt1andcritical-pathlengtht1onacomputerwith Pprocessors,weuseanaccountingargument.Ateachstepofthealgorithm,wecollectP ToanalyzetherunningtimeoftheWork-StealingAlgorithmexecutingafullystrict threebucketsaccordingtoitsactionsatthatstep.iftheprocessorexecutesaninstruction bucket.weshallderivetherunning-timeboundbyboundingthenumberofdollarsineach merelywaitsforaqueuedstealrequestatthestep,thenitplacesitsdollarintothewait atthestep,thenitplacesitsdollarintotheworkbucket.iftheprocessorinitiatesasteal bucketattheendoftheexecution,summingthesethreebounds,andthendividingbyp. attemptatthestep,thenitplacesitsdollarintothestealbucket.and,iftheprocessor WerstboundthetotalnumberofdollarsintheWorkbucket. 17

18 Lemma7TheexecutionofafullystrictmultithreadedcomputationwithworkT1bythe Proof:AprocessorplacesadollarintheWorkbucketonlywhenitexecutesaninstruction. intheworkbucket. Work-StealingAlgorithmonacomputerwithPprocessorsterminateswithexactlyT1dollars Thus,sincethereareT1instructionsinthecomputation,theexecutionendswithexactlyT1 tempts,andwemustalsodeneanaugmenteddagthatwethenusetodene\critical" \delay-sequence"argument.werstintroducethenotionofa\round"ofwork-stealat- dollarsintheworkbucket. instructions.theideaisasfollows.if,duringthecourseoftheexecution,alargenumberof BoundingthetotaldollarsintheStealbucketrequiresasignicantlymoreinvolved stealsareattempted,thenwecanidentifyasequenceofinstructions thedelaysequence in theaugmenteddagsuchthateachofthesestealattemptswasinitiatedwhilesomeinstructionfromthesequencewascritical.wethenshowthatacriticalinstructionisunlikelyto remaincriticalacrossamodestnumberofstealattempts.wecanthenconcludethatsuch adelaysequenceisunlikelytooccur,andtherefore,anexecutionisunlikelytosueralarge attemptssuchthatifastealattemptthatisinitiatedattimesteptoccursinaparticular round,thenallotherstealattemptsinitiatedattimesteptarealsointhesameround.we canpartitionallofthestealattemptsthatoccurduringanexecutionintoroundsasfollows. Aroundofstealattemptsisasetofatleast3Pbutfewerthan4Pconsecutivesteal numberofstealattempts. therstroundstartsattimestep1andendsattimestept1.ingeneral,iftheithround endsattimestepti,thenthe(i+1)stroundbeginsattimestepti+1andendsatthe Therstroundcontainsallstealattemptsinitiatedattimesteps1;2;:::;t1,wheret1isthe earliesttimesuchthatatleast3pstealattemptswereinitiatedatorbeforet1.wesaythat denition,eachroundcontainsatleast3pconsecutivestealattempts.moreover,sinceat mostp 1stealattemptscanbeinitiatedinasingletimestep,eachroundcontainsfewer stepsbetweenti+1andti+1,inclusive.thesestealattemptsbelongtoroundi+1.by earliesttimestepti+1>ti+1suchthatatleast3pstealattemptswereinitiatedattime anaugmenteddagobtainedbymodifyingtheoriginaldagslightly.letgdenotetheoriginal than4p 1stealattempts,andeachroundtakesatleast4steps. spawn,andjoinedgesasedges.theaugmenteddagg0istheoriginaldaggtogetherwith dag,thatis,thedagconsistingofthecomputation'sinstructionsasverticesanditscontinue, Thesequenceofinstructionsthatmakeupthedelaysequenceisdenedwithrespectto dequeedgesareshowndashedinfigure3.insection2wemadethetechnicalassumption spawnedgeand(u;w)isacontinueedge,thedequeedge(w;v)isplaceding0.these somenewedges,asfollows.foreverysetofinstructionsu,v,andwsuchthat(u;v)isa outthatg0isonlyananalyticaltool.thedequeedgeshavenoeectontheschedulingand executionofthecomputationbythework-stealingalgorithm. longestpathing,thenthelongestpathing0haslengthatmost2t1.itisworthpointing thatinstructionwhasnoincomingjoinedges,andsog0isadag.ift1isthelengthofa structionwsuchthatthereisadirectedpathfromwtoving0,instructionwhasbeen theexecution,wesaythatanunexecutedinstructionviscriticalifeveryinstructionthat precedesv(eitherdirectlyorindirectly)ing0hasbeenexecuted,thatis,ifforeveryin- Thedequeedgesarethekeytodeningcriticalinstructions.Atanytimestepduring 18

19 readyinstructionmayormaynotbecritical.intuitively,thestructuralpropertiesofaready executed.acriticalinstructionmustbeready,sinceg0containseveryedgeofg,buta instructionacrossthedequeedgehasnotyetbeenexecuted. dequeenumeratedinlemma4guaranteethatifathreadisdeepinareadydeque,then itscurrentinstructioncannotbecritical,becausethepredecessorofthethread'scurrent Denition8Adelaysequenceisa3-tuple(U;R;)satisfyingthefollowingconditions: U=(u1;u2;:::;uL)isamaximaldirectedpathinG0.Specically,fori=1;2;:::;L Wenowformalizeourdenitionofadelaysequence. structionu1mustbetherstinstructionoftherootthread),andinstructionulhasno outgoingedgesing0(instructionulmustbethelastinstructionoftherootthread). 1,theedge(ui;ui+1)belongstoG0,instructionu1hasnoincomingedgesinG0(in- Risapositiveintegernumberofsteal-attemptrounds. =(1;01;2;02;:::;L;0L)isapartitionofR(thatisR=PLi=1(i+0i)),such ofthepartitioncorrespondstotherst1rounds.thesecondpiececorrespondstothenext ThepartitioninducesapartitionofasequenceofRroundsasfollows.Therstpiece that0i2f0;1gforeachi=1;2;:::;l. tobetheiconsecutiveroundsstartingaftertherithround,whereri=pi 1 inthepiecescorrespondingtothei,notthe0i,andsowedenetheithgroupofrounds consecutiveroundsaftertherst(1+01)rounds,andsoon.weareinterestedprimarily 01consecutiveroundsaftertherst1rounds.Thethirdpiececorrespondstothenext2 BecauseisapartitionofRand0i2f0;1g,fori=1;2;:::;L,wehave LXi=1iR L: j=1(j+0j). ofthestealattemptsthatcomprisetheroundareinitiatedattimestepswhenviscritical. Wesaythatagivenroundofstealattemptsoccurswhileinstructionviscriticalifall (5) rounds. occurwhileinstructionuiiscritical.inotherwords,uimustbecriticalthroughoutalli issaidtooccurduringanexecutionifforeachi=1;2;:::;l,alliroundsintheithgroup Inotherwords,vmustbecriticalthroughouttheentireround.Adelaysequence(U;R;) G0andapartition=(1;01;2;02;:::;L;0L)oftherstRrounds,suchthatforeach thensomedelaysequence(u;r;)mustoccur.inparticular,ifwelookatanyexecutionin whichatleastrroundsoccur,thenwecanidentifyapathu=(u1;u2;:::;ul)inthedag ThefollowinglemmastatesthatifatleastRroundstakeplaceduringanexecution, Sucharoundcannotbepartofanygroup,becausenoinstructioniscriticalthroughout. whetheruiiscriticalatthebeginningofaroundbutgetsexecutedbeforetheroundends. i=1;2;:::l,alloftheiroundsintheithgroupoccurwhileuiiscritical.each0iindicates occur. 4PRstealattemptsoccurduringtheexecution,thensomedelaysequence(U;R;)must pathlengtht1bythework-stealingalgorithmonacomputerwithpprocessors.ifatleast Lemma9Considertheexecutionofafullystrictmultithreadedcomputationwithcritical- 19

20 instructionsonadirectedpathing0suchthatforeverytimestepduringtheexecution, Proof: adelaysequence(u;r;)andshowthatitoccurs.withatleast4prstealattempts,there mustbeatleastrrounds.weconstructthedelaysequencebyrstidentifyingasetof Foragivenexecutioninwhichatleast4PRstealattemptstakeplace,weconstruct oneoftheseinstructionsiscritical.then,wepartitiontherstrroundsaccordingtowhen eachroundoccursrelativetowheneachinstructiononthepathiscritical. whichwedenotebyv1.letvl1denotea(notnecessarilyimmediate)predecessorinstruction ofv1ing0withthelatestexecutiontime.let(vl1;:::;v2;v1)denoteadirectedpathfrom vl1tov1ing0.weextendthispathbacktotherstinstructionoftherootthreadby ToconstructthepathU,weworkbackwardsfromthelastinstructionoftherootthread, ing0.wenishiteratingtheconstructionwhenwegettoaniterationkinwhichvlkisthe latestexecutiontime,andlet(vli+1;:::;vli+1;vli)denoteadirectedpathfromvli+1tovli directedpathing0fromvlitov1.weletvli+1denoteapredecessorofvliing0withthe iteratingthisconstructionasfollows.attheithiterationwehaveaninstructionvlianda rstinstructionoftherootthread.ourdesiredsequenceisthenu=(u1;u2;:::;ul),where L=lkandui=vL i+1fori=1;2;:::;l.onecanverifythatateverytimestepofthe execution,oneofthevliiscritical. oftherstrroundsaccordingtowheneachroundoccurs.wewouldlikeourpartitionto besuchthatforeachround(amongtherstrrounds),wehavethepropertythatifthe roundoccurswhilesomeinstructionuiiscritical,thentheroundbelongstotheithgroup. Now,toconstructthepartition=(1;01;2;02;:::;L;0L),wepartitionthesequence theseroundsareconsecutiveatthebeginningofthesequence,sotheseroundscomprisethe Startwith1,andlet1equalthenumberofroundsthatoccurwhileu1iscritical.Allof 1stgroup thatis,theyarethe1consecutiveroundsstartingafterther1=0rstrounds. Next,iftheroundthatimmediatelyfollowsthoserst1roundsbeginsafteru1hasbeen criticalandendsafteru1isexecuted(forotherwise,itwouldbepartoftherstgroup),so executed,thenweset01=0,andwegoonto2.otherwise,thatroundbeginswhileu1is weset01=1,andwegoonto2.for2,welet2equalthenumberofroundsthatoccur thenumberofroundsthatbeginwhileuiiscriticalbutdonotenduntilafteruiisexecuted. lettingeachibethenumberofroundsthatoccurwhileuiiscriticalandlettingeach0ibe r2=1+01rounds,sotheseroundscomprisethe2ndgroup.wecontinueinthisfashion, whileu2iscritical.notethatalloftheseroundsareconsecutivebeginningaftertherst Asanexample,wemayhavearoundthatbeginswhileuiiscriticalandthenendswhile sequenceandthatitoccurs.byconstruction,uisamaximalpathing0.nowconsidering ui+2iscritical,andinthiscase,weset0i=1and0i+1=0.inthisexample,the(i+1)st groupisempty,sowealsoseti+1=0.,weobservethateachroundamongtherstrroundsiscountedexactlyonceineither aiora0i,soisindeedapartitionofr.moreover,fori=1;2;:::;l,atmostone Weconcludetheproofbyverifyingthatthe(U;R;)asjustconstructedisadelay uiiscritical.therefore,thedelaysequence(u;r;)occurs. fori=1;2;:::;l,theiroundsthatcomprisetheithgroupalloccurwhiletheinstruction 0i2f0;1g.Thus,(U;R;)isadelaysequence.Finally,weobservethat,byconstruction, roundcanbeginwhiletheinstructionuiiscriticalandendafteruiisexecuted,sowehave numberofrounds.specically,werstshowthatacriticalinstructionmustbetheready Wenowestablishthatacriticalinstructionisunlikelytoremaincriticalacrossamodest 20

M.S. in Business and Information Systems

M.S. in Business and Information Systems 1 M.S. in Business and Information Systems (30 Credits) M.S. in Business and Information Systems (courses only) Bridge Courses Select one of the following: 3 CS 100 Roadmap to Computing CS 113 Introduction

More information

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva robert.geva@intel.com Introduction Intel

More information

In Proceedings of Performance Tools 98 Lecture Notes in Computer Science, Vol. 1468, pp. 231-242, September 1998

In Proceedings of Performance Tools 98 Lecture Notes in Computer Science, Vol. 1468, pp. 231-242, September 1998 In Proceedings of Performance Tools 98 OnChoosingaTaskAssignmentPolicyfora Lecture Notes in Computer Science, Vol. 1468, pp. 231-242, September 1998 MorHarchol-Balter?;1,MarkE.Crovella??;2,andCristinaD.Murta???;2

More information

Tutorial 8. NP-Complete Problems

Tutorial 8. NP-Complete Problems Tutorial 8 NP-Complete Problems Decision Problem Statement of a decision problem Part 1: instance description defining the input Part 2: question stating the actual yesor-no question A decision problem

More information

An Oracle White Paper July 2012. Load Balancing in Oracle Tuxedo ATMI Applications

An Oracle White Paper July 2012. Load Balancing in Oracle Tuxedo ATMI Applications An Oracle White Paper July 2012 Load Balancing in Oracle Tuxedo ATMI Applications Introduction... 2 Tuxedo Routing... 2 How Requests Are Routed... 2 Goal of Load Balancing... 3 Where Load Balancing Takes

More information

2.0. Specification of HSN 2.0 JavaScript Static Analyzer

2.0. Specification of HSN 2.0 JavaScript Static Analyzer 2.0 Specification of HSN 2.0 JavaScript Static Analyzer Pawe l Jacewicz Version 0.3 Last edit by: Lukasz Siewierski, 2012-11-08 Relevant issues: #4925 Sprint: 11 Summary This document specifies operation

More information

Bachelor of Technology (Computer Engineering.) Scheme of Courses/Examination. (3 rd SEMESTER) 1 HUT-211 Organizational Behaviour 2 1-3 60 40-100 3 2.

Bachelor of Technology (Computer Engineering.) Scheme of Courses/Examination. (3 rd SEMESTER) 1 HUT-211 Organizational Behaviour 2 1-3 60 40-100 3 2. Bachelor of Technology (Computer Engineering.) Scheme of s/examination Sl. (3 rd SEMESTER) Teaching Schedule Examination Schedule 1 HUT-211 Organizational Behaviour 2 1-3 60 40-100 3 2.5 2 COT-201 Programming

More information

Code and Process Migration! Motivation!

Code and Process Migration! Motivation! Code and Process Migration! Motivation How does migration occur? Resource migration Agent-based system Details of process migration Lecture 6, page 1 Motivation! Key reasons: performance and flexibility

More information

Java SE 7 Programming

Java SE 7 Programming Java SE 7 Programming The second of two courses that cover the Java Standard Edition 7 (Java SE 7) Platform, this course covers the core Application Programming Interfaces (API) you will use to design

More information

Job Scheduling Model

Job Scheduling Model Scheduling 1 Job Scheduling Model problem scenario: a set of jobs needs to be executed using a single server, on which only one job at a time may run for theith job, we have an arrival timea i and a run

More information

MANAGEMENT CONSOLE PERFORMANCE COMPARISON

MANAGEMENT CONSOLE PERFORMANCE COMPARISON MANAGEMENT CONSOLE PERFORMANCE COMPARISON Performance comparison between Blancco Management Console 3.3.2 and 3.6.0 www.blanccotechnologygroup.com TABLE OF CONTENTS 1. Overview... 3 Method... 3 2. Performance

More information

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

More information

Oracle Health Sciences Network Patient Recruiter Cloud Service - Overview

Oracle Health Sciences Network Patient Recruiter Cloud Service - Overview Oracle Health Sciences Network Patient Recruiter Cloud Service Services Description Version: 2.0 Effective Date: 01-April-2013 Oracle Health Sciences Network Patient Recruiter Cloud Service - Services

More information

Dynamic Thread Pool based Service Tracking Manager

Dynamic Thread Pool based Service Tracking Manager Dynamic Thread Pool based Service Tracking Manager D.V.Lavanya, V.K.Govindan Department of Computer Science & Engineering National Institute of Technology Calicut Calicut, India e-mail: lavanya.vijaysri@gmail.com,

More information

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun EECS 750: Advanced Operating Systems 01/28 /2015 Heechul Yun 1 Recap: Completely Fair Scheduler(CFS) Each task maintains its virtual time V i = E i 1 w i, where E is executed time, w is a weight Pick the

More information

Technology: Enterprise Storage

Technology: Enterprise Storage Accelerated Capital Allowances Eligibility Criteria Category: Information and Communications Technology (ICT) Technology: Enterprise Storage Enterprise Storage equipment is defined as a storage device

More information

Capacity Scheduler Guide

Capacity Scheduler Guide Table of contents 1 Purpose...2 2 Features... 2 3 Picking a task to run...2 4 Installation...3 5 Configuration... 3 5.1 Using the Capacity Scheduler... 3 5.2 Setting up queues...3 5.3 Configuring properties

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

TIME VALUE OF MONEY. Return of vs. Return on Investment: We EXPECT to get more than we invest!

TIME VALUE OF MONEY. Return of vs. Return on Investment: We EXPECT to get more than we invest! TIME VALUE OF MONEY Return of vs. Return on Investment: We EXPECT to get more than we invest! Invest $1,000 it becomes $1,050 $1,000 return of $50 return on Factors to consider when assessing Return on

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Application. Performance Testing

Application. Performance Testing Application Performance Testing www.mohandespishegan.com شرکت مهندش پیشگان آزمون افسار یاش Performance Testing March 2015 1 TOC Software performance engineering Performance testing terminology Performance

More information

Curriculum. for the Master s degree programme. Applied Informatics. Programme code L 066 911. Effective date: 1 st of October 2013

Curriculum. for the Master s degree programme. Applied Informatics. Programme code L 066 911. Effective date: 1 st of October 2013 APPENDIX 1 to the University Bulletin Issue 20, No. 159.1-2012/2013, 19.06.2013 Curriculum for the Master s degree programme Applied Informatics Programme code L 066 911 Effective date: 1 st of October

More information

Memory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor

Memory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor WHITE PAPER Memory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor Yu Bai Staff Engineer, APSE Marvell November 2011 www.marvell.com Introduction: Nowadays,

More information

Deadlock Detection and Recovery!

Deadlock Detection and Recovery! Deadlock Detection and Recovery! Richard M. Fujimoto! Professor!! Computational Science and Engineering Division! College of Computing! Georgia Institute of Technology! Atlanta, GA 30332-0765, USA!! http://www.cc.gatech.edu/~fujimoto/!

More information

Load Balancing & Termination

Load Balancing & Termination Load Load & Termination Lecture 7 Load and Termination Detection Load Load Want all processors to operate continuously to minimize time to completion. Load balancing determines what work will be done by

More information

CASE STUDY: Do Press Releases Work?

CASE STUDY: Do Press Releases Work? CASE STUDY: Do Press Releases Work? Do Press Releases Work? Case Study: Is Press Release USELESS, or USEFUL? Copyright 2014 MarketersMedia - Press Release Distribution All Rights Reserved 2 Introduction:

More information

Technician High Pressure Pump Guide for the 7.3 Power Stroke Engine

Technician High Pressure Pump Guide for the 7.3 Power Stroke Engine Technician High Pressure Pump Guide for the 7.3 Power Stroke Engine HIGH PRESSURE PUMP. PUMP LEAKS. ICP SYSTEM DIAGNOSTICS. REPAIR PARTS. TOOLS IPR TEST TOOLS AND ICP PUMP LEAK REPAIR High pressure pumps

More information

Math. Rounding Decimals. Answers. 1) Round to the nearest tenth. 8.54 8.5. 2) Round to the nearest whole number. 99.59 100

Math. Rounding Decimals. Answers. 1) Round to the nearest tenth. 8.54 8.5. 2) Round to the nearest whole number. 99.59 100 1) Round to the nearest tenth. 8.54 8.5 2) Round to the nearest whole number. 99.59 100 3) Round to the nearest tenth. 310.286 310.3 4) Round to the nearest whole number. 6.4 6 5) Round to the nearest

More information

Comp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d

Comp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d Comp 204: Computer Systems and Their Implementation Lecture 12: Scheduling Algorithms cont d 1 Today Scheduling continued Multilevel queues Examples Thread scheduling 2 Question A starvation-free job-scheduling

More information

Java SE 7 Programming

Java SE 7 Programming Oracle University Contact Us: 1.800.529.0165 Java SE 7 Programming Duration: 5 Days What you will learn This Java SE 7 Programming training explores the core Application Programming Interfaces (API) you'll

More information

Java SE 7 Programming

Java SE 7 Programming Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 4108 4709 Java SE 7 Programming Duration: 5 Days What you will learn This Java Programming training covers the core Application Programming

More information

Artificial Intelligence Beating Human Opponents in Poker

Artificial Intelligence Beating Human Opponents in Poker Artificial Intelligence Beating Human Opponents in Poker Stephen Bozak University of Rochester Independent Research Project May 8, 26 Abstract In the popular Poker game, Texas Hold Em, there are never

More information

CS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014)

CS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014) CS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014) CSTA Website Oracle Website Oracle Contact http://csta.acm.org/curriculum/sub/k12standards.html https://academy.oracle.com/oa-web-introcs-curriculum.html

More information

Load balancing in SOAJA (Service Oriented Java Adaptive Applications)

Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Richard Olejnik Université des Sciences et Technologies de Lille Laboratoire d Informatique Fondamentale de Lille (LIFL UMR CNRS 8022)

More information

Dynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas

Dynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas CHAPTER Dynamic Load Balancing 35 Using Work-Stealing Daniel Cederman and Philippas Tsigas In this chapter, we present a methodology for efficient load balancing of computational problems that can be easily

More information

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading

Announcements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading Announcements Reading Chapter 5 Chapter 7 (Monday or Wednesday) Basic Concepts CPU I/O burst cycle Process execution consists of a cycle of CPU execution and I/O wait. CPU burst distribution What are the

More information

Tenacity and rupture degree of permutation graphs of complete bipartite graphs

Tenacity and rupture degree of permutation graphs of complete bipartite graphs Tenacity and rupture degree of permutation graphs of complete bipartite graphs Fengwei Li, Qingfang Ye and Xueliang Li Department of mathematics, Shaoxing University, Shaoxing Zhejiang 312000, P.R. China

More information

NUMERICAL CALCULATION OF THE DENSITY OF PRIME NUMBERS WITH A GIVEN LEAST PRIMITIVE ROOT

NUMERICAL CALCULATION OF THE DENSITY OF PRIME NUMBERS WITH A GIVEN LEAST PRIMITIVE ROOT MATHEMATICS OF COMPUTATION Volume 71, Number 240, Pages 1781 1797 S 0025-5718(01)01382-5 Article electronically published on November 28, 2001 NUMERICAL CALCULATION OF THE DENSITY OF PRIME NUMBERS WITH

More information

Impact of Cloud Computing on Healthcare

Impact of Cloud Computing on Healthcare Volume 1, Issue 3, October 2013 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Impact of Cloud Computing on Healthcare Yogesh Khullar Department of Computer

More information

MDS UK Patient Support Group Feedback: London Local AA/MDS Support Group Meeting 19/07/12

MDS UK Patient Support Group Feedback: London Local AA/MDS Support Group Meeting 19/07/12 UK Patient Support Group Feedback: London Local AA/ Support Group Meeting 9/07/ Attendees How did you hear about this regional meeting? Attended* Feedback AA 7 (33%) 3 (67%) 7 5 *Attendance figures depend

More information

Analysis and Comparison of CPU Scheduling Algorithms

Analysis and Comparison of CPU Scheduling Algorithms Analysis and Comparison of CPU Scheduling Algorithms Pushpraj Singh 1, Vinod Singh 2, Anjani Pandey 3 1,2,3 Assistant Professor, VITS Engineering College Satna (MP), India Abstract Scheduling is a fundamental

More information

Bachelor of Engineering (B.Eng.)

Bachelor of Engineering (B.Eng.) General description of degree program Name of degree program: Degree awarded: Specialization: Educational and professional goals: Program duration: Pre-study work experience: Conditions of admission: Traineeship

More information

Reviewing All Applications & Critiques for a Review Meeting

Reviewing All Applications & Critiques for a Review Meeting proposalcentral Reviewing All Applications & Critiques for a Review Meeting If you need assistance, contact Customer Service by email at pcsupport@altum.com or by phone at 1-800-875-2562 or phone 703-964-5840

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;

More information

Thread level parallelism

Thread level parallelism Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process

More information

Setting up your own Internet Radio Station

Setting up your own Internet Radio Station Setting up your own Internet Radio Station Author: Streamit B.V. Version: 1.0.0 Date: March 2007 Copyright Agreement Notitions about trademarks LUKAS IS A REGISTERED TRADEMARK OF STREAMIT, SEE WWW.STREAMIT.EU.

More information

DIPLOMADO DE JAVA - OCA

DIPLOMADO DE JAVA - OCA DIPLOMADO DE JAVA - OCA TABLA DE CONTENIDO INTRODUCCION... 3 ESTRUCTURA DEL DIPLOMADO... 4 Nivel I:... 4 Fundamentals of the Java Programming Language Java SE 7... 4 Introducing the Java Technology...

More information

Notes on Network Security - Introduction

Notes on Network Security - Introduction Notes on Network Security - Introduction Security comes in all shapes and sizes, ranging from problems with software on a computer, to the integrity of messages and emails being sent on the Internet. Network

More information

GREATEST COMMON DIVISOR

GREATEST COMMON DIVISOR DEFINITION: GREATEST COMMON DIVISOR The greatest common divisor (gcd) of a and b, denoted by (a, b), is the largest common divisor of integers a and b. THEOREM: If a and b are nonzero integers, then their

More information

REAL TIME OPERATING SYSTEMS. Lesson-18:

REAL TIME OPERATING SYSTEMS. Lesson-18: REAL TIME OPERATING SYSTEMS Lesson-18: Round Robin Time Slicing of tasks of equal priorities 1 1. Common scheduling models 2 Common scheduling models Cooperative Scheduling of ready tasks in a circular

More information

FAQ: BroadLink Multi-homing Load Balancers

FAQ: BroadLink Multi-homing Load Balancers FAQ: BroadLink Multi-homing Load Balancers BroadLink Overview Outbound Traffic Inbound Traffic Bandwidth Management Persistent Routing High Availability BroadLink Overview 1. What is BroadLink? BroadLink

More information

Common Approaches to Real-Time Scheduling

Common Approaches to Real-Time Scheduling Common Approaches to Real-Time Scheduling Clock-driven time-driven schedulers Priority-driven schedulers Examples of priority driven schedulers Effective timing constraints The Earliest-Deadline-First

More information

Managing Stop-Go Capital Flows in EM Asia: So Far, So Good

Managing Stop-Go Capital Flows in EM Asia: So Far, So Good Managing Stop-Go Capital Flows in EM Asia: So Far, So Good Andrew Filardo Bank for International Settlements Prepared for the Central Banks of Finland and Austria Joint Conference on European Economic

More information

Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff

Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff Process Scheduling CS 241 February 24, 2012 Copyright University of Illinois CS 241 Staff 1 Announcements Mid-semester feedback survey (linked off web page) MP4 due Friday (not Tuesday) Midterm Next Tuesday,

More information

Communication Cost in Big Data Processing

Communication Cost in Big Data Processing Communication Cost in Big Data Processing Dan Suciu University of Washington Joint work with Paul Beame and Paris Koutris and the Database Group at the UW 1 Queries on Big Data Big Data processing on distributed

More information

IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING

IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING ABSTRACT: IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING Hakan Wiman Department of Photogrammetry, Royal Institute of Technology S - 100 44 Stockholm, Sweden (e-mail hakanw@fmi.kth.se) ISPRS Commission

More information

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and

Breaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and Breaking The Code Ryan Lowe Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and a minor in Applied Physics. As a sophomore, he took an independent study

More information

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline

More information

Contributions to Gang Scheduling

Contributions to Gang Scheduling CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

More information

Using a Digital Recorder with Dragon NaturallySpeaking

Using a Digital Recorder with Dragon NaturallySpeaking Using a Digital Recorder with Dragon NaturallySpeaking For those desiring to record dictation on the go and later have it transcribed by Dragon, the use of a portable digital dictating device is a perfect

More information

Applying Fixed Route Principles To Improve Paratransit Runcutting. Keith Forstall

Applying Fixed Route Principles To Improve Paratransit Runcutting. Keith Forstall Applying Fixed Route Principles To Improve Paratransit Runcutting Keith Forstall Why is runcutting important? Scheduling algorithms are designed to schedule trips efficiently They depend on vehicle capacity

More information

Shanghai R&D Vacancies August 2014 PV, PE, Intern

Shanghai R&D Vacancies August 2014 PV, PE, Intern RD Shanghai R&D Vacancies August 2014 PV, PE, Intern 1. Lead Software Engineer- Routing (Req#: 9528) Responsible for development and maintenance of signal routing in EDI platform (NanoRoute). Implementation

More information

Scheduling Parallel Jobs with Monotone Speedup 1

Scheduling Parallel Jobs with Monotone Speedup 1 Scheduling Parallel Jobs with Monotone Speedup 1 Alexander Grigoriev, Marc Uetz Maastricht University, Quantitative Economics, P.O.Box 616, 6200 MD Maastricht, The Netherlands, {a.grigoriev@ke.unimaas.nl,

More information

Kerrighed: use cases. Cyril Brulebois. Kerrighed. Kerlabs

Kerrighed: use cases. Cyril Brulebois. Kerrighed. Kerlabs Kerrighed: use cases Cyril Brulebois cyril.brulebois@kerlabs.com Kerrighed http://www.kerrighed.org/ Kerlabs http://www.kerlabs.com/ 1 / 23 Introducing Kerrighed What s Kerrighed? Single-System Image (SSI)

More information

Portfolio Replication Variable Annuity Case Study. Curt Burmeister Senior Director Algorithmics

Portfolio Replication Variable Annuity Case Study. Curt Burmeister Senior Director Algorithmics Portfolio Replication Variable Annuity Case Study Curt Burmeister Senior Director Algorithmics What is Portfolio Replication? To find a portfolio of assets whose value is equal to the value of a liability

More information

Guided Performance Analysis with the NVIDIA Visual Profiler

Guided Performance Analysis with the NVIDIA Visual Profiler Guided Performance Analysis with the NVIDIA Visual Profiler Identifying Performance Opportunities NVIDIA Nsight Eclipse Edition (nsight) NVIDIA Visual Profiler (nvvp) nvprof command-line profiler Guided

More information

Online Supplement for Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers by Mohammad H. Yarmand and Douglas G.

Online Supplement for Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers by Mohammad H. Yarmand and Douglas G. Online Supplement for Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers by Mohammad H Yarmand and Douglas G Down Appendix A Lemma 1 - the remaining cases In this appendix,

More information

CPU Scheduling Outline

CPU Scheduling Outline CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different

More information

P900 SERIES PORTABLE HYDROSTATIC TESTER

P900 SERIES PORTABLE HYDROSTATIC TESTER P900 SERIES PORTABLE HYDROSTATIC TESTER MODELMODEL P900-SM ( STANDARD MODEL) SHOWN IN PHOTO coated square tubular frame on wheels for easy transport. PANEL MOUNTED CONTROLS: 4 dial gauges, pump regulator

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Lecture 13: The Knapsack Problem

Lecture 13: The Knapsack Problem Lecture 13: The Knapsack Problem Outline of this Lecture Introduction of the 0-1 Knapsack Problem. A dynamic programming solution to this problem. 1 0-1 Knapsack Problem Informal Description: We have computed

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

More information

Welcome to the course Algorithm Design

Welcome to the course Algorithm Design HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Welcome to the course Algorithm Design Summer Term 2011 Friedhelm Meyer auf der Heide Lecture 6, 20.5.2011 Friedhelm Meyer auf

More information

120 Inch Telescope Dome Crane Load Testing

120 Inch Telescope Dome Crane Load Testing State Required Crane Inspection Annual Crane Inspection An annual inspection of our crane is required by law as indicated below. Division 1. Department of Industrial Relations Chapter 4. Division of Industrial

More information

70-563 (VB) - Pro: Designing and Developing Windows Applications Using the Microsoft.NET Framework 3.5

70-563 (VB) - Pro: Designing and Developing Windows Applications Using the Microsoft.NET Framework 3.5 70-563 (VB) - Pro: Designing and Developing Windows Applications Using the Microsoft.NET Framework 3.5 Course Introduction Course Introduction Chapter 01 - Windows Forms and Controls Windows Forms Demo

More information

Decentralized Utility-based Sensor Network Design

Decentralized Utility-based Sensor Network Design Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA narayans@cs.usc.edu, bkrishna@usc.edu

More information

OpenFlow Based Load Balancing

OpenFlow Based Load Balancing OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple

More information

Chapter 8: Bags and Sets

Chapter 8: Bags and Sets Chapter 8: Bags and Sets In the stack and the queue abstractions, the order that elements are placed into the container is important, because the order elements are removed is related to the order in which

More information

ONLINE CPD FOR SOCIAL SERVICES BUYING YOUR COURSE/S

ONLINE CPD FOR SOCIAL SERVICES BUYING YOUR COURSE/S RSA Tel 086 1000 381 International Tel +27 21 975 2602 Email info.social@ecpd.co.za ONLINE CPD FOR SOCIAL SERVICES BUYING YOUR COURSE/S Page 1 RSA Tel 086 1000 381 International Tel +27 21 975 2602 Email

More information

Main TVM functions of a BAII Plus Financial Calculator

Main TVM functions of a BAII Plus Financial Calculator Main TVM functions of a BAII Plus Financial Calculator The BAII Plus calculator can be used to perform calculations for problems involving compound interest and different types of annuities. (Note: there

More information

Certificate IV in Human Resources. Name Other. Tutorial Support if required - Bundaberg Information Version Control

Certificate IV in Human Resources. Name Other. Tutorial Support if required - Bundaberg Information Version Control Location Start Date: Start Date: Start Date: Start Date: Thursday, 14 July 2016 Start Date: End Date: End Date: End Date: End Date: Thursday, 15 September 2016 End Date: Start Time: Start Time: Start Time:

More information

CPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems

CPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems Based on original slides by Silberschatz, Galvin and Gagne 1 Basic Concepts CPU I/O Burst Cycle Process execution

More information

LOAD BALANCING AND ADMISSION CONTROL OF A PARLAY X APPLICATION SERVER

LOAD BALANCING AND ADMISSION CONTROL OF A PARLAY X APPLICATION SERVER This is an author produced version of a paper presented at the 17th Nordic Teletraffic Seminar (NTS 17), Fornebu, Norway, 25-27 August, 2004. This paper may not include the final publisher proof-corrections

More information

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput Import Settings: Base Settings: Brownstone Default Highest Answer Letter: D Multiple Keywords in Same Paragraph: No Chapter: Chapter 5 Multiple Choice 1. Which of the following is true of cooperative scheduling?

More information

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging

More information

341 - Bioinformatics Android Coursework

341 - Bioinformatics Android Coursework 341 - Bioinformatics Android Coursework 1 Important This coursework must be submitted electronically via CATE. This coursework is intended for groups of 4. Each group must contain at least one Computing

More information

Finding the Measure of Segments Examples

Finding the Measure of Segments Examples Finding the Measure of Segments Examples 1. In geometry, the distance between two points is used to define the measure of a segment. Segments can be defined by using the idea of betweenness. In the figure

More information

Business Life Path - Red Hat, CFS roadmap

Business Life Path - Red Hat, CFS roadmap The Kernel Report Vision 2007 edition Jonathan Corbet LWN.net corbet@lwn.net The Plan 1) A very brief history overview 2) The development process 3) Guesses about the future History 1 An extremely rushed

More information

Minimizing the Number of Machines in a Unit-Time Scheduling Problem

Minimizing the Number of Machines in a Unit-Time Scheduling Problem Minimizing the Number of Machines in a Unit-Time Scheduling Problem Svetlana A. Kravchenko 1 United Institute of Informatics Problems, Surganova St. 6, 220012 Minsk, Belarus kravch@newman.bas-net.by Frank

More information

Real-Time Scheduling 1 / 39

Real-Time Scheduling 1 / 39 Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A

More information

English Schools' Athletic Association Track & Field Championships 2016 Timetable

English Schools' Athletic Association Track & Field Championships 2016 Timetable Some hurdles events will be held on the back straight, denoted (BS). Friday Track Event T1 10:00 hrs. 1500 metres JUNIOR BOYS 1st Round 2 Heats, Event T66 12:51 hrs. Saturday T2 10:12 hrs. 1500 metres

More information

The Management of Logistics in Large Scale Inventory Systems to Support Weapon System Maintenance

The Management of Logistics in Large Scale Inventory Systems to Support Weapon System Maintenance The Management of Logistics in Large Scale Inventory Systems to Support Weapon System Maintenance Eugene A. Beardslee, SAIC and Dr. Hank Grant Department of Industrial Engineering, University of Oklahoma

More information

- 221 - - 222 - - 223 - - 224 - - 225 - - 226 - - 227 - - 228 - - 229 - - 230 - - 231 - - 232 - - 233 - - 234 - - 235 - - 236 - - 237 - - 238 - - 239 - - 240 - - 241 - - 242 - - 243 - - 244 - - 245 - -

More information

CIEL A universal execution engine for distributed data-flow computing

CIEL A universal execution engine for distributed data-flow computing Reviewing: CIEL A universal execution engine for distributed data-flow computing Presented by Niko Stahl for R202 Outline 1. Motivation 2. Goals 3. Design 4. Fault Tolerance 5. Performance 6. Related Work

More information

Internal Audit Report Credit Cards (C4/69, C4/70)

Internal Audit Report Credit Cards (C4/69, C4/70) INFORMATION REPORT Audit Committee 20 October 2011 Governance and Compliance Internal Audit Report Credit Cards (C4/69, C4/70) As part of the 2011 Internal Audit Plan an audit was undertaken on the Credit

More information

1 Formulating The Low Degree Testing Problem

1 Formulating The Low Degree Testing Problem 6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

More information

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background

More information

Accelerating File Transfers Increase File Transfer Speeds in Poorly-Performing Networks

Accelerating File Transfers Increase File Transfer Speeds in Poorly-Performing Networks Accelerating File Transfers Increase File Transfer Speeds in Poorly-Performing Networks Contents Introduction... 2 Common File Delivery Methods... 2 Understanding FTP... 3 Latency and its effect on FTP...

More information

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,

More information