v 16 v 17 v 21 v 22 v 23
|
|
- Evan Clark
- 9 years ago
- Views:
Transcription
1 SchedulingMultithreadedComputations byworkstealing TheUniversityofTexasatAustin CharlesE.Leiserson RobertD.Blumofe MITLaboratoryforComputerScience ticalmethodofschedulingthiskindofdynamicmimd-stylecomputationis\work structured)multithreadedcomputationsonparallelcomputers.apopularandprac- Thispaperstudiestheproblemofecientlyschedulingfullystrict(i.e.,well- Abstract multithreadedcomputationswithdependencies. stealing,"inwhichprocessorsneedingworkstealcomputationalthreadsfromother processors.inthispaper,wegivetherstprovablygoodwork-stealingschedulerfor theminimumexecutiontimewithaninnitenumberofprocessors.moreover,the computationonpprocessorsusingourwork-stealingschedulerist1=p+o(t1),where T1istheminimumserialexecutiontimeofthemultithreadedcomputationandT1is Specically,ouranalysisshowsthattheexpectedtimetoexecuteafullystrict atmosto(pt1(1+nd)smax),wheresmaxisthesizeofthelargestactivationrecordof requirement.wealsoshowthattheexpectedtotalcommunicationofthealgorithmis anythreadandndisthemaximumnumberoftimesthatanythreadsynchronizeswith spacerequiredbytheexecutionisatmosts1p,wheres1istheminimumserialspace threeoftheseboundsareexistentiallyoptimaltowithinaconstantfactor. schedulersaremorecommunicationecientthantheirwork-sharingcounterparts.all itsparent.thiscommunicationboundjustiesthefolkwisdomthatwork-stealing 1Forecientexecutionofadynamicallygrowing\multithreaded"computationonaMIMD- styleparallelcomputer,aschedulingalgorithmmustensurethatenoughthreadsareactive Introduction ofconcurrentlyactivethreadsremainswithinreasonablelimitssothatmemoryrequirements concurrentlytokeeptheprocessorsbusy.simultaneously,itshouldensurethatthenumber arenotundulylarge.moreover,theschedulershouldalsotrytomaintainrelatedthreads andwassupportedinpartbyanarpahigh-performancecomputinggraduatefellowship ThisresearchwasdonewhileRobertD.BlumofewasattheMITLaboratoryforComputerScience ThisresearchwassupportedinpartbytheAdvancedResearchProjectsAgencyunderContractN
2 computations:worksharingandworkstealing.inworksharing,wheneveraprocessor Needlesstosay,achievingallthesegoalssimultaneouslycanbedicult. onthesameprocessor,ifpossible,sothatcommunicationbetweenthemcanbeminimized. generatesnewthreads,theschedulerattemptstomigratesomeofthemtootherprocessors inhopesofdistributingtheworktounderutilizedprocessors.inworkstealing,however, Twoschedulingparadigmshavearisentoaddresstheproblemofschedulingmultithreaded underutilizedprocessorstaketheinitiative:theyattemptto\steal"threadsfromother processors.intuitively,themigrationofthreadsoccurslessfrequentlywithworkstealing byawork-stealingscheduler,butthreadsarealwaysmigratedbyawork-sharingscheduler. thanwithworksharing,sincewhenallprocessorshaveworktodo,nothreadsaremigrated communication.sincethen,manyresearchershaveimplementedvariantsonthisstrategy Theseauthorspointouttheheuristicbenetsofworkstealingwithregardstospaceand allelexecutionoffunctionalprograms[16]andhalstead'simplementationofmultilisp[30]. Thework-stealingideadatesbackatleastasfarasBurtonandSleep'sresearchonpar- search.recently,zhangandortynski[48]haveobtainedgoodboundsonthecommunication [11,21,23,29,34,37,46].Rudolph,Slivkin-Allalouf,andUpfal[43]analyzedarandomizedwork-stealingstrategyforloadbalancingindependentjobsonaparallelcomputer,and KarpandZhang[33]analyzedarandomizedwork-stealingstrategyforparallelbacktrack requirementsofthisalgorithm. aswellasdataowcomputations[2]inwhichthreadsmaystallduetoadatadependency. strict"(well-structured)multithreadedcomputations.thisclassofcomputationsencompassesbothbacktracksearchcomputations[33,48]anddivide-and-conquercomputations[47], Inthispaper,wepresentandanalyzeawork-stealingalgorithmforscheduling\fully Weanalyzeouralgorithmsinastringentatomic-accessmodelsimilartotheatomicmessagepassingmodelof[36]inwhichconcurrentaccessestothesamedatastructureareserially queuedbyanadversary. multithreadedcomputationswhichisprovablyecientintermsoftime,space,andcommunication.weprovethattheexpectedtimetoexecuteafullystrictcomputationonp processorsusingourwork-stealingschedulerist1=p+o(t1),wheret1istheminimum Ourmaincontributionisarandomizedwork-stealingschedulingalgorithmforfullystrict timewithaninnitenumberofprocessors.inaddition,thespacerequiredbytheexecution isatmosts1p,wheres1istheminimumserialspacerequirement.theseboundsarebetterthanpreviousboundsforwork-sharingschedulers[10],andthework-stealingscheduler serialexecutiontimeofthemultithreadedcomputationandt1istheminimumexecution ismuchsimplerandeminentlypractical.partofthisimprovementisduetoourfocusingonfullystrictcomputations,ascomparedtothe(general)strictcomputationsstudied O(PT1(1+nd)Smax),whereSmaxisthesizeofthelargestactivationrecordofanythread andndisthemaximumnumberoftimesthatanythreadsynchronizeswithitsparent.this boundisexistentiallytighttowithinaconstantfactor,meetingthelowerboundofwu andkung[47]forcommunicationinparalleldivide-and-conquer.incontrast,work-sharing in[10].wealsoprovethattheexpectedtotalcommunicationoftheexecutionisatmost requirementsofparallelcomputations.cullerandarvind[19]andruggieroandsargeant schedulershavenearlyworst-casebehaviorforcommunication.thus,ourresultsbolsterthe folkwisdomthatworkstealingissuperiortoworksharing. Othershavestudiedandcontinuetostudytheproblemofecientlymanagingthespace 2
3 [44]giveheuristicsforlimitingthespacerequiredbydataowprograms.Burton[14]shows andanalyzedschedulingalgorithmswithprovablygoodtimeandspacebounds.itisnot spacebounds.blelloch,gibbons,matias,andnarlikar[3,4]havealsorecentlydeveloped Burton[15]hasdevelopedandanalyzedaschedulingalgorithmwithprovablygoodtimeand howtolimitspaceincertainparallelcomputationswithoutcausingdeadlock.morerecently, theoreticmodelofmultithreadedcomputationsintroducedin[10],whichprovidesatheo- reticalbasisforanalyzingschedulers.section3givesasimpleschedulingalgorithmwhich Theremainderofthispaperisorganizedasfollows.InSection2wereviewthegraph- yetclearwhetheranyofthesealgorithmsareaspracticalasworkstealing. usesacentralqueue.this\busy-leaves"algorithmformsthebasisforourrandomizedworkstealingalgorithm,whichwepresentinsection4.insection5weintroducetheatomic-access modelthatweusetoanalyzeexecutiontimeandcommunicationcostsforthework-stealing andcommunicationcostofthework-stealingalgorithm.toconclude,insection7webriey boundalongwithadelay-sequenceargument[41]insection6toanalyzetheexecutiontime algorithm,andwepresentandanalyzeacombinatorial\ballsandbins"gamethatweuse discusshowthetheoreticalideasinthispaperhavebeenappliedtothecilkprogramming toderiveaboundonthecontentionthatarisesinrandomworkstealing.wethenusethis languageandruntimesystem[8,25],aswellasmakesomeconcludingremarks. 2Thissectionreprisesthegraph-theoreticmodelofmultithreadedcomputationintroduced in[10].wealsodenewhatitmeansforcomputationstobe\fullystrict."weconclude Amodelofmultithreadedcomputation withastatementofthegreedy-schedulingtheorem,whichisanadaptationoftheoremsby Brent[13]andGraham[27,28]ondagscheduling. quentialorderingofunit-timeinstructions.theinstructionsareconnectedbydependency edges,whichprovideapartialorderingonwhichinstructionsmustexecutebeforewhich otherinstructions.infigure1,forexample,eachshadedblockisathreadwithcircles Amultithreadedcomputationiscomposedofasetofthreads,eachofwhichisase- representinginstructionsandthehorizontaledges,calledcontinueedges,representingthe sequentialordering.thread 5ofthisexamplecontains3instructions:v10,v11,andv12. usetostorethevaluesonwhichtheycompute. itachunkofmemory,calledanactivationframe,thattheinstructionsofthethreadcan Theinstructionsofathreadmustexecuteinthissequentialorderfromtherst(leftmost) instructiontothelast(rightmost)instruction.inordertoexecuteathread,weallocatefor processorsofap-processorparallelcomputerexecutewhichinstructionsateachstep.an executionscheduledependsontheparticularmultithreadedcomputationandthenumberp ofprocessors.inanygivenstepofanexecutionschedule,eachprocessorexecutesatmost AP-processorexecutionscheduleforamultithreadedcomputationdetermineswhich rentlywiththespawnedthread.weconsiderspawnedthreadstobechildrenofthethread ingathreadislikeasubroutinecall,exceptthatthespawningthreadcanoperateconcur- oneinstruction. thatdidthespawning,andathreadmayspawnasmanychildrenasitdesires.inthisway, Duringthecourseofitsexecution,athreadmaycreate,orspawn,otherthreads.Spawn- 3
4 Γ 1 v 1 v 2 v 16 v 17 v 21 v 22 v 23 Γ 2 Γ 6 Figure1:Amultithreadedcomputation.Thiscomputationcontains23instructionsv1;v2;:::;v23 v 3 v 6 v 9 v 13 v 14 v v 18 v 19 v and6threads 1; 2;:::; Γ 3 Γ 4 Γ 5 v 4 v 5 v 7 v 8 v 10 v 11 v dren.thespawntreeistheparallelanalogofacalltree.inourexamplecomputation,the spawntree'srootthread 1hastwochildren, 2and 6,andthread 2hasthreechildren, threadsareorganizedintoaspawntreeasindicatedinfigure1bythedownward-pointing, shadeddependencyedges,calledspawnedges,thatconnectthreadstotheirspawnedchil- 12 executionschedulemustobeythisedgeinthatnoprocessormayexecuteaninstructionin thespawnoperation intheparentthreadtotherstinstructionofthechildthread.an 3, 4,and 5.Threads 3, 4, 5,and 6,whichhavenochildren,areleafthreads. aspawnedchildthreaduntilafterthespawninginstructionintheparentthreadhasbeen Eachspawnedgegoesfromaspecicinstruction theinstructionthatactuallydoes v7cannotbeexecuteduntilafterthespawninginstructionv6.consistentwithourunit-time instructionexecutes,itallocatesanactivationframeforthenewchildthread.onceathread modelofinstructions,asingleinstructionmayspawnatmostonechild.whenthespawning executed.inourexamplecomputation(figure1),duetothespawnedge(v6;v7),instruction Whenthelastinstructionofathreadexecutes,itdeallocatesitsframeandthethreaddies. bycontinueandspawnedges.consideraninstructionthatproducesadatavaluetobe hasbeenspawnedanditsframehasbeenallocated,wesaythethreadisaliveorliving. consumedbyanotherinstruction.suchaproducer/consumerrelationshipprecludesthe consuminginstructionfromexecutinguntilaftertheproducinginstruction.toenforce Anexecutionschedulegenerallyrespectsotherdependenciesbesidesthoserepresented suchorderings,otherdependencyedges,calledjoinedges,mayberequired,asshownin beforetheproducinginstructionhasexecuted,executionoftheconsumingthreadcannot continue thethreadstalls.oncetheproducinginstructionexecutes,thejoindependencyis Figure1bythecurvededges.Iftheexecutionofathreadarrivesataconsuminginstruction resolutionanddetectioncanbeaccomplishedusingmechanismssuchasjoincounters[8], ready.amultithreadedcomputationdoesnotmodelthemeansbywhichjoindependencies getresolvedorbywhichunresolvedjoindependenciesgetdetected.inimplementation, resolved,whichenablestheconsumingthreadtoresumeitsexecution thethreadbecomes futures[30],ori-structures[2]. instructionhasatmostaconstantnumberofjoinedgesincidentonit.thisassumption Wemaketwotechnicalassumptionsregardingjoinedges.Werstassumethateach 4
5 isconsistentwithourunit-timemodelofinstructions.thesecondassumptionisthatno continuestobereadytoexecuteforatleastonemoreinstruction. joinedgesentertheinstructionimmediatelyfollowingaspawn.thisassumptionmeans thatwhenaparentthreadspawnsachildthread,theparentcannotimmediatelystall.it inthisgraphhavebeenexecuted.sothatexecutionschedulesexist,thisgraphmustbe andnoprocessormayexecuteaninstructionuntilafteralloftheinstruction'spredecessors edgesofthecomputation.thesedependencyedgesformadirectedgraphofinstructions, Anexecutionschedulemustobeytheconstraintsgivenbythespawn,continue,andjoin executed. executionschedule,aninstructionisreadyifallofitspredecessorsinthedaghavebeen acyclic.thatis,itmustbeadirectedacyclicgraph,ordag.atanygivenstepofan frameshavebeendeallocated.althoughthisassumptionisnotabsolutelynecessary,itgives childrendie,andthus,athreaddoesnotdeallocateitsactivationframeuntilallitschildren's theexecutionanaturalstructure,anditwillsimplifyouranalysesofspaceutilization.in Wemakethesimplifyingassumptionthataparentthreadremainsaliveuntilallits (orifsuchstorageisavailable,thenwedonotaccountforit).therefore,thespaceused thecomputation;thereisnoglobalstorageavailabletothecomputationoutsidetheframes accountingforspaceutilization,wealsoassumethattheframesholdallthevaluesusedby threadsatthattime,andthetotalspaceusedinexecutingacomputationisthemaximum atagiventimeinexecutingacomputationisthetotalsizeofallframesusedbyallliving suchvalueoverthecourseoftheexecution. activationframeisallocatedandthisframeremainsallocatedaslongasthethreadremains nectedbydependencyedges.theinstructionsareconnectedbycontinueedgesintothreads, andthethreadsformaspawntreewiththespawnedges.whenathreadisspawned,an Tosummarize,amultithreadedcomputationcanbeviewedasadagofinstructionscon- alive.alivingthreadmaybeeitherreadyorstalledduetoanunresolveddependency. thanonemultithreadedcomputation.inthatcase,wesaytheprogramisnondeterministic.ifthesamemultithreadedcomputationisgeneratedbytheprogramontheinput Agivenmultithreadedprogramwhenrunonagiveninputcansometimesgeneratemore nomatterhowthecomputationisscheduled,thentheprogramisdeterministic.inthis cally,weshallnotworryabouthowthemultithreadedcomputationisgenerated.instead, weshallstudyitspropertiesinanaposteriorifashion. paper,weshallanalyzemultithreadedcomputations,notmultithreadedprograms.speci- thekindsofsyncrhonizationsthatcanoccurarerestricted.astrictmultithreadedcomputationisoneinwhichalljoinedgesfromathreadgotoanancestorofthethreadin Becausemultithreadedcomputationswitharbitrarydependenciescanbeimpossibleto scheduleeciently[10],westudysubclassesofgeneralmultithreadedcomputationsinwhich theactivationtree.inastrictcomputation,theonlyedgeintoasubtree(emanatingfrom itsargumentsareavailable,althoughtheargumentscanbegarneredinparallel.afully spawnedge(v2;v3).thus,strictnessmeansthatathreadcannotbeinvokedbeforeallof thecomputationoffigure1isstrict,andtheonlyedgeintothesubtreerootedat 2isthe outsidethesubtree)isthespawnedgethatspawnsthesubtree'srootthread.forexample, strictcomputationisoneinwhichalljoinedgesfromathreadgotothethread'sparent.a fullystrictcomputationis,inasense,a\well-structured"computation,inthatalljoinedges fromasubtree(ofthespawntree)emanatefromthesubtree'sroot.theexamplecompu- 5
6 tationoffigure1isfullystrict.anymultithreadedcomputationthatcanbeexecutedina depth-rstmanneronasingleprocessorcanbemadeeitherstrictorfullystrictbyaltering thedependencystructure,possiblyaectingtheachievableparallelism,butnotaectingthe semanticsofthecomputation[5]. lengthtobethelengthofalongestdirectedpathinthedag.ourexamplecomputation workofthecomputationtobethetotalnumberofinstructionsandthecritical-path computerintermsofthecomputation's\work"and\critical-pathlength."wedenethe WequantifyandboundtheexecutiontimeofacomputationonaP-processorparallel (Figure1)haswork23andcritical-pathlength10.Foragivencomputation,letT(X)denote thetimetoexecutethecomputationusingp-processorexecutionschedulex,andlet denotetheminimumexecutiontimewithpprocessors theminimumbeingtakenoverallpprocessorexecutionschedulesforthecomputation.thent1istheworkofthecomputation, TP=min XT(X) sincea1-processorcomputercanonlyexecuteoneinstructionateachstep,andt1isthe critical-pathlength,sinceevenwitharbitrarilymanyprocessors,eachinstructiononapath mustexecuteserially.noticethatwemusthavetpt1=p,becausepprocessorscan executeonlypinstructionspertimestep,andofcourse,wemusthavetpt1. provedin[10,20],extendstheseresultsminimallytoshowthatthisupperboundontpcan thisupperboundisuniversallyoptimaltowithinafactorof2.thefollowingtheorem, processorexecutionschedulesxwitht(x)t1=p+t1.asthesumoftwolowerbounds, EarlyworkondagschedulingbyBrent[13]andGraham[27,28]showsthatthereexistP- ready,thenallexecute. Pinstructionsareready,thenPinstructionsexecute,andiffewerthanPinstructionsare beobtainedbygreedyschedules:thoseinwhichateachstepoftheexecution,ifatleast executionschedulexachievest(x)t1=p+t1. T1andcritical-pathlengthT1,andforanynumberPofprocessors,anygreedyP-processor Theorem1(Thegreedy-schedulingtheorem)Foranymultithreadedcomputationwithwork Generally,weareinterestedinschedulesthatachievelinearspeedup,thatisT(X)= O(T1=P).Foragreedyschedule,linearspeedupoccurswhentheparallelism,whichwe denetobet1=t1,satisest1=t1=(p). stackdepthofathreadtobethesumofthesizesoftheactivationframesofallitsancestors, includingitself.thestackdepthofamultithreadedcomputationisthemaximumstack depthofanyofitsthreads.weshalldenotebys1theminimumamountofspacepossiblefor Toquantifythespaceusedbyagivenexecutionscheduleofacomputation,wedenethe any1-processorexecutionofamultithreadedcomputation,whichisequaltothestackdepth ofthecomputation.lets(x)denotethespaceusedbyap-processorexecutionschedule Xofamultithreadedcomputation.Weshallbeinterestedinthoseexecutionschedulesthat exhibitatmostlinearexpansionofspace,thatis,s(x)=o(s1p),whichisexistentially optimaltowithinaconstantfactor[10]. 6
7 Onceathread hasbeenspawnedinastrictcomputation,asingleprocessorcancomplete 3theexecutionoftheentiresubcomputationrootedat evenifnootherprogressismade Thebusy-leavesproperty stall.asweshallsee,thispropertyallowsanexecutionscheduletokeeptheleaves\busy." at thatisready.inparticular,noleafthreadinastrictmultithreadedcomputationcan untilthetime dies,thereisalwaysatleastonethreadfromthesubcomputationrooted onotherpartsofthecomputation.inotherwords,fromthetimethethread isspawned computationwithworkt1,critical-pathlengtht1,andstackdepths1,thereexistsapprocessorexecutionschedulexthatachievestimet(x)t1=p+t1andspaces(x)s1p Inthissection,weshowthatforanynumberPofprocessorsandanystrictmultithreaded Bycombiningthis\busy-leaves"propertywiththegreedyproperty,wederiveexecution schedulesthatsimultaneouslyexhibitlinearspeedupandlinearexpansionofspace. simultaneously.wegiveasimpleonlinep-processorparallelalgorithm thebusy-leaves thealgorithmhascomputedandexecutedtherstt 1stepsoftheexecutionschedule. randomizedwork-stealingalgorithmpresentedinsection4. Algorithm tocomputesuchaschedule.thissimplealgorithmwillformthebasisforthe revealedsofarintheexecutiontocomputeandexecutethetthstepoftheschedule.in Atthetthstep,thealgorithmusesonlyinformationfromtheportionofthecomputation TheBusy-LeavesAlgorithmoperatesonlineinthefollowingsense.Beforethetthstep, particular,itdoesnotuseanyinformationfrominstructionsnotyetexecutedorthreadsnot yetspawned. ThoughwedescribethealgorithmasaP-processorparallelalgorithm,weshallnotanalyzeit thisglobalpool,andwhenaprocessorneedswork,itremovesareadythreadfromthepool. isuniformlyavailabletoallpprocessors.whenspawnsoccur,newthreadsareaddedto TheBusy-LeavesAlgorithmmaintainsalllivingthreadsinasinglethreadpoolwhich contendingforaccesstothepool.infact,weshallonlyanalyzepropertiesoftheschedule itselfandignorethecostincurredbythealgorithmincomputingtheschedule.(scheduling assuch.specically,incomputingthetthstepoftheschedule,wealloweachprocessortoadd threadstothethreadpoolanddeletethreadsfromit.thus,weignoretheeectsofprocessors processoreitherisidleorhasathreadtoworkon.thoseprocessorsthatareidlebeginthe threadintheglobalthreadpoolandallprocessorsidle.atthebeginningofeachstep,each overheadswillbeanalyzedfortherandomizedwork-stealingalgorithm,however.) stepbyattemptingtoremoveanyreadythreadfromthepool.iftherearesucientlymany TheBusy-LeavesAlgorithmoperatesasfollows.Thealgorithmbeginswiththeroot readythreadsinthepooltosatisfyalloftheidleprocessors,theneveryidleprocessorgets thathasathreadtoworkonexecutesthenextinstructionfromthatthread.ingeneral, areadythreadtoworkon.otherwise,someprocessorsremainidle.then,eachprocessor tothefollowingrules. onceaprocessorhasathread,callit a,toworkon,itexecutesaninstructionfrom aat eachstepuntilthethreadeitherspawns,stalls,ordies,inwhichcase,itperformsaccording ➊Spawns:Ifthethread aspawnsachild b,thentheprocessornishesthecurrent stepbyreturning atothethreadpool.theprocessorbeginsthenextstepworking on b. 7
8 step threadpool processoractivity 321 1:v1 2:v3 p1v2 1:v16 p :v4 2:v6 4:v7 v5 6:v18 v17 v :v9 5:v10 v8 1:v21 2:v13 v :v15 1:v23 v11 v12 1:v22 v14 workedonandtheinstructionexecutedbyeachofthe2processors,p1andp2,ateachstep.living justaftereachidleprocessorhasremovedareadythread.italsoliststhereadythreadbeing putationoffigure1.thisscheduleliststhelivingthreadsintheglobalthreadpoolateachstep Figure2:A2-processorexecutionschedulecomputedbytheBusy-LeavesAlgorithmforthecom- threadsthatarereadyarelistedinbold.theotherlivingthreadsarestalled. ➋Stalls:Ifthethread astalls,thentheprocessornishesthecurrentstepbyreturning ➌Dies:Ifthethread adies,thentheprocessornishesthecurrentstepbycheckingto atothethreadpool.theprocessorbeginsthenextstepidle. idle. andnootherprocessorisworkingon b,thentheprocessortakes bfromthepool andbeginsthenextstepworkingon b.otherwise,theprocessorbeginsthenextstep seeif a'sparentthread bcurrentlyhasanylivingchildren.if bhasnolivechildren thebusy-leavesalgorithmonthecomputationoffigure1.rule➊:atstep2,processor p1workingonthread 1executesv2whichspawnsthechild 2,sop1places 1backinthe pool(tobepickedupatthebeginningofthenextstepbytheidlep2)andbeginsthenext Figure2illustratesthesethreerulesina2-processorexecutionschedulecomputedby 2executesv15and 2dies,sop1retrievestheparent 1fromthepoolandbeginsthenext 1stalls,sop2returns 1tothepoolandbeginsthenextstepidle(andremainsidlesince stepworkingon 2.Rule➋:Atstep8,processorp2workingonthread 1executesv21and stepworkingon 1. thethreadpoolcontainsnoreadythreads).rule➌:atstep13,processorp1workingon spawnsubtreeatanytimestepttobetheportionofthespawntreeconsistingofjust execution,everyleafinthe\spawnsubtree"hasaprocessorworkingonit.wedenethe LeavesAlgorithmmaintainsthebusy-leavesproperty:ateverytimestepduringthe Besidesbeinggreedy,foranystrictcomputation,theschedulecomputedbytheBusy- 8
9 thosethreadsthatarealiveatstept.torestatethebusy-leavesproperty,ateverytimestep, property,buteverystrictmultithreadedcomputationdoes.webeginbyshowingthatany nowprovethisfactandshowthatitimplieslinearexpansionofspace.itisworthnoting thatnoteverymultithreadedcomputationhasaschedulethatmaintainsthebusy-leaves everylivingthreadthathasnolivingdescendantshasaprocessorworkingonit.weshall schedulethatmaintainsthebusy-leavespropertyexhibitslinearexpansionofspace. Proof: schedulexthatmaintainsthebusy-leavespropertyusesspaceboundedbys(x)s1p. Lemma2ForanymultithreadedcomputationwithstackdepthS1,anyP-processorexecution andtherefore,thespaceinuseatanytimesteptisatmosts1p. mostpleaves.foreachsuchleaf,thespaceusedbyitandallofitsancestorsisatmosts1, Forschedulesthatmaintainthebusy-leavesproperty,theupperboundS1Pisconser- Thebusy-leavespropertyimpliesthatatalltimestepst,thespawnsubtreehasat vative.bychargings1spaceforeachbusyleaf,wemaybeovercharging.forsomecom- putations,byknowingthattheschedulepreservesthebusy-leavesproperty,wecanappeal directlytothefactthatthespawnsubtreeneverhasmorethanpleavestoobtaintight boundsonspaceusage[6]. Theorem3ForanynumberPofprocessorsandanystrictmultithreadedcomputationwith computesaschedulethatisbothgreedyandmaintainsthebusy-leavesproperty. Wenishthissectionbyshowingthatforstrictcomputations,theBusy-LeavesAlgorithm whosespacesatisess(x)s1p. ap-processorexecutionschedulexwhoseexecutiontimesatisest(x)t1=p+t1and workt1,critical-pathlengtht1,andstackdepths1,thebusy-leavesalgorithmcomputes Lemma2ifwecanshowthattheBusy-LeavesAlgorithmmaintainsthebusy-leavesproperty. Weprovethisfactbyinductiononthenumberofsteps.Attherststepofthealgorithm,the Proof: sincethebusy-leavesalgorithmcomputesagreedyschedule.thespaceboundfollowsfrom Thetimeboundfollowsdirectlyfromthegreedy-schedulingtheorem(Theorem1), eitherspawns,stalls,ordies.rule➊:if aspawnsachild b,then aisnotaleaf(evenifit aprocessorhasathread atoworkon,itexecutesinstructionsfromthatthreaduntilit onit.wemustshowthatallofthealgorithmrulespreservethebusy-leavesproperty.when spawnsubtreecontainsjusttherootthreadwhichisaleaf,andsomeprocessorisworking mayturnintoaleaf.inthiscase,theprocessorworkson bunlesssomeotherprocessor wasbefore)and bisaleaf.inthiscase,theprocessorworkson b,sothenewleafisbusy. alreadyis,sothenewleafisguaranteedtobebusy. Rule➋:If astalls,then acannotbealeafsinceinastrictcomputation,theunresolved dependencymustcomefromadescendant.rule➌:if adies,thenitsparentthread b ecientexecutionschedulesanddoesoperateonline,itsurelydoesnotdosoeciently, mustbecomputedecientlyonline,andthoughthebusy-leavesalgorithmdoescompute schedule,andweknowhowtondit.butthesefactstakeusonlysofar.executionschedules Wenowknowthateverystrictmultithreadedcomputationhasanecientexecution andinthefollowingsections,weprovethatitisbothecientandscalable. contendforaccess.inthenextsection,wepresentadistributedonlineschedulingalgorithm, isaconsequenceofemployingasinglecentralizedthreadpoolatwhichallprocessorsmust exceptpossiblyinthecaseofsmall-scalesymmetricmultiprocessors.thislackofscalability 9
10 4tithreadedcomputationsonaparallelcomputer.Also,wepresentanimportantstructural Inthissection,wepresentanonline,randomizedwork-stealingalgorithmforschedulingmul- Arandomizedwork-stealingalgorithm algorithmcausesatmostalinearexpansionofspace.thislemmareappearsinsection6to lemmawhichisusedattheendofthissectiontoshowthatforfullystrictcomputations,this showthatforfullystrictcomputations,thisalgorithmachieveslinearspeedupandgenerates existentiallyoptimalamountsofcommunication. Algorithmisdistributedacrosstheprocessors.Specically,eachprocessormaintainsaready Threadscanbeinsertedonthebottomandremovedfromeitherend.Aprocessortreats dequedatastructureofthreads.thereadydequehastwoends:atopandabottom. IntheWork-StealingAlgorithm,thecentralizedthreadpooloftheBusy-Leaves migratedtootherprocessorsareremovedfromthetop. deque.itstartsworkingonthethread,callit a,andcontinuesexecuting a'sinstructions itsreadydequelikeacallstack,pushingandpoppingfromthebottom.threadsthatare until aspawns,stalls,dies,orenablesastalledthread,inwhichcase,itperformsaccording tothefollowingrules. Ingeneral,aprocessorobtainsworkbyremovingthethreadatthebottomofitsready ➊Spawns:Ifthethread aspawnsachild b,then aisplacedonthebottomofthe ➋Stalls:Ifthethread astalls,itsprocessorchecksthereadydeque.ifthedeque containsanythreads,thentheprocessorremovesandbeginsworkonthebottommost readydeque,andtheprocessorcommencesworkon b. beginsworkonit.(thiswork-stealingstrategyiselaboratedbelow.) stealsthetopmostthreadfromthereadydequeofarandomlychosenprocessorand thread.ifthereadydequeisempty,however,theprocessorbeginsworkstealing:it ➌Dies:Ifthethread adies,thentheprocessorfollowsrule➋asinthecaseof a ➍Enables:Ifthethread aenablesastalledthread b,thenow-readythread bis placedonthebottomofthereadydequeof a'sprocessor. stalling. rule➍forthecasewhenathreadenablesastalledthread,theserulesareanalogoustothe rulesofthebusy-leavesalgorithm,andasweshallsee,rule➍isneededtoensurethatthe performrule➍forenablingandthenrule➋forstallingorrule➌fordying.exceptfor Athreadcansimultaneouslyenableastalledthreadandstallordie,inwhichcasewerst algorithmmaintainsimportantstructuralproperties,includingthebusy-leavesproperty. themultithreadedcomputationisplacedinthereadydequeofoneprocessor,whiletheother processorsstartworkstealing. TheWork-StealingAlgorithmbeginswithallreadydequesempty.Therootthreadof beginsworkonthetopthread.ifthevictim'sreadydequeisempty,however,thethieftries Thethiefqueriesthereadydequeofthevictim,andifitisnonempty,thethiefremovesand athiefandattemptstostealworkfromavictimprocessorchosenuniformlyatrandom. Whenaprocessorbeginsworkstealing,itoperatesasfollows.Theprocessorbecomes again,pickinganothervictimatrandom. 10
11 Γ k ready deque Γ spawnedachild.thedashededgesarethe\dequeedges"introducedinsection6. Figure3:Thestructureofaprocessor'sreadydeque.Theblackinstructionineachthreadindicates thethread'scurrentlyreadyinstruction.onlythread kmayhavebeenworkedonsinceitlast 2 Γ 1 Γ 0 executing Wenowstateandproveanimportantlemmaonthestructureofthreadsintheready thread timeandcommunication.figure3illustratesthelemma. usedlaterinthissectiontoanalyzeexecutionspaceandinsection6toanalyzeexecution dequeofanyprocessorduringtheexecutionofafullystrictcomputation.thislemmais Lemma4IntheexecutionofanyfullystrictmultithreadedcomputationbytheWork-Stealing thread.let 0bethethreadthatpisworkingon,letkbethenumberofthreadsinp'sready Algorithm,consideranyprocessorpandanygiventimestepatwhichpisworkingona inp'sreadydequesatisfythefollowingproperties: top,sothat 1isthebottommostand kisthetopmost.ifwehavek>0,thenthethreads deque,andlet 1; 2;:::; kdenotethethreadsinp'sreadydequeorderedfrombottomto ➀Fori=1;2;:::;k,thread iistheparentof i 1. Proof: ➁Ifwehavek>1,thenfori=1;2;:::;k 1,thread ihasnotbeenworkedonsince itspawned i 1. processorpexecutesaninstructionfromthread 0.Let 1; 2;:::; kdenotethekthreads therootthreadinsomeprocessor'sreadydequeandallotherreadydequesempty,sothe lemmavacuouslyholdsattheoutset.now,consideranystepofthealgorithmatwhich Theproofisastraightforwardinductiononexecutiontime.Executionbeginswith inp'sreadydequebeforethestep,andsupposethateitherk=0orbothpropertieshold. propertiesholdafterthestep. algorithmandshowthattheyallpreservethelemma.thatis,eitherk0=0orboth denotethek0threadsinp'sreadydequeafterthestep.wenowlookattherulesofthe Let 0denotethethread(ifany)beingworkedonbypafterthestep,andlet 01; 02;:::; 0k0 Property➀:Ifk0>1,thenforj=2;3;:::;k0,thread 0jistheparentof 0j 1,sincebefore andcommencesworkonthechild.thus, 0isthechild,wehavek0=k+1>0,and forj=1;2;:::;k0,wehave 0j= j 1.SeeFigure4.Now,wecancheckbothproperties. Rule➊:If 0spawnsachild,thenppushes 0ontothebottomofthereadydeque thespawnwehavek>0,whichmeansthatfori=1;2;:::;k,thread iistheparentof i 1. 11
12 Moreover, 01isobviouslytheparentof 0.Property➁:Ifk0>2,thenforj=2;3;:::;k0 1, spawnonlyjustoccurred. k>1,whichmeansthatfori=1;2;:::;k 1,thread ihasnotbeenworkedonsinceit thread 0jhasnotbeenworkedonsinceitspawned 0j 1,becausebeforethespawnwehave spawned i 1.Finally,thread 01hasnotbeenworkedonsinceitspawned 0,becausethe Γ k Γ 2 Γ Figure4:Thereadydequeofaprocessorbeforeandafterthethread 0thatitisworkingon 3 Γ 1 Γ spawnsachild.(notethatthethreads 0and 0arenotactuallyinthedeque;theyarethe (a)beforespawn. (b)afterspawn. 2 Γ 0 Γ 1 Γ readydequeisempty,sotheprocessorcommencesworkstealing,andwhentheprocessor threadsbeingworkedonbeforeandafterthespawn.) stealsandbeginsworkonathread,wehavek0=0.ifk>0,thereadydequeisnot empty,sotheprocessorpopsthebottommostthreadothedequeandcommencesworkon Rules➋and➌:If 0stallsordies,thenwehavetwocasestoconsider.Ifk=0,the 0 Forj=1;2;:::;k0,thread 0jistheparentof 0j 1,sincefori=1;2;:::;k,thread iisthe have 0j= j+1.seefigure5.now,ifk0>0,wecancheckbothproperties.property➀: parentof i 1.Property➁:Ifk0>1,thenforj=1;2;:::;k0 1,thread 0jhasnotbeen it.thus,wehave 0= 1(thepoppedthread)andk0=k 1,andforj=1;2;:::;k0,we meansthatfori=2;3;:::;k 1,thread ihasnotbeenworkedonsinceitspawned i 1. workedonsinceitspawned 0j 1,becausebeforethestallordeathwehavek>2,which Γ k Γ k Γ k Γ 2 Γ (Notethatthethreads 0and 0arenotactuallyinthedeque;theyarethethreadsbeingworked Figure5:Thereadydequeofaprocessorbeforeandafterthethread 0thatitisworkingondies. (a)beforedeath. (b)afterdeath. 1 Γ 1 Γ 0 Γ viouslystalledthreadmustbe 0'sparent.First,weobservethatwemusthavek=0.If onbeforeandafterthedeath.) Rule➍:If 0enablesastalledthread,thenduetothefullystrictcondition,thatpre- 0 12
13 thebottomofthereadydeque.wehave 0= 0andk0=k+1=1with 01denotingthe apply.withk=0,thereadydequeisemptyandtheprocessorplacestheparentthreadon bebottommostinthereadydeque.thus,thisparentthreadisreadyandrule➍doesnot wehavek>0,thentheprocessor'sreadydequeisnotempty,andthisparentthreadmust newlyenabledparent.weonlyhavetochecktherstproperty.property➀:thread 01is afterthestealwehavek0=k 1.Ifk0>0holds,thenbothpropertiesareclearlypreserved. obviouslytheparentof 0. notinvokeanyoftheaboverules clearlypreservethelemma. Allotheractionsbyprocessorp suchasworkstealingorexecutinganinstructionthatdoes Ifsomeotherprocessorstealsathreadfromprocessorp,thenwemusthavek>0,and k 1andbroughtbacktoprocessorp'sreadydeque.Thekeyobservationisthatwhen kis kisstolenfromprocessorpandthenstallsonitsnewprocessor.later, kisreenabledby workedonsinceitspawned k 1,sinceProperty➁excludes k.thissituationariseswhen Beforemovingon,itisworthpointingouthowitmayhappenthatthread khasbeen k 2; k 3;:::; 0showninFigure3werespawnedafter kwasreenabled. reenabled,processorp'sreadydequeisemptyandpisworkingon k 1.Theotherthreads Theorem5ForanyfullystrictmultithreadedcomputationwithstackdepthS1,theWork- executingafullystrictcomputation. WeconcludethissectionbyboundingthespaceusedbytheWork-StealingAlgorithm Proof: StealingAlgorithmrunonacomputerwithPprocessorsusesatmostS1Pspace. hasaprocessorworkingonit.ifwecanestablishthisfact,thenlemma2completesthe proof. leavesproperty:ateverytimestepoftheexecution,everyleafinthecurrentspawnsubtree LiketheBusy-LeavesAlgorithm,theWork-StealingAlgorithmmaintainsthebusy- ofsomeprocessor.butlemma4guaranteesthatnoleafthreadsitsinaprocessor'sready readyandthereforemusteitherhaveaprocessorworkingonitorbeinthereadydeque sequenceoflemma4.ateverytimestep,everyleafinthecurrentspawnsubtreemustbe ThattheWork-StealingAlgorithmmaintainsthebusy-leavespropertyisasimplecon- dequewhiletheprocessorworksonsomeotherthread. whenmultiplethiefprocessorssimultaneouslyattempttostealfromthesamevictim. however,wemusttakecaretodeneamodelforcopingwiththecontentionthatmayarise nicationboundsforthework-stealingalgorithm.beforewecanproceedwiththisanalysis, Withthespaceboundinhand,wenowturnattentiontoanalyzingthetimeandcommu- executionofamultithreadedcomputationbythework-stealingalgorithm.weintroduce Thissectionpresentsthe\atomic-access"modelthatweusetoanalyzecontentionduringthe 5 Atomicaccessesandtherecyclinggame incurredbyrandom,asynchronousaccessesinthismodel.weshallusetheresultsofthis acombinatorial\ballsandbins"game,whichweusetoboundthetotalamountofdelay sectioninsection6,whereweanalyzethework-stealingalgorithm. Algorithm.WeassumethatthemachineisanasynchronousparallelcomputerwithP Theatomic-accessmodelisthemachinemodelweusetoanalyzetheWork-Stealing 13
14 themodelofkarpandzhang[33].theyassumethatifconcurrentstealrequestsaremade theatomicmessage-passingmodelof[36].thisassumptionismorestringentthanthatin processors,anditsmemorycanbeeitherdistributedorshared.ouranalysisassumesthat toadeque,inonetimestep,onerequestissatisedandalltheothersaredenied.inthe concurrentaccessestothesamedatastructureareseriallyqueuedbyanadversary,asin Theonlyconstraintontheadversaryisthatifthereisatleastonerequestforadeque,then byanadversary,ratherthanbeingdenied.moreover,fromthecollectionofwaitingrequests foragivendeque,theadversarygetstochoosewhichisservicedandwhichcontinuetowait. atomic-accessmodel,wealsoassumethatonerequestissatised,buttheothersarequeued theadversarycannotchoosethatnonebeserviced. islikelytobeproportionaltothetotalnumbermofrequests,nomatterwhichprocessors processorstopdequeswitheachprocessorallowedatmostoneoutstandingrequest,then thetotalamountoftimethattheprocessorsspendwaitingfortheirrequeststobesatised ThemainresultofthissectionistoshowthatifrequestsaremaderandomlybyP maketherequestsandnomatterhowtherequestsaredistributedovertime.inorderto bytheadversary. provethisresult,weintroducea\ballsandbins"gamethatmodelstheeectsofqueueing executedbytheadversary.initially,allpballsareinareservoirseparatefromthepbins. whichisequaltothenumberofbins.theparametermisthetotalnumberofballtosses ballsaretossedatrandomintobins.theparameterpisthenumberofballsinthegame, The(P;M)-recyclinggameisacombinatorialgameplayedbytheadversary,inwhich Ateachstepofthegame,theadversaryexecutesthefollowingtwooperationsinsequence: 1.Theadversarychoosessomeoftheballsinthereservoir(possiblyallandpossibly none),andthenforeachoftheseballs,theadversaryremovesitfromthereservoir, 2.TheadversaryinspectseachofthePbinsinturn,andforeachbinthatcontainsat selectsoneofthepbinsuniformlyandindependentlyatrandom,andtossestheball leastoneball,theadversaryremovesanyoneoftheballsinthebinandreturnsitto intoit. tosseshavebeenmadeandallballshavebeenremovedfromthebinsandplacedbackinthe TheadversaryispermittedtomakeatotalofMballtosses.ThegameendswhenMball thereservoir. reservoir. isinthereservoir,itmeansthattheball'sownerisnotmakingastealrequest.ifaballis rithm.wecanvieweachballandeachbinasbeingownedbyadistinctprocessor.ifaball inabin,itmeansthattheball'sownerhasmadeastealrequesttothedequeofthebin's TherecyclinggamemodelstheservicingofstealrequestsbytheWork-StealingAlgo- andreturnedtothereservoir,itmeansthattherequesthasbeenserviced. owner,butthattherequesthasnotyetbeensatised.whenaballisremovedfromabin adversaryistomakethetotaldelayaslargeaspossible.thenextlemmashowsthatdespite delayd=ptt=1nt,wheretisthetotalnumberofstepsinthegame.thegoalofthe correspondtostealrequeststhathavenotbeensatised.weshallbeinterestedinthetotal Aftereachsteptofthegame,therearesomenumberntofballsleftinthebins,which 14
15 Lemma6Forany>0,withprobabilityatleast1,thetotaldelayinthe(P;M)-recycling tothereservoir,thetotaldelayisunlikelytobelarge. thechoicesthattheadversarymakesaboutwhichballstotossintobinsandwhichtoreturn modeliso(m+plgp+plg(1=))withprobabilityatleast1,andtheexpectedtotaldelay isatmostm. thetotaldelayincurredbymrandomrequestsmadebypprocessorsintheatomic-access gameiso(m+plgp+plg(1=)).1theexpectedtotaldelayisatmostm.inotherwords, ballfromeachbinisimmaterial,andthus,wecanassumethatballsarequeuedintheirbins Proof: andwhentheadversarytossesaball,itisplacedonthebackofthequeue.ifseveralballs inarst-in-rst-out(fifo)order.theadversaryremovesballsfromthefrontofthequeue, Werstmaketheobservationthatthestrategybywhichtheadversarychoosesa aretossedintothesamebinatthesamestep,theycanbeplacedonthebackofthequeue ballistossed. inanyorder.thereasonthatassumingafifodisciplineforqueuingballsinabindoesnot aectthetotaldelayisthatthenumberofballsinagivenbinatagivenstepisthesame nomatterwhichballisremoved,andwhereballsaretossedhasnothingtodowithwhich totalnumberofstepsthatnishwithballrinabin.then,wehave orinthereservoir.denethedelayofballrtobetherandomvariablerdenotingthe Foranygivenballandanygivenstep,thestepeithernisheswiththetheballinabin ithtimeitistosseduntilitisreturnedtothereservoir.denealsotheithdelayofaball Denetheithcycleofaballtobethosestepsinwhichtheballremainsinabinfromthe D=PXr=1r: (1) tobethenumberofstepsinitsithcycle. have=pmi=1di. ofball1.ifweletmdenotethenumberoftimesthatball1istossedbytheadversary,and fori=1;2;:::;m,letdibetherandomvariabledenotingtheithdelayofball1,thenwe Weshallanalyzethetotaldelaybyfocusing,withoutlossofgenerality,onthedelay=1 byanotherballreitheronceornotatall.consequently,wecandecomposeeachrandom theadversaryfollowsthefiforule,itfollowsthattheithcycleofball1canbedelayed placesitinsomebinkandballrisremovedfrombinkduringtheithcycleofball1.since Wesaythattheithcycleofball1isdelayedbyanotherballriftheithtossofball1 variablediintoasumdi=xi2+xi3++ximofindicatorrandomvariables,where Thus,wehave xir=(1iftheithcycleofball1isdelayedbyballr; 0otherwise. 1=isatmostpolynomialinMandP[40]. 1GregPlaxtonoftheUniversityofTexas,AustinhasimprovedthisboundtoO(M)forthecasewhen =mxi=1pxr=2xir: (2) 15
16 delayedbyballr.foranysuchsets,weclaimthat setsofpairs(i;r),eachofwhichcorrespondstotheeventthattheithcycleofball1is Wenowproveanimportantpropertyoftheseindicatorrandomvariables.Considerany Thecruxofprovingtheclaimistoshowthat Pr8<:^ (i;r)2s(xir=1)9=;p jsj: (3) wheres0=s f(i;r)g,whencetheclaim(3)followsfrombayes'stheorem. Pr8<:xir=1^ (i0;r0)2s0(xi0r0=1)9=;1=p; (4) withprobabilityeither1=por0,andhence,withprobabilityatmost1=p.conditioningon tossofball1,itfallsintowhateverbincontainsballr,ifany.apriori,thiseventhappens saryfollowsthefiforule,wehavethatxir=1onlyif,whentheadversaryexecutestheith WecanderiveInequality(4)fromacarefulanalysisofdependencies.Becausetheadver- tellsnothingaboutwheretheithtossofball1goes.therefore,theserandomvariablesare creasethisprobability,aswenowargueintwocases.intherstcase,theindicatorrandom variablesxi0r0,wherei06=i,tellwhetherothercyclesofball1aredelayed.thisinformation anycollectionofeventsrelatingwhichballsdelaythisorothercyclesofball1cannotin- independentofxir,andthus,theprobability1=pupperboundisnotaected.inthesecond containingballr0,butthisinformationtellsusnothingaboutwhetheritgoestothebin case,theindicatorrandomvariablesxir0tellwhethertheithtossofball1goestothebin randballr0arelocated.moreover,no\collusion"amongtheindicatorrandomvariables providesanymoreinformation,andthusinequality(4)holds. containingballr,becausetheindicatorrandomvariablestellusnothingtorelatewhereball orexceedagivenvalue,theremustbesomesetcontainingoftheseindicatorrandom canbeexpressesasasumofm(p 1)indicatorrandomvariables.Inorderfortoequal variables,eachofwhichmustbe1.foranyspecicsuchset,inequality(3)saysthatthe Equation(2)showsthatthedelayencounteredbyball1throughoutallofitscycles probabilityisatmostp thatallrandomvariablesinthesetare1.sincethereare m(p 1) (emp=)suchsets,whereeisthebaseofthenaturallogarithm,wehave PrfgemP =em P whenevermaxf2em;lgp+lg(1=)g. Althoughouranalysiswasperformedforball1,itappliestoanyotherballaswell. =P; exceedsmaxf2emr;lgp+lg(1=)gisatmost=p.byboole'sinequalityandequation(1), Consequently,foranygivenballrwhichistossedmrtimes,theprobabilitythatitsdelayr 16
17 itfollowsthatwithprobabilityatleast1,thetotaldelaydisatmost DPXr=1maxf2emr;lgP+lg(1=)g sincem=ppr=1mr. TheupperboundE[D]Mcanbeobtainedasfollows.Recallthateachristhe =(M+PlgP+Plg(1=)); sumof(p 1)mrindicatorrandomvariables,eachofwhichhasexpectationatmost1=P. turnbacktothework-stealingalgorithm. linearityofexpectation,weobtaine[d]m. Therefore,bylinearityofexpectation,E[r]mr.UsingEquation(1)andagainusing WiththisboundonthetotaldelayincurredbyMrandomrequestsnowinhand,we 6tithreadedcomputationwiththeWork-StealingAlgorithm.Foranyfullystrictcomputation Inthissection,weanalyzethetimeandcommunicationcostofexecutingafullystrictmul- Analysisofthework-stealingalgorithm withworkt1andcritical-pathlengtht1,weshowthattheexpectedrunningtimewith Pprocessors,includingschedulingoverhead,isT1=P+O(T1).Moreover,forany>0, theexecutiontimeonpprocessorsist1=p+o(t1+lgp+lg(1=)),withprobabilityat fullystrictcomputationiso(pt1(1+nd)smax),wherendisthemaximumnumberofjoin least1.wealsoshowthattheexpectedtotalcommunicationduringtheexecutionofa edgesfromathreadtoitsparentandsmaxisthelargestsizeofanyactivationframe. victimsimultaneously.inthiscase,aswehaveindicatedintheprevioussection,wemake isdistributed,andsothereisnocontentionatacentralizeddatastructure.nevertheless,it isstillpossibleforcontentiontoarisewhenseveralthieveshappentodescendonthesame UnlikeintheBusy-LeavesAlgorithm,the\readypool"intheWork-StealingAlgorithm work-stealingresponsetakesanyconstantamountoftime. request.thisassumptioncanberelaxedwithoutmateriallyaectingtheresultssothata Wefurtherassumethatittakesunittimeforaprocessortorespondtoawork-stealing theconservativeassumptionthatanadversaryseriallyqueuesthework-stealingrequests. dollars,onefromeachprocessor.ateachstep,eachprocessorplacesitsdollarinoneof multithreadedcomputationwithworkt1andcritical-pathlengtht1onacomputerwith Pprocessors,weuseanaccountingargument.Ateachstepofthealgorithm,wecollectP ToanalyzetherunningtimeoftheWork-StealingAlgorithmexecutingafullystrict threebucketsaccordingtoitsactionsatthatstep.iftheprocessorexecutesaninstruction bucket.weshallderivetherunning-timeboundbyboundingthenumberofdollarsineach merelywaitsforaqueuedstealrequestatthestep,thenitplacesitsdollarintothewait atthestep,thenitplacesitsdollarintotheworkbucket.iftheprocessorinitiatesasteal bucketattheendoftheexecution,summingthesethreebounds,andthendividingbyp. attemptatthestep,thenitplacesitsdollarintothestealbucket.and,iftheprocessor WerstboundthetotalnumberofdollarsintheWorkbucket. 17
18 Lemma7TheexecutionofafullystrictmultithreadedcomputationwithworkT1bythe Proof:AprocessorplacesadollarintheWorkbucketonlywhenitexecutesaninstruction. intheworkbucket. Work-StealingAlgorithmonacomputerwithPprocessorsterminateswithexactlyT1dollars Thus,sincethereareT1instructionsinthecomputation,theexecutionendswithexactlyT1 tempts,andwemustalsodeneanaugmenteddagthatwethenusetodene\critical" \delay-sequence"argument.werstintroducethenotionofa\round"ofwork-stealat- dollarsintheworkbucket. instructions.theideaisasfollows.if,duringthecourseoftheexecution,alargenumberof BoundingthetotaldollarsintheStealbucketrequiresasignicantlymoreinvolved stealsareattempted,thenwecanidentifyasequenceofinstructions thedelaysequence in theaugmenteddagsuchthateachofthesestealattemptswasinitiatedwhilesomeinstructionfromthesequencewascritical.wethenshowthatacriticalinstructionisunlikelyto remaincriticalacrossamodestnumberofstealattempts.wecanthenconcludethatsuch adelaysequenceisunlikelytooccur,andtherefore,anexecutionisunlikelytosueralarge attemptssuchthatifastealattemptthatisinitiatedattimesteptoccursinaparticular round,thenallotherstealattemptsinitiatedattimesteptarealsointhesameround.we canpartitionallofthestealattemptsthatoccurduringanexecutionintoroundsasfollows. Aroundofstealattemptsisasetofatleast3Pbutfewerthan4Pconsecutivesteal numberofstealattempts. therstroundstartsattimestep1andendsattimestept1.ingeneral,iftheithround endsattimestepti,thenthe(i+1)stroundbeginsattimestepti+1andendsatthe Therstroundcontainsallstealattemptsinitiatedattimesteps1;2;:::;t1,wheret1isthe earliesttimesuchthatatleast3pstealattemptswereinitiatedatorbeforet1.wesaythat denition,eachroundcontainsatleast3pconsecutivestealattempts.moreover,sinceat mostp 1stealattemptscanbeinitiatedinasingletimestep,eachroundcontainsfewer stepsbetweenti+1andti+1,inclusive.thesestealattemptsbelongtoroundi+1.by earliesttimestepti+1>ti+1suchthatatleast3pstealattemptswereinitiatedattime anaugmenteddagobtainedbymodifyingtheoriginaldagslightly.letgdenotetheoriginal than4p 1stealattempts,andeachroundtakesatleast4steps. spawn,andjoinedgesasedges.theaugmenteddagg0istheoriginaldaggtogetherwith dag,thatis,thedagconsistingofthecomputation'sinstructionsasverticesanditscontinue, Thesequenceofinstructionsthatmakeupthedelaysequenceisdenedwithrespectto dequeedgesareshowndashedinfigure3.insection2wemadethetechnicalassumption spawnedgeand(u;w)isacontinueedge,thedequeedge(w;v)isplaceding0.these somenewedges,asfollows.foreverysetofinstructionsu,v,andwsuchthat(u;v)isa outthatg0isonlyananalyticaltool.thedequeedgeshavenoeectontheschedulingand executionofthecomputationbythework-stealingalgorithm. longestpathing,thenthelongestpathing0haslengthatmost2t1.itisworthpointing thatinstructionwhasnoincomingjoinedges,andsog0isadag.ift1isthelengthofa structionwsuchthatthereisadirectedpathfromwtoving0,instructionwhasbeen theexecution,wesaythatanunexecutedinstructionviscriticalifeveryinstructionthat precedesv(eitherdirectlyorindirectly)ing0hasbeenexecuted,thatis,ifforeveryin- Thedequeedgesarethekeytodeningcriticalinstructions.Atanytimestepduring 18
19 readyinstructionmayormaynotbecritical.intuitively,thestructuralpropertiesofaready executed.acriticalinstructionmustbeready,sinceg0containseveryedgeofg,buta instructionacrossthedequeedgehasnotyetbeenexecuted. dequeenumeratedinlemma4guaranteethatifathreadisdeepinareadydeque,then itscurrentinstructioncannotbecritical,becausethepredecessorofthethread'scurrent Denition8Adelaysequenceisa3-tuple(U;R;)satisfyingthefollowingconditions: U=(u1;u2;:::;uL)isamaximaldirectedpathinG0.Specically,fori=1;2;:::;L Wenowformalizeourdenitionofadelaysequence. structionu1mustbetherstinstructionoftherootthread),andinstructionulhasno outgoingedgesing0(instructionulmustbethelastinstructionoftherootthread). 1,theedge(ui;ui+1)belongstoG0,instructionu1hasnoincomingedgesinG0(in- Risapositiveintegernumberofsteal-attemptrounds. =(1;01;2;02;:::;L;0L)isapartitionofR(thatisR=PLi=1(i+0i)),such ofthepartitioncorrespondstotherst1rounds.thesecondpiececorrespondstothenext ThepartitioninducesapartitionofasequenceofRroundsasfollows.Therstpiece that0i2f0;1gforeachi=1;2;:::;l. tobetheiconsecutiveroundsstartingaftertherithround,whereri=pi 1 inthepiecescorrespondingtothei,notthe0i,andsowedenetheithgroupofrounds consecutiveroundsaftertherst(1+01)rounds,andsoon.weareinterestedprimarily 01consecutiveroundsaftertherst1rounds.Thethirdpiececorrespondstothenext2 BecauseisapartitionofRand0i2f0;1g,fori=1;2;:::;L,wehave LXi=1iR L: j=1(j+0j). ofthestealattemptsthatcomprisetheroundareinitiatedattimestepswhenviscritical. Wesaythatagivenroundofstealattemptsoccurswhileinstructionviscriticalifall (5) rounds. occurwhileinstructionuiiscritical.inotherwords,uimustbecriticalthroughoutalli issaidtooccurduringanexecutionifforeachi=1;2;:::;l,alliroundsintheithgroup Inotherwords,vmustbecriticalthroughouttheentireround.Adelaysequence(U;R;) G0andapartition=(1;01;2;02;:::;L;0L)oftherstRrounds,suchthatforeach thensomedelaysequence(u;r;)mustoccur.inparticular,ifwelookatanyexecutionin whichatleastrroundsoccur,thenwecanidentifyapathu=(u1;u2;:::;ul)inthedag ThefollowinglemmastatesthatifatleastRroundstakeplaceduringanexecution, Sucharoundcannotbepartofanygroup,becausenoinstructioniscriticalthroughout. whetheruiiscriticalatthebeginningofaroundbutgetsexecutedbeforetheroundends. i=1;2;:::l,alloftheiroundsintheithgroupoccurwhileuiiscritical.each0iindicates occur. 4PRstealattemptsoccurduringtheexecution,thensomedelaysequence(U;R;)must pathlengtht1bythework-stealingalgorithmonacomputerwithpprocessors.ifatleast Lemma9Considertheexecutionofafullystrictmultithreadedcomputationwithcritical- 19
20 instructionsonadirectedpathing0suchthatforeverytimestepduringtheexecution, Proof: adelaysequence(u;r;)andshowthatitoccurs.withatleast4prstealattempts,there mustbeatleastrrounds.weconstructthedelaysequencebyrstidentifyingasetof Foragivenexecutioninwhichatleast4PRstealattemptstakeplace,weconstruct oneoftheseinstructionsiscritical.then,wepartitiontherstrroundsaccordingtowhen eachroundoccursrelativetowheneachinstructiononthepathiscritical. whichwedenotebyv1.letvl1denotea(notnecessarilyimmediate)predecessorinstruction ofv1ing0withthelatestexecutiontime.let(vl1;:::;v2;v1)denoteadirectedpathfrom vl1tov1ing0.weextendthispathbacktotherstinstructionoftherootthreadby ToconstructthepathU,weworkbackwardsfromthelastinstructionoftherootthread, ing0.wenishiteratingtheconstructionwhenwegettoaniterationkinwhichvlkisthe latestexecutiontime,andlet(vli+1;:::;vli+1;vli)denoteadirectedpathfromvli+1tovli directedpathing0fromvlitov1.weletvli+1denoteapredecessorofvliing0withthe iteratingthisconstructionasfollows.attheithiterationwehaveaninstructionvlianda rstinstructionoftherootthread.ourdesiredsequenceisthenu=(u1;u2;:::;ul),where L=lkandui=vL i+1fori=1;2;:::;l.onecanverifythatateverytimestepofthe execution,oneofthevliiscritical. oftherstrroundsaccordingtowheneachroundoccurs.wewouldlikeourpartitionto besuchthatforeachround(amongtherstrrounds),wehavethepropertythatifthe roundoccurswhilesomeinstructionuiiscritical,thentheroundbelongstotheithgroup. Now,toconstructthepartition=(1;01;2;02;:::;L;0L),wepartitionthesequence theseroundsareconsecutiveatthebeginningofthesequence,sotheseroundscomprisethe Startwith1,andlet1equalthenumberofroundsthatoccurwhileu1iscritical.Allof 1stgroup thatis,theyarethe1consecutiveroundsstartingafterther1=0rstrounds. Next,iftheroundthatimmediatelyfollowsthoserst1roundsbeginsafteru1hasbeen criticalandendsafteru1isexecuted(forotherwise,itwouldbepartoftherstgroup),so executed,thenweset01=0,andwegoonto2.otherwise,thatroundbeginswhileu1is weset01=1,andwegoonto2.for2,welet2equalthenumberofroundsthatoccur thenumberofroundsthatbeginwhileuiiscriticalbutdonotenduntilafteruiisexecuted. lettingeachibethenumberofroundsthatoccurwhileuiiscriticalandlettingeach0ibe r2=1+01rounds,sotheseroundscomprisethe2ndgroup.wecontinueinthisfashion, whileu2iscritical.notethatalloftheseroundsareconsecutivebeginningaftertherst Asanexample,wemayhavearoundthatbeginswhileuiiscriticalandthenendswhile sequenceandthatitoccurs.byconstruction,uisamaximalpathing0.nowconsidering ui+2iscritical,andinthiscase,weset0i=1and0i+1=0.inthisexample,the(i+1)st groupisempty,sowealsoseti+1=0.,weobservethateachroundamongtherstrroundsiscountedexactlyonceineither aiora0i,soisindeedapartitionofr.moreover,fori=1;2;:::;l,atmostone Weconcludetheproofbyverifyingthatthe(U;R;)asjustconstructedisadelay uiiscritical.therefore,thedelaysequence(u;r;)occurs. fori=1;2;:::;l,theiroundsthatcomprisetheithgroupalloccurwhiletheinstruction 0i2f0;1g.Thus,(U;R;)isadelaysequence.Finally,weobservethat,byconstruction, roundcanbeginwhiletheinstructionuiiscriticalandendafteruiisexecuted,sowehave numberofrounds.specically,werstshowthatacriticalinstructionmustbetheready Wenowestablishthatacriticalinstructionisunlikelytoremaincriticalacrossamodest 20
M.S. in Business and Information Systems
1 M.S. in Business and Information Systems (30 Credits) M.S. in Business and Information Systems (courses only) Bridge Courses Select one of the following: 3 CS 100 Roadmap to Computing CS 113 Introduction
More informationElemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva robert.geva@intel.com Introduction Intel
More informationIn Proceedings of Performance Tools 98 Lecture Notes in Computer Science, Vol. 1468, pp. 231-242, September 1998
In Proceedings of Performance Tools 98 OnChoosingaTaskAssignmentPolicyfora Lecture Notes in Computer Science, Vol. 1468, pp. 231-242, September 1998 MorHarchol-Balter?;1,MarkE.Crovella??;2,andCristinaD.Murta???;2
More informationTutorial 8. NP-Complete Problems
Tutorial 8 NP-Complete Problems Decision Problem Statement of a decision problem Part 1: instance description defining the input Part 2: question stating the actual yesor-no question A decision problem
More informationAn Oracle White Paper July 2012. Load Balancing in Oracle Tuxedo ATMI Applications
An Oracle White Paper July 2012 Load Balancing in Oracle Tuxedo ATMI Applications Introduction... 2 Tuxedo Routing... 2 How Requests Are Routed... 2 Goal of Load Balancing... 3 Where Load Balancing Takes
More information2.0. Specification of HSN 2.0 JavaScript Static Analyzer
2.0 Specification of HSN 2.0 JavaScript Static Analyzer Pawe l Jacewicz Version 0.3 Last edit by: Lukasz Siewierski, 2012-11-08 Relevant issues: #4925 Sprint: 11 Summary This document specifies operation
More informationBachelor of Technology (Computer Engineering.) Scheme of Courses/Examination. (3 rd SEMESTER) 1 HUT-211 Organizational Behaviour 2 1-3 60 40-100 3 2.
Bachelor of Technology (Computer Engineering.) Scheme of s/examination Sl. (3 rd SEMESTER) Teaching Schedule Examination Schedule 1 HUT-211 Organizational Behaviour 2 1-3 60 40-100 3 2.5 2 COT-201 Programming
More informationCode and Process Migration! Motivation!
Code and Process Migration! Motivation How does migration occur? Resource migration Agent-based system Details of process migration Lecture 6, page 1 Motivation! Key reasons: performance and flexibility
More informationJava SE 7 Programming
Java SE 7 Programming The second of two courses that cover the Java Standard Edition 7 (Java SE 7) Platform, this course covers the core Application Programming Interfaces (API) you will use to design
More informationJob Scheduling Model
Scheduling 1 Job Scheduling Model problem scenario: a set of jobs needs to be executed using a single server, on which only one job at a time may run for theith job, we have an arrival timea i and a run
More informationMANAGEMENT CONSOLE PERFORMANCE COMPARISON
MANAGEMENT CONSOLE PERFORMANCE COMPARISON Performance comparison between Blancco Management Console 3.3.2 and 3.6.0 www.blanccotechnologygroup.com TABLE OF CONTENTS 1. Overview... 3 Method... 3 2. Performance
More informationDistributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationOracle Health Sciences Network Patient Recruiter Cloud Service - Overview
Oracle Health Sciences Network Patient Recruiter Cloud Service Services Description Version: 2.0 Effective Date: 01-April-2013 Oracle Health Sciences Network Patient Recruiter Cloud Service - Services
More informationDynamic Thread Pool based Service Tracking Manager
Dynamic Thread Pool based Service Tracking Manager D.V.Lavanya, V.K.Govindan Department of Computer Science & Engineering National Institute of Technology Calicut Calicut, India e-mail: lavanya.vijaysri@gmail.com,
More informationEECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun
EECS 750: Advanced Operating Systems 01/28 /2015 Heechul Yun 1 Recap: Completely Fair Scheduler(CFS) Each task maintains its virtual time V i = E i 1 w i, where E is executed time, w is a weight Pick the
More informationTechnology: Enterprise Storage
Accelerated Capital Allowances Eligibility Criteria Category: Information and Communications Technology (ICT) Technology: Enterprise Storage Enterprise Storage equipment is defined as a storage device
More informationCapacity Scheduler Guide
Table of contents 1 Purpose...2 2 Features... 2 3 Picking a task to run...2 4 Installation...3 5 Configuration... 3 5.1 Using the Capacity Scheduler... 3 5.2 Setting up queues...3 5.3 Configuring properties
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationTIME VALUE OF MONEY. Return of vs. Return on Investment: We EXPECT to get more than we invest!
TIME VALUE OF MONEY Return of vs. Return on Investment: We EXPECT to get more than we invest! Invest $1,000 it becomes $1,050 $1,000 return of $50 return on Factors to consider when assessing Return on
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationApplication. Performance Testing
Application Performance Testing www.mohandespishegan.com شرکت مهندش پیشگان آزمون افسار یاش Performance Testing March 2015 1 TOC Software performance engineering Performance testing terminology Performance
More informationCurriculum. for the Master s degree programme. Applied Informatics. Programme code L 066 911. Effective date: 1 st of October 2013
APPENDIX 1 to the University Bulletin Issue 20, No. 159.1-2012/2013, 19.06.2013 Curriculum for the Master s degree programme Applied Informatics Programme code L 066 911 Effective date: 1 st of October
More informationMemory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor
WHITE PAPER Memory Characterization to Analyze and Predict Multimedia Performance and Power in an Application Processor Yu Bai Staff Engineer, APSE Marvell November 2011 www.marvell.com Introduction: Nowadays,
More informationDeadlock Detection and Recovery!
Deadlock Detection and Recovery! Richard M. Fujimoto! Professor!! Computational Science and Engineering Division! College of Computing! Georgia Institute of Technology! Atlanta, GA 30332-0765, USA!! http://www.cc.gatech.edu/~fujimoto/!
More informationLoad Balancing & Termination
Load Load & Termination Lecture 7 Load and Termination Detection Load Load Want all processors to operate continuously to minimize time to completion. Load balancing determines what work will be done by
More informationCASE STUDY: Do Press Releases Work?
CASE STUDY: Do Press Releases Work? Do Press Releases Work? Case Study: Is Press Release USELESS, or USEFUL? Copyright 2014 MarketersMedia - Press Release Distribution All Rights Reserved 2 Introduction:
More informationTechnician High Pressure Pump Guide for the 7.3 Power Stroke Engine
Technician High Pressure Pump Guide for the 7.3 Power Stroke Engine HIGH PRESSURE PUMP. PUMP LEAKS. ICP SYSTEM DIAGNOSTICS. REPAIR PARTS. TOOLS IPR TEST TOOLS AND ICP PUMP LEAK REPAIR High pressure pumps
More informationMath. Rounding Decimals. Answers. 1) Round to the nearest tenth. 8.54 8.5. 2) Round to the nearest whole number. 99.59 100
1) Round to the nearest tenth. 8.54 8.5 2) Round to the nearest whole number. 99.59 100 3) Round to the nearest tenth. 310.286 310.3 4) Round to the nearest whole number. 6.4 6 5) Round to the nearest
More informationComp 204: Computer Systems and Their Implementation. Lecture 12: Scheduling Algorithms cont d
Comp 204: Computer Systems and Their Implementation Lecture 12: Scheduling Algorithms cont d 1 Today Scheduling continued Multilevel queues Examples Thread scheduling 2 Question A starvation-free job-scheduling
More informationJava SE 7 Programming
Oracle University Contact Us: 1.800.529.0165 Java SE 7 Programming Duration: 5 Days What you will learn This Java SE 7 Programming training explores the core Application Programming Interfaces (API) you'll
More informationJava SE 7 Programming
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 4108 4709 Java SE 7 Programming Duration: 5 Days What you will learn This Java Programming training covers the core Application Programming
More informationArtificial Intelligence Beating Human Opponents in Poker
Artificial Intelligence Beating Human Opponents in Poker Stephen Bozak University of Rochester Independent Research Project May 8, 26 Abstract In the popular Poker game, Texas Hold Em, there are never
More informationCS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014)
CS Standards Crosswalk: CSTA K-12 Computer Science Standards and Oracle Java Programming (2014) CSTA Website Oracle Website Oracle Contact http://csta.acm.org/curriculum/sub/k12standards.html https://academy.oracle.com/oa-web-introcs-curriculum.html
More informationLoad balancing in SOAJA (Service Oriented Java Adaptive Applications)
Load balancing in SOAJA (Service Oriented Java Adaptive Applications) Richard Olejnik Université des Sciences et Technologies de Lille Laboratoire d Informatique Fondamentale de Lille (LIFL UMR CNRS 8022)
More informationDynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas
CHAPTER Dynamic Load Balancing 35 Using Work-Stealing Daniel Cederman and Philippas Tsigas In this chapter, we present a methodology for efficient load balancing of computational problems that can be easily
More informationAnnouncements. Basic Concepts. Histogram of Typical CPU- Burst Times. Dispatcher. CPU Scheduler. Burst Cycle. Reading
Announcements Reading Chapter 5 Chapter 7 (Monday or Wednesday) Basic Concepts CPU I/O burst cycle Process execution consists of a cycle of CPU execution and I/O wait. CPU burst distribution What are the
More informationTenacity and rupture degree of permutation graphs of complete bipartite graphs
Tenacity and rupture degree of permutation graphs of complete bipartite graphs Fengwei Li, Qingfang Ye and Xueliang Li Department of mathematics, Shaoxing University, Shaoxing Zhejiang 312000, P.R. China
More informationNUMERICAL CALCULATION OF THE DENSITY OF PRIME NUMBERS WITH A GIVEN LEAST PRIMITIVE ROOT
MATHEMATICS OF COMPUTATION Volume 71, Number 240, Pages 1781 1797 S 0025-5718(01)01382-5 Article electronically published on November 28, 2001 NUMERICAL CALCULATION OF THE DENSITY OF PRIME NUMBERS WITH
More informationImpact of Cloud Computing on Healthcare
Volume 1, Issue 3, October 2013 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Impact of Cloud Computing on Healthcare Yogesh Khullar Department of Computer
More informationMDS UK Patient Support Group Feedback: London Local AA/MDS Support Group Meeting 19/07/12
UK Patient Support Group Feedback: London Local AA/ Support Group Meeting 9/07/ Attendees How did you hear about this regional meeting? Attended* Feedback AA 7 (33%) 3 (67%) 7 5 *Attendance figures depend
More informationAnalysis and Comparison of CPU Scheduling Algorithms
Analysis and Comparison of CPU Scheduling Algorithms Pushpraj Singh 1, Vinod Singh 2, Anjani Pandey 3 1,2,3 Assistant Professor, VITS Engineering College Satna (MP), India Abstract Scheduling is a fundamental
More informationBachelor of Engineering (B.Eng.)
General description of degree program Name of degree program: Degree awarded: Specialization: Educational and professional goals: Program duration: Pre-study work experience: Conditions of admission: Traineeship
More informationReviewing All Applications & Critiques for a Review Meeting
proposalcentral Reviewing All Applications & Critiques for a Review Meeting If you need assistance, contact Customer Service by email at pcsupport@altum.com or by phone at 1-800-875-2562 or phone 703-964-5840
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;
More informationThread level parallelism
Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process
More informationSetting up your own Internet Radio Station
Setting up your own Internet Radio Station Author: Streamit B.V. Version: 1.0.0 Date: March 2007 Copyright Agreement Notitions about trademarks LUKAS IS A REGISTERED TRADEMARK OF STREAMIT, SEE WWW.STREAMIT.EU.
More informationDIPLOMADO DE JAVA - OCA
DIPLOMADO DE JAVA - OCA TABLA DE CONTENIDO INTRODUCCION... 3 ESTRUCTURA DEL DIPLOMADO... 4 Nivel I:... 4 Fundamentals of the Java Programming Language Java SE 7... 4 Introducing the Java Technology...
More informationNotes on Network Security - Introduction
Notes on Network Security - Introduction Security comes in all shapes and sizes, ranging from problems with software on a computer, to the integrity of messages and emails being sent on the Internet. Network
More informationGREATEST COMMON DIVISOR
DEFINITION: GREATEST COMMON DIVISOR The greatest common divisor (gcd) of a and b, denoted by (a, b), is the largest common divisor of integers a and b. THEOREM: If a and b are nonzero integers, then their
More informationREAL TIME OPERATING SYSTEMS. Lesson-18:
REAL TIME OPERATING SYSTEMS Lesson-18: Round Robin Time Slicing of tasks of equal priorities 1 1. Common scheduling models 2 Common scheduling models Cooperative Scheduling of ready tasks in a circular
More informationFAQ: BroadLink Multi-homing Load Balancers
FAQ: BroadLink Multi-homing Load Balancers BroadLink Overview Outbound Traffic Inbound Traffic Bandwidth Management Persistent Routing High Availability BroadLink Overview 1. What is BroadLink? BroadLink
More informationCommon Approaches to Real-Time Scheduling
Common Approaches to Real-Time Scheduling Clock-driven time-driven schedulers Priority-driven schedulers Examples of priority driven schedulers Effective timing constraints The Earliest-Deadline-First
More informationManaging Stop-Go Capital Flows in EM Asia: So Far, So Good
Managing Stop-Go Capital Flows in EM Asia: So Far, So Good Andrew Filardo Bank for International Settlements Prepared for the Central Banks of Finland and Austria Joint Conference on European Economic
More informationProcess Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff
Process Scheduling CS 241 February 24, 2012 Copyright University of Illinois CS 241 Staff 1 Announcements Mid-semester feedback survey (linked off web page) MP4 due Friday (not Tuesday) Midterm Next Tuesday,
More informationCommunication Cost in Big Data Processing
Communication Cost in Big Data Processing Dan Suciu University of Washington Joint work with Paul Beame and Paris Koutris and the Database Group at the UW 1 Queries on Big Data Big Data processing on distributed
More informationIMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING
ABSTRACT: IMPROVEMENT OF DIGITAL IMAGE RESOLUTION BY OVERSAMPLING Hakan Wiman Department of Photogrammetry, Royal Institute of Technology S - 100 44 Stockholm, Sweden (e-mail hakanw@fmi.kth.se) ISPRS Commission
More informationBreaking The Code. Ryan Lowe. Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and
Breaking The Code Ryan Lowe Ryan Lowe is currently a Ball State senior with a double major in Computer Science and Mathematics and a minor in Applied Physics. As a sophomore, he took an independent study
More informationLecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses
Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline
More informationContributions to Gang Scheduling
CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,
More informationUsing a Digital Recorder with Dragon NaturallySpeaking
Using a Digital Recorder with Dragon NaturallySpeaking For those desiring to record dictation on the go and later have it transcribed by Dragon, the use of a portable digital dictating device is a perfect
More informationApplying Fixed Route Principles To Improve Paratransit Runcutting. Keith Forstall
Applying Fixed Route Principles To Improve Paratransit Runcutting Keith Forstall Why is runcutting important? Scheduling algorithms are designed to schedule trips efficiently They depend on vehicle capacity
More informationShanghai R&D Vacancies August 2014 PV, PE, Intern
RD Shanghai R&D Vacancies August 2014 PV, PE, Intern 1. Lead Software Engineer- Routing (Req#: 9528) Responsible for development and maintenance of signal routing in EDI platform (NanoRoute). Implementation
More informationScheduling Parallel Jobs with Monotone Speedup 1
Scheduling Parallel Jobs with Monotone Speedup 1 Alexander Grigoriev, Marc Uetz Maastricht University, Quantitative Economics, P.O.Box 616, 6200 MD Maastricht, The Netherlands, {a.grigoriev@ke.unimaas.nl,
More informationKerrighed: use cases. Cyril Brulebois. Kerrighed. Kerlabs
Kerrighed: use cases Cyril Brulebois cyril.brulebois@kerlabs.com Kerrighed http://www.kerrighed.org/ Kerlabs http://www.kerlabs.com/ 1 / 23 Introducing Kerrighed What s Kerrighed? Single-System Image (SSI)
More informationPortfolio Replication Variable Annuity Case Study. Curt Burmeister Senior Director Algorithmics
Portfolio Replication Variable Annuity Case Study Curt Burmeister Senior Director Algorithmics What is Portfolio Replication? To find a portfolio of assets whose value is equal to the value of a liability
More informationGuided Performance Analysis with the NVIDIA Visual Profiler
Guided Performance Analysis with the NVIDIA Visual Profiler Identifying Performance Opportunities NVIDIA Nsight Eclipse Edition (nsight) NVIDIA Visual Profiler (nvvp) nvprof command-line profiler Guided
More informationOnline Supplement for Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers by Mohammad H. Yarmand and Douglas G.
Online Supplement for Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers by Mohammad H Yarmand and Douglas G Down Appendix A Lemma 1 - the remaining cases In this appendix,
More informationCPU Scheduling Outline
CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different
More informationP900 SERIES PORTABLE HYDROSTATIC TESTER
P900 SERIES PORTABLE HYDROSTATIC TESTER MODELMODEL P900-SM ( STANDARD MODEL) SHOWN IN PHOTO coated square tubular frame on wheels for easy transport. PANEL MOUNTED CONTROLS: 4 dial gauges, pump regulator
More informationLuby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
More informationLecture 13: The Knapsack Problem
Lecture 13: The Knapsack Problem Outline of this Lecture Introduction of the 0-1 Knapsack Problem. A dynamic programming solution to this problem. 1 0-1 Knapsack Problem Informal Description: We have computed
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More informationWelcome to the course Algorithm Design
HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Welcome to the course Algorithm Design Summer Term 2011 Friedhelm Meyer auf der Heide Lecture 6, 20.5.2011 Friedhelm Meyer auf
More information120 Inch Telescope Dome Crane Load Testing
State Required Crane Inspection Annual Crane Inspection An annual inspection of our crane is required by law as indicated below. Division 1. Department of Industrial Relations Chapter 4. Division of Industrial
More information70-563 (VB) - Pro: Designing and Developing Windows Applications Using the Microsoft.NET Framework 3.5
70-563 (VB) - Pro: Designing and Developing Windows Applications Using the Microsoft.NET Framework 3.5 Course Introduction Course Introduction Chapter 01 - Windows Forms and Controls Windows Forms Demo
More informationDecentralized Utility-based Sensor Network Design
Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA narayans@cs.usc.edu, bkrishna@usc.edu
More informationOpenFlow Based Load Balancing
OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple
More informationChapter 8: Bags and Sets
Chapter 8: Bags and Sets In the stack and the queue abstractions, the order that elements are placed into the container is important, because the order elements are removed is related to the order in which
More informationONLINE CPD FOR SOCIAL SERVICES BUYING YOUR COURSE/S
RSA Tel 086 1000 381 International Tel +27 21 975 2602 Email info.social@ecpd.co.za ONLINE CPD FOR SOCIAL SERVICES BUYING YOUR COURSE/S Page 1 RSA Tel 086 1000 381 International Tel +27 21 975 2602 Email
More informationMain TVM functions of a BAII Plus Financial Calculator
Main TVM functions of a BAII Plus Financial Calculator The BAII Plus calculator can be used to perform calculations for problems involving compound interest and different types of annuities. (Note: there
More informationCertificate IV in Human Resources. Name Other. Tutorial Support if required - Bundaberg Information Version Control
Location Start Date: Start Date: Start Date: Start Date: Thursday, 14 July 2016 Start Date: End Date: End Date: End Date: End Date: Thursday, 15 September 2016 End Date: Start Time: Start Time: Start Time:
More informationCPU Scheduling. Basic Concepts. Basic Concepts (2) Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems
Basic Concepts Scheduling Criteria Scheduling Algorithms Batch systems Interactive systems Based on original slides by Silberschatz, Galvin and Gagne 1 Basic Concepts CPU I/O Burst Cycle Process execution
More informationLOAD BALANCING AND ADMISSION CONTROL OF A PARLAY X APPLICATION SERVER
This is an author produced version of a paper presented at the 17th Nordic Teletraffic Seminar (NTS 17), Fornebu, Norway, 25-27 August, 2004. This paper may not include the final publisher proof-corrections
More information2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput
Import Settings: Base Settings: Brownstone Default Highest Answer Letter: D Multiple Keywords in Same Paragraph: No Chapter: Chapter 5 Multiple Choice 1. Which of the following is true of cooperative scheduling?
More informationwww.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING
www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging
More information341 - Bioinformatics Android Coursework
341 - Bioinformatics Android Coursework 1 Important This coursework must be submitted electronically via CATE. This coursework is intended for groups of 4. Each group must contain at least one Computing
More informationFinding the Measure of Segments Examples
Finding the Measure of Segments Examples 1. In geometry, the distance between two points is used to define the measure of a segment. Segments can be defined by using the idea of betweenness. In the figure
More informationBusiness Life Path - Red Hat, CFS roadmap
The Kernel Report Vision 2007 edition Jonathan Corbet LWN.net corbet@lwn.net The Plan 1) A very brief history overview 2) The development process 3) Guesses about the future History 1 An extremely rushed
More informationMinimizing the Number of Machines in a Unit-Time Scheduling Problem
Minimizing the Number of Machines in a Unit-Time Scheduling Problem Svetlana A. Kravchenko 1 United Institute of Informatics Problems, Surganova St. 6, 220012 Minsk, Belarus kravch@newman.bas-net.by Frank
More informationReal-Time Scheduling 1 / 39
Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A
More informationEnglish Schools' Athletic Association Track & Field Championships 2016 Timetable
Some hurdles events will be held on the back straight, denoted (BS). Friday Track Event T1 10:00 hrs. 1500 metres JUNIOR BOYS 1st Round 2 Heats, Event T66 12:51 hrs. Saturday T2 10:12 hrs. 1500 metres
More informationThe Management of Logistics in Large Scale Inventory Systems to Support Weapon System Maintenance
The Management of Logistics in Large Scale Inventory Systems to Support Weapon System Maintenance Eugene A. Beardslee, SAIC and Dr. Hank Grant Department of Industrial Engineering, University of Oklahoma
More information- 221 - - 222 - - 223 - - 224 - - 225 - - 226 - - 227 - - 228 - - 229 - - 230 - - 231 - - 232 - - 233 - - 234 - - 235 - - 236 - - 237 - - 238 - - 239 - - 240 - - 241 - - 242 - - 243 - - 244 - - 245 - -
More informationCIEL A universal execution engine for distributed data-flow computing
Reviewing: CIEL A universal execution engine for distributed data-flow computing Presented by Niko Stahl for R202 Outline 1. Motivation 2. Goals 3. Design 4. Fault Tolerance 5. Performance 6. Related Work
More informationInternal Audit Report Credit Cards (C4/69, C4/70)
INFORMATION REPORT Audit Committee 20 October 2011 Governance and Compliance Internal Audit Report Credit Cards (C4/69, C4/70) As part of the 2011 Internal Audit Plan an audit was undertaken on the Credit
More information1 Formulating The Low Degree Testing Problem
6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,
More informationAn Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,
More informationA Scalable VISC Processor Platform for Modern Client and Cloud Workloads
A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background
More informationAccelerating File Transfers Increase File Transfer Speeds in Poorly-Performing Networks
Accelerating File Transfers Increase File Transfer Speeds in Poorly-Performing Networks Contents Introduction... 2 Common File Delivery Methods... 2 Understanding FTP... 3 Latency and its effect on FTP...
More informationOperatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings
Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,
More information